I am reposting the query below with a little more info:
I was able to get a crash dump, and took a look at the crash-data.0 file:
there was a single process group (6289) with 725 entries (all running ksh)
in the kdbx process table...
However, the process group head process(i.e.6289) does not show in this table!
(e.g no entry of in the process table for proc 6289 w ppid 1),
even though it does show in the first list (kernel process status list)
I am not yet sure if this is a kernel bug or if someone just started a crazily
respawning ksh???
-----------------------------------
hey, has anyone seen this one???
-DU 3.0
-alpha 2100
-128m memory, 392 m swap (lazy mode)
- 7 rz28's on 2 scsi busses, using both  LSM and ADVFS
maxusers set to 256, maxuprc at 256
about 30-40 users on at time of hang - this is a new system,
and they will be running with at least twice that number of users when
it is up to speed.
running Mumps(the os/language, not the disease)
(note: mumps uses a couple big hunks of shared memory:
# ipcs -ma
 
Shared Memory:
T      ID     KEY      MODE        OWNER    GROUP  CREATOR   CGROUP NATTCH     SEGSZ  CPID  LPID   ATIME    DTIME    CTIME
m       0     5732 --rw-------      root   system     root   system      2    524288   279   969 18:36:39 18:36:38 17:50:38
m       1      438 --rw-------        ix    rep23       ix    rep23      2    131584   546   547 17:51:08 no-entry 17:51:07
m       2        0 --rw-rw-rw-      root   system     root   system      4    387136  1073  1097 19:06:31 19:06:42 19:05:40
m       3        0 --rw-rw-rw-      root   system     root   system      4   5736640  1073  1097 19:06:31 19:06:42 19:05:40
m       4        0 --rw-rw-rw-      root   system     root   system      4   4737024  1073  1097 19:06:31 19:06:42 19:05:41
System hangs - 
syslog mesgs at time of hang show:
Mar 19 17:36:15 sunlab vmunix: h_kmem_alloc_memory_: 0xffffffff8143a780: request
 is stalled
Mar 19 17:36:15 sunlab vmunix: h_kmem_alloc_memory_: 0xffffffff813b03c0: request
 is stalled
Mar 19 17:36:15 sunlab vmunix: heap_thread: no space in heap 0xfffffc00005fcfd0
->Mar 19 17:36:16 sunlab vmunix: Default heap is empty.  Please increase the
->Mar 19 17:36:16 sunlab vmunix: configurable heappercent parameter and reboot.
Mar 19 17:36:16 sunlab vmunix: h_kmem_alloc_memory_: 0xffffffff814ce780: request
 is stalled
Mar 19 17:36:16 sunlab vmunix: heap_thread: no space in heap 0xfffffc00005fcfd0
Mar 19 17:36:16 sunlab vmunix: h_kmem_alloc_memory_: 0xffffffff8143a780: request
 is stalled
-----ok, vmunix, thanks for the suggestion about increasing heappercent,
but i guess I would like
a clue about what I'm doing before I do it....
So, I rtfm and the most informative entry is in "System tuning manual":
heappercent
is "the virtual size of the kernel heap expressed as a percentage of physical
memory...kernel data structures are allocated from the kernel heap.  the kernel
heap wires physical memory as the kernel data structures are allocated."
hmmm... 
So what is eating up  this "heap"?
(and from my college logic class, if you take away 1,is it still a heap?;-) 
should I increase it to 8%,? 9%? 
How can I be sure that I actually need to increase it, and It is not
a result of some 'runaway' process... Is the problem related to ADVFS
requirements? (just a guess, but I like to blame things on ADVFS...)
TIA -
I will summarize!!!!!
Shanna Leonard
Unix Systems Specialist
Sunquest Information Systems
ssl_at_alpha.sunquest.com
Received on Wed Mar 20 1996 - 06:56:58 NZST