When sending out my original query:
> We had our AlphaServer 1000A NFS server stop wanting to be an NFS
> server last night. The system didn't actually crash, and it was
> still possible to log in to the machine after NFS services
> stopped. Our Networker backups actually ran to completion after
> the NFS failure (although we haven't checked whether the tapes
> make sense yet). Any program that tried to look up kernel
> information, like 'ps', 'top', or 'uptime' would stall
> indefinitely and become unkillable, but other programs like
> 'swapon' and 'df' still worked. Eventually, once we shut down
> all the NFS client systems that depended on it, we were able to
> reboot the system and it's running fine again.
>
> The only messages we got in /var/adm/messages that seemed to be
> related to this were:
>
> Aug 11 14:21:00 aserver1 vmunix: malloc_mem_alloc: no space in map
>
> These started several hours before NFS failed, but only appeared
> every half-hour to hour.
>
> Any one have any ideas?
I forgot to mention that the system in question runs Digital UNIX
4.0D with patch kit 2. As an NFS server it provides service
mostly to a Solaris 2.6 system and four other Digital UNIX
systems which run 4.0B or 4.0D, all of which use NFSv3 (in fact,
so far no NFSv2 accesses have occurred since we rebooted the
systems last night).
Two other people reported that they have experienced similar
problems and are pursuing the issue with DEC service:
"K. M. Peterson" <KMP_at_WI.MIT.EDU>
George Michaelson <ggm_at_dstc.edu.au>
K. M. Peterson writes:
> Hi there,
>
> We're seeing it as well. Exact same thing. There was a query from
> George Michaelson <ggm_at_dstc.edu.au> on 9 July that was not answered. I
> sent him email, but I suspect he may be on vacation.
>
> We're a member of Digital's ASAP program, so I tried asking them about
> it. They haven't replied to my latest missive.
>
> This is happening once every 2 or 3 days on a Digital Personal Work
> Station 600au; we're running 4.0D and Patch Kit 2. Swap is 2GB on this
> machine.
>
> We implemented writing to syslog the output from 'ps auxw' and 'vmstat
> -M' every five minutes. However, it's not very helpful in terms of the
> ps output, and I haven't really had time to delve into the vmstat part.
>
> f/y/i, the other message that we get is:
> Aug 12 14:34:20 fermium vmunix: malloc failed: bucket size = 524288, #of
> failures = 1, ra 0xfffffc000044087c
>
> This message (and additional like messages) can precede the NFS failure
> by several hours, but in our case the 'no space in map' generally
> happens a few minutes before folks report that NFS has keeled over.
>
> The only other unusual thing happening are some wierd SCSI errors, which
> do not seem to correlate at all with this problem.
I've had a lengthy exchange with George Michaelson during which
we seem to have eliminated NFSv2 and ADVfs as possible triggers,
since the system I'm working on uses neither of those.
George Michaelson writes:
> you're running a multiple of 8 farm of tcp,udp contexts? they seem to
> recommend it still in the tuning guide. (in /etc/rc.config)
> Which makes me suspect its the number of NFS clients. we have around 120
> to an 800 5/333
He also received these tuning suggestions from DEC that he
forwarded to me; I have yet to evaluate or try them on our
system:
> From: Lance Gardner <Lance.Gardner_at_digital.com>
> To: "'ggm_at_dstc.edu.au'" <ggm_at_dstc.edu.au>
>
> Tuning Suggestion: vm-mapentries ( 200 ) should be changed to 16384 in
> /etc/sysconfigtab (via sysconfigdb(8)).
>
> vm:
> vm-mapentries=16384
>
> Tuning Suggestion: vm-vpagemax ( 16384 ) should be changed to 65536 in
> /etc/sysconfigtab (via sysconfigdb(8)).
>
> vm:
> vm-vpagemax=65536
>
> Tuning Suggestion: ubc-minpercent ( 10 ) should be changed to 20 in
> /etc/sysconfigtab (via sysconfigdb(8)).
>
> vm:
> ubc-minpercent=20
>
> Tuning Suggestion: Raise AdvfsCacheMaxPercent by 3%
> in /etc/sysconfigtab (via sysconfigdb(8))
>
> Tuning Suggestion: vm-mapentries ( 200 ) should be changed to 16384 in
> /etc/sysconfigtab (via sysconfigdb(8)).
>
> vm:
> vm-mapentries=16384
>
> Tuning Suggestion: vm-vpagemax ( 16384 ) should be changed to 65536 in
> /etc/sysconfigtab (via sysconfigdb(8)).
>
> vm:
> vm-vpagemax=65536
>
> Tuning Suggestion: ubc-minpercent ( 10 ) should be changed to 20 in
> /etc/sysconfigtab (via sysconfigdb(8)).
>
> vm:
> ubc-minpercent=20
>
> Tuning Suggestion: Raise AdvfsCacheMaxPercent by 3%
> in /etc/sysconfigtab (via sysconfigdb(8))
>
> advfs:
> AdvfsCacheMaxPercent=10
>
> Monitor that this change does not adversly affect paging/swapping. The
> System
> will have to be rebooted for this to take effect.
>
> Tuning Suggestion: Adjust AdvfsMaxDevQLen
> in /etc/sysconfigtab (via sysconfigdb(8))
>
> advfs:
> AdvfsMaxDevQLen=32
>
> This should be set to 8-32 normally to avoid flooding the disk with too many
> I/O
> requests. For devices with non-volatile Write back cache enabled, or striped
>
> devices, this may be set higher in the above range.
Received on Thu Aug 13 1998 - 08:29:09 NZST