I probably should have posted this earlier since other people have had
this problem. Thanks Martin Moore from DEC support who was very helpful. I
have not yet had a chance to try the pre-release patch, but at least this
might explain some things. 
-Saar Picker
=========================================================================
 Saar Picker				     saarp_at_socrates.berkeley.edu
 CCS/SDA - Administrative Unix 				  (510) 643-8168
=========================================================================
---------- Forwarded message ----------
Date: Fri, 24 Jul 1998 13:51:31 -0400 (EDT)
From: Martin Moore <martin_at_decatl.alf.dec.com>
To: Saar Picker <saarp_at_socrates.berkeley.edu>
Cc: martin_at_decatl.alf.dec.com
Subject: Re: processes not finishing problem
But there is a bug that was introduced in patch kit 1 (it's not fixed in
kit 2, which was just released, but the fix should be in kit 3; I know
Engineering has recently solved the bug.) 
If you have any application that uses mmap on an Advfs filesystem under either
of the following circumstances:
1.) The application does an mlockall() and then an mmap().
or
2.) The application is multithreaded, and does an mmap() in one thread, and a
    read() or write() to the same file from another thread.
it will induce an Advfs locking problem such that subsequent accesses to the
domain will hang.  Blocked threads because of this hang can in turn hold locks
that subsequently block other threads, etc.  
The scenario you described is very similar to what I have seen induced by this
problem.  There's no way to tell for sure, though, without forcing a crash on
the system when in this state, and looking at the crash dump.  
If you have a software contract, I'd suggest logging a call to your CSC, so
you can have the problem analyzed and, if it's the problem described above,
you can get a pre-release copy of the patch.
Martin
-- 
Martin J. Moore                         5555 Windward Parkway West
Digital UNIX Expert Team                Alpharetta GA  30004-7407
Customer Services Division              +1-800-354-9000 x31679
Compaq Computer Corporation             mailto: martin_at_alf.dec.com
Here's the original posting:
> From saarp_at_socrates.berkeley.edu Tue Jul 28 09:41:38 1998
> Date: Fri, 24 Jul 1998 10:24:00 -0700 (PDT)
> From: Saar Picker <saarp_at_socrates.berkeley.edu>
> To: alpha-osf-managers_at_ornl.gov
> Subject: processes not finishing problem
> 
> 
> Hello all,
> 
> We've been experiencing some strange problems on our DEC8200 high volume
> mail server. We're running 4.0D with the latest patch kit. 
> 
> Every once in a while commands like 'ls' and 'ps' stop returning and
> cannot be killed. After a while, the whole machine slows down and dies as
> the memory and swap fill up. The 'ls' problem is strange because it will
> hang only on certain directories and work on others on the same
> filesystem.
> 
> Has anyone ever seen anything like this?
> 
> Thanks.
> -Saar Picker
> 
> =========================================================================
>  Saar Picker				     saarp_at_socrates.berkeley.edu
>  CCS/SDA - Administrative Unix 				  (510) 643-8168
> =========================================================================
> 
Received on Tue Jul 28 1998 - 16:48:38 NZST