 |
» |
|
|
 |
Ask the Wizard Questions
System hangs under heavy load
The Question is:
On our MicroVax 3900 we have been having a problem with
the system locking up. Usually when the number of users
gets around 40-45 the system will no longer allow users to
log in. It does not say that you cannot log in but after
you enter your username, when you enter you password it says
user authorization failed. Even from the system account.
Everyone who is currently logged on is still active, but
new users can't get in. The only thing you can do is to
reboot the system.
We have increased the swap file and added a secondary
swapfile. This seemed to help some but we still have the
problem from time to time.
The system has 32MB of memory.
Any suggestions (other than upgrade to an ALPHA)?
The Answer is:
This sounds suspiciously like SECURITY_SERVER has gone into RWMBX state.
I'm involved with one of these which has been unresolved and ongoing since
July last year. I have another customer who is experiencing similar problems,
but to a lesser extent and a further customer who has seen it once (so far!).
Here is the (sanitised) version of the text of a STARS article which
describes the problem. Interested internal people can look in STARS to see
the whole article which has lots of hidden text. Note that the patches
mentioned in this article have NOT fixed the problem for my original
customer.
[OpenVMS] System not Accepting Logins & SECURITY_SERVER Hung In RWMBX
COPYRIGHT (c) 1988, 1993 by Digital Equipment Corporation.
ALL RIGHTS RESERVED. No distribution except as provided under contract.
Copyright (c) Digital Equipment Corporation 1995, 1996. All rights reserved.
PRODUCT: OpenVMS VAX, Version 6.1
COMPONENT: Security Server Process (SECURITY_SERVER)
SOURCE: Digital Equipment Corporation
SYMPTOM:
The SECURITY_SERVER process hangs in a Resource Wait Mailbox (RWMBX)
state with a channel assigned to a high numbered mailbox. Once the
process hangs, all user login attempts fail even though the
username/password pair entered is correct. The following error
message is returned:
User Authorization Failure
During this time, the following OPCOM messages are displayed on the
operator console:
%SECSRV-E-ASSIGNFAILED, security server failed to assign a
channel to a client
%SYSTEM-W-NOSUCHDEV, no such device available
%SECSRV-I-INVALIDTERMNAME, received invalid terminal name for
intruder/suspect
%SYSTEM-W-NOSUCHDEV, no such device available
Additional symptoms of the problem are that the hang typically occurs
on a heavily loaded system and the DCL command, SHOW INTRUSION also
hangs in RWMBX state.
SOLUTION:
This problem is corrected in OpenVMS VAX V7.0. If you are unable to
upgrade, use one of the WORKAROUNDs below.
WORKAROUND 1:
Empty the high numbered mailbox by copying it to the null device. For
example use a DCL command similar to the following:
$ COPY MBA3419: NLA0:
WORKAROUND 2:
A VAXLOGI ECO kit may address the problem described in this article.
Refer to the ECO-SUMMARY article to determine if this ECO corrects the
problem for your specific configuration. The ECO-SUMMARY information
may be found by opening the ECO-SUMMARY database and using a query of
VAXLOGI.
ANALYSIS:
The high numbered mailbox appears to be a client mailbox. It is
believed that a client has requested a return message from the
SECURITY_SERVER process, but the client process no longer exists. The
SECURITY_SERVER process hangs in RWMBX state waiting for the client
process to read from the client mailbox.
It appears that during heavy load periods, a small timing window
exists in which the SECURITY_SERVER is able to assign a channel to the
client mailbox before the client process dies. However, it is unable
to write to the mailbox until after the client process dies. Thus the
SECURITY_SERVER process hangs in RWMBX state.
|