Frozen X Servers

From: Debra Alpert <alpert_at_fas.harvard.edu>
Date: Sat, 22 Nov 1997 17:36:58 -0500 (EST)

Our site has a student lab which is home to 34 alphastation 255/300's,
each with a PowerStorm 4D20 video card, all originally running Open3D
v4.4. The machines all run Digital Unix 4.0B with enhanced (C2) security
enabled. Each box originally had a local copy of the /tcb/files/auth
hierarchy. We decided that this presented a security risk, as the lab is
open 24 hours a day. Since NIS as a mechanism for distributing the
security databases was not an option (with almost 18,000 users it is much
too slow), we moved /tcb/files/auth to an alphaserver 4100 from which the
workstations remotely mount the filesystem. The server is also running
dUnix 4.0B, as well as TruCluster v1.4, so the database NFS service can
failover to another 4100 should a problem arise.

Ever since we made this change, the X servers on the workstations tend to
"freeze up." This does not happen while a user is logged in, nor has it
been observed immediately after the X server restarts after logout (the X
servers are configured to stop and restart at the end of each session).
What happens is that after a workstation has been idle for a time (for as
little as 5 minutes), users can't login because keyboard input is not
accepted, nor is the screen sensitive to the mouse (although the pointer
can move). This does not happen all the time (a machine can sit for hours
without entering this state), but this behavior was never observed prior
to the remote mounting of /tcb/files/auth. In our busy lab, with not much
idle time for the machines, it happens as often as 20 times a day. Running
`/sbin/init.d/xlogin restart' resolves the problem. While the X server is
in this state, `ps' shows that it is idle.

Installing patch_kit5 did not remedy the situation (and has since been
rolled back because of security loopholes), nor has upgrading the Open3D
software to v4.5 (either with or without patch_kit5). It doesn't matter
whether NFS over tcp or udp is used.

There may be a problem with the interaction of NFS locking and the
authentication mechanism, which requires a read-write mount. To this
point, we've had no sucess isolating the cause of the problem, and neither
syslog nor the xdm-errors file have provided any insight. We logged a
report with DEC weeks ago, and as yet, their engineers have not offered us
any solutions.

Has anyone else observed this behavior, have an idea why it may be
happening, or how to fix it?

TIA,

Debra Alpert
FAS Unix Systems Group
Harvard University
Received on Sat Nov 22 1997 - 23:52:17 NZDT

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:37 NZDT