Hello,
QUESTION:
What is a good way to build a high availability distributed C2 environment?
I request your feedback on my proposed solution because NIS fails the test.
CURRENT ENVIRONMENT:
Two AlphaServers 2100, one 4/275 (NFS client) and a 5/250 (NFS server),
with each having 2GB of memory running Digital UNIX 3.2c with C2
enabled. Both systems will have the system disks local and RAID 0+1.
The apps, user, etc disks will be RAID 0+1 in a (SW300) StorageWorks
300/HSZ40 controller configuration. I have over 19,000 users (with a
potential for 25,000 user accounts), 530+ groups in /etc/group, almost
400 concurrent users during the day and about 200 or less concurrent
users during off hours.
I plan to use the DECsafe Available Server product to reduce down time
caused by a failed system. That is, the disks are served via the SW300
change ownership. For example, the 5/250 has ownership of the disks
(NFS server) in the SW300 and crashes because I spill beer over the
system. The DECsafe software detects the failure and sets the 4/275 to
be the NFS server. The end result is the disks in the SW300 now appear
local to the 4/275 while I lap up the beer out of the 5/250. Raise
your hand if this does not make sense...good, let us press on...
OBVIOUS SOLUTION:
Richard, why don't you use NIS?
PROBLEM WITH OBVIOUS SOLUTION:
1. NIS/C2 environment actually has only a single server (master). C2
forces updates to be done only at the master. This was discussed in
a thread June/95 with Chua Koon Teck (koonteck_at_singnet.com.sg) and
Jon Buchanan (jonathan.buchanan_at_ska.com). It was mentioned that a
slave NIS server would not work (read hang) if the master was down. This
is not high availability! I might as well not use DECsafe with
this NIS/C2 limitation.
2. Even if I was not concerned with high availability, my user
load does not make it practical. For example, I setup the 5/250
as the master NIS and ran a simple test -- turned on nis for only
/etc/group (remember it has 530+ entries) and a user load of
350 users. It felt like I hit the breaks...load average went from
1-2 to over 100, logins, su, etc appeared hung, and ypserv was eating
almost 100% of the CPU. I ran the test later with 200 concurrent users
and saw ypserv constantly eating ~50% of the CPU and all activates that
used the NIS group map were noticeably affected. Not good!
NOTE: the 4/275 was not using the 5/250 as a NIS server at that moment.
...scratch head...
PROPOSED SOLUTION:
Ditch NIS. Continue to maintain key files (/etc/group, /etc/networks,
/etc/hosts, aliases, /etc/passwd) on the 5/250 as the master copy.
Use rdist to migrate the key files to the 4/275 when they change. I
disabled chsh, chfn, and the passwd -f capabilities. This results
in /etc/passwd changing only when accounts are added or deleted -- this
happens once a day. This results in the files being 'local' to
both systems with good user response time. Good so far...
(BTW - why isn't /etc/group dbm'ed?)
What about the /tcb/files/auth/[a-z]?
Hmm...
Store the /tcb/files/auth/[a-z] files on the SW300 in /sys/auth/[a-z]
and mount /sys/auth on top of /tcb/files/auth. This should work
normally on the 5/250. Now NFS export /sys/auth to the 4/275 and mount
it on top of the 4/275 /tcb/files/auth. I have not tried this but
expect smooth updates from both system using network file locks if
necessary. If the 5/250 goes down (remember the beer), then DECsafe
now makes the SW300 disks local to the 4/275 and the /sys/auth disk is
now mounted on top of the 4/275 /tcb/files/auth mount point.
The only problem I see with this scheme is if the 4/275 is in single
user mode and it can't mount /sys/auth on /tcb/files/auth, then root
has no /tcb/files/auth/r/root. The solution is to always have
/tcb/files/auth/r/root exist on both the 5/250 and 4/274. The
end result is the REAL /tcb/files/auth contains only one entry -- r/root.
If the 4/275 is multi-user mode, then the /tcb/files/auth/r/root
is replaced with the NFS export /tcb... files.
The key is to have /tcb/files/auth/r/root updated prior to the
mount point being covered. It should be done on both systems.
This is easy. After mounting /sys/auth on /tcb/files/auth, copy
/tcb/files/auth/r/root to /tcb/files/root. Then, during each boot,
copy /tcb/files/root into the real /tcb/files/auth/r/root before
mounting /sys/auth on /tcb/files/auth.
Does anyone see any problems with this scheme?
Does anyone have a large user base and have to solve the same problem?
Does anyone use C2/NIS with a many concurrent users without problems?
Thank you for your consideration, time, and patience.
--
Regards,
Richard Jackson George Mason University
Computer Systems Engineer UCIS / ISO
Computer Systems Engineering
Received on Thu Sep 14 1995 - 03:36:17 NZST