-- Regards, Richard Jackson George Mason University Computer Systems Engineer UCIS / ISO Computer Systems Engineering QUESTION: What is a good way to build a high availability distributed C2 environment? I request your feedback on my proposed solution because NIS fails the test. CURRENT ENVIRONMENT: Two AlphaServers 2100, one 4/275 (NFS client) and a 5/250 (NFS server), with each having 2GB of memory running Digital UNIX 3.2c with C2 enabled. Both systems will have the system disks local and RAID 0+1. The apps, user, etc disks will be RAID 0+1 in a (SW300) StorageWorks 300/HSZ40 controller configuration. I have over 19,000 users (with a potential for 25,000 user accounts), 530+ groups in /etc/group, almost 400 concurrent users during the day and about 200 or less concurrent users during off hours. I plan to use the DECsafe Available Server product to reduce down time caused by a failed system. That is, the disks are served via the SW300 change ownership. For example, the 5/250 has ownership of the disks (NFS server) in the SW300 and crashes because I spill beer over the system. The DECsafe software detects the failure and sets the 4/275 to be the NFS server. The end result is the disks in the SW300 now appear local to the 4/275 while I lap up the beer out of the 5/250. Raise your hand if this does not make sense...good, let us press on... OBVIOUS SOLUTION: Richard, why don't you use NIS? PROBLEM WITH OBVIOUS SOLUTION: 1. NIS/C2 environment actually has only a single server (master). C2 forces updates to be done only at the master. This was discussed in a thread June/95 with Chua Koon Teck (koonteck_at_singnet.com.sg) and Jon Buchanan (jonathan.buchanan_at_ska.com). It was mentioned that a slave NIS server would not work (read hang) if the master was down. This is not high availability! I might as well not use DECsafe with this NIS/C2 limitation. 2. Even if I was not concerned with high availability, my user load does not make it practical. For example, I setup the 5/250 as the master and ran a simple test -- turned on nis for only /etc/group (remember it has 530+ entries) and a user load of 350 users. It felt like I hit the breaks...load average went from 1-2 to over 100, logins, su, etc appeared hung, and ypserv was eating almost 100% of the CPU. I ran the test later with 200 concurrent users and saw ypserv constantly eating ~50% of the CPU and all activates that used the NIS group map were noticeably affected. Not good! NOTE: the 4/275 was not using the 5/250 as a NIS server at that moment. ...scratch head... PROPOSED SOLUTION: Ditch NIS. Continue to maintain key files (/etc/group, /etc/networks, /etc/hosts, aliases, /etc/passwd) on the 5/250 as the master copy. Use rdist to migrate the key files to the 4/275 when they change. I disabled chsh, chfn, and the passwd -f capabilities. This results in /etc/passwd changing only when accounts are added or deleted -- this happens once a day. This results in the files being local to both systems with good user response time. Good so far... (BTW - why isn't /etc/group dbm'ed?) What about the /tcb/files/auth/[a-z]? Hmm... Store the /tcb/files/auth/[a-z] files on the SW300 in /sys/auth/[a-z] and mount /sys/auth on top of /tcb/files/auth. This should work normally on the 5/250. Now NFS export /sys/auth to the 4/275 and mount it on top of the 4/275 /tcb/files/auth. I have not tried this but expect smooth updates from both system using network file locks if necessary. If the 5/250 goes down (remember the beer), then DECsafe now makes the SW300 disks local to the 4/275 and the /sys/auth disk is now mounted on top of the 4/275 /tcb/files/auth mount point. The only problem I see with this scheme is if the 4/275 is in single user mode and it can't mount /sys/auth on /tcb/files/auth, then root has no /tcb/files/auth/r/root. The solution is to always have /tcb/files/auth/r/root exist on both the 5/250 and 4/274. The end result is the REAL /tcb/files/auth contains only one entry -- r/root. If the 4/275 is multi-user mode, then the /tcb/files/auth/r/root is replaced with the NFS export /tcb... files. The key is to have /tcb/files/auth/r/root updated prior to the mount point being covered. It should be done on both systems. This is easy. After mounting /sys/auth on /tcb/files/auth, copy /tcb/files/auth/r/root to /tcb/files/root. Then, during each boot, copy /tcb/files/root into the real /tcb/files/auth/r/root before mounting /sys/auth on /tcb/files/auth. Does anyone see any problems with this scheme? Does anyone have a large user base and have to solve the same problem? Does anyone use C2/NIS with a many concurrent users without problems? Thank you for your consideration, time, and patience. RESPONSES: -------------------------------------------------- It IS possible to have NIS setup such that ASE can switch the master server from one machine to the other as a ASE Service. We've tested that solution for Jon Buchanan _at_ SKA. ( I don't think it's officially supported, but it works just fine ) Pascal Pederiva -------------------------------------------------- I have a close configuration to yours, and I am on the process of implementing C2/NIS, I will appreciate if you keep me posted about your findings. I like your idea of rdist. The version that I have supports directory duplication, it even removes files that are not on the main source directory. That will help you with the /tcb/files/auth/[a-z] The version that I have is: rdist-6.1.0 The current official version of rdist is available via anonymous ftp on usc.edu under /pub/rdist. The current version is always retrievable as file "/pub/rdist/rdist.tar.gz". The following is from the man pages , Explains one of thhe options -R Removes extraneous files. If a directory is being updated, any files that exist on the remote host that do not exist in the master directory are removed. This is useful for maintaining identical copies of direc- tories. Please kepp me posted, I am implementing DEC-Safe/C2/NIS at we mail each other By the way. I am an alumni from George Mason, I was on the MSIS Carlos Touzard -------------------------------------------------- I received the document, and DEC have recognized that this is a problem. I am looking at Tivoli, it does the job and more, BUT IT IS REALLY EXPENSIVE. Ok, This is what is going on: After talking to DEC, about NIS/ASE they propose the same thing you are doing, We mounted the /tcb directory in all the machines (Using ASE) , It works fine. We also attempted to put a copy of the /etc/passwd in this mounted directory and do a link to the one in /etc, so when you update the one in /etc/ it will update all of them. But there were a little problem that the passwd command breaks the link and rewrites it. So at the present time we wrote a shell using rdist to distribute all the files that get affected with the passwd, XIsso and XSyaAdmin commands. Changing passwords is OK, we want to automate when you create a new account, or change user data ( Fields, Shell, etc ) on the passwowd file. We have requested to them to work arount the break link and automate this process. DEC is comming this week to work to my office, I will let you know the final solution. Carlos Touzard -------------------------------------------------- A crucial error in your trashing of NIS... NIS slave servers do not 'hang' when their master is down. Their current idea of the NIS database just stays static. I.e. no changes can be reliably made to the database until another master is set up and all the slaves databases are copied from it. Typically this would be done by keeping the NIS map source files replicated on one or more of the slave servers so it/they can be quickly switched to being the master. Another possibility is to keep two masters up all the time. If their database files can be kept in sync (the difficulty of this depends on whether users are allowed to change their own passwd entries and how dynamic the other NIS maps are), this will cause no significant problems, and make switch-over even faster (and maybe even automatic, though I haven't experimented with this). NIS has many other security and performance problems, so I would not suggest it be used in a secure environment (C2), but hanging when the master is down isn't one of them. Belonis -------------------------------------------------- Sounds like you have your ducks in a row. I have no knowlege of C2 hacks that may have been done to NIS. I don't use and have never seen C2 in operation. But I don't understand how they could cause 'slaves' to feed anything back to the master. All changes should be made directly to the master and it should feed to the slaves. The major reason for NIS is to have a single master server. The major reason for slaves is to not have to depend on that master being up all the time. It sounds like bogosity in the implementation. I.e. non-NIS activity hanging NIS. Or maybe for performance reasons, they are making changes on the machine originating the changes directly instead of waiting for the master to update like the spec requires. They could also be keeping auxilliary information like 'last login' time etc. in the master password file, and want to update it synchronously. You need not reply. I'm just clarifying my statement. Belonis -------------------------------------------------- We have faced the same issues and have purchased a software package that solves the problem nicely. It is Called BoKS and is manufactured by Dynasoft in Sweden and marketed by a company called Securix, in Buringame California. BoKS has tru e 24x7 availability, annd allows you to manage your userbase centrally. It also provides a very nice admin gui that will let you control user access to servers by network service i.e., telnet, rsh, rlogin, su, xdm, ftp, login, etc. we are using it on our DEC alpha platforms and it seems to be doein very well. If you are in a position to buy something, I heartily recommendit You can get info on BoKS from Securix, Inc, Burlingame, CA. at 415-343-8999 Chuck Jones --------------------------------------------------Received on Wed Oct 04 1995 - 15:48:55 NZDT
This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:46 NZDT