SUMMARY: NIS/C2 Doesn't cut the mustard

From: Richard L Jackson Jr <rjackson_at_osf1.gmu.edu>
Date: Wed, 4 Oct 1995 10:17:11 -0400 (EDT)

Hello,

RESPONSES FROM:
"Pascal Pederiva (Digital UNIX Support Switzerland)" <pascal_at_zuo.dec.com>
carlos.touzard_at_citicorp.com
Jim Belonis <belonis_at_dirac.phys.washington.edu>
Chuck Jones <chuck.jones_at_citicorp.com>

SUMMARY:
The two main problems with NIS/C2 is high availability and performance.
DEC unofficially (I have not received an official statement) recognizes a
problem exist if high availability and unofficially thinks DECsafe
Available Server (ASE) product could solve this issue. I have reported
the poor performance problem with ypserv and it has been elevated to
engineering.

I have implemented my proposed solution, mentioned below (i.e., using rdist
to distribute key admin files and use NFS to export /tcb/files/auth). We had
450+ concurrent users yesterday and performance was excellent. The only
difference is I used nrdist instead of rdist.

I have appended the responses...

-- 
Regards,
Richard Jackson                                George Mason University
Computer Systems Engineer                      UCIS / ISO
                                               Computer Systems Engineering
QUESTION: 
  What is a good way to build a high availability distributed C2 environment?
I request your feedback on my proposed solution because NIS fails the test.
CURRENT ENVIRONMENT:
Two AlphaServers 2100, one 4/275 (NFS client) and a 5/250 (NFS server),
with each having 2GB of memory running Digital UNIX 3.2c with C2
enabled.  Both systems will have the system disks local and RAID 0+1.
The apps, user, etc disks will be RAID 0+1 in a (SW300) StorageWorks
300/HSZ40 controller configuration.  I have over 19,000 users (with a
potential for 25,000 user accounts), 530+ groups in /etc/group, almost
400 concurrent users during the day and about 200 or less concurrent
users during off hours.
I plan to use the DECsafe Available Server product to reduce down time
caused by a failed system.  That is, the disks are served via the SW300
change ownership.  For example, the 5/250 has ownership of the disks
(NFS server) in the SW300 and crashes because I spill beer over the
system.  The DECsafe software detects the failure and sets the 4/275 to
be the NFS server.  The end result is the disks in the SW300 now appear
local to the 4/275 while I lap up the beer out of the 5/250.  Raise
your hand if this does not make sense...good, let us press on...
OBVIOUS SOLUTION:
Richard, why don't you use NIS?
PROBLEM WITH OBVIOUS SOLUTION:
1. NIS/C2 environment actually has only a single server (master).  C2
forces updates to be done only at the master.  This was discussed in
a thread June/95 with Chua Koon Teck (koonteck_at_singnet.com.sg) and
Jon Buchanan (jonathan.buchanan_at_ska.com).  It was mentioned that a
slave NIS server would not work (read hang) if the master was down.  This
is not high availability!  I might as well not use DECsafe with
this NIS/C2 limitation.
2. Even if I was not concerned with high availability, my user
load does not make it practical.  For example, I setup the 5/250
as the master and ran a simple test -- turned on nis for only
/etc/group (remember it has 530+ entries) and a user load of
350 users.  It felt like I hit the breaks...load average went from
1-2 to over 100, logins, su, etc appeared hung, and ypserv was eating
almost 100% of the CPU.  I ran the test later with 200 concurrent users
and saw ypserv constantly eating ~50% of the CPU and all activates that
used the NIS group map were noticeably affected.  Not good!
NOTE: the 4/275 was not using the 5/250 as a NIS server at that moment.
...scratch head...
PROPOSED SOLUTION:
Ditch NIS. Continue to maintain key files (/etc/group, /etc/networks,
/etc/hosts, aliases, /etc/passwd) on the 5/250 as the master copy.
Use rdist to migrate the key files to the 4/275 when they change.  I
disabled chsh, chfn, and the passwd -f capabilities.  This results
in /etc/passwd changing only when accounts are added or deleted -- this
happens once a day.  This results in the files being local to
both systems with good user response time.  Good so far...
(BTW - why isn't /etc/group dbm'ed?)
What about the /tcb/files/auth/[a-z]?
Hmm...
Store the /tcb/files/auth/[a-z] files on the SW300 in /sys/auth/[a-z]
and mount /sys/auth on top of /tcb/files/auth.  This should work
normally on the 5/250.  Now NFS export /sys/auth to the 4/275 and mount
it on top of the 4/275 /tcb/files/auth.  I have not tried this but
expect smooth updates from both system using network file locks if
necessary.  If the 5/250 goes down (remember the beer), then DECsafe
now makes the SW300 disks local to the 4/275 and the /sys/auth disk is
now mounted on top of the 4/275 /tcb/files/auth mount point.
The only problem I see with this scheme is if the 4/275 is in single
user mode and it can't mount /sys/auth on /tcb/files/auth, then root
has no /tcb/files/auth/r/root.  The solution is to always have
/tcb/files/auth/r/root exist on both the 5/250 and 4/274.  The
end result is the REAL /tcb/files/auth contains only one entry -- r/root.
If the 4/275 is multi-user mode, then the /tcb/files/auth/r/root
is replaced with the NFS export /tcb... files.
The key is to have /tcb/files/auth/r/root updated prior to the 
mount point being covered.  It should be done on both systems.
This is easy.  After mounting /sys/auth on /tcb/files/auth, copy 
/tcb/files/auth/r/root to /tcb/files/root.  Then, during each boot,
copy /tcb/files/root into the real /tcb/files/auth/r/root before
mounting /sys/auth on /tcb/files/auth. 
Does anyone see any problems with this scheme?
Does anyone have a large user base and have to solve the same problem?
Does anyone use C2/NIS with a many concurrent users without problems?
Thank you for your consideration, time, and patience.
RESPONSES:
--------------------------------------------------
It IS possible to have NIS setup such that ASE can switch the master
server from one machine to the other as a ASE Service. We've tested
that solution for  Jon Buchanan _at_ SKA.
( I don't think it's officially supported, but it works just fine )
Pascal Pederiva
--------------------------------------------------
I have a close configuration to yours, and I am on the process of implementing 
C2/NIS, I will appreciate if you keep me posted about your findings.
I like your idea of rdist.  
The version that I have supports directory duplication, it even removes files 
that are not on the main source directory. That will help you with the 
/tcb/files/auth/[a-z]
The version that I have is:  rdist-6.1.0 
The current official version of rdist is available via anonymous ftp
on usc.edu under /pub/rdist.  The current version is always
retrievable as file "/pub/rdist/rdist.tar.gz".
The following is from the man pages , Explains one of thhe options
  -R  Removes extraneous files. If a directory is being updated, any files
      that exist on the remote host that do not exist in the master directory
      are removed.  This is useful for maintaining identical copies of direc-
      tories.
Please kepp me posted, I am implementing DEC-Safe/C2/NIS at we mail each other
By the way. I am an alumni from George Mason, I was on the MSIS
 
Carlos Touzard
--------------------------------------------------
I received the document, and DEC have recognized that this is a problem.
I am looking at Tivoli, it does the job and more, BUT IT IS REALLY EXPENSIVE.
Ok, This is what is going on:
After talking to DEC, about NIS/ASE they propose the same thing you are doing,
We mounted the /tcb directory in all the machines (Using ASE) , It works fine.
We also attempted to put a copy of the /etc/passwd in this mounted directory and
 
do a link to the one in /etc,  so when you update the one in /etc/ it  will 
update all of them. But there were a little problem that the passwd command 
breaks the link and rewrites it.
So at the present time we wrote a shell using rdist to distribute all the files 
that get affected with the passwd, XIsso and XSyaAdmin commands. 
Changing passwords is OK, we want to automate when you create a new account, or 
 
change user data ( Fields, Shell, etc ) on the passwowd file.
We have requested to them to work arount the break link and automate this 
process.
DEC is comming this week to work to my office,  I will let you know the final 
solution.
Carlos Touzard
--------------------------------------------------
A crucial error in your trashing of NIS...
NIS slave servers do not 'hang' when their master is down.
Their current idea of the NIS database just stays static.
I.e. no changes can be reliably made to the database
until another master is set up and all the slaves databases are copied from it.
Typically this would be done by keeping the NIS map source files
replicated on one or more of the slave servers so it/they can be quickly
switched to being the master.
Another possibility is to keep two masters up all the time.
If their database files can be kept in sync (the difficulty of this
depends on whether users are allowed to change their own passwd entries and
how dynamic the other NIS maps are), this will cause no significant
problems, and make switch-over even faster (and maybe even automatic,
though I haven't experimented with this).
NIS has many other security and performance problems, 
so I would not suggest it be used in a
secure environment (C2), but hanging when the master is down isn't one of them.
Belonis
--------------------------------------------------
Sounds like you have your ducks in a row.
I have no knowlege of C2 hacks that may have been done to NIS.
I don't use and have never seen C2 in operation.
But I don't understand how they could cause 'slaves' to feed anything back to
the master.  All changes should be made directly to the master
and it should feed to the slaves.
The major reason for NIS is to have a single master server.
The major reason for slaves is to not have to depend on that master
being up all the time.
It sounds like bogosity in the implementation.
I.e. non-NIS activity hanging NIS.  Or maybe for performance reasons,
they are making changes on the machine originating the changes directly
instead of waiting for the master to update like the spec requires.
They could also be keeping auxilliary information like 'last login' time
etc. in the master password file, and want to update it synchronously.
You need not reply.  I'm just clarifying my statement.
Belonis
--------------------------------------------------
We have faced the same issues and have purchased a software package that solves 
the problem nicely.  It is Called BoKS and is manufactured by Dynasoft in Sweden
 and marketed by a company called Securix, in Buringame California. BoKS has tru
e 24x7 availability, annd allows you to manage your userbase centrally.  It also
provides a very nice admin gui that will let you control user access to servers
by network service i.e., telnet, rsh, rlogin, su, xdm, ftp, login, etc.  we are 
using it on our DEC alpha platforms and it seems to be doein very well.  If you 
are in a position to buy something, I heartily recommendit
You can get info on BoKS from Securix, Inc,
Burlingame, CA.
at 415-343-8999 
Chuck Jones
--------------------------------------------------
Received on Wed Oct 04 1995 - 15:48:55 NZDT

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:46 NZDT