---- >3) a side question: has anyone had problems with Digital UNIX 4.0a >on the 4100 series? At any rate, are there any known patches which >are required (or generally are considered a good idea) on this release? >We know about the ping patch already. As of 11/20 there were 44 'generally released' v4.0a patches, ftp: atlanta.service.digital.com:/pub/patches/osfv40a/README for the summary. There are likely at least a half dozen which may be applicable to your configuration (mileage varies). If you conclude there are some you want, then ftp patches.tar.Z. If you need some tools to de-consolidate patches.tar.Z, ftp: raven.alaska.edu:/pub/sois/UA_DUtools.tar.Z I think by early next year Digital plans to have the patches and README's web available. ---- I'll point that soft mounts are somewhat dangerous because there may be programs that blindly expect writes to work and don't cope when they don't. Hard mounts with the interrupt feature are safer. You need to track down what is being saturated. That may suggest its own solution. Since the network doesn't seem to be the problem and the login server seems ok then network and memory bottlenecks are unlikely. That leaves I/O and CPU usage. What is the CPU utiliziation on the server? What is the I/O rate and bandwidth utilization of the array? Enough I/O and all those fast write caches between the client and array will saturate. At that point writes to the array could slow down considerably. ---- Pine is somewhat inefficient when reading large mail files, since it reads the whole thing into memory. Are you sure you're not paging or swapping a lot when the system gets slow? ----- Which version of pine are you using? With 3.91, there is a noticeable delay (about 4-5 seconds) when you open or close a folder. In version 3.95 the delay dissappears completely. Anyway, if it is a NFS problem, you can try to configure pine to use IMAP instead of accessing the mail folders directly via a NFS-mounted drive. IMAP is, IMHO, a very much better protocol than POP. There is an IMAP server included in the pine distribution. ---- > Node B: mail hub. (2 300MHZ CPU, 3 DE500 ethernet, 500MB RAM). > This node handles all SMTP, IMAP, and POP processing for the users > on node A, and NFS-serves a 16GB RAID 5 array to the login server > and multiple workstations scattered within a single building (all are on > 10MB ethernet). This system also has a Prestoserve 8MB module > that enhances NFS on the mailspool disk. This system runs 15 nfsiods > and 64 nfsds. > > MAXUSERS on both nodes is at 1024. This seems kind of high. There are a lot of resources used for MAXUSERS. You need one of these for each user that will be logged in simultaneous. On an NFS server, this could be as low as 64 or 128 or 256, since nobody logs in to the system. On a user system, it depends on how many people will be logged in at once. Does this RAID use AdvFS (if so, AdvFS is parallelized in 4.0a and the support is excellent, in 3.2x the support is weak and AdvFS can only be used by cpu0). AdvFS is a great performance boost for many reasons and also ensures that if your power dies it wont fsck and eat up your drive. Here are some things to look at.. iostat 1, vmstat 1, swapon -s, nfsstat -rs, -rc. See below. look for: $ iostat 1 tty fd0 re0 re1 dk3 cpu tin tout bps tps bps tps bps tps bps tps us ni sy id 0 8 0 0 70 4 990 60 0 0 4 0 16 80 0 58 0 0 8 1 587 61 0 0 5 0 23 72 0 59 0 0 111 8 1654 71 0 0 7 0 33 60 cpu (id) % cpu idle and bps for a disk, this is how busy a disk is. vmstat 1 tells you free memory, multiple free by 8192. Virtual Memory Statistics: (pagesize = 8192) procs memory pages intr cpu r w u act free wire fault cow zero react pin pout in sy cs us sy id 5103 22 24K 29K 9065 720M 212M 157M 371 142M 0 248 8K 511 4 16 80 for example, i have 237mb memory free. swapon -s : Swap partition /dev/re0b (default swap): Allocated space: 183172 pages (1431MB) In-use space: 537 pages ( 0%) Free space: 182635 pages ( 99%) Total swap allocation: Allocated space: 183172 pages (1431MB) Reserved space: 16957 pages ( 9%) In-use space: 537 pages ( 0%) Available space: 166215 pages ( 90%) look for in-use space and reserved space. In-Use space is space actually being used, reserved is used because in Digital Unix by default a process will assign swap space before it attempts to use memory to ensure that there are always enough system resources. nfsstat -rc (for client) nfsstat -rs (for server) see if you have a large amount of timeouts, badxids, retrans, etc, depending on the combination this could too many nfsd's, etc, not enough, not enough bandwidth, bandwith problem, etc. also look at netstat -ian and see if you have a very very large amt of collisions, etc. > There's a dedicated 100MB duplexed link between these two nodes-- > said link handles ONLY the NFS traffic between the two. Other systems > talk to these systems via other ethernet channels. > > Other misc. nodes handle Web, administrative, printing, and other functions. > They're not really a factor in this situation. > > When we peak out at 420-430 users on the login server (75% of > whom are running Pine and therefore accessing their inbox files via > NFS), mail performance drops off badly (1-2 minutes to open the 420-430 users is a lot on a system with 500MB memory. Consider that a shell itself takes a meg or two or memory, pine takes more memory depending on how much mail someone is reading, etc. As soon as a system has to depend on swap, it's never going to be the same. > performance on the login server (i.e. faster access to the NFS-mounted > mail directories? We've experimented with hard vs. soft mounts (the > latter is preferred so the entire system doesn't hang when the mail > server goes away) and timeouts on the mount with some success. > Other ideas would be useful. We mount NFS over FDDI and it works great. Our mounts are normal, nfs-v3 mounts. > 2) does anyone have a set of "preferred" or suggested sysconfig > parameters for either of these scenarios? Digital tell me that they're > working on recommended parameters for various types of servers, > but so far no data. There is very little tuning that needs to be done (we investigated this for a long long time), here are some things to look at: proc: max-proc-per-user=1000 max-threads-per-user=1000 maxusers = 256 This says that each user can have 1000 processes maximum. This may be able to be lower on your system. vm: ubc-minpercent = 2 ubc-maxpercent = 10 ubc-borrowpercent = 2 ubc is how much of the resources can be used for buffers. Your system can eat up all the memory just for buffers, what I have above says use a minimum of 2% of the memory for buffers, use a maximum of 10% and go up in percentages of 2. by default this is much much much higher and the system can eat a ton of your resources just for buffers. > 3) a side question: has anyone had problems with Digital UNIX 4.0a > on the 4100 series? At any rate, are there any known patches which > are required (or generally are considered a good idea) on this release? > We know about the ping patch already. Digital Unix 4.0A is excellent. There are a small amount of minor patches in addition to the ping patch that you will want to install, but they're minimal. Our system has been up rock solid since day 1 without a problem. ---- Update: The problem continues despite some very good suggestions. At this point I've optimized the servers themselves about as much as is currently possible; we're down to what I believe is a problem in the SCSI subsystem--we're seeing timeouts on the built-in kzpaa on a fairly frequent basis (anyone know if there's a known bug in 3.2G with this controller???) and I'm trying to get a replacement from DEC. It appears that the controller is either saturated (though iostat doesn't seem to make this appear to be the condition) or it's flaky. I want to swap in a replacement to see if that helps; if not, I have a kzpsa on order (note: this controller handles the mailspool RAID array). The optimizations done so far include: 1) updating the firmware on our Viper RAID array 2) decreasing ubc-maxpercent and ubc-minpercent to good values (they were way too high); they're now 1 and 2 on the mail server, and 1 and 5 on the login server. 3) increasing timeo= from 300 to 600 on the soft mounts 4) increasing bufcache to 5 on the mail server Unfortunately, at least for the time being we're stuck with using NFS- mounted mail spools in this configuration. As soon as we can come up with a better plan, we'll go with it. I think that once I replace the kzpaa controller (or find a bugfix for these timeouts & resets) we'll be in better shape. Thanks, Dick Joltes Manager, UNIX Systems & Multiplatform Services Harvard Arts & Sciences Computer Services joltes_at_fas.harvard.edu http://www.fas.harvard.edu/~joltes voice: 617-495-9281 fax: 617-495-1210Received on Tue Dec 10 1996 - 17:51:55 NZDT
This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:47 NZDT