Summary: System Management for Digital UNIX

From: Chua Koon Teck <koonteck_at_singnet.com.sg>
Date: Thu, 13 Jul 1995 08:58:23 +0800 (SST)

Hi

Below is two of the replies I have received regarding my query on System
tuning for Digital UNIX.

Thanks to Allan Small and Crispin Harris for their replies.

My original question is :

On Fri, 30 Jun 1995, Chua Koon Teck wrote:

> Hi
>
> As this list consists of many Digital UNIX system managers, I would like
> to seek different advice for troubleshooting, planning and implementing
> expansion of Digital UNIX system.
>
> My questions are as follows :
>
> 1. As there can be many possible causes for the slowness, such as
> network bottleneck, memory, I/O, what are the most accurate method to
> determine which is the greatness contribution to the system slowness.
>
> 2. Is there any technology in Digital to distribute the system load
> across more than one server ? I heard that Digital has got the
> AdvantageCluster technology but it is not available yet.
>
> 3. Beside distributing the system load across servers to improve the
> sytem performance, is there any other methods of improving system
> performance.
>
> 4. What can be the possible cause of a network bottleneck ? For
> example, the telnet application uses tcp to establish connection and
> provide the data transfer mechanism, is there any limitation on the data
> transfer.
>
> 5. Is there any limitation on the ethernet card of the DEC server ?
> Since this ethernet card is the interface between the server and the
> network, I am suspecting that this could be one of the network bottleneck.
>
> 6. How to resolve I/O bottleneck in Digital UNIX and server ?
>
> Well, I would also like to hear any other opinions on system management
> which I might have missed out in this mail.
>
>
> Thank you.
>
> Have a nice day.
>
>
>
> Regards
>
>
> Chua Koon Teck
> koonteck_at_singnet.com.sg
> SingNet
> URL="http://www.singnet.com.sg/"
> Singapore Telecom
>
>

================================================================
>From small_at_gidday.enet.dec.com Thu Jul 13 08:50:39 1995
Date: Thu, 29 Jun 95 20:03:46 PDT
From: Allan Small <small_at_gidday.enet.dec.com>
To: koonteck_at_singnet.com.sg
Subject: RE: System Management for Digital UNIX

Hi,

>1. As there can be many possible causes for the slowness, such as
>network bottleneck, memory, I/O, what are the most accurate method to
>determine which is the greatness contribution to the system slowness.
>

The best place to start is with the vmstat, iostat and netstat
utilities. You
can find a good description of these and other UNIX monitors in 'DEC
OSF/1 Syste
m
Tuning and Performance Management Guide'. (Part of the Digital UNIX docset).

The DEC PS software is also available as a layered product. It can be
used to
identify potential performance problems.

>2. Is there any technology in Digital to distribute the system load
>across more than one server ? I heard that Digital has got the
>AdvantageCluster technology but it is not available yet.

Yes the DECsafe ASE environment can be configured to load balance
services (at
startup time) between two systems. You can also manually migrate services
between systems if you wish. It is available now.

>5. Is there any limitation on the ethernet card of the DEC server ?
>Since this ethernet card is the interface between the server and the
>network, I am suspecting that this could be one of the network bottleneck.

The bottleneck is likely to be the network itself. Check the number of
collisions on the ethernet adaptor.

Hope this helps
Allan

===================================================================

>From crispin%itd.dsto.gov.au%augean.ua.oz_at_communica.oz.au Thu Jul 13
08:50:45 19
95
Date: Fri, 30 Jun 1995 12:03:24 +0930 (CST)
From: Crispin Harris - Communica consultant
     <crispin%itd.dsto.gov.au%augean.ua.oz_at_communica.oz.au>
To: koonteck%singnet.com.sg%augean.ua.oz_at_communica.oz.au
Subject: Re: System Management for Digital UNIX

> 1. As there can be many possible causes for the slowness, such as
> network bottleneck, memory, I/O, what are the most accurate method to
> determine which is the greatness contribution to the system slowness.

With the SysV environment installed, you can use the sar utility
to gather some very powerfull statistics about your system usage
and load levels. This can include things as nebulous as 'cpu utilisation'
to disk load, memory paging, network packets, etc.

It is still not all that usefull for finding out how much time is being
spent _servicing_ network requests.

Alternatively, you can have a look at vmstat and iostat as a method of
determining system load statistics.

Reading the output from these can require a fair degree of skill and
may be thought of more as an art/skill than a science.

> 3. Beside distributing the system load across servers to improve the
> sytem performance, is there any other methods of improving system
> performance.

That really does depend on what the system is doing, and where the
majority of the load is being generated. If the load is disk based,
then you can play with multiple SCSI busses, RAID-4, striping, mirroring
and so forth, and disk load ballancing. If the load is memory/swap based
then adding more memory is a good idea.

> 4. What can be the possible cause of a network bottleneck ? For
> example, the telnet application uses tcp to establish connection and
> provide the data transfer mechanism, is there any limitation on the data
> transfer.

There are a number of ways in which a network interface can be a bottleneck
and these are best described by looking at the path that network traffic
takes when entering the system:

Cable <-> Card <-> packets <-> IP stack <-> TCP stack <-> Application
  +----1----+ +2+ +------3-------+ +-4-+

1) Hardware layer.
 Ethernet cable: could have a high collision rate. This would cause the
network to 'feel' slow and un-responsive. easily checked by checking the
ping times from various machines at varying load times.
 Ethernet Card: 2 points
   - Ether hardware for coping with packets. This is the rate at which
the card itself can process an incomming/outgoing ethernet packet.
Digital have a reputation for being good but not blinding in this respect.
   - I/O with system architecture. This is the rate at which the ethernet
card can transfer packets to the operating system. Each packet generates
a hardware interrupt, which must be serviced quickly. Digital have a good
reputation for servicing network traffic quickly, and with a minimum of
fuss.

2) Processing Ethernet Packets.
 This is mostly buffer manipulation, and memory moves. The packet is first
checked for its type, and then handed to the correct stack to handle the
rest of the processing. On some systems this can take a considerable time,
as the packet may require as many as 2 or 3 memcopy/memmove opperations,
this is not a problem that OSF/alpha suffers from in any great respect.

3) Protocol layer.
 This is when we actually start to get into the intensive stuff. During
the IP stack processing, the packet may need to be queued to wait for
the rest of the IP transaction to arrive (this is because of the
possibility of fragmented, or split, transactions). Anywhere between 1 and
4 or 5 memmove/memcopy's may be nescessary in this phase. Again, OSF/Alpha
avoids the BIG overheads that sometimes occur here in other OS's.
 During TCP stack processing we also need to worry about sequence of
inbound packets, and sliding window transfers. Also redundancy checking
to ensure that the transfer has not been garbled.

The protocol layer is the point at which it is often possible to have
a bottleneck, mostly because of the need to avoid race conditions and
device contention problems. It is in this little paragraph that SysV
derived unix products often have networking bottlenecks (especially on
multi-processor machines). Often this contention is handled by the use
of MutExes (Mutual Exclusion locks). From the Sun MicroSystems man page
on mutex: "They are typically used to ensure that only one thread
executes a critical section of code at any one time (mutual exclusion)."

The use of these sorts of constructs in this portion of the operating
system is very important, and often not tested thoroughtly enough.
Unfortunately, there is rarely anything that the System Administrator
can do to change the situation.

> 5. Is there any limitation on the ethernet card of the DEC server ?
> Since this ethernet card is the interface between the server and the
> network, I am suspecting that this could be one of the network bottleneck.

Hmmm, I am not sure. (Sorry :-(

> 6. How to resolve I/O bottleneck in Digital UNIX and server ?

One thing to note is that anything that produces a hardware interrupt
is infact queue jumping - in that the scheduler cannot say 'wait your
turn' to a serial device. (Or rather it has only a very limited ability
to do so.) This means that a machine that is suffering a large number
of interrupts is not opperating at anywhere near its peak performance.

TTY terminals should be put on terminal servers for this reason.


I hope that this helps.

Regards,
crispin harris
Received on Thu Jul 13 1995 - 03:05:02 NZST

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:45 NZDT