FW: SUMM: (partial) System monitoring from Unix Administrator on 1997-12-06 (tru64-unix-managers)

From: Unix Administrator <unixadmin_at_gmd.com.pe>
Date: Fri, 05 Dec 1997 10:32:39 -0500

> Our thanks to:
>
> Gary Jarrell
> alan_at_nabeth.cxo.dec.com
>
> Explanations for item 1: from Alan
> 1. Get a version of lsof. It will certainly show how many
> files are in use, what files they are and what the
> associated processes are.
>
> The file table may be dynamic, so the size may change. This
> makes it hard to see how much is in use, because it would be
> as big as necessary. Lots of formerly static tables are
> dynamic in Digital UNIX. I just don't know which are which.
>
> Also, Gary suggests to use df -i.
>
> Explanation for item 2:
> Gary says that messages are pretty clear, and suggests to take a look
> into related RFCs.
>
> Explanations for item 3: A nice Digital UNIX-internals summary from
> Alan:
>
> 3. A CPU can run only one process at a time. When the CPU switches
> processes that's a context switch. There may also be context
> switches when going from kernel mode to user mode and back, but
> that merely switches a particular processes state. A CPU bound
> process usually run until its time slice runs out and then some
> other process will be run. These are involuntary context
> switches.
> When processes make system calls they often have to wait for a
> resource (I/O to complete, a kernel lock, etc) and the context
> is switched to another process while the first one waits. These
> are voluntary context switches. I don't think the kernel counts
> which are which.
>
> Interrupts are when devices or the system itself informs the
> operating system software that something needs to done to
> handle a particular condition. An interrupt may be more than
> a device indicating that an I/O is complete or the system
> clock indicating the passage of time. Or, it could be the
> memory subsystem indicating that it had to correct a memory
> error, the CPU indicating that some instruction failed, or
> any number of other hardware failures. Often, these hardware
> interruptions are divided into groups such as device interrupts,
> clock interrupts, unsolicated, but not unexpected traps and
> machine checks. They're all interrupts though.
>
> These are handled in a special context using the lowest level
> operating system code. Older operating systems may have done
> considerable work in the interrupt context. Recent systems
> tend to have the interrupt context do as little work as possible
> and then have a kernel thread continue handling the condition
> in normal kernel mode.
>
> I don't recall whether clock interrupts get counted in what
> vmstat reports as interrupts, though I think not.
>
> On virtual memory, demand paged systems, a page fault is simply
> a access to unmapped memory. The access generates an interrupt
> and the interrupt handler (or thread) determines if the memory
> is valid or not. When the memory is valid, the page fault
> handler determines what it needs to do make it accessible. It
> may simply have been an access to a page that was on the free
> list or a page that was recently created, but had never had
> memory backing it. It could be a page that had been written
> to the page/swap space and needs to be paged in. There are
> many kinds of page faults and you have to look at the other
> paging statistics to see what is really going on. Page faults
> not requireing I/O are relatively cheap, but they do represent
> extra work to reference memory.
>
>
> Thanks and regards,
> UA
>
> -----Original Message-----
>
>
> Hi all,
> Three questions about system monitoring:
>
> 1. Is there a way to see how many open file descriptors are being used
> by the system and how many is the system allowed to use ?
>
> 2. When you monitor the network with `netstat -a`, in the "(state)"
> column there appear different status: "LISTEN", "ESTABLISHED",
> "FIN_WAIT", "CLOSE_WAIT", and so on. Is there a place to look at the
> meaning of these status?
>
> 3. How should be understand the info that we obtain from `vmstat`,
> specifically: context switches, interrupts and page faults?. What
> should
> we do when there is a high percentage of system CPU (from 60~70%) in
> use
> ?
>
> Any advice is welcome. For sure, I'll summarize.
> Thanks in advance
> UA
Received on Fri Dec 05 1997 - 16:27:55 NZDT

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:37 NZDT