Summary: Alpha Pauses

From: WHITTAKER, Bruce <bjw_at_ansto.gov.au>
Date: Wed, 13 Nov 1996 09:49:22 +1100

Hi folks,

sorry that this summary has taken so long but I've been kinda slack. I actually have no solution as yet but I did have some mighty good suggestions from a number of people. The main problem is that the Pause is accuring intermittently but not often - and I'm not getting any messages in any of the system logs. The Pause also doesn't seem tied to system activity because I have seen it occur while very little was going on (I was the only user logged on so I had a good idea of the load).

My next step is to try the suggestion from alan at DEC who recommended running monitor all of the time and trying to observe the system during and after the pause to see what goes on.

Again, many thanks to those who offered help, they are.....
        alan_at_nabeth.cxo.dec.com -> Ta, will try your suggestion.
        Dave Nye (evil_at_Empire.Net) -> Thanks Dave, nothing in UERF.
                                            All SCSI cables 'look' fine too :-)
        Keith Chiles (kchiles_at_hccsf.com)->All our drives are DEC ones. They used
                                            to run on DECstations with no problems so
                                            I'm pretty sure that the termination should be fine.
                                            We may try shifting the drives to another machine
                                            and see if the 'Pause' moves with it.
        Steve Mclaughlin -> Thanks Steve, but there's nothing in ANY of the logs...
        (mclaughlin_at_nssdca.gsfc.nasa.gov)

If I work out the solution, I'll post a second summary. If I can't work it out I'll yell at DEC (or should I ask politely??)

Oh well, back to putting bugs into working programs.

Bruce Whittaker.
-----------------------------------------------------------------------------------------------------------
This is the original posting.....

Hi all,

I've been noticing a problem with our AlphaStation 200 4/166. Every now and again it seems to pause for about 5-10 seconds and I can't find out what it's related to. Has anyone else noticed a similar problem?

The machine configuration is

Memory: Bank 0 has 2x8 Meg Simms
                Bank 1 has 2x8 Meg Simms
                bank 2 has 2x16 Megs Simms.

Drives: 2xRZ56's (old 650 meg Drives)


I suspect that the problem is either tied to the memory or the drives. I actually lean more towards memory being the problem as it's mixed

Can anyone help out?

-----------------------------------------------------------------------------------------------------------
Responses follow.
-----------------------------------------------------------------------------------------------------------
>From alan_at_nabeth.cxo.dec.com

One possibility related to those drives is that Digital UNIX uses
a unfied buffer cache; all of free memory can be the cache. If
your I/O load is heavy sequential writes, much of memory will be
used for the cache and when it comes time to flush the cache it
may take a little while. Those drives are fairly slow compared
to modern drives and could slow down the process.

The other side of the cache behavior is on sequential reads, where
lots of barely idle pages get written out to provide more cache
and then have to be paged back in when something runs.

You might want to run a real-time performance monitoring program
(something small is better) and see what else is happening when
the system pauses, or just after as the performance monitor
catches up. One choice is Monitor, available from:

ftp://www.digital.com/pub/DEC/monitor.alpha.tar.Z

----------------------------------------------------------------------------------------------------------------------------

>From Dave Nye,


Check your uerf for scsi reset messages. I had a bad cable recently that
caused these exact same symptoms. Replaced cable, and it was gone, as were
the SCSI resets.

----------------------------------------------------------------------------------------------------------------------------

     
Bruce,

I read your posting with interest because I had the same problem a year ago on
two systems. I don't know if this is your problem, but I finally traced my
problem down to an add-on scsi board that came from the factory with internal
scsi termination turned off as the default. This left me with a scsi buss that
was open on the inside.

Keith
----------------------------------------------------------------------------------------------------------------------------

Hi Bruce,

 Look at the uerf output. For example, %uerf -o full -R | more
 and see if anything is timing out (like on scsi resets, etc..).
 We had this problem too when one of our drives went bad. Hope
 this helps.

Steve Mc
mclaughlin_at_nssdca.gsfc.nasa.gov



Bruce Whittaker
ANSTO - Physics Division
Phone - +61 (02) 9717 3662
Fax - +61 (02) 9717 3257
Received on Wed Nov 13 1996 - 00:57:08 NZDT

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:47 NZDT