Thanks to John P Speno for this little gem...
> Here's the problem with your DS10, I'll bet.
> If you've got a DS10 system with any KZPCM* cards, be advised that there is
> a bug in the driver for said cards. Your DS10 will eventually hang, and be
> unresponsive in most ways (the RMC will still work, so you can power cycle
> it).
> 
> The good news is that Compaq has a patch which seems to fix this problem.
> Refer to BLITZ: TD 2689.
> 
> *The KZPCM is the 2 port ultra scsi plus 10/100 NIC combo PCI card. 
I called COMPAQ and got the patch.  I have applied it and the machine has
been up for over 30 minutes!  Seriously I will post again if this doesn't
fix it.  The file that the patch contained...  /sys/BINARY/i2c.mod was
in the crash dump, so I am hoping this does the trick.  Thanks to all
who had ideas.
Bryan
---------- Forwarded message ----------
Date: Wed, 8 Sep 1999 09:17:33 -0400 (EDT)
From: Bryan Rank <bryan_at_compgen.com>
To: tru64-unix-managers_at_ornl.gov
Cc: eric_at_compgen.com
Subject: Personal Workstation 433au and PWRMRG
Hi Everyone,
I have a followup to Paul Crittenden wrote: PWRMGR_ENABLED post...
We have a brand new Compaq AlphaServer DS10 and the darned thing wont
stay up for more than a couple of hours. It seems prone to scsi bus
resets which afterwards we loose our tape drive.
scu> show edt
CAM Equipment Device Table (EDT) Information:
    Device: CDR-8435   Bus: 0, Target: 0, Lun: 0, Type: Read-Only Direct Access
    Device: RZ2DD-KS   Bus: 2, Target: 0, Lun: 0, Type: Direct Access
    Device: RZ2DD-KS   Bus: 2, Target: 1, Lun: 0, Type: Direct Access
    Device: TLZ10      Bus: 2, Target: 6, Lun: 0, Type: Sequential Access
The scsi card appears to be an multiple ultra-wide card that also has a 
10/100UTP card with it...
Sep  8 08:35:02 goose vmunix: ata0 at pci0 slot 13
Sep  8 08:35:02 goose vmunix: ata0: ACER M1543C
Sep  8 08:35:02 goose vmunix: scsi0 at ata0 slot 0
Sep  8 08:35:02 goose vmunix: rz0 at scsi0 target 0 lun 0 (LID=0) (COMPAQ  CDR-8435         0013)
Sep  8 08:35:02 goose vmunix: scsi1 at ata0 slot 1
Sep  8 08:35:02 goose vmunix: comet0 at pci0 slot 14
Sep  8 08:35:02 goose vmunix: pci2000 at pci0 slot 15
Sep  8 08:35:02 goose vmunix: itpsa0 at pci2000 slot 0
Sep  8 08:35:02 goose vmunix: IntraServer ROM Version V2.0 (c)1998
Sep  8 08:35:02 goose vmunix: scsi2 at itpsa0 slot 0
Sep  8 08:35:02 goose vmunix: rz16 at scsi2 target 0 lun 0 (LID=1) (DEC     RZ2DD-KS (C) DEC 0306) (Wide16)
Sep  8 08:35:02 goose vmunix: rz17 at scsi2 target 1 lun 0 (LID=2) (DEC     RZ2DD-KS (C) DEC 0306) (Wide16)
Sep  8 08:35:02 goose vmunix: tz22 at scsi2 target 6 lun 0 (LID=3) (DEC     TLZ10    (C) DEC 04a8)
Sep  8 08:35:02 goose vmunix: itpsa1 at pci2000 slot 1
Sep  8 08:35:02 goose vmunix: IntraServer ROM Version V2.0 (c)1998
Sep  8 08:35:02 goose vmunix: scsi3 at itpsa1 slot 0
Sep  8 08:35:02 goose vmunix: tu2: DECchip 21140: Revision: 2.2
Sep  8 08:35:02 goose vmunix: tu2: auto negotiation capable device
Sep  8 08:35:02 goose vmunix: tu2 at pci2000 slot 2
Sep  8 08:35:02 goose vmunix: tu2: DEC TULIP (10/100) Ethernet Interface, hardware address: 00-06-2B-00-D3-44
Sep  8 08:35:02 goose vmunix: tu2: auto negotiation off: selecting 100BaseTX (UTP) port: half duplex
I was happy to see Paul Crittenden's not about PWRMGR_ENABLED in the 
kernel config file.  I went to remove it, and it's not in there.
Just for completeness I looked with "dia -R -o full"
******************************** ENTRY    2 ******************************** 
####################SNIP####################
ASCII Message 
    Alpha boot: available memory from 0x138e000 to 0x1ff8e000 
    Digital UNIX V4.0F  (Rev. 1229); Thu Sep  2 18:29:31 EDT 1999  
    physical memory = 512.00 megabytes. 
    available memory = 492.03 megabytes. 
    using 1956 buffers containing 15.28 megabytes of memory 
    Firmware revision: 5.4-2 
    PALcode: Digital UNIX version 1.50-48 
    COMPAQ AlphaServer DS10 466 MHz 
We applied patch0001 for 4.0F...
KITNAME><DUV40FAS0001-19990609> OSF440
Any ideas?  We have a call open, but would like to have this thing fixed
as soon as possible.  Anyone having similar problems?
Thanks 
Bryan Rank
On Tue, 7 Sep 1999, Paul Crittenden wrote: PWRMGR_ENABLED in the system configuration
> This is an FYI for anyone that has an Alpha Personal Workstation 433au.  On
> this system I noticed that we were getting Disk SCSI errors at the rate of
> 1 to 2 a day.  I spoke with Compaq and they wanted to replace the disk
> drive.  Since this is our Library database system and it would have to be
> rebuilt from scratch by the vendor this was going to be a major pain.  The
> vendor, Innovative Software, stated that they had seen this problem before
> and suggested we remove the item PWRMGR_ENABLED in the system configuration
> file, rebuild the kernel, replace the old kernel with the new one and
> reboot.  I did this and as advertised the SCSI errors went away.  The
> system apparently was not that busy and the disks would spin dow to save
> power and then when they were addressed by the software we would get an
> error.  I was sure thankful that this fixed the problem because it saved me
> a very late night.
> 
> Just thouht I would post this in case someone else is sees this problem it
> might help you.
> 
> Paul Crittenden
> Computer System Manager
> Simpson College
> e-mail: crittend_at_simpson.edu
> 
> Y2K?  Why not 3?
> 
Received on Wed Sep 08 1999 - 15:53:01 NZST