Hello Admins,
I have an AS4100 running 4.0E with patches.
It has all 4 cpus, 2 GB memory
and 2 power supplies (specifically 0 and 2).
The system panicked this morning.
The "show power" command at the SRM prompt points to an intermittent problem
with power supply 2. I am willing to believe this.
BUT, dia says:
System type register x00000016 Alpha 4000/1200 Series
Number of CPUs (mpnum) x00000004
CPU logging event (mperr) x00000000
Event validity 1. O/S claims event is valid
Event severity 1. Severe Priority
Entry type 100. CPU Machine Check Errors
CPU Minor class 2. 660 Entry
Software Flags x0000000300000000
IOD 0 Register Subpkt Pres
IOD 1 Register Subpkt Pres
Active CPUs x0000000F
Hardware Rev x00000000
System Serial Number <Deleted>
Module Serial Number
Module Type x0000
System Revision x00000000
Machine Check Reason x0208 Fatal Environmental Event Interrupt
Environmental Entry ---> System Environmental Register Follows
======================== =====================================
Sys Environmental Regs x000017CB Function Reg<15:8>: x00000017
Failure Reg <7:0> : x000000CB
>> Invalid Pwr Supply 0 Status Bits
Sequence
>> Power Supply 1 Present and Ok
>> Invalid Pwr Supply 2 Status Bits
Sequence
System Fans are OK
>> PROBLEM with CPU Fan 0 and 2
Temperature is OK
PALcode Revision Palcode Rev: 1.21-26
As you can see, the decoded Environmental Register bits claim that power
supplies 0 and 2 have invalid status bits and power supply 1 (which doesn't
exist) is present and ok.
So the real question is "Does this suggest that the problem is monitoring
circuits giving false readings?" Or should I trust the SRM and buy a
replacement power supply?
As a followup question...Does anyone know where I can find a quality, yet
inexpensive, replacement power supply?
Thanks!!
Ken Lawrence <mailto:lawrenk_at_wes.army.mil>
BAE Systems
USAE-Engineer Research & Development Center
Coastal & Hydraulics Lab
Unix System Administrator?
601.634.3813
Received on Wed Jul 09 2003 - 21:58:48 NZST