Hello Managers,
I am having trouble with my DEC 3000 500X. It crashes occasionally complaining about Hardware Check
Errors. I called DEC about this and they said that it was the cache memory and that the whole system board
(motherboard) would have to be replaced. They couldn't tell which level of cache that it was. They also said
that the processor or the L2 cache chips could not be simply replaced because they couldn't be removed (they are
in sockets however). I can't justify spending $5000 for an infrequent problem which they diagnosed by
deciphering the registers in my uerf ouput. On their recommendation I blew out all the dust from inside the
box. That didn't seem to help. Now the problem is more frequent and I need help before I commit to the $5000.
In the archives there was a mention about CAM SCSI errors derived from motherboard faults. I am also
getting these errors. Another symptom I am getting is that my cron backup routine reveals bad reads. These
errors do not occur all the time. Sometime there's alot of them (after 16 the dump is aborted), somtimes a few,
sometimes none. These errors could be from bad files.
It only occurs to one file system. These three symptons could be related.
Attached is my uerf output (uerf -o full -R). I would really appreciate any insight. Thank you in
advance.
Uerf output:
********************************* ENTRY 1. *********************************
----- EVENT INFORMATION -----
EVENT CLASS ERROR EVENT
OS EVENT TYPE 100. CPU EXCEPTION
SEQUENCE NUMBER 1.
OPERATING SYSTEM DEC OSF/1
OCCURRED/LOGGED ON Mon Oct 14 18:34:37 1996
OCCURRED ON SYSTEM gore
SYSTEM ID x00020004 CPU TYPE: DEC 3000
SYSTYPE x00000000
----- UNIT INFORMATION -----
UNIT CLASS CPU
----- KN15AA CPU 630/620 STACK FRAME -----
PROCESSOR OFFSET x00000018
SYSTEM OFFSET x00000048
BIU_STAT x0000000000000340
BIU_CMD CYCLE CLASS IS READ_BLOCK
FILL_ECC PRI. CACHE FILL FROM EXT.
_CACHE HAD ECC ERROR
BIU_ADDR x0000000000108018
PHYSICAL ADDRESS OF CACHE BLOCK WITH ERROR IS x8400
DC_STAT x0000000000000007
DC_HIT LAST LOAD OR STORE MISSED
_DCACHE
OPCODE RA FIELD - INTEGER REGISTER IS R 0.
FILL_SYNDROME x0000000000002C00 SINGLE BIT ERROR IS NO ERRORS
SINGLE BIT ERROR IS DATA BIT 05
FILL_ADDR x00000000042E1548
PHYSICAL ADDRESS OF QUADWORD WITH ERROR x2170AA
BC_TAG x0000000000404295 EXTERNAL CACHE TAG CONTROL BITS
_EXTERNAL CACHE HIT
D BIT - CACHE BLOCK DIRTY
V BIT - CACHE BLOCK VALID
TAG ADDRESS IS x214
EXTERNAL CACHE TAG CONTROL BITS TAG
_ADDRESS PARITY BIT
INT_EXC_IDENT x0000000000000000
INTERRUPT OR EXCEPTION IS NONE
********************************* ENTRY 2. *********************************
----- EVENT INFORMATION -----
EVENT CLASS OPERATIONAL EVENT
OS EVENT TYPE 300. SYSTEM STARTUP
SEQUENCE NUMBER 0.
OPERATING SYSTEM DEC OSF/1
OCCURRED/LOGGED ON Mon Oct 14 12:01:14 1996
OCCURRED ON SYSTEM gore
SYSTEM ID x00020004 CPU TYPE: DEC 3000
SYSTYPE x00000000
MESSAGE LK401 keyboard, language English
_(American)
{ cropped}
********************************* ENTRY 3. *********************************
----- EVENT INFORMATION -----
EVENT CLASS ERROR EVENT
OS EVENT TYPE 302. PANIC
SEQUENCE NUMBER 2.
OPERATING SYSTEM DEC OSF/1
OCCURRED/LOGGED ON Mon Oct 14 11:58:24 1996
OCCURRED ON SYSTEM gore
SYSTEM ID x00020004 CPU TYPE: DEC 3000
SYSTYPE x00000000
MESSAGE panic (cpu 0): Machine check -
_Hardware error
********************************* ENTRY 4. *********************************
----- EVENT INFORMATION -----
EVENT CLASS ERROR EVENT
OS EVENT TYPE 100. CPU EXCEPTION
SEQUENCE NUMBER 1.
OPERATING SYSTEM DEC OSF/1
OCCURRED/LOGGED ON Mon Oct 14 11:58:24 1996
OCCURRED ON SYSTEM gore
SYSTEM ID x00020004 CPU TYPE: DEC 3000
SYSTYPE x00000000
----- UNIT INFORMATION -----
UNIT CLASS CPU
----- LEP MACHINE CHECK STACK FRAME -----
PROCESSOR OFFSET x00000110
SYSTEM OFFSET x000001A0
PALTEMP1 x0000000000000000
PALTEMP2 x000C06F800000004
PALTEMP3 x0000000000000000
PALTEMP4 x000000000000000F
PALTEMP5 x0000000000000000
PALTEMP6 x000003FF80017E48
PALTEMP7 x0000000000104000
PALTEMP8 x0000000000000000
PALTEMP9 x0000000000000008
PALTEMP10 xFFFFFC00003C3050
PALTEMP11 x0000000000000000
PALTEMP12 xFFFFFC00003C33F0
PALTEMP13 xFFFFFC00003C3420
PALTEMP14 xFFFFFC00003C3480
PALTEMP15 xFFFFFC00003C31F0
PALTEMP16 xFFFFFC00003C2F00
PALTEMP17 x0000000000000000
PALTEMP18 x000000011FFFF520
PALTEMP19 xFFFFFFFF88B5FA58
PALTEMP20 xFFFFFC00004CDC10
PALTEMP21 x0000000000000000
PALTEMP22 x6068686C7C7C7C7C
PALTEMP23 x00000062000007F9
PALTEMP24 x0000000000000000
PALTEMP25 x0000000000010000
PALTEMP26 x0000000000000000
PALTEMP27 x0000000000000000
PALTEMP28 x000000000191A000
PALTEMP29 xFFFFFFFC00000000
PALTEMP30 x0000000000000001
PALTEMP31 x00000000048CBA58
EXC_ADDR x0000000080017E86
EXCEPTING OR EXECUTING INSTRUCTION DID NOT COMPLETE PC IS xE0005FA1
EXC_SUM x0000000000000000
EXC_MSK x0000000000000000
ICCSR x0000000000000000
PC0 INT ENABLED AFTER 2**16 EVENTS
PC1 INT ENABLED AFTER 2**12 EVENTS
PC0 COUNTER INPUT TOTAL ISSUES DIVIDED
_BY 2
PC1 COUNTER INPUT DCACHE MISSES
FP INSTRUCTIONS CAUSE FEN EXCEPTIONS
ADDRESS SPACE NUMBER = x0
PAL_BASE x0000000000060000
BASE ADDRESS FOR PALCODE = x18
HIER x00000000000018F0
CORRECTABLE READ ERROR INTERRUPT
_ENABLED
CPU HARDWARE INTERRUPT ENABLED ON PIN
_3
CPU HARDWARE INTERRUPT ENABLED ON PIN
_4
CPU HARDWARE INTERRUPT ENABLED ON PIN
_5
PC1 INTERRUPT DISABLED
PC0 INTERRUPT DISABLED
CPU HARDWARE INTERRUPT ENABLED ON PIN
_1
CPU HARDWARE INTERRUPT ENABLED ON PIN
_2
HIRR x0000000000000000
MM_CSR x0000000000003640
INTEGER REGISTER USED IS R 4.
DC_STAT x0000000000000007
DC_HIT LAST LOAD OR STORE MISSED
_DCACHE
OPCODE RA FIELD - INTEGER REGISTER IS R 0.
DC_ADDR x00000000FFFFFFFF SEO SECOND ERROR OCCURRED
ABOX_CTL x000000000000042E
FUNCTIONS ENABLED - MCHECK ENABLED FOR
_UNCORRECTABLE ERRORS
FUNCTIONS ENABLED - CRD CORRECTED READ
_DATA INTERRUPT ENABLED
FUNCTIONS ENABLED - SINGLE ENTRY ICACHE
_STREAM BUFFER ENABLED
FUNCTIONS ENABLED - DCACHE ENABLED
BIU_STAT x0000000000000140
BIU_CMD CYCLE CLASS IS READ_BLOCK
FILL_ECC PRI. CACHE FILL FROM EXT.
_CACHE HAD ECC ERROR
BIU_ADDR x0000000002983520
PHYSICAL ADDRESS OF CACHE BLOCK WITH ERROR IS x14C1A9
BIU_CTL x0000000020007447
EXTERNAL CACHE ENABLED
EXTERNAL CACHE ECC ENABLED
EXTERNAL CACHE READ/WRITE SPEED IN CPU CYCLES IS
_3
EXTERNAL CACHE WRITE ENABLE TIMING BIT FIELD IS x1
FILL_SYNDROME x0000000000000900 SINGLE BIT ERROR IS NO ERRORS
FILL_ADDR x0000000002983520
PHYSICAL ADDRESS OF QUADWORD WITH ERROR x14C1A9
VA x00000000001011F0 D-STREAM FAULT OR DTB MISS - VIRTUAL ADDRESS IS x1011F0
BC_TAG x0000000000002995 EXTERNAL CACHE TAG CONTROL BITS
_EXTERNAL CACHE HIT
D BIT - CACHE BLOCK DIRTY
V BIT - CACHE BLOCK VALID
TAG ADDRESS IS x14C
----- KN15AA CPU SPECIFIC STACK FRAME -----
INT_EXC_IDENT x0000000000000088
INTERRUPT OR EXCEPTION IS NONE
MCR_STAT x0000000011808080 BANK 0 32 MBYTES
BANK 1 32 MBYTES
BANK 2 32 MBYTES
BANK 4 32 MBYTES
IOSLOT x0000000000100000
TURBOCHANEL OPTION SLOT 1 PARITY
_DISABLED
TURBOCHANEL OPTION SLOT 2 PARITY
_DISABLED
TURBOCHANEL OPTION SLOT 4 PARITY
_DISABLED
TURBOCHANEL OPTION SLOT 5 PARITY
_DISABLED
TURBOCHANEL OPTION SLOT 6 PARITY
_DISABLED
TC OPTION SCSI ADAPTER PARITY DISABLED
TC OPTION CORE I/O PARITY DISABLED
TC OPTION CXTURBO PARITY DISABLED
TC_CONFIG x0000000000000016 MAGIC # FOR DMA CONTROL IS x16
PAGE SIZE IS 8KBYTES
IR x000000000007FE00
SECOND ERROR OCCURED
DMA BUFFER ERROR - UNDER/OVER FLOW
CROSSED 2K BOUNDARY ON DMA
TC RESET IN PROGRESS
TC PARITY ERROR
TAG ERROR DURING DMA
SINGLE BIT ERROR ON I/O WRITE OR DMA
_READ
DOUBLE BIT ERROR ON I/O WRITE OR DMA
_READ
TC TIMEOUT ON I/O REQUEST
--------------
{another system startup}
--------------
********************************* ENTRY 6. *********************************
----- EVENT INFORMATION -----
EVENT CLASS ERROR EVENT
OS EVENT TYPE 302. PANIC
SEQUENCE NUMBER 5.
OPERATING SYSTEM DEC OSF/1
OCCURRED/LOGGED ON Sun Oct 13 21:17:08 1996
OCCURRED ON SYSTEM gore
SYSTEM ID x00020004 CPU TYPE: DEC 3000
SYSTYPE x00000000
MESSAGE panic (cpu 0): Machine check -
_Hardware error
Received on Wed Oct 16 1996 - 21:02:09 NZDT