Hello Managers,
        I am having trouble with my DEC 3000 500X.  It crashes occasionally complaining about Hardware Check 
Errors.  I called DEC about this and they said that it was the cache memory and that the whole system board 
(motherboard) would have to be replaced.  They couldn't tell which level of cache that it was.  They also said 
that the processor or the L2 cache chips could not be simply replaced because they couldn't be removed (they are 
in sockets however).  I can't justify spending $5000 for an infrequent problem which they diagnosed by 
deciphering the registers in my uerf ouput.  On their recommendation I blew out all the dust from inside the 
box.  That didn't seem to help.  Now the problem is more frequent and I need help before I commit to the $5000.  
        In the archives there was a mention about CAM SCSI errors derived from motherboard faults.  I am also 
getting these errors.  Another symptom I am getting is that my cron backup routine reveals bad reads.  These 
errors do not occur all the time.  Sometime there's alot of them (after 16 the dump is aborted), somtimes a few, 
sometimes none.  These errors could be from bad files.
It only occurs to one file system.  These three symptons could be related.  
          Attached is my uerf output (uerf -o full -R).  I would really appreciate any insight.  Thank you in 
advance.  
Uerf output:
********************************* ENTRY     1. *********************************
----- EVENT INFORMATION -----
EVENT CLASS                             ERROR EVENT
OS EVENT TYPE                  100.     CPU EXCEPTION
SEQUENCE NUMBER                  1.
OPERATING SYSTEM                        DEC OSF/1
OCCURRED/LOGGED ON                      Mon Oct 14 18:34:37 1996
OCCURRED ON SYSTEM                      gore
SYSTEM ID                 x00020004     CPU TYPE:  DEC 3000
SYSTYPE                   x00000000
----- UNIT INFORMATION -----
UNIT CLASS                              CPU
----- KN15AA CPU 630/620 STACK FRAME -----
PROCESSOR OFFSET          x00000018
SYSTEM OFFSET             x00000048
BIU_STAT              x0000000000000340
                                        BIU_CMD CYCLE CLASS IS  READ_BLOCK
                                        FILL_ECC PRI. CACHE FILL FROM EXT.
                                         _CACHE HAD ECC ERROR
BIU_ADDR              x0000000000108018
                                        PHYSICAL ADDRESS OF CACHE BLOCK WITH ERROR IS x8400
DC_STAT               x0000000000000007
                                        DC_HIT LAST LOAD OR STORE MISSED
                                         _DCACHE
                                        OPCODE RA FIELD - INTEGER REGISTER IS R 0.
FILL_SYNDROME         x0000000000002C00     SINGLE BIT ERROR IS NO ERRORS
                                        SINGLE BIT ERROR IS DATA BIT 05
FILL_ADDR             x00000000042E1548
                                        PHYSICAL ADDRESS OF QUADWORD WITH ERROR x2170AA
BC_TAG                x0000000000404295     EXTERNAL CACHE TAG CONTROL BITS
                                         _EXTERNAL CACHE HIT
                                        D BIT - CACHE BLOCK DIRTY
                                        V BIT - CACHE BLOCK VALID
                                        TAG ADDRESS IS x214
                                        EXTERNAL CACHE TAG CONTROL BITS TAG
                                         _ADDRESS PARITY BIT
INT_EXC_IDENT         x0000000000000000
                                        INTERRUPT OR EXCEPTION IS NONE
********************************* ENTRY     2. *********************************
----- EVENT INFORMATION -----
EVENT CLASS                             OPERATIONAL EVENT
OS EVENT TYPE                  300.     SYSTEM STARTUP
SEQUENCE NUMBER                  0.
OPERATING SYSTEM                        DEC OSF/1
OCCURRED/LOGGED ON                      Mon Oct 14 12:01:14 1996
OCCURRED ON SYSTEM                      gore
SYSTEM ID                 x00020004     CPU TYPE:  DEC 3000
SYSTYPE                   x00000000
MESSAGE                                 LK401 keyboard, language English
                                         _(American)
{ cropped}
********************************* ENTRY     3. *********************************
----- EVENT INFORMATION -----
EVENT CLASS                             ERROR EVENT
OS EVENT TYPE                  302.     PANIC
SEQUENCE NUMBER                  2.
OPERATING SYSTEM                        DEC OSF/1
OCCURRED/LOGGED ON                      Mon Oct 14 11:58:24 1996
OCCURRED ON SYSTEM                      gore
SYSTEM ID                 x00020004     CPU TYPE:  DEC 3000
SYSTYPE                   x00000000
MESSAGE                                 panic (cpu 0): Machine check -
                                         _Hardware error
********************************* ENTRY     4. *********************************
----- EVENT INFORMATION -----
EVENT CLASS                             ERROR EVENT
OS EVENT TYPE                  100.     CPU EXCEPTION
SEQUENCE NUMBER                  1.
OPERATING SYSTEM                        DEC OSF/1
OCCURRED/LOGGED ON                      Mon Oct 14 11:58:24 1996
OCCURRED ON SYSTEM                      gore
SYSTEM ID                 x00020004     CPU TYPE:  DEC 3000
SYSTYPE                   x00000000
----- UNIT INFORMATION -----
UNIT CLASS                              CPU
----- LEP MACHINE CHECK STACK FRAME -----
PROCESSOR OFFSET          x00000110
SYSTEM OFFSET             x000001A0
PALTEMP1              x0000000000000000
PALTEMP2              x000C06F800000004
PALTEMP3              x0000000000000000
PALTEMP4              x000000000000000F
PALTEMP5              x0000000000000000
PALTEMP6              x000003FF80017E48
PALTEMP7              x0000000000104000
PALTEMP8              x0000000000000000
PALTEMP9              x0000000000000008
PALTEMP10             xFFFFFC00003C3050
PALTEMP11             x0000000000000000
PALTEMP12             xFFFFFC00003C33F0
PALTEMP13             xFFFFFC00003C3420
PALTEMP14             xFFFFFC00003C3480
PALTEMP15             xFFFFFC00003C31F0
PALTEMP16             xFFFFFC00003C2F00
PALTEMP17             x0000000000000000
PALTEMP18             x000000011FFFF520
PALTEMP19             xFFFFFFFF88B5FA58
PALTEMP20             xFFFFFC00004CDC10
PALTEMP21             x0000000000000000
PALTEMP22             x6068686C7C7C7C7C
PALTEMP23             x00000062000007F9
PALTEMP24             x0000000000000000
PALTEMP25             x0000000000010000
PALTEMP26             x0000000000000000
PALTEMP27             x0000000000000000
PALTEMP28             x000000000191A000
PALTEMP29             xFFFFFFFC00000000
PALTEMP30             x0000000000000001
PALTEMP31             x00000000048CBA58
EXC_ADDR              x0000000080017E86
                                        EXCEPTING OR EXECUTING INSTRUCTION DID NOT COMPLETE PC IS  xE0005FA1
EXC_SUM               x0000000000000000
EXC_MSK               x0000000000000000
ICCSR                 x0000000000000000
                                        PC0 INT ENABLED AFTER  2**16 EVENTS
                                        PC1 INT ENABLED AFTER  2**12 EVENTS
                                        PC0 COUNTER INPUT  TOTAL ISSUES DIVIDED
                                         _BY 2
                                        PC1 COUNTER INPUT  DCACHE MISSES
                                        FP INSTRUCTIONS CAUSE FEN EXCEPTIONS
                                        ADDRESS SPACE NUMBER = x0
PAL_BASE              x0000000000060000
                                        BASE ADDRESS FOR PALCODE = x18
HIER                  x00000000000018F0
                                        CORRECTABLE READ ERROR INTERRUPT
                                         _ENABLED
                                        CPU HARDWARE INTERRUPT ENABLED ON PIN
                                         _3
                                        CPU HARDWARE INTERRUPT ENABLED ON PIN
                                         _4
                                        CPU HARDWARE INTERRUPT ENABLED ON PIN
                                         _5
                                        PC1 INTERRUPT  DISABLED
                                        PC0 INTERRUPT  DISABLED
                                        CPU HARDWARE INTERRUPT ENABLED ON PIN
                                         _1
                                        CPU HARDWARE INTERRUPT ENABLED ON PIN
                                         _2
HIRR                  x0000000000000000
MM_CSR                x0000000000003640
                                        INTEGER REGISTER USED IS R 4.
DC_STAT               x0000000000000007
                                        DC_HIT LAST LOAD OR STORE MISSED
                                         _DCACHE
                                        OPCODE RA FIELD - INTEGER REGISTER IS R 0.
DC_ADDR               x00000000FFFFFFFF     SEO SECOND ERROR OCCURRED
ABOX_CTL              x000000000000042E
                                        FUNCTIONS ENABLED - MCHECK ENABLED FOR
                                         _UNCORRECTABLE ERRORS
                                        FUNCTIONS ENABLED - CRD CORRECTED READ
                                         _DATA INTERRUPT ENABLED
                                        FUNCTIONS ENABLED - SINGLE ENTRY ICACHE
                                         _STREAM BUFFER ENABLED
                                        FUNCTIONS ENABLED - DCACHE ENABLED
BIU_STAT              x0000000000000140
                                        BIU_CMD CYCLE CLASS IS  READ_BLOCK
                                        FILL_ECC PRI. CACHE FILL FROM EXT.
                                         _CACHE HAD ECC ERROR
BIU_ADDR              x0000000002983520
                                        PHYSICAL ADDRESS OF CACHE BLOCK WITH ERROR IS x14C1A9
BIU_CTL               x0000000020007447
                                        EXTERNAL CACHE ENABLED
                                        EXTERNAL CACHE ECC ENABLED
                                        EXTERNAL CACHE READ/WRITE SPEED IN CPU CYCLES IS
                                         _3
                                        EXTERNAL CACHE WRITE ENABLE TIMING BIT FIELD IS x1
FILL_SYNDROME         x0000000000000900     SINGLE BIT ERROR IS NO ERRORS
FILL_ADDR             x0000000002983520
                                        PHYSICAL ADDRESS OF QUADWORD WITH ERROR x14C1A9
VA                    x00000000001011F0     D-STREAM FAULT OR DTB MISS - VIRTUAL ADDRESS IS x1011F0
BC_TAG                x0000000000002995     EXTERNAL CACHE TAG CONTROL BITS
                                         _EXTERNAL CACHE HIT
                                        D BIT - CACHE BLOCK DIRTY
                                        V BIT - CACHE BLOCK VALID
                                        TAG ADDRESS IS x14C
----- KN15AA CPU SPECIFIC STACK FRAME -----
INT_EXC_IDENT         x0000000000000088
                                        INTERRUPT OR EXCEPTION IS NONE
MCR_STAT              x0000000011808080     BANK 0 32 MBYTES
                                        BANK 1 32 MBYTES
                                        BANK 2 32 MBYTES
                                        BANK 4 32 MBYTES
IOSLOT                x0000000000100000
                                        TURBOCHANEL OPTION SLOT 1 PARITY
                                         _DISABLED
                                        TURBOCHANEL OPTION SLOT 2 PARITY
                                         _DISABLED
                                        TURBOCHANEL OPTION SLOT 4 PARITY
                                         _DISABLED
                                        TURBOCHANEL OPTION SLOT 5 PARITY
                                         _DISABLED
                                        TURBOCHANEL OPTION SLOT 6 PARITY
                                         _DISABLED
                                        TC OPTION SCSI ADAPTER PARITY DISABLED
                                        TC OPTION CORE I/O PARITY DISABLED
                                        TC OPTION CXTURBO PARITY DISABLED
TC_CONFIG             x0000000000000016     MAGIC # FOR DMA CONTROL IS x16
                                        PAGE SIZE IS 8KBYTES
IR                    x000000000007FE00
                                        SECOND ERROR OCCURED
                                        DMA BUFFER ERROR - UNDER/OVER FLOW
                                        CROSSED 2K BOUNDARY ON DMA
                                        TC RESET IN PROGRESS
                                        TC PARITY ERROR
                                        TAG ERROR DURING DMA
                                        SINGLE BIT ERROR ON I/O WRITE OR DMA
                                         _READ
                                        DOUBLE BIT ERROR ON I/O WRITE OR DMA
                                         _READ
                                        TC TIMEOUT ON I/O REQUEST
--------------
{another system startup}
--------------
********************************* ENTRY     6. *********************************
----- EVENT INFORMATION -----
EVENT CLASS                             ERROR EVENT
OS EVENT TYPE                  302.     PANIC
SEQUENCE NUMBER                  5.
OPERATING SYSTEM                        DEC OSF/1
OCCURRED/LOGGED ON                      Sun Oct 13 21:17:08 1996
OCCURRED ON SYSTEM                      gore
SYSTEM ID                 x00020004     CPU TYPE:  DEC 3000
SYSTYPE                   x00000000
MESSAGE                                 panic (cpu 0): Machine check -
                                         _Hardware error
Received on Wed Oct 16 1996 - 21:02:09 NZDT