Dear Managers,
        All my humblest apologies for my rather unclear summary.
A more complete version appears below.
 It WAS a memory problem.  I called
the guys at Digital support and they gave me the info I needed to
find out what module was the culprit.
        The UERF utility does not parse the error entries
properly and mistakes memory errors for cpu errors.  From the
information I received from the Digital guy I found out that the
machine was trying to report uncorrectable memory errors.
        Thanks to the following for pointing me in the correct
direction.
        Claude Soma soma_c_at_decus.fr
        Clifford Krieger ckrieger_at_psi.prc.com
        Isaac Oribioye I.O.Oribioye_at_herts.ac.uk
        Jim Skoog skoog_at_netcom.com
        Karl Marble kmarble_at_ultranet.com
        Kurt Carlson SXKAC_at_orca.alaska.edu
        Kurt Wild Kurt.Wild_at_ska.com
        Melvin Smith msmith_at_quix.robins.af.mil
        Nick Hill NMH1_at_axpr11.r1.ac.uk
No one was able to tell me where I may obtain documentation on
how i may go about analysing that information for myself, though.
My original question was:
Hello managers,
        I am writing again to enlist your aid in solving a very
perplexing problem.  I have an AlphaServer 2100 machine running
Digital Unix 3.0b with 512 MB of RAM and 8 RZ28 disk drives and
the machine is filling up its binary errorlog file
(/var/adm/binary.errlog) with entries of the form:
*********************** ENTRY   1. ************************
----- EVENT INFORMATION -----
EVENT CLASS                     ERROR EVENT
OS EVENT TYPE           100.    CPU EXCEPTION
SEQUENCE NUMBER           1.
OPERATING SYSTEM                DEC OSF/1
OCCURRED/LOGGED ON              Mon Jul 15 04:15:00 1996
OCCURRED ON SYSTEM              orpheus
SYSTEM ID           x00000009   CPU TYPE: DEC 2100
SYSTYPE             x00000000
-----  UNIT INFORMATION  -----
UNIT CLASS                      CPU
-----  LEP MACHINE CHECK STACK FRAME  -----
PROCESSOR OFFSET    x00000110
SYSTEM OFFSET       x000001A0
PALTEMP1        x0000000140002058
PALTEMP2        x000782F800000004
PALTEMP3        x0000000000000001
PALTEMP4        x0000000000000000
PALTEMP5        x0000000000000000
PALTEMP6        x0000000000000240
PALTEMP7        x0000000000004200
PALTEMP8        x0000000000000400
PALTEMP9        x0000000000000000
PALTEMP10       xFFFFFC000044D630
PALTEMP11       x0000000000000000
PALTEMP12       xFFFFFC000044D9C0
PALTEMP13       xFFFFFC000044D9F0
PALTEMP14       xFFFFFC000044DA50
PALTEMP15       xFFFFFC000044D7D0
PALTEMP16       xFFFFFC000044D4F0
PALTEMP17       x00000000000192D0
PALTEMP18       x000000011FFFFB70
PALTEMP19       xFFFFFFFFB1F1BA58
PALTEMP20       xFFFFFC00005A39B0
PALTEMP21       x0000000000000000
PALTEMP22       x40424272727E7E7E
PALTEMP23       xFFCFDFFBBFFFBEE5
PALTEMP24       x0000000000000000
PALTEMP25       x0000000000010000
PALTEMP26       x0000000000000000
PALTEMP27       x0000000000000000
PALTEMP28       x00000000178A8000
PALTEMP29       xFFFFFFFC00000000
PALTEMP30       x0000000000000001
PALTEMP31       x00000000041AFA58
EXC_ADDR        x00000000200049F0
                     EXCEPTING OR EXECUTING INSTRUCTION DID NOT
COMPLETE PC IS x4800127C
EXC_SUM         x0000000000000000
EXC_MSK         x0000000000000000
ICCSR           x0000000000000004
                                PC0 INT ENABLED AFTER 2**16
EVENTS
                                PC1 INT ENABLED AFTER 2**8 EVENTS
                                PC0 COUNTER INPUT TOTAL ISSUES
DIVIDED BY 2
                                PC1 COUNTER INPUT DCACHE MISSES
                                FP INSTRUCTIONS CAUSE FEN
EXCEPTIONS
                                ADDRESS SPACE NUMBER = x0
PAL_BASE        x0000000000014000
                                BASE ADDRESS FOR PALCODE = x5
HIER            x0000000000001CF0
                                CORRRECTABLE READ ERROR INTERRUPT
ENABLED
                                CPU HARDWARE INTERRUPT ENABLED ON
PIN 3
                                CPU HARDWARE INTERRUPT ENABLED ON
PIN 4
                                CPU HARDWARE INTERRUPT ENABLED ON
PIN 5
                                PC1 INTERRUPT DISABLED
                                PC0 INTERRUPT DISABLED
                                CPU HARDWARE INTERRUPT ENABLED ON
PIN 0
                                CPU HARDWARE INTERRUPT ENABLED ON
PIN 1
                                CPU HARDWARE INTERRUPT ENABLED ON
PIN 2
HIRR            x0000000000000000
MM_CSR          x0000000000003640
                                INTEGER REGISTER USED IS R4.
DC_STAT         x0000000000000007
                                DC_HIT LAST LOAD OR STORE MISSED
_DCACHE
                                OPCODE RA FIELD - INTEGER
REGISTER IS R0.
DC_ADDR         x00000000FFFFFFFF SE O SECOND ERROR OCCURRED
ABOX_CTL        x000000000000142E
                                FUNCTIONS ENABLED - MCHECK
ENABLED FOR UNCORRECTABLE ERRORS
                                FUNCTIONS ENABLED - CRD CORRECTED
READ _DATA INTERRUPT ENABLED
                                FUNCTIONS ENABLED - SINGLE ENTRY
ICACHE _STREAM BUFFER ENABLED
                                FUNCTIONS ENABLED - DCACHE
ENABLED
BIU_STAT        x0000000000000240
                                BIU_CMD CYCLE CLASS IS READ_BLOCK
BIU_ADDR        x00000000000192D0
                                PHYSICAL ADDRESS OF CACHE BLOCK
WITH ERROR IS xC96
BIU_CTL         x0000000030006477
                                EXTERNAL CACHE ENABLED
                                EXTERNAL CACHE ECC ENABLED
                                EXTERNAL CACHE READ/WRITE SPEED
IN CPU CYCLES IS _3
                                EXTERNAL CACHE WRITE ENABLE
TIMING BIT FIELD IS x4001
FILL_SYNDROME   x0000000000000000 SINGLE BIT ERROR IS NO ERRORS
                                SINGLE BIT ERROR IS NO ERRORS
FILL_ADDRESS    x0000000000006100
                                PHYSICAL ADDRESS OF QUADWORD WITH
ERROR x308
VA              x0000000000006170 D-STREAM FAULT OR DTB MISS -
VIRTUAL ADDRESS IS x6170
BC_TAG          x0000000024961248
                                S BIT - CACHE BLOCK SHARED TAG
ADDRESS IS xB092
-----  DIGITAL 2100 A500 CPU SPECIFIC FRAME -----
BCC_CSR0        x0000000000000220
                                FILL WRONG DUP TAG STORE PAR ENB
B-CACHE COND I/O UPDATES
BCCE_CSR1       x000001A000000110
BCCEA_CSR2      x000000010000008A
BCUE_CSR3       x0000000040002058
                                UNCORRECTABLE ERROR
                                EDC SYNDROME 0 x0
                                EDC SYNDROME 2 x20
                                EDC SYNDROME 1 x0
                                EDC SYNDROME 3 x0
BCUEA_CSR4      x0000000000000004 B-CACHE MAP OFFSET x4
                                TAG VALUE x0
                                B-CACHE MAP OFFSET H x182F8
                                PREDICTED TAG PARITY H
                                TAG PARITY H
                                TAG VALUE H x0
DTER_CSR5       x0000000000000001 MISSED ERROR OCCURRED
                                DUP TAG STORE OFFSET x0
                                DUP TAG x0
                                DUP TAG STORE OFFSET H x0
                                DUP TAG H x0
CBCTL_CSR6      x0000000000000000
                                C/A WRONG PARITY x0
                                COMMANDER ID x0
                                ARB CONTROL MASK x0
                                C/A WRONG PARITY H x0
                                COMMANDER ID H x0
                                ARB CONTROL MASK H x0
CBE_CSR7        x0000000000000000
                                MISS COUNT x0
                                MISS COUNT H x0
CBEAL_CSR8      x0000000000000240
                                ADDRESS x90
                                ADDRESS H x0
CBEAH_CSR9      x0000000000004200
PMBX_CSR10      x0000000000000400
IPIR_CSR11      x0000000000000000
SIC_CSR13       xFFFFFC000044D630
ADLK_CSR13      x0000000000000000
MADRL_CSR14     xFFFFFC000044D9C0
CRREV4          xFFFFFC000044D9F0
----- DIGITAL 2100 A500P T2 SPECIFIC FRAME -----
IOCSR           x0000000000000000
CERR1           xE3800010E3800010
CERR2           x0020004320200043
CERR3           x0000000000000000
PERR1           x000000064061A3C0
HAE0_1          x0000000000000000
HAE0_2          x00000000400807FF
WBASE1          x000000003FF00000
WMASK1          x0000000000000000
TBASE1          x00000000000C00FF
WBASE2          x000000000FF00000
WMASK2          x0000000000460000
TBASE2          x0000000000000000
TLBBR           x0000002400000000
IVR             x0000000000000000
HAE0_3          x0000000000000003
HAE0_4          x0000000000000000
TDR0            x0000002400000000
TDR1            x0000000000000000
TDR2            x0000000000000003
TDR3            x0000000000000000
TDR4            x0000000000000000
TDR5            x0000000000000000
TDR6            x0000000000000000
TDR7            x0000005800000008
----- DIGITAL 2100 A500 MEMORY SPECIFIC FRAME -----
MODULE NUMBER   x0000000000000000
MERR            xE2000008E2000008
MCMD1           x0020004320200043
MCMD2           x800150A0800150A0
MCONF           x0EC4055F0EC90669
MEDC1           x000000170000000D
MEDC2           x2000000200000000
MEDCC           x0000080000000800
MREF            x0000000000000000
FILTER          x0000005800000008
----- DIGITAL 2100 A 500 MEMORY SPECIFIC FRAME -----
MODULE NUMBER   x0000000000000000
MERR            xE2400008E2400008
MCMD1           x0020004320200043
MCMD2           x8201505182015051
MCONF           x01CB06F10CB60A7B
MEDC1           x000000170000000D
MEDC2           x2000000020000000
MEDCC           x0000080000000800
MSCTL           x000001D8000001D8
MREF            x0000000000000000
FILTER          x0000005800000008
----- DIGITAL 2100 A500 MEMORY SPECIFIC FRAME -----
MODULE NUMBER   x0000000000000000
MERR            xE2400008E2800008
MCMD1           x0020004320200043
MCMD2           x8401505284015052
MCONF           x01CB06F10CB61A7B
MEDC1           x000000170000000D
MEDC2           x2000000020000000
MEDCC           x0000080000000800
MSCTL           x000001D8000001D8
MREF            x0000000000000000
FILTER          x0000005800000008
----- DIGITAL 2100 A500 MEMORY SPECIFIC FRAME -----
MODULE NUMBER   x0000000000000001
MERR            xE2C00008065B7280
MCMD1           x00200043002000DF
MCMD2           x8601505386015053
MCONF           x0C1405FB0C140F4E
MEDC1           x00000017000004DF
MEDC2           x2000000020000000
MEDCC           x0000080000000800
MSCTL           x000001D8000001D8
MREF            x0000000000000000
FILTER          x0000000000000000
I have the full error report from the system corresponding to the
output from the command uerf -R -o full -c err .  I did a check
to see how many entries like this there were in the file and
found 3085 such entries all recorded in a space of 5 minutes.
The binary errorlog file has grown to over 400 MB in size!!!
I have replaced the CPU module and the problem still seems to
occur as of 15-Jul-1996 09:00:00.  Any ideas?
        Also, can anyone suggest to me where I may obtain
documentation to allow me to analyse these error entries for
myself?
-- 
Yours sincerely,
Robert Honore
robert_at_digi-data.com
Phone: 623 6658 Fax: 623 0978
Snail Mail: Digi Data systems limited, 96 Wrightson Road,
Trinidad, W. I.
> If one didn't have to WORK for a living, WORK would be MUCH MORE FUN!
Received on Fri Jul 26 1996 - 15:24:59 NZST