I am having some trouble with my AS2100A.  I have rebuilt this server,
upgrading the 2 processors from EV4/275's to EV5/375's, and adding a
KZPSC-XE swxcr RAID controller card.  I had DEC install the hardware and
upgrade me to v5.1 firmware.
Shortly after upgrade I began seeing b-cache errors logged by cpu1.
Assuming my cpu was bad, I had DEC replace it.  Several days later now,
cpu0 is now logging these errors!
I've considered all sorts of bizarre possibilites:  and Oracle 7.3.4
bug, something wrong with the firmware upgrade, reseating the cpu
module.  Oddly, I was unable to upgrade to DecEvent2.8.  I received an
error during bit-to-text translation (didn't capture it), so I rolled
back to 2.6.  I wonder if this is somehow related.
Before I have DEC replace the other CPU, I want to know if anyone has
seen this issue before.  I found notes in the archives for b-cache
errors, but in that instance it was causing kernel panics and my errors
are not.  As far as I can tell, the system is ignoring the errors.
Someone even suggested to me that this is an error caused by code
compiled for 32-bit operating systems that is "out-of-sync" with my
64-bit architecture.  That sounded a little off to me, but I am willing
to consider anything at this point.
My output from DecEvent:
******************************** ENTRY    1
********************************
Logging OS                        2. Digital UNIX
System Architecture               2. Alpha
Event sequence number            39.
Timestamp of occurrence              18-JUN-1998 15:28:33
Host name                            robin
System type register      x00000018  AlphaServer 2000A or 2100A
Number of CPUs (mpnum)    x00000002
CPU logging event (mperr) x00000000
Event validity                    1. O/S claims event is valid
Event severity                    1. Severe Priority
Entry type                      100. CPU Machine Check Errors
CPU Minor class                   3. Bcache error (630 entry)
Entry Body Size:          x00000078
Entry body:
          15--<-12  11--<-08  07--<-04  03--<-00   :Byte Order
 0000:    80000000  00000060  00000060  00000023   *#...`...`.......*
 0010:    00000000  00000086  00000038  00000018   *....8...........*
 0020:    00000000  000000F4  FFFFFF00  1E654AAF   *.Je.............*
 0030:    00000001  00000000  FFFFFFF0  81FFFFFF   *................*
 0040:    00000000  00000000  480013F2  48001002   *...H...H........*
 0050:    00000000  00000000  000000E1  00000061   *a...............*
 0060:    00000000  00000000  B800000A  B800000A   *................*
 0070:                        5E3C7E25  00000000   *        ....%~<^*
************************************************************************
*****************
/var/adm/messages says:
Jun 18 15:14:06 robin vmunix: Machine Check error corrected by processor
Jun 18 15:19:18 robin vmunix: Machine Check error corrected by processor
Jun 18 15:28:33 robin vmunix: Machine Check error corrected by processor
Jun 18 15:28:33 robin vmunix: Machine Check error corrected by processor
TIA
susrod_at_hbsi.com
Received on Fri Jun 19 1998 - 00:46:42 NZST