[Q] Bcache error

From: Susan Rodriguez <SUSROD_at_HBSI.COM>
Date: Thu, 18 Jun 1998 15:50:09 -0700

I am having some trouble with my AS2100A. I have rebuilt this server,
upgrading the 2 processors from EV4/275's to EV5/375's, and adding a
KZPSC-XE swxcr RAID controller card. I had DEC install the hardware and
upgrade me to v5.1 firmware.

Shortly after upgrade I began seeing b-cache errors logged by cpu1.
Assuming my cpu was bad, I had DEC replace it. Several days later now,
cpu0 is now logging these errors!

I've considered all sorts of bizarre possibilites: and Oracle 7.3.4
bug, something wrong with the firmware upgrade, reseating the cpu
module. Oddly, I was unable to upgrade to DecEvent2.8. I received an
error during bit-to-text translation (didn't capture it), so I rolled
back to 2.6. I wonder if this is somehow related.

Before I have DEC replace the other CPU, I want to know if anyone has
seen this issue before. I found notes in the archives for b-cache
errors, but in that instance it was causing kernel panics and my errors
are not. As far as I can tell, the system is ignoring the errors.
Someone even suggested to me that this is an error caused by code
compiled for 32-bit operating systems that is "out-of-sync" with my
64-bit architecture. That sounded a little off to me, but I am willing
to consider anything at this point.


My output from DecEvent:

******************************** ENTRY 1
********************************


Logging OS 2. Digital UNIX
System Architecture 2. Alpha
Event sequence number 39.
Timestamp of occurrence 18-JUN-1998 15:28:33
Host name robin

System type register x00000018 AlphaServer 2000A or 2100A
Number of CPUs (mpnum) x00000002
CPU logging event (mperr) x00000000

Event validity 1. O/S claims event is valid
Event severity 1. Severe Priority
Entry type 100. CPU Machine Check Errors

CPU Minor class 3. Bcache error (630 entry)

Entry Body Size: x00000078
Entry body:

          15--<-12 11--<-08 07--<-04 03--<-00 :Byte Order
 0000: 80000000 00000060 00000060 00000023 *#...`...`.......*
 0010: 00000000 00000086 00000038 00000018 *....8...........*
 0020: 00000000 000000F4 FFFFFF00 1E654AAF *.Je.............*
 0030: 00000001 00000000 FFFFFFF0 81FFFFFF *................*
 0040: 00000000 00000000 480013F2 48001002 *...H...H........*
 0050: 00000000 00000000 000000E1 00000061 *a...............*
 0060: 00000000 00000000 B800000A B800000A *................*
 0070: 5E3C7E25 00000000 * ....%~<^*

************************************************************************
*****************
/var/adm/messages says:

Jun 18 15:14:06 robin vmunix: Machine Check error corrected by processor
Jun 18 15:19:18 robin vmunix: Machine Check error corrected by processor
Jun 18 15:28:33 robin vmunix: Machine Check error corrected by processor
Jun 18 15:28:33 robin vmunix: Machine Check error corrected by processor


TIA

susrod_at_hbsi.com
Received on Fri Jun 19 1998 - 00:46:42 NZST

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:37 NZDT