Dear managers!
I'm administering a couple of Alphas (5 Workstations model 3000
and 2 Servers 2100). It seems that I have memory errors on one workstation
and one server. I need your help to decide to order or not a new
memory chips, and which one should be replaced. I will try to explain in
details, what information I have, so please excuse my long letter (and my
ignorance in hardware)
On one workstation I have follow messages in /var/adm/messages
--------------------------------------------------------------------------
Dec 16 17:15:01 ignosc3 vmunix: Memory error corrected by processor
Dec 16 17:15:01 ignosc3 vmunix: biu_stat = 0000000000001b40
Dec 16 17:15:01 ignosc3 vmunix: biu_addr = 00000001d4000018
Dec 16 17:15:01 ignosc3 vmunix: dc_stat = 0000000000000003
Dec 16 17:15:01 ignosc3 vmunix: fill_syndrome = 0000000000000015
Dec 16 17:15:01 ignosc3 vmunix: fill_addr = 0000000002b39740
Dec 16 17:15:01 ignosc3 vmunix: bc_tag = 0000000000402c12
Dec 16 17:15:01 ignosc3 vmunix: ident = 0
--------------------------------------------------------------------------
This occures about once a two months. What does it means - bad
memory chip, or something else? I have run test mem on boot monitor a
number of times, but did not find any memory problem. Should I worry about
this? Should I replace the chip (and how could I found which one?)
On one server during boot test I have a message
--------------------------------------------------------------------------
Testing Memory bank 0
***Error - Memory Board 2 ***
Failing address: 005c0820
Bank Number: 0
ASIC ID: 0
Error Type: 0
Error Syndrome: 000006c7
Configuring Memory Modules
....
Memory Testing and Configuration Status
Module Size Base Addr Intlv Mode Intlv Unit Status
------ ----- --------- ---------- ---------- ------
2 128MB 00000000 2-Way 0 Passed
3 128MB 00000000 2-Way 1 Passed
Total Bad Pages 1
--------------------------------------------------------------------------
And this error occures not every time during reboot, about half
times (We do not reboot server often, but I have played a little). After
getting to the boot monitor the show error command gives me the folowing
--------------------------------------------------------------------------
MEM2 Module EEROM Event Log
Test Directed Errors
No Entries Found
Symptom Directed Errors
Entry Fail Address Bits/Syndrome Bank # ASIC # Source Event
Type
00 005c0220 06c7 0 0 1 00
01 005c0820 06c7 0 0 1 00
02 005c0620 06c7 0 0 1 00
--------------------------------------------------------------------------
and the command show fru gives me the following
--------------------------------------------------------------------------
Rev Events logged
Slot Option Part# Hw Sw Serial# SDD TDD
0 IO B2110-AA K3 0 AY52803390 00 00
1 CPU2 B2040-AB B1 37 AY63504963 00 00
2 CPU0 B2040-AB B1 37 AY60916498 00 00
3 CPU1 B2040-AB B1 37 AY62702853 00 00
5 CPU3 B2040-AB B1 0 AY62121147 00 00
6 MEM2 B2021-CA B1 0 AY45013015 03 00
7 MEM3 B2022-DA B1 0 AY53407756 00 00
Slot Option Hose 0, Bus 0, PCI on Standard I/O
6 DECchip 21040-AA PCI Option Slot 0
8 DEC PCI FDDI PCI Option Slot 2
Slot Option Hose 0, Bus 1, EISA on Standard I/O
2 CPQ3111
--------------------------------------------------------------------------
So, I think I should replace memory chip on the server in slot 6,
and probably replace memory in the workstation. Am I correct? How could I
know which chip should I replace in the workstation? Thank you in advance!
---
Vladas Lapinskas, mailto:lapinskas_at_mail.iae.lt
Received on Thu Apr 01 1999 - 12:21:09 NZST