RAID error messages

From: paulo <paulo_at_dexel.co.za>
Date: Tue, 14 Jan 1997 17:10:29 +0200

Hello osf-managers,

I have a customer with a 7 member RAID 5 system (4GB x 7). Alpha 2000
with 256MB of Memory.

I have installed the RAID Event Notification Daemon and Monitor
(swxcrmon) and it Mails root and logs the following error message. It
has happened twice in one day, once early in the morning (5:00am) when
no one is working!?

This is what the error looks like:
-------------
SWXCR XCR0 Event Notification from node xxxx.xxxx.co.za

The hard disk at channel 0, target 0 had a hard error.

-------------
This resulted in their Oracle Database becoming corrupted. Fortunately
this only happened to a small table and they were able to drop the
offending table without any serious consequences!?

The above messages have been logged before but only "soft errors" have
been reported, approx. once a month for 3 months but on "target 2" with
no disruptions like what happened now.

My questions:
1. At what point do you consider these errors to be serious enough to
take action such as replacing a RAID disk.
2. How often should these soft errors be logged before you become
worried? Is there any formal Digital policy on this? Is one just meant
to use ones professional discretion and judgement?
3. Being RAID 5, could a bad block on a disk actually have caused data
corruption?
4. What is the best way of replacing a faulty RAID disk: via a software
RAID utility(any suggestions) or via the hardware level from the chevron
prompt(using the RCU diskettes or something similar)?

Any suggestions or advice would be welcome.
Regards
Paulo
Received on Tue Jan 14 1997 - 16:36:33 NZDT

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:47 NZDT