SUMMARY:Error in HSZ40

From: Bompreco <super3_at_svn.com.br>
Date: Thu, 09 Oct 1997 11:28:40 -0300

My original message:
 
> Hi,
>
> I have an HSZ40 cluster and i had a serious I/O problem with my
> database (Informix) in a defined chunk (rza8f partition). I found
this
> message in my /var/adm/syslog.dated :
>
> daemon.log:Sep 25 16:00:10 UXfinanceiro DECsafe: UXfinanceiro Agent
> ***ALERT: hard device error on /dev/rza8f from
> UXfinanceiro.supermar.com.br
>
> The message points to a hardware error in the volume rza8 that is a
> logical volume with 6 rz29b 4.3 GB disks, grouped in a raidset (RAID
> 5), so i can't know in which physical device the error ocurred. The
> cluster only wrotes the log message below, and didn't identify the
> device :
>
> Instance Code: 01010302
> Description: An unrecoverable hardware detected fault occurred.
> Reporting Component: 1.(01)
> Description:Executive Services
> Reporting component's event number: 1.(01)
> Event Threshold: 2.(02)
> Classification: HARD. Failure of a component that affects controller
> performance or precludes access to a device connected to the
> controller is indicated. Last Failure Code: 018800A0 (No Last Failure
> Parameters) Last Failure Code: 018800A0 Description:A processor
> interrupt was generated with an indication that the program card was
> removed.
>
> My immediate solution was don't use the partition rza8f in the
> database. But i'm loosing 2Gb (the size of rza8f) and i still can't
> identify the physical device with problem.
>
> Any ideas ?

The Solution:

the HSZ40 didnīt identify the error. I saw all the logs, the FMU (Fault,
Manager Utility ), UERF, and i spoke with DEC hardware and software
support but we only had a message of a hardware error in the logical
device and nobody could explain why the cluster didnīt assign (flashing
the error led)the dammage device.
The hszterm 'show disks' ,'show failedsets' and all the others commands
didnīt show any problem.
So the only way to find the phisycal device error was using the DECevent
software ( command : dia -t s:25-sep-1997:08:00:00
e:25-sep-1997:14:00:00 ):
.....
------- HSZ Data -------
Instance Code x0252000A The last block of data returned
contains a
                                     forced error. A forced error occurs
when a
                                     disk block is successfully
reassigned,
                                     but the data in that block is lost.
                                     Re-writing the disk block will
clear the
                                     forced error condition. The Device
Sense
                                     Data Information Bytes contain the
block
                                     number of the first block in error.

                                     Component ID = Value Added
Services.
                                     Event Number = x00000052
                                     Repair Action = x00000000
                                     NR Threshold = x0000000A
Template Type x51 Disk Transfer Error.
Template Flags x01 HCE = 1, Event occurred during
Host
                                             Command Execution.
Ctrl Serial # ZG62003670
Ctrl Software Revision V30Z
RAIDSET State x00 NORMAL. All members present and
                                     reconstructed, IF LUN is configured
as a
                                    
RAIDSET.
Error Count 1.
Retry Count 0.
Most Recent ASC x80
Most Recent ASCQ x00
Next Most Recent ASC x00
Next Most Recent ASCQ x00
Device Locator x000003 Port = 3.
                                     Target = 0.
                                     LUN = 0.
Command Opcode x28 Read (10 byte)
Original
CDB
---------------------------------------------------------------------------
So in hszterm i did : locate ptl 3 0 0 , and so i had my rz29b xxthe
DEC changed my RZ29B disk.
Received on Thu Oct 09 1997 - 16:13:50 NZDT

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:36 NZDT