Managers:
I have an RA7000 raid array with HSZ70 controller attached to a DS20.
Some time ago one of the controllers failed and has not yet been
replaced, therefore we have been running on a single controller. It has
been working alright, but last week and again today we experienced I/O
errors on one of the RAID sets which continually sent this error to the
messages log:
May 22 10:45:30 whistler vmunix: Deferring I/O (errno 5) for
block(0x31a440, 0x31a440) on device 19,162
May 22 10:45:30 whistler vmunix: Deferring I/O (errno 5) for
block(0x3065c0, 0x3065c0) on device 19,162
May 22 10:45:30 whistler vmunix: Deferring I/O (errno 5) for
block(0x595440, 0x595440) on device 19,162
The only way I found to clear the error was to reboot the server.
What I am wondering is, could this be a problem caused by an overrun of
the single controller? It only seems to occur when we have high I/O for
an extended period such as a database export. Any advice would be
helpful.
The attached message is sent to the root account.
Thank you,
Bryan Daniel
Systems Administrator
University College of the Fraser Valley
Abbotsford, BC Canada
Subject: EVM ALERT [700]: SCSI event
======================= Binary Error Log event =======================
EVM event name: sys.unix.binlog.hw.scsi
Binary error log events are posted through the binlogd daemon, and
stored in the binary error log file, /var/adm/binary.errlog. This
event is used to report all SCSI device errors, including disk,
tape, HSZ raid events, and adapter errors.
======================================================================
Formatted Message:
SCSI event
Event Data Items:
Event Name : sys.unix.binlog.hw.scsi
Priority : 700
PID : 326
PPID : 1
Event Id : 2054
Timestamp : 21-May-2003 13:43:33
Host IP address : 198.162.97.2
Host Name : whistler
User Name : root
Format : SCSI event
Reference : cat:evmexp.cat:300
Variable Items:
subid_class (INT32) = 199
subid_num (INT32) = 2
subid_unit_num (INT32) = 2047
subid_type (INT32) = 55
binlog_event (OPAQUE) = [OPAQUE VALUE: 360 bytes]
============================ Translation =============================
Sequence number of error: 2099250215 Time of error entry: 21-May-2003
13:43:33 Host name: whistler
SCSI CAM ERROR PACKET
Controller type: DISK
SCSI device class: UNKNOWN
Bus Number: 2
Target number: 7
Lun Number: 7
Name of routine that logged the event: isp_reinit
Event information: Beginning Adapter/Chip reinitialization (0x1)
======================================================================
Received on Thu May 22 2003 - 22:10:49 NZST