Managers,
I received a hard lock of my Alpha server this morning ( ES40/HSZ70 (RA7000)/ 5.1B PK4) Keyboard/consol not responding/no ping/halt button had no effect. hit the restart button on the ES40 and on reboot (firmware check) received mem errors on the LCD then the system stopped booting before I ever got my consol. I then preformed a cold boot of the system with the halt button in to get the SRM. at the SRM I preformed test mem/etc and received no errors. I then continued to boot the system (rc3) with success. I have no errors in any of my OS logs/alert logs/ no core files. The only errors I found were in by binary error log (Below). the errors talk about scsi cam lun0 target1 witch is on my HSZ70. From the RA7000, show shows that the state is good/no errors on this lun/target (R5). Correct me if I'm wrong but scsi cam errors wouldn't cause a system lock. I would think I would at least get a kernel panic out of the deal.
Any thoughts/leads on my issue would be greatly appreciated.
Thanks,
David Knight
UERF:
----- EVENT INFORMATION -----
EVENT CLASS ERROR EVENT
OS EVENT TYPE 199. CAM SCSI
SEQUENCE NUMBER 2263.
OPERATING SYSTEM DEC OSF/1
OCCURRED/LOGGED ON Sun Aug 24 04:44:36 2003
OCCURRED ON SYSTEM alpha0
SYSTEM ID x000D0022
SYSTYPE x00000000
PROCESSOR COUNT 2.
PROCESSOR WHO LOGGED x00000000
----- UNIT INFORMATION -----
CLASS x0037
SUBSYSTEM x0037
BUS # x0000
x0008 LUN x0
TARGET x1
_____________________________________________________
======================= Binary Error Log event =======================
EVM event name: sys.unix.binlog.hw.scsi
Binary error log events are posted through the binlogd daemon, and
stored in the binary error log file, /var/adm/binary.errlog. This
event is used to report all SCSI device errors, including disk,
tape, HSZ raid events and adapter errors.
Action: Use Compaq Analyze or DECevent to read and analyze the
system error log to determine if a SCSI device may need to be
replaced.
======================================================================
Formatted Message:
SCSI event
Event Data Items:
Event Name : sys.unix.binlog.hw.scsi
Priority : 700
PID : 466
PPID : 1
Event Id : 1660
Timestamp : 25-Aug-2003 06:03:04
Host IP address : 10.34.80.2
Host Name : alpha0
User Name : root
Format : SCSI event
Reference : cat:evmexp.cat:300
Variable Items:
subid_class (INT32) = 199
subid_num (INT32) = 0
subid_unit_num (INT32) = 8
subid_type (INT32) = 34
binlog_event (OPAQUE) = [OPAQUE VALUE: 1224 bytes]
============================ Translation =============================
Sequence number of error: -129694387
Time of error entry: 25-Aug-2003 06:03:04
Host name: alpha0
SCSI CAM ERROR PACKET
SCSI device class: DEC SIM
Bus Number: 0
Target number: 1
Lun Number: 0
Name of routine that logged the event: ss_perform_timeout
Event information: timeout on disconnected request
############### Entry End ###############
Event information: Active CCB at time of error
############### Entry End ###############
======================================================================
Received on Mon Aug 25 2003 - 15:18:58 NZST