funny SCSI problem

From: Lucio Chiappetti <lucio_at_ifctr.mi.cnr.it>
Date: Wed, 10 Dec 1997 09:33:39 +0100 (MET)

Last Friday morning one of our Alpha 255 was suddenly hung. One could ping it,
but not telnet into it. One could move the mouse and press session managers
buttons, but not create dxterms nor type into existing ones. Therefore it was
not possible to shutdown. I had to press the "reset" button.

After that the machine could not boot. A "show device" at PROM level showed
dva0 ewa0 pka0 but none of the three disks attached.

After some thinking I issued an "init" command at PROM level. After this the
disk were visible again. The first boot stopped (silently) after "jumping to
bootstrap code", but a further reset and boot was successful (this sequence
sometimes happens after a halt).

At this point I logged in and found two bursts of SCSI errors in "uerf".
One was dated to Thursday night, and one to Thursday afternoon. I called DEC
and e-mailed them the uerf log (apparently e-mail from Italy to DEC Italy via
DEC USA takes half a day).

After that we had another hangup on Friday early afternoon. It was preceded by
another burst of SCSI errors. The "init" trick seemed not to cure it (no disks
visible), but as soon as I touched (not stiffened, just touched) the SCSI
connector, the disks were visible again.

Since then the machine performed OK.
DEC just phoned (after a bank holiday period) and suggested to leave it
running for a while before they come. The puzzle is : is this a hardware
problem (with the disks or more likely with the SCSI controller) ? or is it an
environmental problem (static buildup on the cables) ?

We just moved to a new building with a new airflow climatization. The feeling
was that the place was quite hot (24 C) and dry. Actually we had a couple of
funny problems with other two machines after the removal. Incidentally I had
my colleague (in whose office the 255 is) switch the heating off on Friday.

Comments anyone ?

The sort of uerf errors are reported in brief below (a selection). SCSI
targets 0 and 1 are respectively the internal disk, and one of two RZ28 in an
external BA enclosure.

----------------------------------------------------------------------------
Lucio Chiappetti - IFCTR/CNR - via Bassini 15 - I-20133 Milano (Italy)
For more info : http://www.ifctr.mi.cnr.it/~lucio/personal.html
----------------------------------------------------------------------------

the thursday night burst

EVENT CLASS ERROR EVENT
OS EVENT TYPE 199. CAM SCSI
SEQUENCE NUMBER 105.
----- UNIT INFORMATION -----
CLASS x0022 DEC SIM
SUBSYSTEM x0000 DISK
BUS # x0000
                              x0000 LUN x0
                                        TARGET x0
----- CAM STRING -----
ROUTINE NAME ss_abort_done
----- CAM STRING -----
                                        SCSI abort has been performed

EVENT CLASS ERROR EVENT
OS EVENT TYPE 199. CAM SCSI
SEQUENCE NUMBER 104.
----- UNIT INFORMATION -----
CLASS x0022 DEC SIM
SUBSYSTEM x0000 DISK
BUS # x0000
                              x0008 LUN x0
                                        TARGET x1
----- CAM STRING -----
ROUTINE NAME ss_abort_done
----- CAM STRING -----
                                        SCSI abort tag has been performed
[... and a few more like]
----- CAM STRING -----
ROUTINE NAME ss_perform_timeout
----- CAM STRING -----
                                        timeout on disconnected request
----- UNSUPPORTED ENTRY -----
CAM ENTRY x0000040E SIM_WS


the thursday afternoon burst was similar.

while the friday afternoon one ended with

ROUTINE NAME psiop_hardintr
----- CAM STRING -----
                                        Bus reset detected
----- UNSUPPORTED ENTRY -----
CAM ENTRY x00000430

----- CAM STRING -----
ROUTINE NAME ss_perform_timeout
----- CAM STRING -----
                                        Reached max abort count, scheduled bus
                                         _reset
----- UNSUPPORTED ENTRY -----
CAM ENTRY x0000040E SIM_WS
Received on Wed Dec 10 1997 - 09:31:52 NZDT

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:37 NZDT