SUMMARY - Alpha hanging problem

From: <mclaughl_at_nssdc.gsfc.nasa.gov>
Date: Fri, 25 Aug 95 08:46:26 -0400

Thanks to the _quick_ responses from:

Anthony D'Atri <aad_at_nwnet.net>
Jon Buchanan <Jonathan.Buchanan_at_ska.com>
"Mark F. Rondinaro" <mfr_at_lns598.lns.cornell.edu>

My original post:

 About once every two months one of our Alphas
 hangs about every two to three minutes for about
 twenty seconds. The cpu activity is very low and
 no unusual processes are running. The only clue
 I have is the uerf output:

  ********************************* ENTRY 18. *********************************

 -snip-
--------- CAM STRING -----
ROUTINE NAME ss_perform_timeout
--------- CAM STRING -----
                                        timeout on disconnected request
--------- UNSUPPORTED ENTRY -----
CAM ENTRY x0000040E SIM_WS

  ********************************* ENTRY 19. *********************************



The problem was a tape drive on that bus. Mark Rondinaro
hit the nail on the head wi this response:


The relevant lines from your error log are:

>CLASS x0022 DEC SIM
>SUBSYSTEM x0000 DISK
>BUS # x0001
> x0040 LUN x0
> TARGET x0
>
>--------- CAM STRING -----
>
>ROUTINE NAME ss_perform_timeout
>
>--------- CAM STRING -----
>
> timeout on disconnected request

This indicates that disk 0 started a SCSI command and disconnected from
the bus to allow another device to transfer data while it was completing
the command. I think that the command is probably a read or write command,
as these take the most time to complete. Upon disconnect the system starts
a timer, which times out if the system doesn't reconnect to finish the
request within the timeout period. Since the timeout occurred, my guess
would be that either the device has an intermittent electronic failure, that
the SCSI controller is failing, or that a cable is bad or loose, so the
reconnect message from the device gets missed by the controller. It could also
be that the drive takes significantly longer to complete some particular
disconnetced requests and the timeout value is too low, but I have only seen
this with tape drives - never disk drives.
Received on Fri Aug 25 1995 - 15:21:19 NZST

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:45 NZDT