Summary:Disk error

From: Cai Xuejun <caixj_at_hptc1.ihep.ac.cn>
Date: Sat, 17 Oct 1998 12:02:16 +0800

Thank you all who helped me with my question.

But I haven't found the reason and soloution until now. The only thing I
could do is to make a copy of the error disk. :-(

The following is these kind men's informative messages.

>From Wesley Darlington:
W Darlington wrote:
>
> Hi Cai,
>
> My instinct would be to see if I can get it to stay up long enough to back
> up what's on the disk and then get it replaced under warranty. It's often a
> good idea to keep such a disk *really* cold while doing such a last backup.
>
> It's possible the disk isn't dead and that it is just recalibrating (whatever
> that is!) or something, but probably unlikely.
>

>From alan_at_nabeth.cxo.dec.com:
> Two kinds of errors are common from SCSI devices; protocol
> errors and device errors. Entries for device errors will
> nearly always have SCSI Request Sense data included in the
> entry somewhere and dia(8) knows how to translate all of
> the standard SCSI error codes. Just look for the Sense Key,
> Additional Sense Code (ASC) and Additional Sense Code Qualifier
> (ASCQ). Short having having decent vendor documentation on
> the SCSI errors codes, that translation of the ASC and ASCQ
> is all the explaination you'll get.
>
> Protocol errors usually include sequences of timeouts and
> bus resets. These usually indicate cabling or termination
> problems on the SCSI bus.

>Based on the messages, it appears that the disk was
>spun down when the errors occured and that the driver
>expected this could be resolved by sending the appropriate
>command. Error #24 is from the SCSI adapter, indicating
>that either the host or the device wasn't following the
>SCSI protocol; a Bus Free coming at an unexpected time.
>The last message suggests that the driver for the adapter
>aborted the outstanding command. The sequence probably
>repeats at each I/O:
>
> The disk reports that it isn't ready and needs
>some command to initialize it (Start Unit).
>The disk frees the bus at an unexpected time.
>The driver aborts the command.
>
>You can send an explicit Start Unit command using scu(8).
>If that fails, the disk is probably history. If it works,
>make a copy of the data as quickly as you can and starting
>working to replace the disk.

>From tpb_at_doctor.zk3.dec.com :
> Came through as plain text. Much easier to read :^) I can't tell for sure,
> but it looks like the drive drops into a "not ready" state and the system is
> trying to recover. It could be a drive defect, or it could be that there is
> a need for a drive specific entry in the system's DDR database to change the
> timeouts or other parameters for dealing with this specific drive. Good
> luck on figuring out how to get it to work reliably, unless you get really
> lucky and someone else has the same drives and managed to them them to work
> by adjusting the software configuration.

> It sounds to me like you have a defective disk. It made strange noises, you
> powered it off and back on again, now you're getting weird errors on your
> SCSI bus that look like some device is reporting bad addresses. If it were
> my system, I'd replace that disk.

>From digiunix_at_bellatlantic.net:
> did you do:
>
> # scu
> scu> scan edt
> scu>sho edt
>
> or
>
> # uerf -R -r 300 |more

>From rdbowma_at_tsi.clemson.edu :
> Hi Cai -
> How were you backing up the data? It might be that the one disk
> in question has bad sectors that are causing the errors during
> backup. I ran into that problem a while back while using NSR.
> Dump would also fail. The thing to do would be to run the
> scu utility, and at the prompt(help provides lots of info - more than
> the man page) use verify media and verify the entire disk. there
> are ways to reassign bad blocks if that is indeed the problem.
> You can search on my name in the archives and come up with the
> more detailed information. to start scu I believe you have to
> specify a disk,and that is covered in the man page.
>
> Lastly, you did not include the log files for the latest errors.
>
> I hope this helps some.
> Ron Bowman
> Techno-Sciences, Inc.
> rdbowma_at_tsi.clemson.edu
> 864-646-4028
>
> Alpha EB 21164, 333MHz, 1 CPU
> DU 4.0B (564) Patch #6 installed
> Searchable Archive URLs:
> http://www-archive.ornl.gov:8000/ (simple search)
>http://www-archive.ornl.gov:8000/archive/power.htm (more detailed)

-- 
**************************************
*** mailto:caixj_at_hptc1.ihep.ac.cn    *
*** http://alpha01.ihep.ac.cn/~caixj *
**************************************
Received on Sat Oct 17 1998 - 04:07:14 NZDT

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:38 NZDT