SUMMARY - CAM SCSI Error and strange behaviors from Tim Minton on 1996-07-02 (tru64-unix-managers)

From: Tim Minton <tminton_at_GCPL.LIB.OH.US>
Date: Mon, 01 Jul 1996 17:40:25 EDT

SUMMARY :

Sorry this has taken so long to get back.

> Hello OSF managers,
>
> I am running into a very odd happening on a box. Maybe you can help.
> following after this description is a sample error I am getting when I try and
> access this disk (my current system disk) ... we noticed the problem yesterday
> after i upgraded the machine to 96 MB ram. One of the developers tried to use
> vi on the machine and got the following message:
>
> Bus Error (core dump)
>
> My question is ... has anyone ran into this problem?
>
> Is it a drive going bad?
>
> Most all the rest of the stuff on the drive runs. I have had trouble with uerf
> also.
>
> I have checked all the cables to make sure they are seated properly and even
> switched cables going to this drive.
>
> My current system is a DEC 3000 model 400 at firmware 6.0 and running OSF/1 3.2
> (rev 214)
>
> Thanks in advance for any help.
>
> Tim Minton tminton_at_gcpl.lib.oh.us
> Network Administrator
> Greene County Public Library
> Xenia, Ohio
>

Well it has been determined that the drive is going bad .. and is being
replaced with a new 4 gig :).

Thanks to all who gave much help in figuring out the problem the most useful
information is contained below.

From: MX%"alan_at_nabeth.cxo.dec.com" 23-JUN-1996 21:25:33.98

An Unrecoverable medium error means that one or more sectors of the
disk have become so unreadable that whatever error correction
mechanisms the disks supports can't recover the data. More
simply, the disk has at least one bad block.

You could have lots of bad blocks indicating a fairly substantial
failure of the disk or just a few bad blocks. Since the error
shows up trying to read vi, it would probably be safe to guess
that part of the vi executable uses that block.

The correction is:

1. Use scu(8) to scan the disk for more bad blocks. Many SCSI
    disks support automatic replacement when the retry and
    error correction mechanisms management to recover the data;
    a replacement block is allocated in place of the bad one,
    and the recovered data is written to it. The disk keeps
    track of the remapping.

2. More often, the data isn't recovered and it is necessary to
    force replacement. The driver and disk will try to hard to
    read a good copy of the data, but this will probably fail.
    The new block won't have the right data, so you have to
    note the original logical block number, see what file uses
    that block (using icheck and ncheck for UFS) and restore it
    from backup. Scu can be used to force replacement of a bad
    block.

Other things to check were:

        SCSI Termination, Cables, Also checking the memory.
        All these were done. After checking the drive. I found errors.
        Also most all the SCSI errors were associated with this drive. /usr
        /tmp and / all reside on it .. (luckily i had / on a spare drive)

        Thanks for all your help.

===========================================================================
Tim Minton - Network Administrator Greene County Public Library
                                              Xenia, Ohio.
        TMINTON_at_gcpl.lib.oh.us

Anybody can make a mistake but to really foul things up requires a
computer!

Disclaimer - My views are my own. Not the library's.
===========================================================================
Received on Mon Jul 01 1996 - 23:54:58 NZST

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:46 NZDT