SUMMARY : advfs or disk

From: Mirat Satoglu <mirat_at_bornova.ege.edu.tr>
Date: Wed, 05 May 1999 16:17:55 +0400 (EET DST)

Here is my question again :

>
> Hello all,
>
> My system gives following messages continuously. Any idea if this is a
> problem of advfs or disk needs to be changed? Any idea ?
>
> DEC 3000 M500 , osf v. 3.2c rev 148.
>
> May 3 09:44:59 bornova vmunix: advfs I/O error: setId 0x31a45b2b.000a0d70.1.8001 tag 0x00000001.8001u page 6288
> May 3 09:44:59 bornova vmunix: vd 1 blk 5456720 blkCnt 16
> May 3 09:44:59 bornova vmunix: read error = 5
> May 3 09:45:41 bornova vmunix: advfs I/O error: setId 0x31a45b2b.000a0d70.1.8001 tag 0x00000001.8001u page 6288
> May 3 09:45:41 bornova vmunix: vd 1 blk 5456720 blkCnt 16
> May 3 09:45:41 bornova vmunix: read error = 5
> May 3 09:45:42 bornova vmunix: advfs I/O error: setId 0x31a45b2b.000a0d70.1.8001 tag 0x00000001.8001u page 6288
> May 3 09:45:42 bornova vmunix: vd 1 blk 5456720 blkCnt 16
> May 3 09:45:42 bornova vmunix: read error = 5
> May 3 09:45:56 bornova vmunix: advfs I/O error: setId 0x31a45b2b.000a0d70.1.8001 tag 0x00000001.8001u page 6288
> May 3 09:45:56 bornova vmunix: vd 1 blk 5456720 blkCnt 16
> May 3 09:45:56 bornova vmunix: read error = 5
> May 3 09:45:58 bornova vmunix: advfs I/O error: setId 0x31a45b2b.000a0d70.1.8001 tag 0x00000001.8001u page 6288
> May 3 09:45:58 bornova vmunix: vd 1 blk 5456720 blkCnt 16
> May 3 09:45:58 bornova vmunix: read error = 5
> May 3 09:45:59 bornova vmunix: advfs I/O error: setId 0x31a45b2b.000a0d70.1.8001 tag 0x00000001.8001u page 6288
> May 3 09:45:59 bornova vmunix: vd 1 blk 5456720 blkCnt 16
> May 3 09:45:59 bornova vmunix: read error = 5
>
>


Sorry for late summary. But I am still working on the problem.
Thanks to Olle Eriksson , Kjell Andresen , Umut Ceyhan , Dr. Tom Blinn and
Alan Rollow and Robin Kundert.

The answers are as following :


From: Olle Eriksson <olle_at_cb.uu.se>

You should look in the system error log to see what is logged

uerf -R -o full | more


From: Kjell Andresen <kjell_at_dod.no>

I've seen Read error = 5 a few times lately and 2 out of three times
the disk was the problem. I'm investigating #3 now and guess the
disk..

From: Umut Ceyhan <ceyhan_at_bornova.ege.edu.tr>

man scu....


From: "Dr. Tom Blinn, 603-884-0646" <tpb_at_doctor.zk3.dec.com>

You have a bad spot on the disk. It appears to be affecting one file, since
all of the messages reference the same tag in the same fileset. You might
want to go look at your binary error log with the uerf utility, since it will
provide a simpler way of figuring out which block on which physical disk is
causing the problem. If the file is small enough, you can make the errors go
away by moving the file elsewhere on the disk; just copy it to a new name,
then delete the old version, then name the new version to the old version's
name. Or if the file isn't needed, just delete it. In any case, you'll get
those errors repeatedly until you stop using the file or at least the bad spot
on the disk.


From: Alan Rollow - Dr. File System's Home for Wayward Inodes.
    <alan_at_nabeth.cxo.dec.com>

It looks like you have a block that needs to be changed. The getting
read I/O errors on the domain with the particular setID and file with
the indicated tag. There is an AdvFS utility to convert the tag to
a file name (tag2name I think). The logical block number is "blk"
and is the same one each time. I don't know if the AdvFS message
normalizes the block number for the partition offset if a domain
doesn't start at the beginning of a disk, so you'll have to double
check that your self.

For ordinary SCSI disks you can force a replacement of a bad block
with the scu(8) command "reassign lba". HSZ family controllers
will have replaced the block already, but continue to remember
that the data in the block is corrupt and fake an I/O error when
you try to read it. The only way to clear the condition is to
overwrite the block. In either case deleting the file and restoring
it from a backup should fix the error. I think SWXCR family
controllers require the use of the configuration utility to fix
bad block errors.



From: Robin Kundert <rkundert_at_spu.edu>

Given my experience, I would say that you have a disk that is going bad.
I'd recommend migrating EVERYTHING off of that disk NOW!!! Then replace
the failing disk and move things around as you wish. Hopefully a restore
from backup is NOT in your immediate future....

RESULT :

uerf says that it is hw problem.
I could not determine the file name in the wrong place, I am trying to
find it , and also working on scu's reassign lba plus backups of course.
Received on Wed May 05 1999 - 14:05:45 NZST

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:39 NZDT