Summary: I/O errors from Geert Jan Bex on 1997-02-21 (tru64-unix-managers)

From: Geert Jan Bex <gjb_at_luc.ac.be>
Date: Thu, 20 Feb 1997 19:10:20 +0100

Let me start by thanking all of you for your quick and accurate responses,
this list is great. ("Rainer Landes"
<rlandes_at_fphws01.physik.uni-karlsruhe.de>, Martin J Lamb
<M.Lamb_at_Queens-Belfast.AC.UK>, Andres Henckens <henckens_at_luc.ac.be>,
alan_at_nabeth.cxo.dec.com (Alan Rollow), Stephane.Branchoux_at_univ-perp.fr
(stephane BRANCHOUX), sxkac_at_java.sois.alaska.edu (Kurt Carlson), Vincent
Willems <vinnie_at_skynet.be>)

I created a new fileset in another file domain to hold the contents of the
/usr. Fortunately, only the binary error log (which was partly unreadable
by uerf) and de NSR database were damaged. The former caused the error
messages on boot and the latter the first crash related to this problem.
Using scu showed only three additional bad blocks so reformating should
give me a useable filesystem. I've haven't done this so far since my users
are not exactly queueing to have their data stored on this disk partition
;-)
Everything is up and running again without I/O errors, so I hope everything
was due to those bad blocks only on which I'll keep an eye to see whether
further damage occurs.

The original post:

>Since some time I get I/O errors on a Digital Alpha/AXP 600/266 station.
>It started with a crash due to a read error on the rz0. Later these errors
>occured also on other SCSI devices such as a DAT and an external harddisk
>connected to the system. It's only now that I've been able to bring the
>system down completely due to a crash (this is a compute server, people
>are counting on it).
>
>Below are some entries from the log files related to the problem:
>Jan 16 10:21:33 salma vmunix: Hard Error Detected
>Jan 16 10:21:33 salma vmunix: DEC RZ28M
>Jan 16 10:21:33 salma vmunix: Active CCB at time of error
>Jan 16 10:21:33 salma vmunix: CCB request completed with an error
>Jan 16 10:21:33 salma vmunix: Error, exception, or abnormal condition
>Jan 16 10:21:33 salma vmunix: MEDIUM ERROR - Nonrecoverable medium error
>This was on rz0:
>Jan 16 10:21:31 salma vmunix: rz0 at scsi0 bus 0 target 0 lun 0 (DEC
>RZ28M
> (C) DEC 0466)
>
>This is the error generating todays crash:
>Feb 17 11:42:11 salma vmunix: advfs I/O error: setId
>0x31c83d14.000d5430.1.8001
> tag 0x00004022.8001u page 5
>Feb 17 11:42:11 salma vmunix: vd 1 blk 1889008 blkCnt 128
>Feb 17 11:42:11 salma vmunix: read error = 5
>Feb 17 11:42:12 salma vmunix: advfs I/O error: setId
>0x31c83d14.000d5430.1.8001
> tag 0x00004022.8001u page 13
>Feb 17 11:42:12 salma vmunix: vd 1 blk 1889136 blkCnt 128
>Feb 17 11:42:12 salma vmunix: read error = 5
>
>Below are console messages generated while doing a vdump on DAT (external):
>vdump: unable to read file
><./roger/spock/server/Physics:2F19:2F4:2F:2F96/ElasticPendulum/ElasticPendulum.m
>a>; [5] I/O error
>vdump: unable to write to device </dev/rmt0h>; [5] I/O error
>: do you want to retry? n
>
>On the partition the problems occured first ADVFS has been installed.
>
>I would be grateful if someone could point me in the right direction:
>hardware/software...

Many thanks and best regards,

                                            -gjb-
                                   http://www.luc.ac.be/~gjb/
                            PGP public key: finger gjb_at_alpha.luc.ac.be
Received on Thu Feb 20 1997 - 19:31:55 NZDT

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:47 NZDT