Thanks to everyone who responded to my plea for help. Original message below
and responses follow.
---------------------------------------------
"We are running DU 4.0E. The machine hung this morning, with the following
error in /usr/var/adm/messages. I saw similar errors when I searched the
archives but there appears to be no summary of a solution (perhaps there
isn't one). If anyone has resolved this problem in the past, I would be
grateful for your input.
Mar 28 07:35:26 ncalpha vmunix: AdvFS I/O error:
Mar 28 07:35:26 ncalpha vmunix: Volume: /dev/rz17a
Mar 28 07:35:26 ncalpha vmunix: Tag: 0xfffffffa.0000
Mar 28 07:35:26 ncalpha vmunix: Page: 3
Mar 28 07:35:26 ncalpha vmunix: Block: 8400
Mar 28 07:35:26 ncalpha vmunix: Block count: 16
Mar 28 07:35:26 ncalpha vmunix: Type of operation: Write
Mar 28 07:35:26 ncalpha vmunix: Error: 5
Mar 28 07:35:26 ncalpha vmunix: Tag: 0xfffffffa.0000
Mar 28 07:35:26 ncalpha vmunix: Page: 3
Mar 28 07:35:26 ncalpha vmunix: Block: 8400
Mar 28 07:35:26 ncalpha vmunix: Block count: 16
Mar 28 07:35:26 ncalpha vmunix: Type of operation: Write
Mar 28 07:35:26 ncalpha vmunix: Error: 5
Mar 28 07:35:26 ncalpha vmunix: ADVFS EXCEPTION
Mar 28 07:35:26 ncalpha vmunix: Module = msfs_config.c, Line = 505
Mar 28 07:35:26 ncalpha vmunix: bs_osf_complete: metadata write failed
Mar 28 07:35:26 ncalpha vmunix: device string for dump = SCSI 0 2009 0 1 100
0 0."
---------------------------------------------
>From Gareth Dempsey:
What Patch Kit are you on?
We experienced these type of errors for some time on 2 machines, an 8400 and
a ES40 both running 4.0e PK3.
We installed PK4 a week ago (which is supposed to have included a fix for
these errors) and so far no problems.
forgot to mention that it may be beneficial to run a verify on the domain
associated with /dev/rz17a.
e.g. un-mount the directory and run /sbin/advfs/verify -f 'fdmn name'
>From Bryan Lavelle:
I/O error 5's are a hardware problem. Something is wrong with your
underlying storage, rz17a.
>From Alex Gorbachev:
Run DECEvent (dia) to find out the exact reason for the error from the
binary errorlog. This could well be a h/w problem.
>From alan_at_nabeth.cxo.dec.com
AdvFS tried to write to the indicated block of the indicated
disk and it failed. Apparently it was metadata space as
shown by the bs_osf_complete part of the message. This
means that the file system is not inconsistent, so AdvFS
paniced the system. I/O failures are nearly always the
result of a problem on the I/O subsystem; a disk block
gone bad, a disk gone bad, etc. If possible, reboot the
system and arrange for that file system not be mounted at
boot. Check the error log for error on the device to see
what the problem is and take appropriate corrective action
(replace the block, replace the disk, etc).
Then see if you can mount the file system without the system
crashing again.
Now, if it happens that rz17a is the partition used for the
root domain, the write failure could have been the result
of a whole device or subsystem failure. Since the system
dump space would be on the same disk (generally), the hang
could be a sign that the whole device/subsystem had a
problem.
---------------------------------------------
Everything seems to be pointing to hardware. I have no resolution as yet but
have some great info to start. Thanks guys!
Jason Carter
Network Administrator
Newcrest Mining Ltd
Ph: +61 8 9270 7093
Mb: 041 993 8578
Fx: +61 8 9270 7150
Received on Mon Apr 03 2000 - 00:48:51 NZST