Dear managers!
The AdvFS error I described in my previous mail has occurred on a
different machine this morning. Again, I had to reboot the machine
(arrgh ...). The disk appeared as being mounted but was completely
inaccessible. Unmounting didn't help either, verify and disklabel both
complained about the device being non-existent.
The machine configuration is virtually identical (CPU, controllers, disk
shelf). I changed the OS version from 4.0B to 4.0F a few days ago.
Before, I have never seen such errors on this machine. What puzzles me
most is that only one particular type of disk is involved
(identification string QUANTUM FIREBALL SE8.4S PJ09 or PJ0A). Disks of
other vendors haven't been affected yet.
uerf output (-c err -o full) shows lots of entries, all roughly lokking
as follows:
----- CAM STRING -----
ROUTINE NAME ss_perform_timeout
----- CAM STRING -----
timeout on disconnected request
----- CAM STRING -----
Unknown frame - SIM_WS
----- CAM STRING -----
Active CCB at time of error
----- ENT_CCB_SCSIIO -----
All this makes me believe that I can exclude errors in cabling,
controllers or storageworks shelves. I don't even think age is an issue.
What am I missing? Could it be a misconfigured device database resulting
in timing problems (or should I say 'unconfigured' because I haven't
touched it at all yet)?
Any more help is greatly appreciated.
Regards
Peter
Original message follows:
I encountered the following AdvFS error on two disks on one of my
servers (AS 200 4/166, DU 4.0F, PK2):
Jan 2 22:03:39 server vmunix: AdvFS I/O error:
Jan 2 22:03:39 server vmunix: Volume: /dev/rz9g
Jan 2 22:03:39 server vmunix: Tag: 0xfffffffa.0000
Jan 2 22:03:39 server vmunix: Page: 67
Jan 2 22:03:39 server vmunix: Block: 12352
Jan 2 22:03:39 server vmunix: Block count: 16
Jan 2 22:03:40 server vmunix: Type of operation: Write
Jan 2 22:03:40 server vmunix: Error: 5
Jan 2 22:03:40 server vmunix:
Jan 2 22:03:40 server vmunix: bs_osf_complete: metadata write failed
Jan 2 22:03:40 server vmunix: AdvFS Domain Panic; Domain user17_dmn Id
0x35028ebe.0004b672
Jan 2 22:03:40 server vmunix: An AdvFS domain panic has occurred due to
either a metadata write error or an internal inconsistency. This domain
is being rendered inaccessible.
Jan 2 22:03:40 server vmunix: Please refer to guidelines in AdvFS Guide
to File System Administration regarding what steps to take to recover
this domain.
Jan 2 22:03:40 server vmunix: AdvFS I/O error:
Jan 2 22:03:40 server vmunix: Volume: /dev/rz9g
Jan 2 22:03:40 server vmunix: Tag: 0xfffffffa.0000
Jan 2 22:03:40 server vmunix: Page: 77
Jan 2 22:03:40 server vmunix: Block: 12512
Jan 2 22:03:40 server vmunix: Block count: 16
Jan 2 22:03:40 server vmunix: Type of operation: Write
Jan 2 22:03:40 server vmunix: Error: 5
... more
The only way to recover from this error is to reboot the machine because
the disks are no longer accessible (annoying ...). Verifying the disks
after the reboot reveals nothing unusual. Subsequently, the disks appear
to function normally.
I defragment my disks regularly at 10pm. I upgraded the OS to version
4.0F recently (Y2K, you know). I have never come across this kind of
error when I was running V4.0B. Any clues what could go wrong?
--
Dipl.-Phys. Rolf-Peter Kienzle
Dept. of Remote Sensing and Land Information Systems
University of Freiburg, D-79085 Freiburg, Germany
Tel/Fax +49 761 203 8643/8640 e-mail: kienzlep_at_uni-freiburg.de
Received on Tue Jan 04 2000 - 18:44:23 NZDT