Summarized issue:
ES45, Tru64 5.1A (no patches)
External Hardware RAID: Western Scientific F4 Tornado RAID IDE-SCSI
3TB partitioned & presented to Tru64 as 2TB and 1.3TB Luns. Each
incorporated as single domain with single fileset each.
Successful usage, as is, until ~40% and ~56% capacity fill whereupon
begin AdvFS I/O errors followed in short order by domain panics and
withdrawal of domain from service.
fixfdmn showed the following:
fixfdmn -n d12
fixfdmn: Checking the RBMT.
fixfdmn: Can't read page at block -660733904 on '/dev/disk/dsk12c'.
fixfdmn: Invalid argument
fixfdmn: Error correcting the RBMT.
Was this OS or hardware related?
Additional evidence later from examination of disklabel I applied:
# size offset fstype fsize bsize cpg # ~Cyl
values
a: 131072 0 unused 0 0 # 0 -
7
b: 262144 131072 unused 0 0 # 8 -
23
c: -1651834880 0 AdvFS # 0 -
161323
d: 0 0 unused 0 0 # 0 -
0
e: 0 0 unused 0 0 # 0 -
0
f: 0 0 unused 0 0 # 0 -
0
g: 1321369600 393216 unused 0 0 # 24 -
80673
h: 1321369600 1321762816 unused 0 0 # 80674 -
161323
-------------------------
Answer: Problem is twofold, and was not hardware related.
1. Patch Kit 3, at minimum, required - Advfs fixes
(I have installed Patch Kit 6 for 5.1A)
2. Disklabel applied to the luns was wrong, as hinted
by the negative-integer partition sizes in the label.
I had applied a default disklabel by doing
disklabel -rw dsk12
This is wrong! I should have used the following syntax
which forces disklabel to query the disk, in this case
the hardware RAID controller, for disk info:
disklabel -rwt advfs dsk12 junk
where 'junk' is anything not found in /etc/disktab
Many thanks to:
John Farmer
Bob Harris
Robert Collins
Alan Rollow
--
Neil R. Smith, Comp. Sys. Mngr. neils_at_tamu.edu
Dept. Atmospheric Sci., Texas A&M Univ. 979/845-6272 FAX:979/862-4466
Received on Fri Apr 09 2004 - 20:29:06 NZST