SUMMARY: What's going on here?

From: c.j.bol <bol_at_Axp1.IenD.wau.nl>
Date: Fri, 07 Jan 2000 11:31:26 +0100

I received the following messages from Mr. Thomas P. Blinn from Compaq (thanks
alot)

1)
What does AdvFS' verify utility say when you run it on the filesets in
the domain? You've got hardware problems that are corrupting your AdvFS
domain, most probably. By the way, there is AdvFS file system admin
documentation available on the Tru64 UNIX documentation web pages; if
you haven't read it, you should.
---
2)
Some problems show up in AdvFS because of hardware errors.  AdvFS looks at
both hardware faults and software (data) consistency.  In this case, it sure
looks like some of the data written onto disk /dev/rz30c was wrong -- since
AdvFS is looking at data structures on that disk and finding things that are
clearly bogus, like a record size of -4 when it should be 20.  Why this has
happened isn't clear, but hardware faults could make it happen.
The fact that hardware faults were reported in the past could have been the
events that lead to what looks like metadata corruption.  Without having the
full picture of what's in the system (physically), the maintenance history,
whether all the components are qualified for use with Tru64 UNIX, and so on,
it's impossible to know for sure.  But I do know that I've seen disks that
appeared to work fine with, say, UFS file systems that don't work reliably
with AdvFS.  I know it seems weird, but it's true.
Sorry I can't be more specifically helpful, and I do hope your local Compaq
services people can get some solid answers for you.
-----
A verify indeed gave problems for one disk, so now I'm quite sure that one disk
is responsible for this problems and will return it to the dealer for
checking/replacement.
I wonder what happens with Raid-5 systems when some disk in it gives this kind
of problems. I mean problems that don't show up in the errorlog.
Will this kind of problems be detected by the RAID-controller or can it corrupt
a large RAID-5 set.
Kees Bol
--------------------------------------------------------------------------
Original Question:
I have 4 identical Advfs-partition that consists each of 3 disks behind a
Raid-array 230 controller (Raid 0) and one 9 Gig SEAGATE ST19171W (differential)
Since the upgrade of V4.0 to 4.0E (+patches) I have restored one partition 2
times now and I see problems again for that partition today.
The first time (last november) there were some badblocks that damaged the
partition and gave messages like:
   'filename not found' when doing a 'ls -l'
 and 'Bad file number' during a Networker backup
The second time (23-december) there was a fatal Advfs-panic
Now I see again messages like 'filnename not found' and 'Bad file number' in the
same partition again.
Nothing appears in the binary-errorlog or the /var/adm/messages-file.
Any idea what's going on here?
Kees Bol
Received on Fri Jan 07 2000 - 10:32:34 NZDT

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:40 NZDT