This is one of the stranges problems I've ever encountered. I have an
AlphaServer 4000 with 2 pairs of dual-redundant HSZ70's. Each pair of
controllers has five (5) RAID5 raid sets, bringing my total number of
logical devices to ten. Beginning sometime around Monday night the
integrity of one of the raid sets (it happens to be a mount point we call
/stripe9) came into question based on the database we're running.
To make a long process of troubleshooting short, I can issue a UNIX copy
(cp) command from any of my logical devices to any other and it works fine.
(With one exception!) No errors are reported in the cp command and if I
compare the source and destination files using cmp there are no differences.
However if I copy from any logical device to /stripe9 the copy completes
successfully, BUT a cmp almost always shows that the files are NOT
identical! No errors are being reported by my controllers. I have only seen
one error show up in DECevent and admittedly it does seem to be pointing to
a problem with /stripe9, although the error message seemed to indicating it
was a self-correcting problem.
I have already logged a call to Digital F/S and they're preparing to do a
dial-in. Has anyone seen ANYTHING like this before? Any suggestions as to
what may have happened? (Note: I am using ufs on all my devices ...)
cknorr_at_hops.com
305-827-8600 ext. 238 (voice)
305-827-0999 (fax)
Received on Wed Sep 30 1998 - 15:41:50 NZST