Summary: Shattered lsm mirrors

From: <Noboom2_at_aol.com>
Date: Tue, 02 Nov 1999 17:37:25 -0500 (EST)

I received a few replies from some helpful folks who wanted to
know if there were hardware problems either in the binary error
log or the kernel syslog file. Nope, both show up clean as far as
I can see.

The fortunate thing is that the problems haven't reoccured for
about a week, so I'm relaxing about that a little bit. If anyone
has any additional info please let me know, but for now I'm
considering the question closed.

Thanks all.
Sue


Original message follows:

Hello all.

I manage a fairly heavily-loaded TruCluster Production Server that uses LSM
to mirror
several volumes, particularly our mail spool volume, which accounts for a
large amount
of system use. During peak usage, for the second time, four of the six
volumes that
are ase disk services have 'failed-out' of the disk sets. We have 2 pairs of
HSZ70
controllers. The failures were 3 from one controller, 1 from the other, in
both
incidents.

An inspection of the physical disks that were marked as 'failed-out' showed
no
problems, and the HSZ70 controllers reported all disks 'normal'. If I check
the
'volprint' for this group, I see the disk is listed as 'disabled'.

# volprint -htA (edited for this disk group)
dg maildg 956910516.1940.myhost.here.com

dm rz19c rz19c simple 1024 177737480 /dev/rrz19c
dm rz27c - - - - -

v mailvol fsgen ENABLED ACTIVE 177737480 SELECT -
pl mailvol-01 mailvol ENABLED ACTIVE 177737480 CONCAT - RW
sd rz19c-01 mailvol-01 0 0 177737480 rz19c rz19c
pl pl-01 mailvol DISABLED NODEVICE 177737480 CONCAT - RW
sd rz27c-01 pl-01 0 0 177737480 rz27c -

I can restore this volume to the disk group with the 'voldg -g maildg -k
adddisk rz27c', and
the run a volume restore on it, but my REALY question is why does this
happen. And what
can be done to prevent it from occuring again? Is it just the load on the
system that
prevents the controllers from keeping up with the mirror? Here are the log
messages:

    Oct 26 12:44:34 myhost vmunix: io/vol.c(volerror): Uncorrectable read
error
        on volume mailvol, plex pl-01, block 83859168
    Oct 26 12:44:36 myhost vmunix: io/vol.c(volerror): Uncorrectable read
error
        on volume mailvol, plex pl-01, block 30672
    Oct 26 12:44:36 myhost vmunix: io/vol.c(volerror): Uncorrectable read
error
        on volume mailvol, plex pl-01, block 46819936
    Oct 26 12:44:36 tempest vmunix: io/vol.c(volerror): Uncorrectable read
error
         on volume mailvol, plex pl-01, block 94045072
    Oct 26 12:44:36 tempest vmunix: io/vol.c(volerror): Uncorrectable read
error
        on volume homevol, plex pl-01, block 958528
    Oct 26 12:44:36 tempest vmunix: io/vol.c(volerror): Uncorrectable read
error
        on volume mailvol, plex pl-01, block 94044928
    Oct 26 12:44:36 tempest vmunix: voliod_error: plex detach - volume
mailvol, plex pl-01

Thanks in advance for your assistance.

Sue
Received on Tue Nov 02 1999 - 22:39:16 NZDT

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:40 NZDT