Help with detached-stale LSM plex & read error on active plex

From: Douglas C. Stephens <stephens_at_ameslab.gov>
Date: Tue, 22 Jun 1999 23:32:35 -0500

Dear Tru64-Unix-Managers:

We have an AlphaServer 800 with DU4.0E bl1 loaded with two RZ1BB-CS disks,
rz0 and rz1, configured as mirrored root/swap/usr volumes according the
"volencap"/"volrootmir -a" procedure outlined in chap. 5 as well as sections
C.15 and C.17 of the LSM manual on the v4.0E Doc CD. A third disk, an
RZ2DA-LA, is configured with its rz2c partition to be AdvFS and holds /var.

In order to test disaster recovery booting, we powered down and removed the
default boot disk rz0, then powered up and booted sucessfully to its mirror
on rz1. After we powered down again, re-inserted the rz0 disk and rebooted
to rz1 again, LSM relocated, re-attached, sync'ed, and activated the -01
plexes for root-vol and swap-vol on the rz0 disk. It did not do the same
for the vol-rz0g mirrored volume containing /usr, although the volume did
come online and was mountable as /usr by way of the it vol-rz0g-02 plex on
the rz1 disk.

Doing a "volprint -ht" revealed that plex vol-rz0g-01 was detached and
stale. When we tried to execute a "volplex att vol-rz0g vol-rz0g-01" to
resolve the situation, the operation failed with a read error occuring on
the vol-rz0g-02 plex, which now contains the only current copy of /usr.
This is error is repeatable and happens at exactly the same block number
each time. I've included the output from one of these attempts here:

# volplex att vol-rz0g vol-rz0g-01
fsgen/volplex: Volume vol-rz0g, plex vol-rz0g-02, block 1955317: Plex read:
        Error: Read failure
fsgen/volplex: I/O error on volume vol-rz0g, plex vol-rz0g-01 not attached
#

So what we have is rootvol and swapvol using both their -01 and -02 plexes
and /usr on vol-rz0g using only its -02 plex, which just happens to be the
plex with a read error on it. Further, I cannot get volplex to sync with
any other associated plex of vol-rz0g due to the read error during sync
read.

Can someone more familiar with LSM please suggest a course of action which
would allow us to re-activate the -01 plex of the vol-rz0g containing /usr
so that we can take the rz1 disk out and have it replaced without taking
the system out of action for a tape restore. A brief soujourn to "init s"
or "init 2", or even just umounting /usr to disable vol-rz0g and work with
it would be ok. Perhaps unmirroring and unencapsulating the /usr?
I don't think I'm completely screwed yet, but I'm not certain.

For reference, I've included output from /etc/fstab, "volprint -ht", and
"voldisk list" below.

I will summarize any responses.
Thanks in advance.



# cat /etc/fstab
/dev/vol/rootdg/rootvol / ufs rw 1 1
/proc /proc procfs rw 0 0
/dev/vol/rootdg/vol-rz0g /usr ufs rw 1 2
var_domain#var /var advfs rw 0 0
/dev/vol/rootdg/swapvol swap1 ufs sw 0 2

# volprint -ht
DG NAME GROUP-ID
DM NAME DEVICE TYPE PRIVLEN PUBLEN PUBPATH
V NAME USETYPE KSTATE STATE LENGTH READPOL PREFPLEX
PL NAME VOLUME KSTATE STATE LENGTH LAYOUT ST-WIDTH MODE
SD NAME PLEX PLOFFS DISKOFFS LENGTH DISK-NAME DEVICE

dg rootdg 929742104.1025.alchemy.ameslab.gov

dm rz0a rz0a nopriv 0 262816 /dev/rrz0a
dm rz0b rz0b nopriv 0 1048864 /dev/rrz0b
dm rz0d rz0d simple 1024 0 /dev/rrz0d
dm rz0g rz0g nopriv 0 2797776 /dev/rrz0g
dm rz1a rz1a nopriv 0 262816 /dev/rrz1a
dm rz1b rz1b nopriv 0 1048864 /dev/rrz1b
dm rz1d rz1d simple 1024 0 /dev/rrz1d
dm rz1g rz1g nopriv 0 2797776 /dev/rrz1g

v rootvol root ENABLED ACTIVE 262816 ROUND -
pl rootvol-01 rootvol ENABLED ACTIVE 262816 CONCAT - RW
sd rz0a-01p rootvol-01 0 0 16 rz0a rz0a
sd rz0a-01 rootvol-01 16 16 262800 rz0a rz0a
pl rootvol-02 rootvol ENABLED ACTIVE 262816 CONCAT - RW
sd rz1a-01p rootvol-02 0 0 16 rz1a rz1a
sd rz1a-01 rootvol-02 16 16 262800 rz1a rz1a

v swapvol swap ENABLED ACTIVE 1048864 ROUND -
pl swapvol-01 swapvol ENABLED ACTIVE 1048864 CONCAT - RW
sd rz0b-01 swapvol-01 0 0 1048864 rz0b rz0b
pl swapvol-02 swapvol ENABLED ACTIVE 1048864 CONCAT - RW
sd rz1b-01 swapvol-02 0 0 1048864 rz1b rz1b

v vol-rz0g fsgen ENABLED ACTIVE 2797776 SELECT -
pl vol-rz0g-01 vol-rz0g DETACHED STALE 2797776 CONCAT - WO
sd rz0g-01 vol-rz0g-01 0 0 2797776 rz0g rz0g
pl vol-rz0g-02 vol-rz0g ENABLED ACTIVE 2797776 CONCAT - RW
sd rz1g-01 vol-rz0g-02 0 0 2797776 rz1g rz1g

# voldisk list
DEVICE TYPE DISK GROUP STATUS
rz0a nopriv rz0a rootdg online
rz0b nopriv rz0b rootdg online
rz0d simple rz0d rootdg online
rz0g nopriv rz0g rootdg online
rz1a nopriv rz1a rootdg online
rz1b nopriv rz1b rootdg online
rz1d simple rz1d rootdg online
rz1g nopriv rz1g rootdg online

--
Douglas C. Stephens             | UNIX/VMS/WinNT/Network/DNS Admin
System Support Specialist       | Postmaster / Webmaster
Information Systems             | Phone: (515) 294-6102
Ames Laboratory, US DOE         | Email: stephens_at_ameslab.gov
Received on Wed Jun 23 1999 - 04:34:52 NZST

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:39 NZDT