Thank you to all who replied. Most of the responses I could not use
because I have a read error on the active plex of vol-rz0g while mirror-
copying that plex to the good plex. Since this error was on an unused
block, then step 7 is ok.
The solution that worked is as follows:
0) "Have good backup of /usr"
1) init s
2) vold
3) voldctl mode
4) umount /usr (if not already unmounted)
5) mount -r /dev/rz4c /usr (DU4.0E OS CD)
6) volplex dis vol-rz0g-01
7) volplex -o rerr cp vol-rz0g vol-rz0g-01 (ignore SCSI/CAM error messages)
8) volume stop vol-rz0g
9) volplex att vol-rz0g vol-rz0g-01
10) /usr/isl/pisl/volmend fix clean vol-rz0g-01
11) volplex dis vol-rz0g-02
12) volume start vol-rz0g
13) umount /usr (DU4.0E OS CD)
14) mount /dev/vol/rootdg/vol-rz0g /usr
15) "Verify vol-rz0g contents are intact and correct"
16) shutdown -r now
16) "Call Digital/Compaq and get replacement for rz1 physical disk"
17) "Follow steps in sections 14.7.1.2 and 5.3.1 of LSM manual to re-init and
re-mirror disks"
Proposed solutions and my original message are appended below:
-------------------------------------------------------------------------
From: Carlos_CHUA_at_asia.paribas.com
How about trying "# volrecover -sb rz0g" ?
-------------------------------------------------------------------------
Date: Wed, 23 Jun 1999 13:42:35 +0800
From: Girish Phadke <pgirish_at_maxisnet.com.my>
Hello
You may like to verify whether there are really any errors.
When you have detached the plex 'voldisk list' will also show the drive as removed .
since it is not shown as removed .
you can give command volrecover
volrecover -sb -g rootdg vol-rz0g # this will put the process in background after
# process is finished the plex will become RW
if the same does not work
you can try to remove the disk & readd
voldg -k rmdisk rz0g=rz0g # -k option will preserve the name & info
voldg -k addsik rz0g=rz0g
voldisk online rz0g
then volrecover
Incidently your mirrors look be on the same controllers rz0 to rz1???.
-------------------------------------------------------------------------
From: "Leonard, Roger" <rleonard_at_cvty.com>
Well lets see...
I am referring to Veritas, rather than LSM, but it should correlate.
Veritas uses vx commands versus vol commands. they should cross reference.
first:
Pull up the GUI and look at the tutil fields and putil fields.
if any have errors, enter
volmend clear tutilX plexname or volname
and restart the vols
if this doesn't work, try
voldedit set failing=off diskname
do a voldiskadm & readd it. Answer yes when it asks you to reattach it.
restart the volume
i have had this fix problems numerous times when I would have an HSZ go out
and the disks would go into an errored state. The voldiskadm thing seems a
bit scary, but as long as you reattach it and not initialize it you should
be ok. good luck.
-------------------------------------------------------------------------
>Date: Tue, 22 Jun 1999 23:32:35 -0500
>To: tru64-unix-managers_at_ornl.gov
>From: "Douglas C. Stephens" <stephens_at_ameslab.gov>
>Subject: Help with detached-stale LSM plex & read error on active plex
>Cc: taylora_at_ameslab.gov
>
>Dear Tru64-Unix-Managers:
>
>We have an AlphaServer 800 with DU4.0E bl1 loaded with two RZ1BB-CS disks,
>rz0 and rz1, configured as mirrored root/swap/usr volumes according the
>"volencap"/"volrootmir -a" procedure outlined in chap. 5 as well as sections
>C.15 and C.17 of the LSM manual on the v4.0E Doc CD. A third disk, an
>RZ2DA-LA, is configured with its rz2c partition to be AdvFS and holds /var.
>
>In order to test disaster recovery booting, we powered down and removed the
>default boot disk rz0, then powered up and booted sucessfully to its mirror
>on rz1. After we powered down again, re-inserted the rz0 disk and rebooted
>to rz1 again, LSM relocated, re-attached, sync'ed, and activated the -01
>plexes for root-vol and swap-vol on the rz0 disk. It did not do the same
>for the vol-rz0g mirrored volume containing /usr, although the volume did
>come online and was mountable as /usr by way of the it vol-rz0g-02 plex on
>the rz1 disk.
>
>Doing a "volprint -ht" revealed that plex vol-rz0g-01 was detached and
>stale. When we tried to execute a "volplex att vol-rz0g vol-rz0g-01" to
>resolve the situation, the operation failed with a read error occuring on
>the vol-rz0g-02 plex, which now contains the only current copy of /usr.
>This is error is repeatable and happens at exactly the same block number
>each time. I've included the output from one of these attempts here:
>
># volplex att vol-rz0g vol-rz0g-01
>fsgen/volplex: Volume vol-rz0g, plex vol-rz0g-02, block 1955317: Plex read:
> Error: Read failure
>fsgen/volplex: I/O error on volume vol-rz0g, plex vol-rz0g-01 not attached
>#
>
>So what we have is rootvol and swapvol using both their -01 and -02 plexes
>and /usr on vol-rz0g using only its -02 plex, which just happens to be the
>plex with a read error on it. Further, I cannot get volplex to sync with
>any other associated plex of vol-rz0g due to the read error during sync
>read.
>
>Can someone more familiar with LSM please suggest a course of action which
>would allow us to re-activate the -01 plex of the vol-rz0g containing /usr
>so that we can take the rz1 disk out and have it replaced without taking
>the system out of action for a tape restore. A brief soujourn to "init s"
>or "init 2", or even just umounting /usr to disable vol-rz0g and work with
>it would be ok. Perhaps unmirroring and unencapsulating the /usr?
>I don't think I'm completely screwed yet, but I'm not certain.
>
>For reference, I've included output from /etc/fstab, "volprint -ht", and
>"voldisk list" below.
>
>I will summarize any responses.
>Thanks in advance.
>
>
>
># cat /etc/fstab
>/dev/vol/rootdg/rootvol / ufs rw 1 1
>/proc /proc procfs rw 0 0
>/dev/vol/rootdg/vol-rz0g /usr ufs rw 1 2
>var_domain#var /var advfs rw 0 0
>/dev/vol/rootdg/swapvol swap1 ufs sw 0 2
>
># volprint -ht
>DG NAME GROUP-ID
>DM NAME DEVICE TYPE PRIVLEN PUBLEN PUBPATH
>V NAME USETYPE KSTATE STATE LENGTH READPOL PREFPLEX
>PL NAME VOLUME KSTATE STATE LENGTH LAYOUT ST-WIDTH
MODE
>SD NAME PLEX PLOFFS DISKOFFS LENGTH DISK-NAME DEVICE
>
>dg rootdg 929742104.1025.alchemy.ameslab.gov
>
>dm rz0a rz0a nopriv 0 262816 /dev/rrz0a
>dm rz0b rz0b nopriv 0 1048864 /dev/rrz0b
>dm rz0d rz0d simple 1024 0 /dev/rrz0d
>dm rz0g rz0g nopriv 0 2797776 /dev/rrz0g
>dm rz1a rz1a nopriv 0 262816 /dev/rrz1a
>dm rz1b rz1b nopriv 0 1048864 /dev/rrz1b
>dm rz1d rz1d simple 1024 0 /dev/rrz1d
>dm rz1g rz1g nopriv 0 2797776 /dev/rrz1g
>
>v rootvol root ENABLED ACTIVE 262816 ROUND -
>pl rootvol-01 rootvol ENABLED ACTIVE 262816 CONCAT - RW
>sd rz0a-01p rootvol-01 0 0 16 rz0a rz0a
>sd rz0a-01 rootvol-01 16 16 262800 rz0a rz0a
>pl rootvol-02 rootvol ENABLED ACTIVE 262816 CONCAT - RW
>sd rz1a-01p rootvol-02 0 0 16 rz1a rz1a
>sd rz1a-01 rootvol-02 16 16 262800 rz1a rz1a
>
>v swapvol swap ENABLED ACTIVE 1048864 ROUND -
>pl swapvol-01 swapvol ENABLED ACTIVE 1048864 CONCAT - RW
>sd rz0b-01 swapvol-01 0 0 1048864 rz0b rz0b
>pl swapvol-02 swapvol ENABLED ACTIVE 1048864 CONCAT - RW
>sd rz1b-01 swapvol-02 0 0 1048864 rz1b rz1b
>
>v vol-rz0g fsgen ENABLED ACTIVE 2797776 SELECT -
>pl vol-rz0g-01 vol-rz0g DETACHED STALE 2797776 CONCAT - WO
>sd rz0g-01 vol-rz0g-01 0 0 2797776 rz0g rz0g
>pl vol-rz0g-02 vol-rz0g ENABLED ACTIVE 2797776 CONCAT - RW
>sd rz1g-01 vol-rz0g-02 0 0 2797776 rz1g rz1g
>
># voldisk list
>DEVICE TYPE DISK GROUP STATUS
>rz0a nopriv rz0a rootdg online
>rz0b nopriv rz0b rootdg online
>rz0d simple rz0d rootdg online
>rz0g nopriv rz0g rootdg online
>rz1a nopriv rz1a rootdg online
>rz1b nopriv rz1b rootdg online
>rz1d simple rz1d rootdg online
>rz1g nopriv rz1g rootdg online
>
--
Douglas C. Stephens | Network/DNS/Unix/WinNT/VMS Administrator
System Support Specialist | Postmaster / Webmaster
Information Systems | Phone: (515) 294-6102
Ames Laboratory, US DOE | Email: stephens_at_ameslab.gov
Received on Wed Jun 23 1999 - 19:14:49 NZST