disklabel panics the system

From: Amr Galal Fahim <agahmed_at_ns2.emirates.net.ae>
Date: Wed, 25 Oct 1995 13:08:38 +0400

Hi OSF-managers
        Yesterday I was at a custoemr's site for some failover procedure
test. As there was a hardware error reported on one of the disks the
hardware engineer was there to replace the failing disk (RZ 26 model). Too
many bad blocks were reported on the disk.

I had to take care of reinitializing the new disk and reconfiguring it into
the LSM mirrored volume and the AdvFS domain. I took the plex that contains
the failing disk, i.e. rz27 in the following configuration, out of the
mirrored volume ( vol01 ). Then I removed the failing disk out of the
disassociated plex. The hardware engineer replaced the disk ( the only
precaution I have here is that he replaced the RZ26 with RZ26L. Could it
make a difference ? I know there is a performance difference due to the
banding mechanism of the RZ26L, but the disk geometry looks the same to me
in the disktab ). However, I understand that as long as I take care of the
mirrored plex sizes for LSM everything should be fine !!

Anyway the hardware engineer installed the disk into the BA350 expansion
box, the I attempted to initialze the disk. Believe it or not ! a simple
"disklabel -r" issued just to read the existing disklabel label on the new
disk caused the mighty OSF/1 version 3.2a Rev. 17 to crash with a :
        panic : pte valid

Don't think so far. I have got the lesson working in this unbelievably
stable ....( I mean IT of course ). So I was quite conservative before
labeling the disk or even reading the disk label. I made sure that the LSM
volume was stopped and everything is quite on the system. I even told the
hardware engineer to make hot swapping of the disks as COLD as possible !!
I even shtudown the system to single user mode and done the same just to get
the same result. The new disk was replaced with another new disk ( RZ26L
also ). Then the entire SCSI controller was replaced just to get the same
result.

As I will call CSC today. I thought maybe you could help me faster !

********************************************************************************
About the environment :
-----------------------

2 x 3000/500s servers running OSF/1 3.2a Rev. 17 - will be upgraded to 3.2c
                                                        very soon.
2 x ANCOT SCSI switches linking the 2 servers in a - You don't want me to
tell
high data availabilty based on DECwatchdog/Autopilot. you about Watchdog
....!!
                                                       If you do just ASK !!!
The following output should tell you more :
-------------------------------------------
# df -k
Filesystem 1024-blocks Used Avail Capacity Mounted on
/dev/rz3a 63231 46686 10221 82% /
/dev/rz3g 674852 472560 134806 78% /usr
pof1_dmn#pof1app 2050344 1741592 244584 88% /appl

# showfdmn pof1_dmn
     Id Date Created LogPgs Domain Name
2ec5fae9.000eeff0 Sun Nov 13 15:39:21 1994 512 pof1_dmn

  Vol 512-Blks Free % Used Cmode Rblks Wblks Vol Name
   1L 4100694 489376 88% on 128 128 /dev/vol/vol01

# volprint -t
DG NAME GROUP-ID
DM NAME DEVICE TYPE PRIVLEN PUBLEN PUBPATH
V NAME USETYPE KSTATE STATE LENGTH READPOL PREFPLEX
PL NAME VOLUME KSTATE STATE LENGTH LAYOUT ST-WIDTH MODE
SD NAME PLEX PLOFFS DISKOFFS LENGTH DISK-NAME DEVICE

dg rootdg 777031503.1025.pof1
dm rz17 - - - - -
dm rz19 - - - - -
dm rz25 rz25 sliced 512 2050347 /dev/rrz25g
dm rz27 rz27 sliced 512 2050347 /dev/rrz27g
dm rz4 rz4 sliced 512 2050347 /dev/rrz4g
dm rz5 rz5 sliced 512 2050347 /dev/rrz5g
sd rz17-01 pl-03 0 0 2050347 rz17 -
sd rz19-01 pl-03 2050347 0 2050347 rz19 -
sd rz25-01 pl-02 0 0 2050347 rz25 rz25
sd rz27-01 pl-02 2050347 0 2050347 rz27 rz27
sd rz4-01 pl-01 0 0 2050347 rz4 rz4
sd rz5-01 pl-01 2050347 0 2050347 rz5 rz5
pl pl-01 vol01 ENABLED ACTIVE 4100694 STRIPE 57 RW
pl pl-02 vol01 ENABLED ACTIVE 4100694 STRIPE 57 RW
pl pl-03 vol02 DISABLED NODEVICE 4100694 STRIPE 57 RW
v vol01 fsgen ENABLED ACTIVE 4100694 SELECT -
v vol02 fsgen DISABLED ACTIVE 4100694 SELECT -

********************************************************************************

Do you have any clues ?!

Thanks & Regards

       /-------------------------------------------------------------\
      / Amr Galal Fahim Ahmed agahmed_at_emirates.net.ae \
     / Senior Software Engineer \
     \ Computer Network Systems /
      \ DEC distributor in U.A.E. /
       \-------------------------------------------------------------/
Received on Wed Oct 25 1995 - 10:55:31 NZDT

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:46 NZDT