Thanks to -
Dr. Tom Blinn
Colin Waters
Alan
Irving Waldo
Responses
1.
What happened is exactly what is supposed to happen. The HSZ70 tells
the operating system some set of "WWID" values for the logical units
on the controller. If you replace the HSZ70 with a different unit,
you have to configure the new unit to export exactly the same WWID
values for the equivalent logical units, or else from the perspective
of the UNIX operating system, it's just as if the old unit was just
off-line and the new unit was totally unknown. In fact, it sure
sounds like as far as UNIX is concerned, the old unit is still active
(do you have any mounted file systems or open database files or
whatever on those units?).
I'm sure this is all documented somewhere, probably in the most up to
date HSZ70 hardware manuals, since it's not really a UNIX issue, it's
an HSZ configuration issue. There MIGHT be information on it in the
TruCluster documentation, I'm not sure.
And yes, what happened is exactly what is supposed to happen; if these
were direct attached disks instead of being behind a SCSI attached
RAID controller, UNIX would have seen that the WWIDs in the disks
themselves (if they are new enough to have WWIDs) haven't changed, and
it would've preserved the "dsk" names. That works with direct
attached disks even if you replace the SCSI controller with a
different model, or move it to a different I/O slot. But with the
RAID attached disks, the thing that UNIX sees is a software construct
provided by the RAID controller, and you have to tell the RAID
controller to export the save values for the logical unit WWIDs that
were exported by the old unit.
Tom Blinn
2.
Alay,
Use hwmgr -show scsi to determine the SCSI DEVICEID (did) for the
devices, and then use the redirect option:
# hwmgr -redirect scsi -src <source_did> -dest <destination_did>
For example:
# hwmgr -redirect scsi -src 18 -dest 1 <destination_did>.
Note that if the output from -show scsi indicates that there are no
valid paths for either disk, you might have to use -delete scsi to
remove the pathless disk. You can then
use dsfmgr.
Regards,
Colin
Colin Walters
3.
For parallel SCSI devices, V5 constructs a unique ID for the device by
using the serial number from one of the Inquiry mode pages. Since an
HSZ70 presents multiple logical units, it probably uses its own serial
number and something unique for each LUN as the serial number for the
LUN. Odds are, when you changed the controller, the serial number
changed, which changed each device. With the new serial number, the
operating system had no choice but to treat them as new devices.
I have a recollection of some internal commentary on this subject and
it may have made into the "Best Practices" document. I don't have a
clue where to look for this though.
Unrelated to this, I have to point out, that as far as I remember,
Storage doesn't support running pairs of redundant controllers except
as a redundant pair.
Alan
What did I do-
I realised that the problem I was facing was because of the change in
the HSZ70 disk controller. The serial numbers changed and the OS gave
the controller a new HWID and this started all the problems. In the
end I ended up doing the following:
(Remember dsk1 was my original device and dsk18 was what the system
was calling it now)
#dsfmgr -Z rm_cluster_hwid <dsk1's hwidno> 0
#dsfmgr -Z rm_cluster_hwid <dsk1's hwidno> 0
Now if you cannot find the HWID in the "hwmgr -show devices" output
look into /etc/dfsc.dat and the fourth column value would be the hwid
assigned to your dskxxx.
Then a simple
#dsfmgr -m dsk18 dsk1
A reboot to ensure everything was okay !!!
Once again, Thanks to all who helped!!!
Tks & Rgds,
Alay Shah.
______________________________ Forward Header
__________________________________
Subject: Trouble with Device Special Files on V5.1
Author: ^IT_MPG_UNIX_ADMIN at MPG001
Date: 3/2/01 1:17 PM
We just replaced a disk controller (An HSZ70). This is a cluster of 2
HSZ70's(Not configured for dual redundancy) and 2 4100's running V5.1
Patch Kit 2. After I replaced this disk controller although all the
units on this disk controller were, dsk1...dsk7, now the system
decided to rebuild new device special file names for them
dsk18....dsk24. Why is this? Is this supposed to happen if you
replaced this disk controller even if the bus stays the same. To fix
this I tried a "dsfmgr -m dsk18 dsk1" but I get dsk1a is active
message.
Help!!
Tks & Rgds,
Alay.
Allegiance Healthcare Corporation
Email - shahal_at_allegiance.net
Received on Fri Mar 02 2001 - 21:05:32 NZDT