SUMMARY: Removing device from cluster

From: Robert Mulley <robert.mulley_at_travelcorporation.com.au>
Date: Wed, 27 Mar 2002 10:01:56 +1100

Hello,

Sorry in advance for the long summary.
Thanks go to Thomas Blinn and Raul Sossa Sossa for their helpful replies,
also to Udo de Boer and Pat O'Brien. The general indication was that this is
a bug, probably because no testing has been done in such an environment as I
have here. So I will escalate this call through the ranks and see if
something can be done about it. I've got to get the support people here in
Australia to escalate it the support engineering group in the US, so it
could take a while. However I will let you know the end result.
Raul gave me a step by step method of replacing disks, which unfortunately
didn't work in my situation. As it is the most complete and succinct
procedure I've seen for this then I'll list it here. It will be good to
have in the archives.

1. # grep "dsk XX" /etc/dsfsc.dat
2. dsfmgr -Z rm_cluster_hwid yyyy 0 (where yyy is obtained from previous
command).
3. dsfmgr -Z rm_local_hwid yyyy 0" (where yyy is obtained from previous
command).
4. dsfmgr -D dskXX (eliminates all devices special files at /dev/).
5. If the mentioned device still appears at 'hwmgr -view devices" and/or
"hwmgr -show scsi",
   use the "hwmgr -delete component -id NN " (where NN is shown at hwmgr).
6. To controll the inserting process, open a new window and type a
:"#evmwatch -A & ". So, new scsi cluster events will be displayed at this
window.
7. Continue working on the other window.
6. Inser the new disk at local busses or create and public new unit at HSG80
controller.
7. Apply a "#hwmgr -scan scsi" at the server that must see the new unit (if
local).
8. Apply a "#hwmgr -scan component -category scsi_bus" just to update the
TruCluster Server 5.1 device database (because "hwmgr -scan scsi" does not
update it).
9. You will see some information events at the other window, if you have an
error, you will see it too. Usually, they're all information messages.
10. At this point, your new device disk must have been recognized by the
system and a "hwmgr -view devices" must show it.

dsfmgr -Z xxxxxx is an undocumented feature of dsfmgr. Most people don't
like using undocumented features, so I hope compaq will document it at some
point, as it seems to be the best/only way to remove errant devices. I have
used it successfully on stand-alone systems but not in a cluster.

Robert Mulley
The Travel Corporation

-----Original Question-----

Hello,

We've got a situation on our 5.1 cluster. Running 5.1 pk4. When we add a new
disk device we know that you have to use hwmgr -scan component -cat
scsi_bus. However if this device fails or becomes unavailable how do you get
rid of it?
When I run hwmgr -delete component -id xxx I get the error message:
hwmgr: Error (95) Cannot start operation.
I know from other people and sites you can use "dsfmgr -Z rm_cluster_hwid
xxx 0" followed by "dsfmgr -Z rm_local_hwid xxx 0". Well in our case this
doesn't seem to clear up the entry. Previous suggestion from Compaq support
was that we should reboot, yeah this clears up the error. However, who
really wants to reboot a production machine every time we need to add new
storage or replace a failed disk. Makes a mockery of the hot-swap concept.
So if anybody has any experience with this or can offer any assistance
please help us.

Robert Mulley
The Travel Corporation.
Received on Tue Mar 26 2002 - 23:08:14 NZST

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:43 NZDT