Hello Managers,
While in the process of building a 5.1a cluster, I've encountered an odd 
problem which appears to be with wwidmgr.
Some background:
The cluster will eventually consist of 2 DS20s with  two KGPSAs each , SRM 
firmware  6.1 ( HBAs are in the two lowest PCI slots)
2 HSG 80 (firmware 8.6) in multibus failover mode with 2  fabric switches ( 
Compaq ). Each HSG is connected to an individual switch only via HSG port 1 
( port 2 is not connected to anything). Each HBA in the DS20s connects to 
an individual switch. The connections on the HSG for the DS20 HBAs are set 
to Tru64 UNIX operating system. The SAN is currently in use for some 
Windows NT systems, so a reboot is not possible.
For the present, only one DS20 is connected as the other is still a 
non-cluster production system.
The switches are zoned. One zone for 4 Windows NT systems which includes 
the HSG and an MDR. The other zone is the UNIX system zone which for now 
contains only the one DS20, the HSG, and the MDR.
I've created 9 units on the HSG, set the identifiers to match the unit 
numbers, and set the enable_access to allow only the two connections to the 
DS20.  Everything looks good at this point.
On the DS20, I used wwidmgr to set each adapters topology to fabric. a 
"wwidmgr -show adapter" confirms the topology is set to fabric.
Here's the quirky part...
When I use wwidmgr, I get messages saying "pga0 not ready" and " pgb0 not 
ready" ( twice each)( I've confirmed the LEDs, cables, etc)
When I do a "wwidmgr -show wwid" I see some odd WWID numbers which do not 
directly relate to the WWID indicated by a "show unit" on the HSG. Also, 
the UDID only shows up for two of the units, the others all show 0.
I have tried "wwidmgr -clear all"  with the same results.
I went ahead and loaded 5.1a on the internal disk to begin the cluster load 
anyway, hoping things may clear up.
"hwmgr -view devices"  showed me all the HSG units including the UDID.  I 
was able to disklabel them all. I went ahead and built the cluster and 
everything worked fine as far as creating the cluster system disk, the 
quorum disk, and the member boot disk.
However, after the cluster install procedure indicated it had set the 
console variable ( bootdef_dev, etc ) and rebooted, the system stopped at 
the SRM prompt.
The SRM variable for bootdef_dev was cleared ( empty ). I did a wwidmgr 
-show wwid and picked the WWID that most closely matched the WWID of the 
unit I wanted to set as my member boot disk and used "wwidmgr -quickset 
-item # -unit #" to set it.  A "show dev " showed me  2  dga devices and 2 
dgb devices with long strings of numbers after them.  I used "set 
bootdef_dev" to set the boot devices to the dga and dgb devices.
When I went to boot, I received the usual message about doing an init, 
which I did.
However, after the init,  when I did  a "show bootdef_dev" it was again 
empty and a "show dev" showed no  dga or dgb devices.
I have repeated used wwidmgr to set, clear, etc and set and reset the boot 
device. Each time I do an INIT, I lose all the settings.
I have a call open to Compaq, but they are also baffled.
Has anyone experienced anything like this, or does anyone have any suggestions.
Thanks.
John
Received on Sat May 25 2002 - 18:51:09 NZST