UPDATE: Inconsistent devices on a cluster

From: Jim Fitzmaurice <jpfitz_at_fnal.gov>
Date: Mon, 19 Mar 2001 14:26:05 -0600

One other thing, when I try to remove these devices the following message
appears on the screen of every user connected to the cluster.

Mar 19 09:57:29 d0olc vmunix: nil>

NOT just the console or just my screen, it goes to everybody connected to
the cluster. Anybody know why?



> Hello,
>
> I have a 2 system TruCluster with a GS80 and a 4100, they share an
HSZ50
> and we have just attached an HSG80. The problem is with the newly attached
> HSG80. The HSZ50 has all the System cluster filesytems and directories
like
> /home, etc.... It was used to build the cluster and works great. Now we
are
> adding an HSZ80 which we plan on using for our databases. This isn't going
> as well as I'd like.
>
> A possible cause of the problem could have been when I added the
second
> member, the 4100, to the cluster. The HSG was being used on a different
4100
> (One that is not part of the cluster at this time.) but, the HSG was also
> hooked up to the 4100 that I was adding to the cluster, even though it
> wasn't using it. I realize now that I should have disconnected that 4100
> from the HSG80, but since I wasn't using it on the system I'd forgotten it
> was attached. Well the when I booted the 4100 into the cluster it went out
> and grabbed all the devices attached to it for the cluster. This caused
some
> problems on the other 4100 since it lost most of its AdvFS when the newly
> clustered system made it's connection to the HSG online and placed the
other
> systems connection offline. After a while we were able to correct that
> situation, by disconnecting the clustered system and brining the other
> systems connection back online.
>
> Now we have migrated all the data off the HSG80 and have attached it
to
> the cluster systems. Unfortunately it showed devices dsk35c to dsk47c
that's
> two more devices than we have on the array. Running disklabel on all the
> devices showed that dsk35c to dsk39c and dsk42c to dsk47c. to be the
correct
> devices. The devices dsk40c & dsk41c are not valid. So I wanted to remove
> the devices using hwmgr -delete and re-add them using the hwmgr -scan to
get
> rid of the two devices in the middle. This is to avoid causing any
confusion
> in the future. I was able to successfully remove all the HSG devices,
EXCEPT
> for those two! I get the following error:
>
> # hwmgr -delete component -id 244
> hwmgr: Error (95) Cannot start operation.
>
> I discovered the devices were inconsistent by running the command:
>
> # hwmgr -show component -inconsisten
>
> HWID: HOSTNAME FLAGS SERVICE COMPONENT NAME
> -----------------------------------------------
> 244: d0olc rcd-i iomap
> SCSI-WWID:01000010:6000-1fe1-0008-4b30-0009-0300-5108-0044
> 245: d0olc rcd-i iomap
> SCSI-WWID:01000010:6000-1fe1-0008-4b30-0009-0300-5108-0045
> 252: d0olc rcdsi none
> SCSI-WWID:02000008:5000-1fe1-0008-4b30
>
> I assume HWID: 252 the controller:
>
> 252: /dev/cport/scp3 HSG80CCL bus-4-targ-0-lun-0
>
> is showing up as inconsistent because the two inconsistent devices are
> attached to it. (I can't delete that device (252) either, same error.) The
> only documentation I found was in the hwmgr man page where it states:
>
> Note that this command does not fix database inconsistencies; it
> only detects inconsistencies. One possible fix may be to reboot
> the cluster.
>
> <sarcasm-mode-on>
> My, doesn't that sound reassuring... What a definitive answer... Surely
> there is no need to direct the reader to another document or resource
where
> he might find more information....
> <sarcasm-mode-off>
>
> (Sorry about that.) Anyway I tried the hwmgr -refresh command but that
> didn't work either.
>
> A search for the error on the Compaq site revealed 22,229 page matches
I
> tried a couple of documents but searches of the documents revealed no
> information on that error. (Nope, sorry, I am NOT going to search all
22,229
> matches.)
>
> System Info:
>
> Two member cluster consisting of a GS80 and a 4100, and HSZ50 and
an
> HSG80.
> Running Tru64 V5.1 and TruCluster V5.1patch kit 2.
>
> Has anybody seen this before, or direct me to some documentation
before
> I blindly reboot my cluster?
>
> Jim Fitzmaurice
> jpfitz_at_fnal.gov
>
> UNIX is very user friendly, It's just very particular about who it makes
> friends with.
>
>
Received on Mon Mar 19 2001 - 20:26:05 NZST

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:42 NZDT