HP OpenVMS Systems

ask the wizard
Content starts here

Cluster Failover, Shared Roots?

» close window

The Question is:

 
I have two vax 4000 705a in a cluster.  I would like to use two others I have
 for hardware failover
this is a dssi cluster and I connected all four
systems together and set the bootflags so the
replacement systems boot properly.  when booting the old system works fine and
 the new system boots half way and hangs when I try to boot it.  I know I can
 only boot one of them at a time because the boot from the same system segment.
  HELP
 


The Answer is :

 
  You cannot have the same host name and the same SCSSYTEMID host address
  active in the same cluster at the same time -- and these values are
  paired, so the entire cluster must be rebooted if either the SCS node
  name or the SCS system ID (but not both) changes.
 
  Bootstrapping this configuration should involve great care as well, as
  cluster configurations with invalid VOTES and/or invalid (or creative)
  EXPECTED_VOTES settings can trigger user data corruptions.  (For details
  on correctly setting the VOTES and EXPECTED_VOTES parameters, please
  see the OpenVMS FAQ.)
 
  If you endeavour to have two nodes booting from one root, this should
  work -- assuming that the hardware configurations of the pairs are
  sufficiently similar of course, and assuming that one of the two nodes
  in the pair is always and reliably down.  That written, the OpenVMS
  Wizard would tend to use a cluster alias and would tend to keep all
  four nodes active, each with a unique host name and SCS host address.
  This tends to reduce the exposure to operator error, of course, and
  this allows all four hosts to be continuously tested through normal
  operation.
 
  You can turn on procedure verification and determine where the hang
  occurs, of course -- assuming the system has gotten that far.  (With
  no details on when the hang occurs, no specific answer is possible.)
 
  The OpenVMS Wizard would also look to the exposure involved in the
  operator input -- human error is a common source of problems -- and
  the exposure to hardware-level failures in the storage area.  DSSI
  is an old (and slow) storage technology, and the hardware involved
  in most DSSI configurations is aging and becoming coorespondingly
  more prone to hardware failures.
 

answer written or last revised on ( 17-DEC-2003 )

» close window