Previous | Contents | Index |
OpenVMS and Tru64 UNIX clusters use mechanisms such as a lock manager
to deal with file sharing and sustained availability. The cluster
configuration includes dual-ported disks to provide access from
multiple CPUs.
2.17.1.1 OpenVMS Clusters and Tru64 Truclusters
In Figure 2-6, N1 and N2 are nodes in an OpenVMS cluster. When the active process P1A fails, the standby process P1S takes over. Whenever standby takeover occurs as part of takeover activity, the standby server undergoes a recovery process in which it tries to recover any uncertain transactions that the active server was processing when the failure occurred.
If node N1 failed, then RTR on node N2 opens the failed node's RTR
journal and recovers any uncertain transactions from it, thereby
ensuring transaction consistency. If only the RTR server process
failed, the failed node (N1) still has its journal open so RTR does not
try to open the journal directly. Instead, it asks the remote RTR
system N2 to recover any uncertain transactions. This behavior imposes
certain requirements on the accessibility of the journal.
2.17.1.2 Journal Location
Since the node that takes over needs to open the journal of the failed
node, this journal must be placed on the cluster file system. If the
journal is not on the cluster file system, the standby recovery process
will continue to scan the file systems for the journal and the
partition will never come out of recovery. As long as RTR is unable to
access the required journal and the system operator does not enter an
overriding system management command, the partition state remains in
lcl_rec_fail
.
2.17.1.3 Journal Locking
RTR uses the distributed lock manager (DLM) to coordinate access to the
journal file. Normally each node locks and opens its own journal file.
During recovery, some other node may receive the lock and open the
journal. However, when the owning node is restored, RTR will request
release of the journal. In this case, the remote node will release the
lock on this journal, and the owner node can open its journal. If the
node loses cluster quorum, then RTR releases locks on this journal and
lets another node take over.
2.17.1.4 Cluster Communications
When setting up networks and cluster communications in an OpenVMS or Tru64 cluster that are intended for RTR standby operations, avoid the situation where RTR loses quorum on the node while the OpenVMS or Tru64 cluster has quorum. This can happen if there is one interface for cluster traffic and a completely separate interface for network traffic (IP, DECnet). In this case, if the network interface breaks, then RTR will view the node as unreachable and therefore inquorate.
However, since cluster communication is still intact, the operating
system does not lose cluster quorum. Since RTR has lost quorum, another
node will try to take over, but since the operating system cluster has
not lost quorum, the lock on the journal will not be released and
recovery will not complete. The key point is to avoid situations where
a backend node can lose network communication to its RTR routers yet
remain a viable member of its cluster.
2.17.2 Windows Clusters
Windows clusters, unlike OpenVMS and Tru64 Clusters, are not shared-all
clusters. Windows clusters use the concept of host-based
clustering, that is, one node physically
mounts the shared disks and makes the shared disks available as a
network share to all other nodes in the cluster. If the host node
fails, then one of the other nodes will rehost the disks. This
rehosting is handled by the Windows clustering software. Only two-node
Windows cluster configurations are supported for RTR. In terms of
Windows clusters, RTR is an application and the RTR journals are the
database resource that fails over between the Windows cluster servers.
(A good reference for Windows clustering information is Joseph M.
Lamb's Windows 2000 Clustering and Load Balancing Handbook
available from Prentice-Hall.)
2.17.2.1 Journal Location
The RTR journal for both Windows NT servers must be located on the same disk on the SCSI bus that is shared between the two NT cluster servers. The RTR registry entry for the journal must be set to the same value on both server nodes. Furthermore, the registry entry should specify the journal disk using the path qualified by the cluster name. For example, if the cluster name is ALPHACLUSTER , and the journal disk has the cluster share name DISK1 , then the RTR journal registry entry should be entered as
\\ALPHACLUSTER\DISK1 |
which can be modified using the Registry Editor. The registry key for the journal is found under
\HKEY_LOCAL_MACHINE\SOFTWARE\Compaq Computer Corporation\Reliable Transaction Router\Journal |
There is no default and the value must be in the given format. If the
journal is not located on a shared disk in a Windows cluster
configuration, then RTR behaves as a standalone RTR node and no use is
made of cluster functionality.
2.17.2.2 Facility Role Definition
The computers (nodes) participating in RTR Facilities that are using
the standby features must be configured with both a backend role and a
router role.
2.17.2.3 RTR Home Directory
In a Windows cluster configuration, the RTR home directory must not be
located on a shared SCSI disk. RTR creates lock files in the RTR home
directory and the journal directory during normal operation. These are
of the form N*.LCK or N*.BLK, and C*.LCK or C*.BLK. These files may be
left in these directories after RTR has been stopped, but they will be
reused once RTR is started again. There is no real need for a daemon to
purge these files at system boot time.
2.17.2.4 Cluster Failover
The cluster failover group containing the disk share on which the RTR
journal files are located must not have failback policy enabled. That
is, if the failover group fails over to the secondary cluster node due
to a primary server outage, the group must not fail back to the primary
node once the primary node is available again. As long as RTR
facilities have been defined in a cluster configuration, then the
failover group with the journal device must not be manually failed over
to the other cluster server by the cluster administrator. Failover
should only occur at the discretion of the cluster failover manager
software.
2.17.3 Unrecognized Clusters
Unrecognized or unsupported clusters have different behaviour than recognized or supported clusters.
The default behavior in unrecognized cluster systems is to treat them as non-clustered. However, RTR standby failover will still work. RTR will fail over to the standby server process if the active server process fails. This standby takeover also performs recovery. If it is only the active server process that failed, then RTR can still recover any uncertain transactions through the remote RTR process. If, however, the node itself becomes unavailable (from, for example, an RTR crash, a node crash or a network crash) then the recovery process performs a journal scan to locate the journal of the failed node.
But unlike the case of a recognized cluster, RTR does not wait for the
journal to become available. Instead, it changes to the active state
and continues to process transactions. Any incomplete transactions in
the failed node's journal will remain there; these transactions are not
lost. They are eventually recovered when the failed node becomes active
again, although their sequencing will be lost.
2.17.4 Enhancing Recovery on Sun Systems
RTR supports the use of external scripts to complement RTR standby failover in unclustered configurations. This behavior is enabled with the environment variable RTR_STANDBY_WITHOUT_CLUSTER . When this environment variable is set, it modifies the behavior of RTR standby failover as described in Failover for Sun. Note that this feature is currently only available on Solaris platforms.
When the active node or RTR goes down, the standby node begins to fail over. As part of its failover, it scans the available file systems for the RTR journal of the previously active node that failed. This scanning continues until the journal is found. External scripts can then be run to make the journal available using volume management, rehosting disks or other methods; however NFS mounts or network shares will not be accepted. Once the journal is available, the currently active node can open and lock the remote journal.
Failback
Since the current active node then has the remote journal locked, when
the standby node is restarted, it will not have the journal available.
When the facility is created on the standby node, the facility creation
event generated on the active node will close the remote journal.
Additionally, an external user-written script called
freeremotedisk
is also called. User-defined commands can be put in this script to
cause migration of the disk back to its original owner. Once the
journal is available in the standby node's file system, it is
automatically opened. The user-defined script
freeremotedisk
should be located in
/opt/rtr/RTR400/bin
. Output from the execution of this script is sent to
/rtr/freeremotedisk.LOG
. Execution of this script is also logged to the RTR log file.
2.17.4.1 Restrictions
Whenever the RTR_STANDBY_WITHOUT_CLUSTER variable is set, it is also recommended that RTR_JAM_FAILOVER_WAIT_SECS be set to some suitable value such as 20 seconds. This is the interval after which RTR will poll to find the remote journal during failover. By default, this is set to zero which can lead to high CPU usage. This feature should be restricted to use in two-node clusters with each node assigned both a backend and a router role. Before changing these environment variables, make sure that all RTR processes have been shut down.
This section describes the concepts and operations of RTR partitions.
3.1.1 What is a Partition?
Partitions are subdivisions of a routing key range of values. They are used with a partitioned data model and RTR data-content routing. Partitions exist for each distinct range of values in the routing key for which a server is available to process transactions. RTR provides for failure tolerance by allowing system operators to start separate instances of partitions in a distributed network and by automatically managing the state and flow of transactions to the partition instances.
Partition instances support the following relationships:
The system operator can issue commands to control certain partition
characteristics, and to set preferences concerning partition behavior.
3.2 Partition Naming
A prerequisite for partition management is the ability to identify a
partition in the system that is to be the subject of management
commands. For this purpose, partitions have names, either by default,
supplied by the programmer, or supplied by the system manager.
3.2.1 Name Format and Scope
A valid partition name can contain no more than 63 characters. It can
combine alphanumeric characters (abc123), the underscore (_), and the
dollar sign ($). Partition names must be unique within a facility name
and should be referenced on the command line with the facility name
when using partition commands. Partition names exist only on the
backend where the partition resides. You will not see the partition
names at the RTR routers.
3.2.2 Default Partition Names
Partitions can receive automatically generated default names, in the
form RTR$DEFAULT_PARTITION, unless the name is supplied.
3.2.3 Programmer-Supplied Names
The application programmer can supply a name when opening a server channel with the rtr_open_channel() call. The pkeyseg argument specifies an additional item of type rtr_keyseg_t , assigning the following values:
Using this model, the partition segments and key ranges served by the
server are still specified by the server when the channel is opened.
3.2.4 System-Manager Supplied Partition Names
The system manager can supply partition names using the
create partition
system management command, or by using
rtr_open_channel()
flag arguments. The system manager can set partition characteristics
with this command and applications can open channels to the partition
by name. See Section 3.4 for an example of passing a partition name
with
rtr_open_channel()
.
3.3 Life Cycle of a Partition
This section describes the life cycle of partitions, including the ways
they can be created and their persistence.
3.3.1 Implicit Partition Creation
Partitions are created implicitly when an application program calls
rtr_open_channel()
to create a server channel, specifying the key segments and value
ranges for the segments with the
pkeyseg
argument. Other partition attributes are established with the
flags
argument. Prior to RTR V3.2, this was the only way partitions could be
created. Partitions created in this way are automatically deleted when
the last server channel to the partition is closed.
3.3.2 Explicit Partition Creation
Partitions can also be created by the system operator before server
application program start up using system management commands. This
gives the operator more control over partition characteristics.
Partitions created in this way remain in the system until either
explicitly deleted by the operator, or RTR is stopped.
3.3.3 Persistence of Partition Definitions
RTR stores partition definitions in the journal and records for each transaction the partition in which it was processed. This is convenient when viewing or editing the contents of the journal (using the SET TRANSACTION command), where the partition name can be used to select a subset of the transactions in the journal. RTR will not permit a change in the partition name or definition as long as transactions remain in the journal that were processed under the current name or definition for the partition. If transactions remain in the journal and you need to change the partition name or definition, you can take one of the following actions:
For a server application to be able to open a channel to an explicitly created partition, the application passes the name of the partition through the pkeyseg argument of rtr_open_channel() call. It is not necessary to pass key segment descriptors, but if the application does, they must be compatible with the existing partition definition. You may pass partition characteristics through the flags argument, but these will be superseded by those of the existing partition.
RTR> create partition/KEY1=(type. . .) par_one . . . rtr_keyseg_t partition_name; partition_name.ks_type = rtr_keyseg_partition; partition_name.ks_lo_bound = "par_one"; status - rtr_open_channel(..., RTR_F_OPE_SERVER,..., 1, &partition_name); |
In summary, to fully decouple server applications from the definition
of the partitions to be processed, write applications that open server
channels where only the required partition name is passed. Leave the
management of the partition characteristics to the system managers and
operators.
3.5 Entering Partition Commands
Partitions can be managed by issuing partition commands directed at the required partitions after they are created. Partition commands can be entered in one of two ways:
Enter partition commands on the backend where the partition is located.
Note that commands that affect a partition state only take effect once
the first server joins a partition. Errors encountered at that time
will appear as log file entries. Using partition commands to change the
state of the system causes a log file entry.
3.5.1 Command Line Usage
Partition management in the RTR command language is implemented with the following command set:
The name of the facility in which the partition resides can be
specified with the /FACILITY command line qualifier, or as a
colon-separated prefix to the partition name (for example
Facility1:Partition1). Detailed descriptions of the command syntax are
given in the Command Line Reference section of this manual, and are
summarized in the following discussions. Examples in the following
sections use a partition name of Partition1 in the facility name of
Facility1.
3.5.2 Programmed Partition Management
Partition commands are programmed using rtr_set_info() . Usage of the arguments are as follows:
rtr_qualifier_value_t select_qualifiers[ 3 ]; select_qualifiers[ 0 ].qv_qualifier = rtr_facility_name; select_qualifiers[ 0 ].qv_value = "your_facility_name_here"; select_qualifiers[ 1 ].qv_qualifier = rtr_partition_name; select_qualifiers[ 1 ].qv_value = "your_partition_name_here"; select_qualifiers[ 2 ].qv_qualifier = rtr_qualifiers_end; select_qualifiers[ 2 ].qv_value = NULL; |
The
rtr_set_info()
call completes asynchronously. If the function call is successful,
completion is signaled by the delivery of an RTR message of type
rtr_mt_closed
on the channel whose identifier is returned through the
pchannel
argument. The programmer should retrieve this message by using
rtr_receive_message()
. The data accompanying the message is of type
rtr_status_data_t
. The completion status of the partition command can be accessed as the
status field of the message data.
3.6 Managing Partitions
A set of commands or program calls are used to manage partitions.
Information on managing partitions is provided in this section.
3.6.1 Controlling Shadowing
The state of shadowing for a partition can be enabled or disabled. This can be useful in the following circumstances:
The following restrictions apply:
Once shadowing is disabled, the secondary site servers will be unable to start up in shadow mode until shadowing is enabled again. Shadowing for the partition can be turned on by entering the command at the current active backend member or on any of its standbys.
RTR> SET PARTITION/SHADOW Facility1:Partition1 |
For further information, see the SET PARTITION command in Chapter 8.
To enable shadowing, program the set_qualifier argument of rtr_set_info() as follows:
rtr_qualifier_value_t set_qualifiers[ 2 ]; rtr_partition_state_t newState = rtr_partition_state_shadow; set_qualifiers[ 0 ].qv_qualifier = rtr_partition_state; set_qualifiers[ 0 ].qv_value = &newState; set_qualifiers[ 1 ].qv_qualifier = rtr_qualifiers_end; set_qualifiers[ 1 ].qv_value = NULL; |
To disable shadowing, specify newState as rtr_partition_state_noshadow .
Previous | Next | Contents | Index |