HP OpenVMS Systems Documentation

HP OpenVMS Cluster Systems

Contents

Index

2.4.2 Losing a Member

Table 2-3 describes the phases of a transition caused by the failure of a current OpenVMS Cluster member.

Table 2-3 Transitions Caused by Loss of a Cluster Member

Cause

Description

Failure detection

The duration of this phase depends on the cause of the failure and on how the failure is detected.

During normal cluster operation, messages sent from one computer to another are acknowledged when received.

IF...	THEN...
A message is not acknowledged within a period determined by OpenVMS Cluster communications software	The repair attempt phase begins.
A cluster member is shut down or fails	The operating system causes datagrams to be sent from the computer shutting down to the other members. These datagrams state the computer's intention to sever communications and to stop sharing resources. The failure detection and repair attempt phases are bypassed, and the reconfiguration phase begins immediately.

Repair attempt

If the virtual circuit to an OpenVMS Cluster member is broken, attempts are made to repair the path. Repair attempts continue for an interval specified by the PAPOLLINTERVAL system parameter. (System managers can adjust the value of this parameter to suit local conditions.) Thereafter, the path is considered irrevocably broken, and steps must be taken to reconfigure the OpenVMS Cluster system so that all computers can once again communicate with each other and so that computers that cannot communicate are removed from the OpenVMS Cluster.

Reconfiguration

If a cluster member is shut down or fails, the cluster must be reconfigured. One of the remaining computers acts as coordinator and exchanges messages with all other cluster members to determine an optimal cluster configuration with the most members and the most votes. This phase, during which all user (application) activity is blocked, usually lasts less than 3 seconds, although the actual time depends on the configuration.

OpenVMS Cluster system recovery

Recovery includes the following stages, some of which can take place in parallel:

Stage

Action

I/O completion

When a computer is removed from the cluster, OpenVMS Cluster software ensures that all I/O operations that are started prior to the transition complete before I/O operations that are generated after the transition. This stage usually has little or no effect on applications.

Lock database rebuild

Because the lock database is distributed among all members, some portion of the database might need rebuilding. A rebuild is performed as follows:

WHEN...	THEN...
A computer leaves the OpenVMS Cluster	A rebuild is always performed.
A computer is added to the OpenVMS Cluster	A rebuild is performed when the LOCKDIRWT system parameter is greater than 1.

Caution: Setting the LOCKDIRWT system parameter to different values on the same model or type of computer can cause the distributed lock manager to use the computer with the higher value. This could cause undue resource usage on that computer.

Disk mount verification

This stage occurs only when the failure of a voting member causes quorum to be lost. To protect data integrity, all I/O activity is blocked until quorum is regained. Mount verification is the mechanism used to block I/O during this phase.

Quorum disk votes validation

If, when a computer is removed, the remaining members can determine that it has shut down or failed, the votes contributed by the quorum disk are included without delay in quorum calculations that are performed by the remaining members. However, if the quorum watcher cannot determine that the computer has shut down or failed (for example, if a console halt, power failure, or communications failure has occurred), the votes are not included for a period (in seconds) equal to four times the value of the QDSKINTERVAL system parameter. This period is sufficient to determine that the failed computer is no longer using the quorum disk.

Disk rebuild

If the transition is the result of a computer rebooting after a failure, the disks are marked as improperly dismounted.

Reference: See Sections 6.5.5 and 6.5.6 for information about rebuilding disks.

XFC cache change

If the XFC cache is active on this node, a check is made to determine if there are any nodes in the cluster that do not support the XFC cache. If so, any XFC cache data must be flushed before continuing with the cluster transition.

Clusterwide logical name recovery

This stage ensures that all nodes in the cluster have matching clusterwide logical name information.

Application recovery

When you assess the effect of a state transition on application users, consider that the application recovery phase includes activities such as replaying a journal file, cleaning up recovery units, and users logging in again.

2.5 OpenVMS Cluster Membership

OpenVMS Cluster systems based on LAN or IP network use a cluster group number and a cluster password to allow multiple independent OpenVMS Cluster systems to coexist on the same extended LAN or IP network and to prevent accidental access to a cluster by unauthorized computers.

Note

When using IP network for cluster communication, the remote node's IP address must be present in the SYS$SYSTEM:PE$IP_CONFIG.DAT local file.

2.5.1 Cluster Group Number

The cluster group number uniquely identifies each OpenVMS Cluster system on a LAN or IP or communicates by a common memory region (that is, communicating using SMCI). This group number must be either from 1 to 4095 or from 61440 to 65535.

Rule: If you plan to have more than one OpenVMS Cluster system on a LAN or an IP network, you must coordinate the assignment of cluster group numbers among system managers.

2.5.2 Cluster Password

The cluster password prevents an unauthorized computer using the cluster group number, from joining the cluster. The password must be from 1 to 31 characters; valid characters are letters, numbers, the dollar sign ($), and the underscore (_).

2.5.3 Location

The cluster group number and cluster password are maintained in the cluster authorization file, SYS$COMMON:[SYSEXE]CLUSTER_AUTHORIZE.DAT. This file is created during the installation of the operating system, if you indicate that you want to set up a cluster that utilizes the shared memory or the LAN. The installation procedure then prompts you for the cluster group number and password.

Note

If you convert an OpenVMS Cluster that uses only the CI or DSSI interconnect to one that includes a LAN or shared memory interconnect, the SYS$COMMON:[SYSEXE]CLUSTER_AUTHORIZE.DAT file is created when you execute the CLUSTER_CONFIG.COM command procedure, as described in Chapter 8.

Reference: For information about OpenVMS Cluster group data in the CLUSTER_AUTHORIZE.DAT file, see Sections 8.4 and 10.8.

2.5.4 Example

If all nodes in the OpenVMS Cluster do not have the same cluster password, an error report similar to the following is logged in the error log file.

**** V3.4  ********************* ENTRY  343 ******************************** 
 
 
Logging OS                        1. OpenVMS 
System Architecture               2. Alpha 
OS version                           XC56-BL2 
Event sequence number           102. 
Timestamp of occurrence              16-SEP-2009 16:47:48 
Time since reboot                    0 Day(s) 1:04:52 
Host name                            PERK 
 
System Model                         AlphaServer ES45 Model 2 
 
Entry Type                       98. Asynchronous Device Attention 
 
 
---- Device Profile ---- 
Unit                                 PERK$PEA0 
Product Name                         NI-SCA Port 
 
---- NISCA Port Data ---- 
Error Type and SubType        x0600  Channel Error, Invalid Cluster Password 
                                     Received 
Status                    x0000000000000000 
Datalink Device Name                 EIA8: 
Remote Node Name                     CHBOSE 
Remote Address            x000064A9000400AA 
Local Address             x000063B4000400AA 
Error Count                       1. Error Occurrences This Entry 
 
----- Software Info ----- 
UCB$x_ERRCNT                      6. Errors This Unit

2.6 Synchronizing Cluster Functions by the Distributed Lock Manager

The distributed lock manager is an OpenVMS feature for synchronizing functions required by the distributed file system, the distributed job controller, device allocation, user-written OpenVMS Cluster applications, and other OpenVMS products and software components.

The distributed lock manager uses the connection manager and SCS to communicate information between OpenVMS Cluster computers.

2.6.1 Distributed Lock Manager Functions

The functions of the distributed lock manager include the following:

Synchronizes access to shared clusterwide resources, including:
- Devices
- Files
- Records in files
- Any user-defined resources, such as databases and memory
Each resource is managed clusterwide by an OpenVMS Cluster computer.
Implements the $ENQ and $DEQ system services to provide clusterwide synchronization of access to resources by allowing the locking and unlocking of resource names.
Reference: For detailed information about system services, refer to the HP OpenVMS System Services Reference Manual.
Queues process requests for access to a locked resource. This queuing mechanism allows processes to be put into a wait state until a particular resource is available. As a result, cooperating processes can synchronize their access to shared objects, such as files and records.
Releases all locks that an OpenVMS Cluster computer holds if the computer fails. This mechanism allows processing to continue on the remaining computers.
Supports clusterwide deadlock detection.

2.6.2 System Management of the Lock Manager

The lock manager is fully automated and usually requires no explicit system management. However, the LOCKDIRWT and LOCKRMWT system parameters can be used to adjust the distribution of activity and control of lock resource trees across the cluster.

A lock resource tree is an abstract entity on which locks can be placed. Multiple lock resource trees can exist within a cluster. For every resource tree, there is one node known as the directory node and another node known as the lock resource master node.

A lock resource master node controls a lock resource tree and is aware of all the locks on the lock resource tree. All locking operations on the lock tree must be sent to the resource master. These locks can come from any node in the cluster. All other nodes in the cluster only know about their specific locks on the tree.

Furthermore, all nodes in the cluster have many locks on many different lock resource trees, which can be mastered on different nodes. When creating a new lock resource tree, the directory node must first be queried if a resource master already exists.

The LOCKDIRWT parameter allocates a node as the directory node for a lock resource tree. The higher a node's LOCKDIRWT setting, the higher the probability that it will be the directory node for a given lock resource tree.

For most configurations, large computers and boot nodes perform optimally when LOCKDIRWT is set to 1 and satellite nodes have LOCKDIRWT set to 0. These values are set automatically by the CLUSTER_CONFIG.COM procedure. Nodes with a LOCKDIRWT of 0 will not be the directory node for any resources unless all nodes in the cluster have a LOCKDIRWT of 0.

In some circumstances, you may want to change the values of the LOCKDIRWT parameter across the cluster to control the extent to which nodes participate as directory nodes.

LOCKRMWT influences which node is chosen to remaster a lock resource tree. Because there is a performance advantage for nodes mastering a lock resource tree (as no communication is required when performing a locking operation), the lock resource manager supports remastering lock trees to other nodes in the cluster. Remastering a lock resource tree means to designate another node in the cluster as the lock resource master for that lock resource tree and to move the lock resource tree to it.

A node is eligible to be a lock resource master node if it has locks on that lock resource tree. The selection of the new lock resource master node from the eligible nodes is based on each node's LOCKRMWT system parameter setting and each node's locking activity.

LOCKRMWT can contain a value between 0 and 10; the default is 5. The following list describes how the value of the LOCKRMWT system parameter affects resource tree mastery and how lock activity can affect the decision:

Any node that has a LOCKRMWT value of 0 will attempt to remaster a lock tree to another node which has locks on that tree, as long as the other node has a LOCKRMWT greater than 0.
Nodes with a LOCKRMWT value of 10 will be given resource trees from other nodes that have a LOCKRMWT less than 10.
Otherwise, the difference in LOCKRMWT is computed between the master and the eligible node. The higher the difference, the more activity is required by the eligible node for the lock tree to move.

In most cases, maintaining the default value of 5 for LOCKRMWT is appropriate, but there may be cases where assigning some nodes a higher or lower LOCKRMWT is useful for determining which nodes master a lock tree. The LOCKRMWT parameter is dynamic, hence it can be adjusted, if necessary.

2.6.3 Large-Scale Locking Applications

The Enqueue process limit (ENQLM), which is set in the SYSUAF.DAT file and which controls the number of locks that a process can own, can be adjusted to meet the demands of large scale databases and other server applications.

Prior to OpenVMS Version 7.1, the limit was 32767. This limit was removed to enable the efficient operation of large scale databases and other server applications. A process can now own up to 16,776,959 locks, the architectural maximum. By setting ENQLM in SYSUAF.DAT to 32767 (using the Authorize utility), the lock limit is automatically extended to the maximum of 16,776,959 locks. $CREPRC can pass large quotas to the target process if it is initialized from a process with the SYSUAF Enqlm quota of 32767.

Reference: See the HP OpenVMS Programming Concepts Manual for additional information about the distributed lock manager and resource trees. See the HP OpenVMS System Manager's Manual for more information about Enqueue Quota.

2.7 Resource Sharing

Resource sharing in an OpenVMS Cluster system is enabled by the distributed file system, RMS, and the distributed lock manager.

2.7.1 Distributed File System

The OpenVMS Cluster distributed file system allows all computers to share mass storage and files. The distributed file system provides the same access to disks, tapes, and files across the OpenVMS Cluster that is provided on a standalone computer.

2.7.2 RMS and the Distributed Lock Manager

The distributed file system and OpenVMS Record Management Services (RMS) use the distributed lock manager to coordinate clusterwide file access. RMS files can be shared to the record level.

Almost any disk or tape device can be made available to the entire OpenVMS Cluster system. The devices can be:

Connected to a supported storage subsystem
A local device that is served to the OpenVMS Cluster

All cluster-accessible devices appear as if they are connected to every computer.

2.8 Disk Availability

Locally connected disks can be served across an OpenVMS Cluster by the MSCP server.

2.8.1 MSCP Server

The MSCP server makes locally connected disks, including the following, available across the cluster:

DSA disks local to OpenVMS Cluster members using SDI
HSG and HSV disks in an OpenVMS Cluster using mixed interconnects
SCSI and HSZ disks
SAS, LSI 1068 SAS and LSI Logic 1068e SAS disks
FC and HSG disks
Disks on boot servers and disk servers located anywhere in the OpenVMS Cluster

In conjunction with the disk class driver (DUDRIVER), the MSCP server implements the storage server portion of the MSCP protocol on a computer, allowing the computer to function as a storage controller. The MSCP protocol defines conventions for the format and timing of messages sent and received for certain families of mass storage controllers and devices designed by HP. The MSCP server decodes and services MSCP I/O requests sent by remote cluster nodes.

Note: The MSCP server is not used by a computer to access files on locally connected disks.

2.8.2 Device Serving

Once a device is set up to be served:

Any cluster member can submit I/O requests to it.
The local computer can decode and service MSCP I/O requests sent by remote OpenVMS Cluster computers.

2.8.3 Enabling the MSCP Server

The MSCP server is controlled by the MSCP_LOAD and MSCP_SERVE_ALL system parameters. The values of these parameters are set initially by answers to questions asked during the OpenVMS installation procedure (described in Section 8.4), or during the CLUSTER_CONFIG.COM procedure (described in Chapter 8).

The default values for these parameters are as follows:

MSCP is not loaded on satellites.
MSCP is loaded on boot server and disk server nodes.

Reference: See Section 6.3 for more information about setting system parameters for MSCP serving.

2.9 Tape Availability

Locally connected tapes can be served across an OpenVMS Cluster by the TMSCP server.

2.9.1 TMSCP Server

The TMSCP server makes locally connected tapes, available across the cluster including the following:

HSG and HSV tapes
SCSI tapes
SAS tapes

The TMSCP server implements the TMSCP protocol, which is used to communicate with a controller for TMSCP tapes. In conjunction with the tape class driver (TUDRIVER), the TMSCP protocol is implemented on a processor, allowing the processor to function as a storage controller.

The processor submits I/O requests to locally accessed tapes, and accepts the I/O requests from any node in the cluster. In this way, the TMSCP server makes locally connected tapes available to all nodes in the cluster. The TMSCP server can also mak HSG and HSV tapes accessible to OpenVMS Cluster satellites.

2.9.2 Enabling the TMSCP Server

The TMSCP server is controlled by the TMSCP_LOAD system parameter. The value of this parameter is set initially by answers to questions asked during the OpenVMS installation procedure (described in Section 4.2.3) or during the CLUSTER_CONFIG.COM procedure (described in Section 8.4). By default, the setting of the TMSCP_LOAD parameter does not load the TMSCP server and does not serve any tapes.

2.10 Queue Availability

The distributed queue manager makes queues available across the cluster to achieve the following:

Function	Description
Permit users on any OpenVMS Cluster computer to submit batch and print jobs to queues that execute on any computer in the OpenVMS Cluster	Users can submit jobs to any queue in the cluster, provided that the necessary mass storage volumes and peripheral devices are accessible to the computer on which the job executes.
Distribute the batch and print processing work load over OpenVMS Cluster nodes	System managers can set up generic batch and print queues that distribute processing work loads among computers. The distributed queue manager directs batch and print jobs either to the execution queue with the lowest ratio of jobs-to-queue limit or to the next available printer.

The distributed queue manager uses the distributed lock manager to signal other computers in the OpenVMS Cluster to examine the batch and print queue jobs to be processed.

Contents

Index