HP OpenVMS Systems Documentation |
OpenVMS Cluster Systems
10.12.1 Restoring VotesFor process execution to resume, the cluster votes total must be restored to a value greater than or equal to the cluster quorum value. Often, the required votes are added as computers join or rejoin the cluster. However, waiting for a computer to join the cluster and increasing the votes value is not always a simple or convenient remedy. An alternative solution, for example, might be to shut down and reboot all the computers with a reduce quorum value. After the failure of a computer, you may want to run the Show Cluster utility and examine values for the VOTES, EXPECTED_VOTES, CL_VOTES, and CL_QUORUM fields. (See the OpenVMS System Management Utilities Reference Manual for a complete description of these fields.) The VOTES and EXPECTED_VOTES fields show the settings for each cluster member; the CL_VOTES and CL_QUORUM fields show the cluster votes total and the current cluster quorum value. To examine these values, enter the following commands:
Note: If you want to enter SHOW CLUSTER commands interactively, you must specify the /CONTINUOUS qualifier as part of the SHOW CLUSTER command string. If you do not specify this qualifier, SHOW CLUSTER displays cluster status information returned by the DCL command SHOW CLUSTER and returns you to the DCL command level.
If the display from the Show Cluster utility shows the CL_VOTES value
equal to the CL_QUORUM value, the cluster cannot survive the failure of
any remaining voting member. If one of these computers shuts down, all
process activity in the cluster stops.
To prevent the disruption of cluster process activity, you can reduce the cluster quorum value as described in Table 10-6.
10.13 Cluster Performance
Sometimes performance issues involve monitoring and tuning applications
and the system as a whole. Tuning involves collecting and reporting on
system and network processes to improve performance. A number of tools
can help you collect information about an active system and its
applications.
The following table briefly describes the SHOW commands available with the OpenVMS operating system. Use the SHOW DEVICE commands and qualifiers shown in the table.
The SHOW CLUSTER command displays a variety of information about the OpenVMS Cluster system. The display output provides a view of the cluster as seen from a single node, rather than a complete view of the cluster.
Reference: The OpenVMS System Management Utilities Reference Manual contains complete
information about all the SHOW commands and the Show Cluster utility.
The following table describes using the OpenVMS Monitor utility to locate disk I/O bottlenecks. I/O bottlenecks can cause the OpenVMS Cluster system to appear to hang.
10.13.3 Using Compaq Availability Manager and DECamdsCompaq Availability Manager and DECamds are real-time monitoring, diagnostic, and correction tools used by system managers to improve the availability and throughput of a system. Availability Manager runs on OpenVMS Alpha or on a Windows node. DECamds runs on both OpenVMS VAX and OpenVMS Alpha and uses the DECwindows interface. These products, which are included with the operating system, help system managers correct system resource utilization problems for CPU usage, low memory, lock contention, hung or runaway processes, I/O, disks, page files, and swap files. Availability Manager enables you to monitor one or more OpenVMS nodes on an extended LAN from either an OpenVMS Alpha or a Windows node. Availability Manager collects system and process data from multiple OpenVMS nodes simultaneously. It analyzes the data and displays the output using a native Java GUI. DECamds collects and analyzes data from multiple nodes (VAX and Alpha) simultaneously, directing all output to a centralized DECwindows display. DECamds helps you observe and troubleshoot availability problems, as follows:
Reference: For more information about Availability Manager, see the Availability Manager User's Guide and the Availability Manager web site, which you can access from the Compaq OpenVMS site:
For more information about DECamds, see the DECamds User's Guide.
It is important to monitor LAN activity on a regular basis. Using the SCA (Systems Communications Architecture) Control Program (SCACP), you can monitor LAN activity as well as set and show default ports, start and stop LAN devices, and assign priority values to channels. Reference: For more information about SCACP, see the OpenVMS System Management Utilities Reference Manual: A--L. Using NCP commands like the following, you can set up a convenient monitoring procedure to report activity for each 12-hour period. Note that DECnet event logging for event 0.2 (automatic line counters) must be enabled. Reference: For detailed information on DECnet for OpenVMS event logging, refer to the DECnet for OpenVMS Network Management Utilities manual. In these sample commands, BNA-0 is the line ID of the Ethernet line.
At every timer interval (in this case, 12 hours), DECnet will create an event that sends counter data to the DECnet event log. If you experience a performance degradation in your cluster, check the event log for increases in counter values that exceed normal variations for your cluster. If all computers show the same increase, there may be a general problem with your Ethernet configuration. If, on the other hand, only one computer shows a deviation from usual values, there is probably a problem with that computer or with its Ethernet interface device. The following layered products can be used in conjunction with one of Compaq's LAN bridges to monitor the LAN traffic levels: RBMS, DECelms, DECmcc, and LAN Traffic Monitor (LTM).
Appendix A
|
Parameter | Description | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ALLOCLASS | Specifies a numeric value from 0 to 255 to be assigned as the disk allocation class for the computer. The default value is 0. | |||||||||||||||
CHECK_CLUSTER | Serves as a VAXCLUSTER parameter sanity check. When CHECK_CLUSTER is set to 1, SYSBOOT outputs a warning message and forces a conversational boot if it detects the VAXCLUSTER parameter is set to 0. | |||||||||||||||
CLUSTER_CREDITS |
Specifies the number of per-connection buffers a node allocates to
receiving VMS$VAXcluster communications.
If the SHOW CLUSTER command displays a high number of credit waits for the VMS$VAXcluster connection, you might consider increasing the value of CLUSTER_CREDITS on the other node. However, in large cluster configurations, setting this value unnecessarily high will consume a large quantity of nonpaged pool. Each receive buffer is at least SCSMAXMSG bytes in size but might be substantially larger depending on the underlying transport. It is not required that all nodes in the cluster have the same value for CLUSTER_CREDITS. For small or memory-constrained systems, the default value of CLUSTER_CREDITS should be adequate. |
|||||||||||||||
CWCREPRC_ENABLE | Controls whether an unprivileged user can create a process on another OpenVMS Cluster node. The default value of 1 allows an unprivileged user to create a detached process with the same UIC on another node. A value of 0 requires that a user have DETACH or CMKRNL privilege to create a process on another node. | |||||||||||||||
DISK_QUORUM | The physical device name, in ASCII, of an optional quorum disk. ASCII spaces indicate that no quorum disk is being used. DISK_QUORUM must be defined on one or more cluster computers capable of having a direct (not MSCP served) connection to the disk. These computers are called quorum disk watchers. The remaining computers (computers with a blank value for DISK_QUORUM) recognize the name defined by the first watcher computer with which they communicate. | |||||||||||||||
++DR_UNIT_BASE | Specifies the base value from which unit numbers for DR devices (StorageWorks RAID Array 200 Family logical RAID drives) are counted. DR_UNIT_BASE provides a way for unique RAID device numbers to be generated. DR devices are numbered starting with the value of DR_UNIT_BASE and then counting from there. For example, setting DR_UNIT_BASE to 10 will produce device names such as $1$DRA10, $1$DRA11, and so on. Setting DR_UNIT_BASE to appropriate, nonoverlapping values on all cluster members that share the same (nonzero) allocation class will ensure that no two RAID devices are given the same name. | |||||||||||||||
EXPECTED_VOTES |
Specifies a setting that is used to derive the initial quorum value.
This setting is the sum of all VOTES held by potential cluster members.
By default, the value is 1. The connection manager sets a quorum
value to a number that will prevent cluster partitioning (see
Section 2.3). To calculate quorum, the system uses the following
formula:
|
|||||||||||||||
LOCKDIRWT | Lock manager directory system weight. Determines the portion of lock manager directory to be handled by this system. The default value is adequate for most systems. | |||||||||||||||
+LRPSIZE |
For VAX computers running VMS Version 5.5--2 and earlier, the LRPSIZE
parameter specifies the size, in bytes, of the large request packets.
The actual physical memory consumed by a large request packet is
LRPSIZE plus overhead for buffer management. Normally, the default
value is adequate. The value of LRPSIZE affects the transfer size used
by VAX nodes on an FDDI ring.
FDDI supports transfers using large packets (up to 4468 bytes). PEDRIVER does not use large packets by default, but can take advantage of the larger packet sizes if you increase the LRPSIZE system parameter to 4474 or higher. PEDRIVER uses the full FDDI packet size if the LRPSIZE is set to 4474 or higher. However, only FDDI nodes connected to the same ring use large packets. Nodes connected to an Ethernet segment restrict packet size to that of an Ethernet packet (1498 bytes). |
|||||||||||||||
++MC_SERVICES_P0 (dynamic) |
Controls whether other MEMORY CHANNEL nodes in the cluster continue to
run if this node bugchecks or shuts down.
A value of 1 causes other nodes in the MEMORY CHANNEL cluster to fail with bugcheck code MC_FORCED_CRASH if this node bugchecks or shuts down. The default value is 0. A setting of 1 is intended only for debugging purposes; the parameter should otherwise be left at its default state. |
|||||||||||||||
++MC_SERVICES_P2 (static) |
Specifies whether to load the PMDRIVER (PMA0) MEMORY CHANNEL cluster
port driver. PMDRIVER is a new driver that serves as the MEMORY CHANNEL
cluster port driver. It works together with MCDRIVER (the MEMORY
CHANNEL device driver and device interface) to provide MEMORY CHANNEL
clustering. If PMDRIVER is not loaded, cluster connections will not be
made over the MEMORY CHANNEL interconnect.
The default for MC_SERVICES_P2 is 1. This default value causes PMDRIVER to be loaded when you boot the system. Compaq recommends that this value not be changed. This parameter value must be the same on all nodes connected by MEMORY CHANNEL. |
|||||||||||||||
++MC_SERVICES_P3 (dynamic) |
Specifies the maximum number of tags supported. The maximum value is
2048 and the minimum value is 100.
The default value is 800. Compaq recommends that this value not be changed. This parameter value must be the same on all nodes connected by MEMORY CHANNEL. |
|||||||||||||||
++MC_SERVICES_P4 (static) |
Specifies the maximum number of regions supported. The maximum value is
4096 and the minimum value is 100.
The default value is 200. Compaq recommends that this value not be changed. This parameter value must be the same on all nodes connected by MEMORY CHANNEL. |
|||||||||||||||
++MC_SERVICES_P6 (static) |
Specifies MEMORY CHANNEL message size, the body of an entry in a free
queue, or a work queue. The maximum value is 65536 and the minimum
value is 544. The default value is 992, which is suitable in all cases
except systems with highly constrained memory.
For such systems, you can reduce the memory consumption of MEMORY CHANNEL by slightly reducing the default value of 992. This value must always be equal to or greater than the result of the following calculation:
This parameter value must be the same on all nodes connected by MEMORY CHANNEL. |
|||||||||||||||
++MC_SERVICES_P7 (dynamic) |
Specifies whether to suppress or display messages about cluster
activities on this node. Can be set to a value of 0, 1, or 2. The
meanings of these values are:
The default value is 0. Compaq recommends that this value not be changed except for debugging MEMORY CHANNEL problems or adjusting the MC_SERVICES_P9 parameter. |
|||||||||||||||
++MC_SERVICES_P9 (static) |
Specifies the number of initial entries in a single channel's free
queue. The maximum value is 2048 and the minimum value is 10.
Note that MC_SERVICES_P9 is not a dynamic parameter; you must reboot the system after each change in order for the change to take effect. The default value is 150. Compaq recommends that this value not be changed. This parameter value must be the same on all nodes connected by MEMORY CHANNEL. |
|||||||||||||||
++MPDEV_ENABLE | Enables the formation of multipath sets when set to ON (1). If set to OFF (0), the formation of additional multipath sets is disabled. However, existing multipath sets remain in effect. The default is ON. | |||||||||||||||
++MPDEV_LCRETRIES | Controls the number of times the system retries locally connected paths before moving on to local unconnected paths or to an MSCP served path to the device. The valid range for retries is 1 through 256. The default is 1. | |||||||||||||||
++MPDEV_POLLER | Enables polling of the paths to multipath set members when set to ON (1). Polling allows early detection of errors on inactive paths. If a path becomes unavailable or returns to service, the system manager is notified with an OPCOM message. If set to OFF (0), multipath polling is disabled. The default is ON. | |||||||||||||||
++MPDEV_REMOTE | Enables MSCP served disks to become members of a multipath set when set to ON (1). If set to OFF (0), only local paths to a SCSI device will be used in the formation of additional multipath sets. However, existing remote members of multipath sets remain as members of them. The default is OFF. For OpenVMS Version 7.2, the only valid setting is OFF. | |||||||||||||||
MSCP_BUFFER |
This buffer area is the space used by the server to transfer data
between client systems and local disks.
On VAX systems, MSCP_BUFFER specifies the number of pages to be allocated to the MSCP server's local buffer area. On Alpha systems, MSCP_BUFFER specifies the number of pagelets to be allocated the MSCP server's local buffer area. |
|||||||||||||||
MSCP_CMD_TMO |
Specifies the time in seconds that the OpenVMS MSCP server uses to
detect MSCP command timeouts. The MSCP server must complete the command
within a built-in time of approximately 40 seconds plus the value of
the MSCP_CMD_TMO parameter.
An MSCP_CMD_TMO value of 0 is normally adequate. A value of 0 provides the same behavior as in previous releases of OpenVMS (which did not have an MSCP_CMD_TMO system parameter). A nonzero setting increases the amount of time before an MSCP command times out. If command timeout errors are being logged on client nodes, setting the parameter to a nonzero value on OpenVMS servers reduces the number of errors logged. Increasing the value of this parameter reduces the numb client MSCP command timeouts and increases the time it takes to detect faulty devices. If you need to decrease the number of command timeout errors, set an initial value of 60. If timeout errors continue to be logged, you can increase this value in increments of 20 seconds. |
|||||||||||||||
MSCP_CREDITS | Specifies the number of outstanding I/O requests that can be active from one client system. | |||||||||||||||
MSCP_LOAD | Controls whether the MSCP server is loaded. Specify 1 to load the server, and use the default CPU load rating. A value greater than 1 loads the server and uses this value as a constant load rating. By default, the value is set to 0 and the server is not loaded. | |||||||||||||||
MSCP_SERVE_ALL |
Controls the serving of disks. The settings take effect when the system
boots. You cannot change the settings when the system is running.
Starting with OpenVMS Version 7.2, the serving types are implemented as a bit mask. To specify the type of serving your system will perform, locate the type you want in the following table and specify its value. For some systems, you may want to specify two serving types, such as serving the system disk and serving locally attached disks. To specify such a combination, add the values of each type, and specify the sum. In a mixed-version cluster that includes any systems running OpenVMS Version 7.1- x or earlier, serving all available disks is restricted to serving all disks except those whose allocation class does not match the system's node allocation class (pre-Version 7.2 meaning). To specify this type of serving, use the value 9 (which sets bit 0 and bit 3). The following table describes the serving type controlled by each bit and its decimal value.
Although the serving types are now implemented as a bit mask, the values of 0, 1, and 2, specified by bit 0 and bit 1, retain their original meanings:
If the MSCP_LOAD system parameter is 0, MSCP_SERVE_ALL is ignored. For more information about this system parameter, see Section 6.3.1. |
|||||||||||||||
NISCS_CONV_BOOT | During booting as an OpenVMS Cluster satellite, specifies whether conversational bootstraps are enabled on the computer. The default value of 0 specifies that conversational bootstraps are disabled. A value of 1 enables conversational bootstraps. | |||||||||||||||
NISCS_LAN_OVRHD | Starting with OpenVMS Version 7.3, this parameter is obsolete. This parameter was formerly provided to reserve space in a LAN packet for encryption fields applied by external encryption devices. PEDRIVER now automatically determines the maximum packet size a LAN path can deliver, including any packet-size reductions required by external encryption devices. | |||||||||||||||
NISCS_LOAD_PEA0 |
Specifies whether the port driver (PEDRIVER) is to be loaded to enable
cluster communications over the local area network (LAN). The default
value of 0 specifies that the driver is not loaded. A value of 1
specifies that that driver is loaded.
Caution: If the NISCS_LOAD_PEA0 parameter is set to 1, the VAXCLUSTER system parameter must be set to 2. This ensures coordinated access to shared resources in the OpenVMS Cluster and prevents accidental data corruption. |
|||||||||||||||
NISCS_MAX_PKTSZ |
Specifies an upper limit, in bytes, on the size of the user data area
in the largest packet sent by NISCA on any local area network (LAN).
The NISCS_MAX_PKTSZ parameter allows the system manager to change the packet size used for cluster communications on network communication paths. PEDRIVER automatically allocates memory to support the largest packet size that is usable by any virtual circuit connected to the system up to the limit set by this parameter. Its default values are different for OpenVMS Alpha and OpenVMS VAX.
PEDRIVER uses NISCS_MAX_PKTSZ to compute the maximum amount of data to transmit in any LAN packet as follows: LAN packet size <= (LAN header (padded Ethernet format) + NISCS_MAX_PKTSZ + NISCS checksum (only if data checking is enabled) + LAN CRC or FCS) The actual packet size automatically used by PEDRIVER might be smaller than the NISCS_MAX_PKTSZ limit for either of the following reasons:
The actual memory allocation includes the required data structure overhead used by PEDRIVER and the LAN drivers, in addition to the actual LAN packet size. The following table shows the minimum NISCS_MAX_PKTSZ value required to use the maximum packet size supported by LAN types.
|
|||||||||||||||
PASTDGBUF |
Specifies the number of datagram receive buffers to queue initially for
the cluster port driver's configuration poller. The initial value is
expanded during system operation, if needed.
MEMORY CHANNEL devices ignore this parameter. |
|||||||||||||||
QDSKINTERVAL |
Specifies, in seconds, the disk quorum polling interval. The maximum is
32767, the minimum is 1, and the default is 3. Lower values trade
increased overhead cost for greater responsiveness.
This parameter should be set to the same value on each cluster computer. |
|||||||||||||||
QDSKVOTES | Specifies the number of votes contributed to the cluster votes total by a quorum disk. The maximum is 127, the minimum is 0, and the default is 1. This parameter is used only when DISK_QUORUM is defined. | |||||||||||||||
RECNXINTERVAL |
Specifies, in seconds, the interval during which the connection manager
attempts to reconnect a broken connection to another computer. If a new
connection cannot be established during this period, the connection is
declared irrevocably broken, and either this computer or the other must
leave the cluster. This parameter trades faster response to certain
types of system failures for the ability to survive transient faults of
increasing duration.
This parameter should be set to the same value on each cluster computer. This parameter also affects the tolerance of the OpenVMS Cluster system for LAN bridge failures (see Section 3.4.7). |
|||||||||||||||
SCSBUFFCNT |
+On VAX systems, SCSBUFFCNT is the number of buffer descriptors
configured for all SCS devices. If no SCS device is configured on your
system, this parameter is ignored. Generally, each data transfer needs
a buffer descriptor: thus, the number of buffer descriptors limit the
number of possible simultaneous I/Os. Various performance monitors
report when a system is out of buffer descriptors for a given work
load, indicating that a larger value for SCSBUFFCNT is worth
considering.
Note: AUTOGEN provides feedback for this parameter on VAX systems only. ++On Alpha systems, the SCS buffers are allocated as needed, and SCSBUFFCNT is reserved for OpenVMS use, only. |
|||||||||||||||
SCSCONNCNT |
The initial number of SCS connections that are configured for use by
all system applications, including the one used by Directory Service
Listen. The initial number will be expanded by the system if needed.
If no SCS ports are configured on your system, this parameter is ignored. The default value is adequate for all SCS hardware combinations. Note: AUTOGEN provides feedback for this parameter on VAX systems only. |
|||||||||||||||
SCSNODE 1 |
Specifies the name of the computer. This parameter is not dynamic.
Specify SCSNODE as a string of up to six characters. Enclose the string in quotation marks. If the computer is in an OpenVMS Cluster, specify a value that is unique within the cluster. Do not specify the null string. If the computer is running DECnet for OpenVMS, the value must be the same as the DECnet node name. |
|||||||||||||||
SCSRESPCNT |
SCSRESPCNT is the total number of response descriptor table entries
(RDTEs) configured for use by all system applications.
If no SCS or DSA port is configured on your system, this parameter is ignored. |
|||||||||||||||
SCSSYSTEMID 1 |
Specifies a number that identifies the computer. This parameter is not
dynamic. SCSSYSTEMID is the low-order 32 bits of the 48-bit system
identification number.
If the computer is in an OpenVMS Cluster, specify a value that is unique within the cluster. If the computer is running DECnet for OpenVMS, calculate the value
from the DECnet address using the following formula:
Example: If the DECnet address is 2.211, calculate the
value as follows:
|
|||||||||||||||
SCSSYSTEMIDH | Specifies the high-order 16 bits of the 48-bit system identification number. This parameter must be set to 0. It is reserved by OpenVMS for future use. | |||||||||||||||
TAPE_ALLOCLASS | Specifies a numeric value from 0 to 255 to be assigned as the tape allocation class for tape devices connected to the computer. The default value is 0. | |||||||||||||||
TIMVCFAIL | Specifies the time required for a virtual circuit failure to be detected. Compaq recommends that you use the default value. Compaq further recommends that you decrease this value only in OpenVMS Cluster systems of three or fewer CPUs, use the same value on each computer in the cluster, and use dedicated LAN segments for cluster I/O. | |||||||||||||||
TMSCP_LOAD | Controls whether the TMSCP server is loaded. Specify a value of 1 to load the server and set all available TMSCP tapes served. By default, the value is set to 0, and the server is not loaded. | |||||||||||||||
TMSCP_SERVE_ALL |
Controls the serving of tapes. The settings take effect when the system
boots. You cannot change the settings when the system is running.
Starting with OpenVMS Version 7.2, the serving types are implemented as a bit mask. To specify the type of serving your system will perform, locate the type you want in the following table and specify its value. For some systems, you may want to specify two serving types, such as serving all tapes except those whose allocation class does not match. To specify such a combination, add the values of each type, and specify the sum. In a mixed-version cluster that includes any systems running OpenVMS Version 7.1- x or earlier, serving all available tapes is restricted to serving all tapes except those whose allocation class does not match the system's allocation class (pre-Version 7.2 meaning). To specify this type of serving, use the value 9, which sets bit 0 and bit 3. The following table describes the serving type controlled by each bit and its decimal value.
Although the serving types are now implemented as a bit mask, the values of 0, 1, and 2, specified by bit 0 and bit 1, retain their original meanings:
If the TMSCP_LOAD system parameter is 0, TMSCP_SERVE_ALL is ignored. |
|||||||||||||||
VAXCLUSTER |
Controls whether the computer should join or form a cluster. This
parameter accepts the following three values:
You should always set this parameter to 2 on computers intended to run in a cluster, to 0 on computers that boot from a UDA disk controller and are not intended to be part of a cluster, and to 1 (the default) otherwise. Caution: If the NISCS_LOAD_PEA0 system parameter is set to 1, the VAXCLUSTER parameter must be set to 2. This ensures coordinated access to shared resources in the OpenVMS Cluster system and prevents accidental data corruption. Data corruption may occur on shared resources if the NISCS_LOAD_PEA0 parameter is set to 1 and the VAXCLUSTER parameter is set to 0. |
|||||||||||||||
VOTES | Specifies the number of votes toward a quorum to be contributed by the computer. The default is 1. |
Previous | Next | Contents | Index |