HP OpenVMS Systems Documentation |
OpenVMS Cluster Systems
10.7.1 Removing a ComputerIf you want to shut down a computer that you expect will not rejoin the cluster for an extended period, use the REMOVE_NODE option. For example, a computer may be waiting for new hardware, or you may decide that you want to use a computer for standalone operation indefinitely. When you use the REMOVE_NODE option, the active quorum in the remainder of the cluster is adjusted downward to reflect the fact that the removed computer's votes no longer contribute to the quorum value. The shutdown procedure readjusts the quorum by issuing the SET CLUSTER/EXPECTED_VOTES command, which is subject to the usual constraints described in Section 10.12.
Note: The system manager is still responsible for
changing the EXPECTED_VOTES system parameter on the remaining OpenVMS
Cluster computers to reflect the new configuration.
When you choose the CLUSTER_SHUTDOWN option, the computer completes all shut down activities up to the point where the computer would leave the cluster in a normal shutdown situation. At this point the computer waits until all other nodes in the cluster have reached the same point. When all nodes have completed their shutdown activities, the entire cluster dissolves in one synchronized operation. The advantage of this is that individual nodes do not complete shutdown independently, and thus do not trigger state transitions or potentially leave the cluster without quorum.
When performing a CLUSTER_SHUTDOWN you must specify this option on
every OpenVMS Cluster computer. If any computer is not included,
clusterwide shutdown cannot occur.
When you choose the REBOOT_CHECK option, the shutdown procedure checks for the existence of basic system files that are needed to reboot the computer successfully and notifies you if any files are missing. You should replace such files before proceeding. If all files are present, the following informational message appears:
Note: You can use the REBOOT_CHECK option separately
or in conjunction with either the REMOVE_NODE or the CLUSTER_SHUTDOWN
option. If you choose REBOOT_CHECK with one of the other options, you
must specify the options in the form of a comma-separated list.
Use the SAVE_FEEDBACK option to enable the AUTOGEN feedback operation. Note: Select this option only when a computer has been running long enough to reflect your typical work load.
Reference: For detailed information about AUTOGEN
feedback, see the OpenVMS System Manager's Manual.
Whether your OpenVMS Cluster system uses a single common system disk or
multiple system disks, you should plan a strategy to manage dump files.
Dump-file management is especially important for large clusters with a single system disk. For example, on a 256 MB OpenVMS Alpha computer, AUTOGEN creates a dump file in excess of 500,000 blocks. In the event of a software-detected system failure, each computer normally writes the contents of memory to a full dump file on its system disk for analysis. By default, this full dump file is the size of physical memory plus a small number of pages. If system disk space is limited (as is probably the case if a single system disk is used for a large cluster), you may want to specify that no dump file be created for satellites or that AUTOGEN create a selective dump file. The selective dump file is typically 30% to 60% of the size of a full dump file. You can control dump-file size and creation for each computer by specifying appropriate values for the AUTOGEN symbols DUMPSTYLE and DUMPFILE in the computer's MODPARAMS.DAT file. Specify dump files as shown in Table 10-4.
Caution: Although you can configure computers without dump files, the lack of a dump file can make it difficult or impossible to determine the cause of a system failure. For example, use the following commands to modify the system dump-file size on large-memory systems:
The dump-file size of 70,000 blocks is sufficient to cover about 32 MB of memory. This size is usually large enough to encompass the information needed to analyze a system failure.
After the system reboots, you can purge SYSDUMP.DMP.
Another option for saving dump-file space is to share a single dump file among multiple computers. This technique makes it possible to analyze isolated computer failures. But dumps are lost if multiple computers fail at the same time or if a second computer fails before you can analyze the first failure. Because boot server failures have a greater impact on cluster operation than do failures of other computers you should configure full dump files on boot servers to help ensure speedy analysis of problems. VAX systems cannot share dump files with Alpha computers and vice versa. However, you can share a single dump file among multiple Alpha computers and another single dump file among VAX computers. Follow these steps for each operating system:
10.9 Maintaining the Integrity of OpenVMS Cluster MembershipBecause multiple LAN and mixed-interconnect clusters coexist on a single extended LAN, the operating system provides mechanisms to ensure the integrity of individual clusters and to prevent access to a cluster by an unauthorized computer. The following mechanisms are designed to ensure the integrity of the cluster:
The purpose of the cluster group number and password is to prevent accidental access to the cluster by an unauthorized computer. Under normal conditions, the system manager specifies the cluster group number and password either during installation or when you run CLUSTER_CONFIG.COM (see Example 8-11) to convert a standalone computer to run in an OpenVMS Cluster system. OpenVMS Cluster systems use these mechanisms to protect the integrity of the cluster in order to prevent problems that could otherwise occur under circumstances like the following:
Reference: These mechanisms are discussed in
Section 10.9.1 and Section 8.2.1, respectively.
The cluster authorization file, SYS$COMMON:[SYSEXE]CLUSTER_AUTHORIZE.DAT, contains the cluster group number and (in scrambled form) the cluster password. The CLUSTER_AUTHORIZE.DAT file is accessible only to users with the SYSPRV privilege. Under normal conditions, you need not alter records in the CLUSTER_AUTHORIZE.DAT file interactively. However, if you suspect a security breach, you may want to change the cluster password. In that case, you use the SYSMAN utility to make the change. To change the cluster password, follow these instructions:
10.9.2 ExampleExample 10-4 illustrates the use of the SYSMAN utility to change the cluster password.
10.10 Adjusting Maximum Packet Size for LAN Configurations
You can adjust the maximum packet size for LAN configurations with the
NISCS_MAX_PKTSZ system parameter.
Starting with OpenVMS Version 7.3, the operating system (PEdriver) automatically detects the maximum packet size of all the virtual circuits to which the system is connected. If the maximum packet size of the system's interconnects is smaller than the default packet-size setting, PEdriver automatically reduces the default packet size.
For earlier versions of OpenVMS (VAX Version 6.0 to Version 7.2; Alpha
Version 1.5 to Version 7.2-1), the NISCS_MAX_PKTSZ parameter should be
set to 1498 for Ethernet clusters and to 4468 for FDDI clusters.
To obtain this parameter's current, default, minimum, and maximum values, issue the following command:
You can use the NISCS_MAX_PKTSZ parameter to reduce packet size, which in turn can reduce memory consumption. However, reducing packet size can also increase CPU utilization for block data transfers, because more packets will be required to transfer a given amount of data. Lock message packets are smaller than the minimum value, so the NISCS_MAX_PKTSZ setting will not affect locking performance. You can also use NISCS_MAX_PKTSZ to force use of a common packet size on all LAN paths by bounding the packet size to that of the LAN path with the smallest packet size. Using a common packet size can avoid VC closure due to packet size reduction when failing down to a slower, smaller packet size network.
If a memory-constrained system, such as a workstation, has adapters to
a network path with large-size packets, such as FDDI or Gigabit
Ethernet with jumbo packets, then you may want to conserve memory by
reducing the value of the NISCS_MAX_PKTSZ parameter.
If you decide to change the value of the NISCS_MAX_PKTSZ parameter,
edit the SYS$SPECIFIC:[SYSEXE]MODPARAMS.DAT file to permit AUTOGEN to
factor the changed packet size into its calculations.
On Alpha systems, process quota default values in SYSUAF.DAT are often
higher than the SYSUAF.DAT defaults on VAX systems. How, then, do you
choose values for processes that could run on Alpha systems or on VAX
systems in an OpenVMS Cluster? Understanding how a process is assigned
quotas when the process is created in a dual-architecture OpenVMS
Cluster configuration will help you manage this task.
The quotas to be used by a new process are determined by the OpenVMS LOGINOUT software. LOGINOUT works the same on OpenVMS Alpha and OpenVMS VAX systems. When a user logs in and a process is started, LOGINOUT uses the larger of:
Example: LOGINOUT compares the value of the account's
ASTLM process limit (as defined in the common SYSUAF.DAT) with the
value of the PQL_MASTLM system parameter on the host Alpha system or on
the host VAX system in the OpenVMS Cluster.
The letter M in PQL_M means minimum. The PQL_Mquota system parameters set a minumum value for the quotas. In the Current and Default columns of the following edited SYSMAN display, note how the current value of each PQL_Mquota parameter exceeds its system-defined default value in most cases. Note that the following display is Alpha specific. A similar SYSMAN display on a VAX system would show "Pages" in the Unit column instead of "Pagelets".
In this display, the values for many PQL_Mquota parameters
increased from the defaults to their current values. Typically, this
happens over time when AUTOGEN feedback is run periodically on your
system. The PQL_Mquota values also can change, of course, when
you modify the values in MODPARAMS.DAT or in SYSMAN. As you consider
the use of a common SYSUAF.DAT in an OpenVMS Cluster with both VAX and
Alpha computers, keep the dynamic nature of the PQL_Mquota
parameters in mind.
The following table summarizes common SYSUAF.DAT scenarios and probable results on VAX and Alpha computers in an OpenVMS Cluster system.
You might decide to experiment with the higher process-quota values that usually are associated with an OpenVMS Alpha system's SYSUAF.DAT as you determine values for a common SYSUAF.DAT in an OpenVMS Cluster environment. The higher Alpha-level process quotas might be appropriate for processes created on host VAX nodes in the OpenVMS Cluster if the VAX systems have large available memory resources. You can determine the values that are appropriate for processes on your VAX and Alpha systems by experimentation and modification over time. Factors in your decisions about appropriate limit and quota values for each process will include the following:
10.12 Restoring Cluster QuorumDuring the life of an OpenVMS Cluster system, computers join and leave the cluster. For example, you may need to add more computers to the cluster to extend the cluster's processing capabilities, or a computer may shut down because of a hardware or fatal software error. The connection management software coordinates these cluster transitions and controls cluster operation. When a computer shuts down, the remaining computers, with the help of the connection manager, reconfigure the cluster, excluding the computer that shut down. The cluster can survive the failure of the computer and continue process operations as long as the cluster votes total is greater than the cluster quorum value. If the cluster votes total falls below the cluster quorum value, the cluster suspends the execution of all processes.
|