Previous | Contents | Index |
You should define a cluster alias name for the OpenVMS Cluster to ensure that remote access will be successful when at least one OpenVMS Cluster member is available to process the client program's requests.
The cluster alias acts as a single network node identifier for an OpenVMS Cluster system. Computers in the cluster can use the alias for communications with other computers in a DECnet network. Note that it is possible for nodes running DECnet for OpenVMS to have a unique and separate cluster alias from nodes running DECnet--Plus. In addition, clusters running DECnet--Plus can have one cluster alias for VAX, one for Alpha, and another for both.
Note: A single cluster alias can include nodes running either DECnet for OpenVMS or DECnet--Plus, but not both. Also, an OpenVMS Cluster running both DECnet for OpenVMS and DECnet--Plus requires multiple system disks (one for each).
Reference: See Chapter 4 for more information about setting up and using a cluster alias in an OpenVMS Cluster system.
Once your cluster is up and running, you can implement routine, site-specific maintenance operations---for example, backing up disks or adding user accounts, performing software upgrades and installations, running AUTOGEN with the feedback option on a regular basis, and monitoring the system for performance.
You should also maintain records of current configuration data, especially any changes to hardware or software components. If you are managing a cluster that includes satellite nodes, it is important to monitor LAN activity.
From time to time, conditions may occur that require the following special maintenance operations:
As a part of the regular system management procedure, you should copy operating system files, application software files, and associated files to an alternate device using the OpenVMS Backup utility.
Some backup operations are the same in an OpenVMS Cluster as they are on a single OpenVMS system. For example, an incremental back up of a disk while it is in use, or the backup of a nonshared disk.
Backup tools for use in a cluster include those listed in Table 10-1.
Tool | Usage |
---|---|
Online backup |
Use from a running system to back up:
Caution: Files open for writing at the time of the backup procedure may not be backed up correctly. |
Menu-driven |
If you have access to the OpenVMS Alpha distribution
CD-ROM, back up your system using the menu system
provided on that disc. This menu system, which is displayed
automatically when you boot the CD-ROM, allows you to:
Reference: For more detailed information about using the menu-driven procedure, see the OpenVMS Upgrade and Installation Manual and the HP OpenVMS System Manager's Manual. |
Plan to perform the backup process regularly, according to a schedule that is consistent with application and user needs. This may require creative scheduling so that you can coordinate backups with times when user and application system requirements are low.
Reference: See the HP OpenVMS System Management Utilities Reference Manual: A--L for complete
information about the OpenVMS Backup utility.
10.2 Updating the OpenVMS Operating System
When updating the OpenVMS operating system, follow the steps in Table 10-2.
Step | Action |
---|---|
1 | Back up the system disk. |
2 | Perform the update procedure once for each system disk. |
3 | Install any mandatory updates. |
4 | Run AUTOGEN on each node that boots from that system disk. |
5 | Run the user environment test package (UETP) to test the installation. |
6 | Use the OpenVMS Backup utility to make a copy of the new system volume. |
Reference: See the appropriate OpenVMS upgrade and
installation manual for complete instructions.
10.2.1 Rolling Upgrades
The OpenVMS operating system allows an OpenVMS Cluster system running on multiple system disks to continue to provide service while the system software is being upgraded. This process is called a rolling upgrade because each node is upgraded and rebooted in turn, until all the nodes have been upgraded.
If you must first migrate your system from running on one system disk to running on two or more system disks, follow these steps:
Step | Action |
---|---|
1 | Follow the procedures in Section 8.5 to create a duplicate disk. |
2 | Follow the instructions in Section 5.8 for information about coordinating system files. |
These sections help you add a system disk and prepare a common user
environment on multiple system disks to make the shared system files
such as the queue database, rightslists, proxies, mail, and other files
available across the OpenVMS Cluster system.
10.3 LAN Network Failure Analysis
The OpenVMS operating system provides a sample program to help you analyze OpenVMS Cluster network failures on the LAN. You can edit and use the SYS$EXAMPLES:LAVC$FAILURE_ANALYSIS.MAR program to detect and isolate failed network components. Using the network failure analysis program can help reduce the time required to detect and isolate a failed network component, thereby providing a significant increase in cluster availability.
Reference: For a description of the network failure
analysis program, refer to Appendix D.
10.4 Recording Configuration Data
To maintain an OpenVMS Cluster system effectively, you must keep accurate records about the current status of all hardware and software components and about any changes made to those components. Changes to cluster components can have a significant effect on the operation of the entire cluster. If a failure occurs, you may need to consult your records to aid problem diagnosis.
Maintaining current records for your configuration is necessary both
for routine operations and for eventual troubleshooting activities.
10.4.1 Record Information
At a minimum, your configuration records should include the following information:
The first time you execute CLUSTER_CONFIG.COM to add a satellite, the procedure creates the file NETNODE_UPDATE.COM in the boot server's SYS$SPECIFIC:[SYSMGR] directory. (For a common-environment cluster, you must rename this file to the SYS$COMMON:[SYSMGR] directory, as described in Section 5.8.2.) This file, which is updated each time you add or remove a satellite or change its Ethernet hardware address, contains all essential network configuration data for the satellite.
If an unexpected condition at your site causes configuration data to be lost, you can use NETNODE_UPDATE.COM to restore it. You can also read the file when you need to obtain data about individual satellites. Note that you may want to edit the file occasionally to remove obsolete entries.
Example 10-1 shows the contents of the file after satellites EUROPA and GANYMD have been added to the cluster.
Example 10-1 Sample NETNODE_UPDATE.COM File |
---|
$ RUN SYS$SYSTEM:NCP define node EUROPA address 2.21 define node EUROPA hardware address 08-00-2B-03-51-75 define node EUROPA load assist agent sys$share:niscs_laa.exe define node EUROPA load assist parameter $1$DGA11:<SYS10.> define node EUROPA tertiary loader sys$system:tertiary_vmb.exe define node GANYMD address 2.22 define node GANYMD hardware address 08-00-2B-03-58-14 define node GANYMD load assist agent sys$share:niscs_laa.exe define node GANYMD load assist parameter $1$DGA11:<SYS11.> define node GANYMD tertiary loader sys$system:tertiary_vmb.exe |
Reference: See the DECnet--Plus documentation for
equivalent NCL command information.
10.5 Controlling OPCOM Messages
When a satellite joins the cluster, the Operator Communications Manager (OPCOM) has the following default states:
Table 10-3 shows how to define the following system logical names in the command procedure SYS$MANAGER:SYLOGICALS.COM to override the OPCOM default states.
System Logical Name | Function |
---|---|
OPC$OPA0_ENABLE | If defined to be true, OPA0: is enabled as an operator console. If defined to be false, OPA0: is not enabled as an operator console. DCL considers any string beginning with T or Y or any odd integer to be true, all other values are false. |
OPC$OPA0_CLASSES |
Defines the operator classes to be enabled on OPA0:. The logical name
can be a search list of the allowed classes, a list of classes, or a
combination of the two. For example:
$ DEFINE/SYSTEM OP$OPA0_CLASSES CENTRAL,DISKS,TAPE You can define OPC$OPA0_CLASSES even if OPC$OPA0_ENABLE is not defined. In this case, the classes are used for any operator consoles that are enabled, but the default is used to determine whether to enable the operator console. |
OPC$LOGFILE_ENABLE | If defined to be true, an operator log file is opened. If defined to be false, no log file is opened. |
OPC$LOGFILE_CLASSES | Defines the operator classes to be enabled for the log file. The logical name can be a search list of the allowed classes, a comma-separated list, or a combination of the two. You can define this system logical even when the OPC$LOGFILE_ENABLE system logical is not defined. In this case, the classes are used for any log files that are open, but the default is used to determine whether to open the log file. |
OPC$LOGFILE_NAME | Supplies information that is used in conjunction with the default name SYS$MANAGER:OPERATOR.LOG to define the name of the log file. If the log file is directed to a disk other than the system disk, you should include commands to mount that disk in the SYLOGICALS.COM command procedure. |
The following example shows how to use the OPC$OPA0_CLASSES system logical to define the operator classes to be enabled. The following command prevents SECURITY class messages from being displayed on OPA0.
$ DEFINE/SYSTEM OPC$OPA0_CLASSES CENTRAL,PRINTER,TAPES,DISKS,DEVICES, - _$ CARDS,NETWORK,CLUSTER,LICENSE,OPER1,OPER2,OPER3,OPER4,OPER5, - _$ OPER6,OPER7,OPER8,OPER9,OPER10,OPER11,OPER12 |
In large clusters, state transitions (computers joining or leaving the cluster) generate many multiline OPCOM messages on a boot server's console device. You can avoid such messages by including the DCL command REPLY/DISABLE=CLUSTER in the appropriate site-specific startup command file or by entering the command interactively from the system manager's account.
10.6 Shutting Down a Cluster
The SHUTDOWN command of the SYSMAN utility provides five options for
shutting down OpenVMS Cluster computers:
These options are described in the following sections.
10.6.1 The NONE Option
If you select the default SHUTDOWN option NONE, the shutdown procedure performs the normal operations for shutting down a standalone computer. If you want to shut down a computer that you expect will rejoin the cluster shortly, you can specify the default option NONE. In that case, cluster quorum is not adjusted because the operating system assumes that the computer will soon rejoin the cluster.
In response to the "Shutdown options [NONE]:" prompt, you can
specify the DISABLE_AUTOSTART=n option, where n is
the number of minutes before autostart queues are disabled in the
shutdown sequence. For more information about this option, see
Section 7.13.
10.6.2 The REMOVE_NODE Option
If you want to shut down a computer that you expect will not rejoin the cluster for an extended period, use the REMOVE_NODE option. For example, a computer may be waiting for new hardware, or you may decide that you want to use a computer for standalone operation indefinitely.
When you use the REMOVE_NODE option, the active quorum in the remainder of the cluster is adjusted downward to reflect the fact that the removed computer's votes no longer contribute to the quorum value. The shutdown procedure readjusts the quorum by issuing the SET CLUSTER/EXPECTED_VOTES command, which is subject to the usual constraints described in Section 10.11.
Note: The system manager is still responsible for
changing the EXPECTED_VOTES system parameter on the remaining OpenVMS
Cluster computers to reflect the new configuration.
10.6.3 The CLUSTER_SHUTDOWN Option
When you choose the CLUSTER_SHUTDOWN option, the computer completes all shut down activities up to the point where the computer would leave the cluster in a normal shutdown situation. At this point the computer waits until all other nodes in the cluster have reached the same point. When all nodes have completed their shutdown activities, the entire cluster dissolves in one synchronized operation. The advantage of this is that individual nodes do not complete shutdown independently, and thus do not trigger state transitions or potentially leave the cluster without quorum.
When performing a CLUSTER_SHUTDOWN you must specify this option on
every OpenVMS Cluster computer. If any computer is not included,
clusterwide shutdown cannot occur.
10.6.4 The REBOOT_CHECK Option
When you choose the REBOOT_CHECK option, the shutdown procedure checks for the existence of basic system files that are needed to reboot the computer successfully and notifies you if any files are missing. You should replace such files before proceeding. If all files are present, the following informational message appears:
%SHUTDOWN-I-CHECKOK, Basic reboot consistency check completed. |
Note: You can use the REBOOT_CHECK option separately
or in conjunction with either the REMOVE_NODE or the CLUSTER_SHUTDOWN
option. If you choose REBOOT_CHECK with one of the other options, you
must specify the options in the form of a comma-separated list.
10.6.5 The SAVE_FEEDBACK Option
Use the SAVE_FEEDBACK option to enable the AUTOGEN feedback operation.
Note: Select this option only when a computer has been running long enough to reflect your typical work load.
Reference: For detailed information about AUTOGEN
feedback, see the HP OpenVMS System Manager's Manual.
10.6.6 Shutting Down TCP/IP
Where clusters use IP as the interconnect, shutting down the TCP/IP connection results in loss of connection between the node and the existing members of the cluster. As a result, the Quorum of the cluster hangs, leading to the CLUEXIT crash. Therefore, ensure that all software applications are closed before shutting down TCP/IP
Shut down TCP/IP as shown:
$@SYS$MANAGER:TCPIPCONFIG Checking TCP/IP Services for OpenVMS configuration database files. HP TCP/IP Services for OpenVMS Configuration Menu Configuration options: 1 - Core environment 2 - Client components 3 - Server components 4 - Optional components 5 - Shutdown HP TCP/IP Services for OpenVMS 6 - Startup HP TCP/IP Services for OpenVMS 7 - Run tests A - Configure options 1 - 4 [E] - Exit configuration procedure Enter configuration option: 5 Begin Shutdown... TCPIP$SHUTDOWN has detected the presence of IPCI configuration file: SYS$SYSROOT:[SYSEXE]TCPIP$CLUSTER.DAT; If you are using TCP/IP as your only cluster communication channel, then stopping TCP/IP will cause this system to CLUEXIT. Remote systems may also CLUEXIT. Non-interactive. Continuing with TCP/IP shutdown ... |
Whether your OpenVMS Cluster system uses a single common system disk or
multiple system disks, you should plan a strategy to manage dump files.
10.7.1 Controlling Size and Creation
Dump-file management is especially important for large clusters with a single system disk. For example, on a 1 GB OpenVMS Alpha computer, AUTOGEN creates a dump file in excess of 350,000 blocks.
In the event of a software-detected system failure, each computer normally writes the contents of memory as a compressed selective dump file on its system disk for analysis. AUTOGEN calculates the size of the file based on the size of physical memory and the number of processes. If system disk space is limited (as is probably the case if a single system disk is used for a large cluster), you may want to specify that no dump file be created for satellites.
You can control dump-file size and creation for each computer by specifying appropriate values for the AUTOGEN symbols DUMPSTYLE and DUMPFILE in the computer's MODPARAMS.DAT file. For example, specify dump files as shown in Table 10-4.
Value Specified | Result |
---|---|
DUMPSTYLE = 9 | Compressed selective dump file created (default) |
DUMPFILE = 0 | No dump file created |
DUMPFILE = n | Dump file of size n created |
Refer to the HP OpenVMS System Manager's Manual, Volume 2: Tuning, Monitoring, and Complex Systems for more information on dump files and Dump Off System Disk (DOSD).
Caution: Although you can configure computers without dump files, the lack of a dump file can make it difficult or impossible to determine the cause of a system failure.
The recommended method for controlling dump file size and location is using AUTOGEN and MODPARAMS.DAT. However, if necessary, the SYSGEN utility can be used explicitly. The following example shows the use of SYSGEN to modify the system dump-file size on large-memory systems:
$ MCR SYSGEN SYSGEN> USE CURRENT SYSGEN> SET DUMPSTYLE 9 SYSGEN> WRITE CURRENT SYSGEN> CREATE SYS$SYSTEM:SYSDUMP.DMP/SIZE=350000 SYSGEN> EXIT $ @SHUTDOWN |
The dump-file size of 35,000 blocks is sufficient to cover about 1 GB of memory. This size is usually large enough to encompass the information needed to analyze a system failure.
After the system reboots, you can purge SYSDUMP.DMP.
Previous | Next | Contents | Index |