Previous | Contents | Index |
Satellite nodes can be set up to reboot automatically when recovering from system failures or power failures.
Reboot behavior varies from system to system. Many systems provide a console variable that allows you to specify which device to boot from by default. However, some systems have predefined boot "sniffers" that automatically detect a bootable device. The following table describes the rebooting conditions.
AUTOGEN includes a mechanism called feedback. This mechanism examines data collected during normal system operations, and it adjusts system parameters on the basis of the collected data whenever you run AUTOGEN with the feedback option. For example, the system records each instance of a disk server waiting for buffer space to process a disk request. Based on this information, AUTOGEN can size the disk server's buffer pool automatically to ensure that sufficient space is allocated.
Execute SYS$UPDATE:AUTOGEN.COM manually as described in the
HP OpenVMS System Manager's Manual.
8.7.1 Advantages
To ensure that computers are configured adequately when they first join the cluster, you can run AUTOGEN with feedback automatically as part of the initial boot sequence. Although this step adds an additional reboot before the computer can be used, the computer's performance can be substantially improved.
HP strongly recommends that you use the feedback option. Without feedback, it is difficult for AUTOGEN to anticipate patterns of resource usage, particularly in complex configurations. Factors such as the number of computers and disks in the cluster and the types of applications being run require adjustment of system parameters for optimal performance.
HP also recommends using AUTOGEN with feedback rather than the SYSGEN utility to modify system parameters, because AUTOGEN:
When a computer is first added to an OpenVMS Cluster, system parameters that control the computer's system resources are normally adjusted in several steps, as follows:
Because the first AUTOGEN operation (initiated by either
CLUSTER_CONFIG_LAN.COM or CLUSTER_CONFIG.COM) is performed both in the
minimum environment and without feedback, a newly added computer may be
inadequately configured to run in the OpenVMS Cluster environment. For
this reason, you might want to implement additional configuration
measures like those described in Section 8.7.3 and Section 8.7.4.
8.7.3 Obtaining Reasonable Feedback
When a computer first boots into an OpenVMS Cluster, much of the computer's resource utilization is determined by the current OpenVMS Cluster configuration. Factors such as the number of computers, the number of disk servers, and the number of disks available or mounted contribute to a fixed minimum resource requirements. Because this minimum does not change with continued use of the computer, feedback information about the required resources is immediately valid.
Other feedback information, however, such as that influenced by normal user activity, is not immediately available, because the only "user" has been the system startup process. If AUTOGEN were run with feedback at this point, some system values might be set too low.
By running a simulated user load at the end of the first production boot, you can ensure that AUTOGEN has reasonable feedback information. The User Environment Test Package (UETP) supplied with your operating system contains a test that simulates such a load. You can run this test (the UETP LOAD phase) as part of the initial production boot, and then run AUTOGEN with feedback before a user is allowed to log in.
To implement this technique, you can create a command file like that in
step 1 of the procedure in Section 8.7.4, and submit the file to the
computer's local batch queue from the cluster common SYSTARTUP
procedure. Your command file conditionally runs the UETP LOAD phase and
then reboots the computer with AUTOGEN feedback.
8.7.4 Creating a Command File to Run AUTOGEN
As shown in the following sample file, UETP lets you specify a typical user load to be run on the computer when it first joins the cluster. The UETP run generates data that AUTOGEN uses to set appropriate system parameter values for the computer when rebooting it with feedback. Note, however, that the default setting for the UETP user load assumes that the computer is used as a timesharing system. This calculation can produce system parameter values that might be excessive for a single-user workstation, especially if the workstation has large memory resources. Therefore, you might want to modify the default user load setting, as shown in the sample file.
Follow these steps:
$! $! ***** SYS$COMMON:[SYSMGR]UETP_AUTOGEN.COM ***** $! $! For initial boot only, run UETP LOAD phase and $! reboot with AUTOGEN feedback. $! $ SET NOON $ SET PROCESS/PRIVILEGES=ALL $! $! Run UETP to simulate a user load for a satellite $! with 8 simultaneously active user processes. For a $! CI connected computer, allow UETP to calculate the load. $! $ LOADS = "8" $ IF F$GETDVI("PAA0:","EXISTS") THEN LOADS = "" $ @UETP LOAD 1 'loads' $! $! Create a marker file to prevent resubmission of $! UETP_AUTOGEN.COM at subsequent reboots. $! $ CREATE SYS$SPECIFIC:[SYSMGR]UETP_AUTOGEN.DONE $! $! Reboot with AUTOGEN to set SYSGEN values. $! $ @SYS$UPDATE:AUTOGEN SAVPARAMS REBOOT FEEDBACK $! $ EXIT |
$! $ NODE = F$GETSYI("NODE") $ IF F$SEARCH ("SYS$SPECIFIC:[SYSMGR]UETP_AUTOGEN.DONE") .EQS. "" $ THEN $ SUBMIT /NOPRINT /NOTIFY /USERNAME=SYSTEST - _$ /QUEUE='NODE'_BATCH SYS$MANAGER:UETP_AUTOGEN $ WAIT_FOR_UETP: $ WRITE SYS$OUTPUT "Waiting for UETP and AUTOGEN... ''F$TIME()'" $ WAIT 00:05:00.00 ! Wait 5 minutes $ GOTO WAIT_FOR_UETP $ ENDIF $! |
When you boot the computer, it runs UETP_AUTOGEN.COM to simulate the user load you have specified, and it then reboots with AUTOGEN feedback to set appropriate system parameter values.
This chapter provides guidelines for building OpenVMS Cluster systems that include many computers---approximately 20 or more---and describes procedures that you might find helpful. (Refer to the OpenVMS Cluster Software Software Product Description (SPD) for configuration limitations.) Typically, such OpenVMS Cluster systems include a large number of satellites.
Note that the recommendations in this chapter also can prove beneficial in some clusters with fewer than 20 computers. Areas of discussion include:
When building a new large cluster, you must be prepared to run AUTOGEN and reboot the cluster several times during the installation. The parameters that AUTOGEN sets for the first computers added to the cluster will probably be inadequate when additional computers are added. Readjustment of parameters is critical for boot and disk servers.
One solution to this problem is to run the UETP_AUTOGEN.COM command procedure (described in Section 8.7.4) to reboot computers at regular intervals as new computers or storage interconnects are added. For example, each time there is a 10% increase in the number of computers, storage, or interconnects, you should run UETP_AUTOGEN.COM. For best results, the last time you run the procedure should be as close as possible to the final OpenVMS Cluster environment.
To set up a new, large OpenVMS Cluster, follow these steps:
Step | Task |
---|---|
1 | Configure boot and disk servers using the CLUSTER_CONFIG_LAN.COM or the CLUSTER_CONFIG.COM command procedure (described in Chapter 8). |
2 | Install all layered products and site-specific applications required for the OpenVMS Cluster environment, or as many as possible. |
3 | Prepare the cluster startup procedures so that they are as close as possible to those that will be used in the final OpenVMS Cluster environment. |
4 | Add a small number of satellites (perhaps two or three) using the cluster configuration command procedure. |
5 | Reboot the cluster to verify that the startup procedures work as expected. |
6 | After you have verified that startup procedures work, run UETP_AUTOGEN.COM on every computer's local batch queue to reboot the cluster again and to set initial production environment values. When the cluster has rebooted, all computers should have reasonable parameter settings. However, check the settings to be sure. |
7 | Add additional satellites to double their number. Then rerun UETP_AUTOGEN on each computer's local batch queue to reboot the cluster, and set values appropriately to accommodate the newly added satellites. |
8 | Repeat the previous step until all satellites have been added. |
9 | When all satellites have been added, run UETP_AUTOGEN a final time on each computer's local batch queue to reboot the cluster and to set new values for the production environment. |
For best performance, do not run UETP_AUTOGEN on every computer simultaneously, because the procedure simulates a user load that is probably more demanding than that for the final production environment. A better method is to run UETP_AUTOGEN on several satellites (those with the least recently adjusted parameters) while adding new computers. This technique increases efficiency because little is gained when a satellite reruns AUTOGEN shortly after joining the cluster.
For example, if the entire cluster is rebooted after 30 satellites have
been added, few adjustments are made to system parameter values for the
28th satellite added, because only two satellites have joined the
cluster since that satellite ran UETP_AUTOGEN as part of its initial
configuration.
9.2 General Booting Considerations
Two general booting considerations, concurrent booting and minimizing
boot time, are described in this section.
9.2.1 Concurrent Booting
Concurrent booting occurs after a power or a site failure when all the
nodes are rebooted simultaneously. This results in significant I/O load
on the interconnects. Also, results in network activity due to SCS
traffic required for synchronizing. All satellites wait to reload
operating system. As soon as the boot server is available, they begin
to boot in parallel resulting in elapsed time during login.
9.2.2 Minimizing Boot Time
A large cluster needs to be carefully configured so that there is sufficient capacity to boot the desired number of nodes in the desired amount of time. The effect of 96 satellites rebooting could induce an I/O bottleneck that can stretch the OpenVMS Cluster reboot times into hours. The following list provides a few methods to minimize boot times.
OpenVMS clusters can use TCP/IP stack for communicating with other nodes in the cluster and passing SCS traffic. To be able to use TCP/IP for cluster communication a node has to be configured. For details on how to configure a node to use OpenVMS Cluster over IP, see Section 8.2.3.1. After enabling this feature, load TCP/IP stack early in the boot time during load. OpenVMS executive has been modified to load TCP/IP execlets early in the boot time so that the node can exchange SCS messages with other existing nodes of the cluster. This feature also uses configuration files which get loaded during boot time. It is necessary to ensure that these configuration files are correctly generated during the configuration. The following are some of considerations for booting.
OpenVMS Cluster satellite nodes use a single LAN adapter for the initial stages of booting. If a satellite is configured with multiple LAN adapters, the system manager can specify with the console BOOT command which adapter to use for the initial stages of booting. Once the system is running, the OpenVMS Cluster uses all available LAN adapters. This flexibility allows you to work around broken adapters or network problems.
For Alpha and Integrity cluster satellites, the network boot device cannot be a prospective member of a LAN Failover Set. For example, if you create a LAN Failover Set, LLA consisting of EWA and EWB, to be active when the system boots, you cannot boot the system as a satellite over the LAN devices EWA or EWB.
The procedures and utilities for configuring and booting satellite
nodes vary between Integrity servers and Alpha systems.
9.3.1 Differences between Alpha and Integrity server Satellites
Table 9-1 lists the differences between Alpha and Integrity server satellites.
Alpha | Integrity servers | |
---|---|---|
Boot Protocol | MOP | PXE(BOOTP/DHCP/TFTP) |
Crash Dumps | May crash to remote system disk or to local disk via Dump Off the System Disk (DOSD) | Requires DOSD. Crashing to the remote disk is not possible. |
Error Log Buffers | Always written to the remote system disk | Error log buffers are written to the same disk as DOSD |
File protections | No different than standard system disk | Requires that all loadable execlets are W:RE (the default case) and that certain files have ACL access via the VMS$SATELLITE_ ACCESS identifier |
Complete the items in the following Table 9-2 before proceeding with satellite booting.
Step | Action |
---|---|
1 |
Configure disk server LAN adapters.
Because disk-serving activity in an OpenVMS Cluster system can generate a substantial amount of I/O traffic on the LAN, boot and disk servers should use the highest-bandwidth LAN adapters in the cluster. The servers can also use multiple LAN adapters in a single system to distribute the load across the LAN adapters. The following list suggests ways to provide sufficient network bandwidth:
|
2 | If the MOP server node and system-disk server node are not already configured as cluster members, follow the directions in Section 8.4 for using the cluster configuration command procedure to configure each of the Alpha nodes. Include multiple boot and disk servers to enhance availability and distribute I/O traffic over several cluster nodes. |
3 | Configure additional memory for disk serving. |
4 | Run the cluster configuration procedure on the Alpha node for each satellite you want to boot into the OpenVMS Cluster. |
To boot a satellite, enter the following command:
>>> BOOT LAN-adapter-device-name |
In the example, the LAN-adapter-device-name could be any valid LAN adapter name, for example EZA0 or XQB0.
If you need to perform a conversational boot, use the command shown. At the Alpha system console prompt (>>>), enter:
>>> b -flags 0,1 eza0 |
In this example, -flags stands for the flags command line qualifier, which takes two values:
The argument eza0 is the LAN adapter to be used for booting.
Finally, notice that a load file is not specified in this boot command line. For satellite booting, the load file is part of the node description in the DECnet or LANCP database.
Previous | Next | Contents | Index |