Guidelines for OpenVMS Cluster Configurations


Previous Contents Index

7.9.1 Configuring Additional Cluster Nodes to Boot with a Shared FC Disk (Integrity servers Only)

For configuring additional nodes to boot with a shared FC Disk on an OpenVMS Cluster system, HP requires that you execute the OpenVMS Integrity servers Boot Manager (BOOT_OPTIONS.COM).

After you have enabled clustering on a single or standalone system, you can add additional Integrity server nodes to boot on a shared FC Disk, as follows:

  1. Boot the HP OpenVMS Installation Disk on the target node.
  2. From the OpenVMS Installation Menu, choose Option 7 "Execute DCL commands and procedures."
  3. Follow the instructions in Section 7.7.3. Make sure that you set the correct system root when asked to enter the OpenVMS boot flags.

    Note

    The OpenVMS Integrity servers Boot Manager (BOOT_OPTIONS.COM) utility requires the shared FC disk to be mounted. If the shared FC disk is not mounted cluster-wide, the utility will try to mount the disk with a /NOWRITE option. If the shared FC disk is already mounted cluster-wide, user intervention is required. For more information on this utility, refer to the OpenVMS System Manager's Manual, Volume 1: Essentials.

7.9.2 Online Reconfiguration

The FC interconnect can be reconfigured while the hosts are running OpenVMS. This includes the ability to:

OpenVMS does not automatically detect most FC reconfigurations. You must use the following procedure to safely perform an FC reconfiguration, and to ensure that OpenVMS has adjusted its internal data structures to match the new state:

  1. Dismount all disks that are involved in the reconfiguration.
  2. Perform the reconfiguration.
  3. Enter the following commands on each host that is connected to the Fibre Channel:


    SYSMAN> IO SCSI_PATH_VERIFY 
    SYSMAN> IO AUTOCONFIGURE 
    

The purpose of the SCSI_PATH_VERIFY command is to check each FC path in the system's IO database to determine whether the attached device has been changed. If a device change is detected, then the FC path is disconnected in the IO database. This allows the path to be reconfigured for a new device by using the IO AUTOCONFIGURE command.

Note

In the current release, the SCSI_PATH_VERIFY command only operates on FC disk devices. It does not operate on generic FC devices, such as the HSG command console LUN (CCL). (Generic FC devices have names such as $1$GGAnnnnn. This means that once the CCL of an HSG has been configured by OpenVMS with a particular device identifier, its device identifier should not be changed.

7.9.3 HSG Host Connection Table and Devices Not Configured

When a Fibre Channel host bus adapter is connected (through a Fibre Channel switch) to an HSG controller, the HSG controller creates an entry in the HSG connection table. There is a separate connection for each host bus adapter, and for each HSG port to which the adapter is connected. (Refer to the HSG CLI command SHOW CONNECTIONS for more information.)

Once an HSG connection exists, you can modify its parameters by using commands that are described in the HSG Array Controller ACS Configuration and CLI Reference Guide. Since a connection can be modified, the HSG does not delete connection information from the table when a host bus adapter is disconnected. Instead, when the user is done with a connection, the user must explicitly delete the connection using a CLI command.

The HSG controller supports a limited number of connections: ACS V8.5 allows a maximum of 64 connections and ACS V8.4 allows a maximum of 32 connections. The connection limit is the same for both single- and dual-redundant controllers. Once the maximum number of connections is reached, then new connections will not be made. When this happens, OpenVMS will not configure disk devices, or certain paths to disk devices, on the HSG.

The solution to this problem is to delete old connections that are no longer needed. However, if your Fibre Channel fabric is large and the number of active connections exceeds the HSG limit, then you must reconfigure the fabric or use FC switch zoning to "hide" some adapters from some HSG ports to reduce the number of connections.

7.10 Using Interrupt Coalescing for I/O Performance Gains (Alpha Only)

Starting with OpenVMS Alpha Version 7.3-1, interrupt coalescing is supported for the KGPSA host adapters and is off by default. Interrupt coalescing can improve performance in environments with high I/O work loads by enabling the adapter to reduce the number of interrupts seen by a host. This feature is implemented in the KGPSA firmware.

You can read and modify the current settings for interrupt coalescing by means of the Fibre Channel Control Program (FC$CP). You must have the CMKRNL privilege to use FC$CP.

If you specify a response count and a delay time (in milliseconds) with FC$CP, the adapter defers interrupting the host until that number of responses is available or until that amount of time has passed, whichever occurs first.

Interrupt coalescing may cause a performance degradation to an application that does synchronous I/O. If no other I/O is going through a given KGPSA, the latency for single writes is an average of 900 microseconds longer with interrupt coalescing enabled (or higher depending on the selected response interval).

Interrupt coalescing is set on a per KGPSA basis. You should have an average of at least 2000 I/Os per second through a given KGPSA before enabling interrupt coalescing.

The format of the command is:

RUN SYS$ETC:FC$CP FGx enable-value [delay][response-count]

In this format:

OpenVMS recommends the following settings for the FC$CP command:


$  RUN SYS$ETC:FC$CP FGx 2 1 8 

7.11 Using Fast Path in Your Configuration

Fast Path support was introduced for Fibre Channel in OpenVMS Alpha Version 7.3 and is enabled by default. It is designed for use in a symmetric multiprocessor system (SMP). When Fast Path is enabled, the I/O completion processing can occur on all the processors in the SMP system instead of only on the primary CPU. Fast Path substantially increases the potential I/O throughput on an SMP system, and helps to prevent the primary CPU from becoming saturated.

The Fast Path also support the following features, which includes additional optimizations, preallocating of resources, and providing an optimized code path for mainline code:

You can manage Fast Path programmatically using Fast Path system services. You can also manage Fast Path with DCL commands and by using the system parameters FAST_PATH and FAST_PATH_PORTS. For more information about using Fast Path, see the HP OpenVMS I/O User's Reference Manual.

7.12 FIBRE_SCAN Utility for Displaying Device Information

FIBRE_SCAN.EXE displays information about all storage devices attached to Fibre Channel on the system; both configured and nonconfigured devices are included. The displayed information includes such data as the Fibre Channel target and LUN values, the vendor and product ID, device type, port and device worldwide identifiers (WWIDs), serial number, firmware revision level, and port login state. While the program primarily describes disk and tape devices, some limited information is also displayed for controller and other generic ($n$GGAn) devices.

FIBRE_SCAN can be invoked in two modes:


$ MCR SYS$ETC:FIBRE_SCAN        ! Scans all ports on the Fibre Channel. 
$ MCR SYS$ETC:FIBRE_SCAN  PGx ! Scans only port x on the Fibre Channel. 

FIBRE_SCAN requires CMKRNL and LOG_IO privilege.

To capture the FIBRE_SCAN output in a file, use a command such as the following before invoking FIBRE_SCAN:


$ DEFINE/USER SYS$OUTPUT xxx.log 

FIBRE_SCAN is a display-only utility and is not capable of loading device drivers nor otherwise configuring devices on the Fibre Channel. To configure devices, use the SYSMAN IO AUTOCONFIGURE command.

The System Dump Analyzer (SDA) command, FC PERFORMANCE, was introduced in OpenVMS Version 8.2--1. FC PERFORMANCE is used to display I/O performance characteristics of DGA devices.

FC PERFORMANCE is also available in Version 8.2 of OpenVMS Alpha and OpenVMS Integrity servers. Furthermore, this command is available for OpenVMS Alpha Version 7.3-2 in the FIBRE_SCSI_V400 Fibre Channel patch kit. The Fibre Channel drivers of these supported versions keep a performance array for every configured disk.

You can use this SDA command to display I/O performance characteristics of a named DGA device or of all DGA devices that are configured on the system. If you omit the device name, then the performance data of all DGA disks with any nonzero data is displayed, regardless of whether the devices are currently mounted. FC PERFORMANCE arrays keep counts of I/O latency across LUNs, I/O size, and I/O direction (read/write).

I/O latency is measured from the time a request is queued to the host FC adapter until the completion interrupt occurs. Modern disk drives have an average latency in the range of 5-10 ms. Caches located in disk controllers (HSG/EVA/MSA/XP) and in the physical disk drives can occasionally produce significantly lower access times.

By default, the FC PERFORMANCE command uses the /SYSTIME qualifier to measure latency. The /SYSTIME qualifier uses EXE$GQ_SYSTIME, which is updated every millisecond. If I/O completes in less than 1 ms, it appears to have completed in zero time. When /SYSTIME is used, I/O operations that complete in less than 1 ms are shown in the display in the <2us column, where us represents microsecond.

To achieve greater accuracy, you can use the /RSCC qualifier, which uses the System Cycle Counter timer. The command qualifiers, including the timers, are described in Table 7-2.

Table 7-2 FC PERFORMANCE Command Qualifiers
Qualifier Description
[ device-name] Device whose performance characteristics are to be displayed. You can specify only one DGA device name. If you omit the name, then the performance data for every DGA device configured on the system is displayed, provided the array contains nonzero data. This includes the performance data for DGA devices that are not currently mounted.
[/RSCC|/SYSTIME] Two time qualifiers are available. The /RSCC qualifier uses a PAL call, Read System Cycle Counter, to get a CPU cycle counter. This is highly accurate but incurs the cost, in time, of two expensive PAL calls per I/O operation. To display time resolution below 1 ms, use this qualifier. The /SYSTIME qualifier is the OpenVMS system time which is updated every millisecond; it is the default.
[/COMPRESS] Suppresses the screen display of columns that contain only zeroes.
[/CSV] Causes output to be written to a comma-separated values (CSV) file, which can be read into a Microsoft® Excel spreadsheet or can be graphed.
[/CLEAR] Clears the performance array. This qualifier is useful when you are testing I/O performance on your system. Before starting a new test, you can clear the performance array for the device whose performance you are measuring and then immediately dump the contents of its performance array when the test is completed.

The following example shows the write and read output for device $1$DGA4321 resulting from the use of the SDA command FC PERFORMANCE with the /COMPRESS qualifier.

In the display, LBC (first column heading) represents logical block count. Notice that the LBCs in the rows are powers of 2. A particular row contains all counts up to the count in the next row. A particular row contains all LBC counts up to the count that begins the next row; for example, the LBC 8 row shows the count for all I/Os with LBC 8 through LCB 15. I/Os with an LBC greater than 256 are not shown in the matrix, but they are included in the "total blocks" value at the beginning of the matrix.

The counts in each column represent the number of I/Os that completed in less than the time shown in the column header. For example the <2ms column means that the I/Os in this column took less than 2ms but more than 1 ms to complete. Similarly, the I/Os in the <4ms column took less than 4ms but more than 2 ms to complete. The one exception is the <2us column when you use the /SYSTIME qualifier; all I/O that completes in less than a millisecond are included in the <2us count.

The columns headings of <2us, <2ms, <4ms, and so on, are shown in this display. If there are no values for some of the headings, those columns are not displayed because the /COMPRESS qualifier was used. If the /RSCC qualifier was used instead of the default /SYSTIME qualifier, additional headings for <4us, <8us, <16us, and <256us would be displayed.


SDA> FC PERFORMANCE $1$DGA4321/COMPRESS 
 
Fibre Channel Disk Performance Data 
---------------------------------- 
 
$1$dga4321 (write) 
 
Using EXE$GQ_SYSTIME to calculate the I/O time 
 
    accumulated write time = 2907297312us 
 
    writes = 266709 
 
    total blocks = 1432966 
 
 
I/O rate is less than 1 mb/sec 
 
 
LBC  <2us  <2ms  <4ms  <8ms <16ms <32ms <64ms <128ms <256ms <512ms <1s 
 
===  ===== ====  ====  ==== ===== ===== ===== ====   =====  ===== ==== 
 
  1   46106 20630 12396 13605 13856 15334 14675 8101    777     8    -   145488 
 
  2      52    21     8     9     5     5     6    1      2     -    -      109 
 
  4   40310 13166  3241  3545  3423  3116  2351  977     88     -    -    70217 
 
  8    2213  1355   360   264   205   225   164   82      5     -    -     4873 
 
 16   16202  6897  3283  3553  3184  2863  2323 1012    108     -    1    39426 
 
 32     678   310    36    39    47    44    33   27      6     -    -     1220 
 
 64     105    97    18    26    41    43    42   24      7     -    -      403 
 
128     592  3642   555    60    43    31    23    9      2     -    -     4957 
 
256       -     9     7     -     -     -     -    -      -     -    -       16 
 
 
 
     106258 46127 19904  21101 20804 21661 19617 10233  995     8    1   266709 
    
 
Fibre Channel Disk Performance Data 
 
---------------------------------- 
 
$1$dga4321 (read) 
 
Using EXE$GQ_SYSTIME to calculate the I/O time 
 
    accumulated read time = 1241806687us 
 
    reads = 358490 
 
    total blocks = 1110830 
 
 
I/O rate is less than 1 mb/sec 
 


LBC  <2us  <2ms  <4ms  <8ms  <16ms <32ms <64ms <128ms <256ms <512ms <2s 
=== ===== =====  ===== ===== ===== =====  ===== ===== ====== ====== ==== 
 
  1  46620 12755  6587  7767  3758  2643   1133   198     5    -    -    81466 
 
  2    574   134    66   158    82    20     21     4     1    -    -     1060 
 
  4 162060 35896 20059 18677 15851 11298   5527  1300    25    2    1   270696 
 
  8    355    79    46    97    59    36     28    10     -    -    -      710 
 
 16    241   103    32   150    77    24     13     1     -    -    -      641 
 
 32    916   355    76   302   316    61     25    10     -    -    -     2061 
 
 64    725   380    64   248   140    17     10     3     -    -    -     1587 
 
128     13    22    13    36    21     6      -     -     -    -    -      111 
 
256     10    41    28    15    49    13      2     -     -    -    -      158 
 
  
    211514 49765 26971 27450 20353 14118   6759  1526    31    2    1   358490 
 
SDA> 


Chapter 8
Configuring OpenVMS Clusters for Availability

Availability is the percentage of time that a computing system provides application service. By taking advantage of OpenVMS Cluster features, you can configure your OpenVMS Cluster system for various levels of availability, including disaster tolerance.

This chapter provides strategies and sample optimal configurations for building a highly available OpenVMS Cluster system. You can use these strategies and examples to help you make choices and tradeoffs that enable you to meet your availability requirements.

8.1 Availability Requirements

You can configure OpenVMS Cluster systems for different levels of availability, depending on your requirements. Most organizations fall into one of the broad (and sometimes overlapping) categories shown in Table 8-1.

Table 8-1 Availability Requirements
Availability Requirements Description
Conventional For business functions that can wait with little or no effect while a system or application is unavailable.
24 x 365 For business functions that require uninterrupted computing services, either during essential time periods or during most hours of the day throughout the year. Minimal down time is acceptable.
Disaster tolerant For business functions with stringent availability requirements. These businesses need to be immune to disasters like earthquakes, floods, and power failures.

8.2 How OpenVMS Clusters Provide Availability

OpenVMS Cluster systems offer the following features that provide increased availability:

8.2.1 Shared Access to Storage

In an OpenVMS Cluster environment, users and applications on multiple systems can transparently share storage devices and files. When you shut down one system, users can continue to access shared files and devices. You can share storage devices in two ways:

8.2.2 Component Redundancy

OpenVMS Cluster systems allow for redundancy of many components, including:

With redundant components, if one component fails, another is available to users and applications.

8.2.3 Failover Mechanisms

OpenVMS Cluster systems provide failover mechanisms that enable recovery from a failure in part of the OpenVMS Cluster. Table 8-2 lists these mechanisms and the levels of recovery that they provide.

Table 8-2 Failover Mechanisms
Mechanism What Happens if a Failure Occurs Type of Recovery
DECnet--Plus cluster alias If a node fails, OpenVMS Cluster software automatically distributes new incoming connections among other participating nodes. Manual. Users who were logged in to the failed node can reconnect to a remaining node.

Automatic for appropriately coded applications. Such applications can reinstate a connection to the cluster alias node name, and the connection is directed to one of the remaining nodes.

I/O paths With redundant paths to storage devices, if one path fails, OpenVMS Cluster software fails over to a working path, if one exists. Transparent, provided another working path is available.
Interconnect With redundant or mixed interconnects, OpenVMS Cluster software uses the fastest working path to connect to other OpenVMS Cluster members. If an interconnect path fails, OpenVMS Cluster software fails over to a working path, if one exists. Transparent.
Boot and disk servers If you configure at least two nodes as boot and disk servers, satellites can continue to boot and use disks if one of the servers shuts down or fails.

Failure of a boot server does not affect nodes that have already booted, providing they have an alternate path to access MSCP served disks.

Automatic.
Terminal servers and LAT software Attach terminals and printers to terminal servers. If a node fails, the LAT software automatically connects to one of the remaining nodes. In addition, if a user process is disconnected from a LAT terminal session, when the user attempts to reconnect to a LAT session, LAT software can automatically reconnect the user to the disconnected session.

Manual. Terminal users who were logged in to the failed node must log in to a remaining node and restart the application.
Generic batch and print queues You can set up generic queues to feed jobs to execution queues (where processing occurs) on more than one node. If one node fails, the generic queue can continue to submit jobs to execution queues on remaining nodes. In addition, batch jobs submitted using the /RESTART qualifier are automatically restarted on one of the remaining nodes.

Transparent for jobs waiting to be dispatched.

Automatic or manual for jobs executing on the failed node.

Autostart batch and print queues For maximum availability, you can set up execution queues as autostart queues with a failover list. When a node fails, an autostart execution queue and its jobs automatically fail over to the next logical node in the failover list and continue processing on another node. Autostart queues are especially useful for print queues directed to printers that are attached to terminal servers. Transparent.

Reference: For more information about cluster aliases, generic queues, and autostart queues, refer to the HP OpenVMS Cluster Systems manual.


Previous Next Contents Index