Guidelines for OpenVMS Cluster Configurations
Guidelines for OpenVMS Cluster Configurations
 
 
9.3.2 Six-Satellite OpenVMS Cluster with Two Boot Nodes
Figure 9-5 shows six satellites and two boot servers connected by 
Ethernet. Boot server 1 and boot server 2 perform MSCP server dynamic 
load balancing: they arbitrate and share the work load between them and 
if one node stops functioning, the other takes over. MSCP dynamic load 
balancing requires shared access to storage.
 
Figure 9-5 Six-Satellite LAN OpenVMS Cluster with Two Boot 
Nodes
  
 
The advantages and disadvantages of the configuration shown in 
Figure 9-5 include:
 
Advantages
 
 
  - The MSCP server is enabled for adding satellites and allows access 
  to more storage.
  
 - Two boot servers perform MSCP dynamic load balancing.
  
Disadvantage
 
 
  - The Ethernet is a potential bottleneck and a single point of 
  failure.
  
If the LAN in Figure 9-5 became an OpenVMS Cluster bottleneck, this 
could lead to a configuration like the one shown in Figure 9-6.
9.3.3 Twelve-Satellite LAN OpenVMS Cluster with Two LAN Segments  
 
Figure 9-6 shows 12 satellites and 2 boot servers connected by two 
Ethernet segments. These two Ethernet segments are also joined by a LAN 
bridge. Because each satellite has dual paths to storage, this 
configuration also features MSCP dynamic load balancing.
 
Figure 9-6 Twelve-Satellite OpenVMS Cluster with Two LAN 
Segments
  
 
The advantages and disadvantages of the configuration shown in 
Figure 9-6 include:
 
Advantages
 
 
  - The MSCP server is enabled for adding satellites and allows access 
  to more storage.
  
 - Two boot servers perform MSCP dynamic load balancing. 
 From the 
  perspective of a satellite on the Ethernet LAN, the dual paths to the 
  Alpha and Integrity server nodes create the advantage of MSCP load 
  balancing.
   -  Two LAN segments provide twice the amount of LAN capacity.
  
Disadvantages
 
 
  - This OpenVMS Cluster configuration is limited by the number of 
  satellites that it can support.
  
 - The single HSG controller is a potential bottleneck and a single 
  point of failure.
  
If the OpenVMS Cluster in Figure 9-6 needed to grow beyond its 
current limits, this could lead to a configuration like the one shown 
in Figure 9-7.
9.3.4 Forty-Five Satellite OpenVMS Cluster with Intersite Link
 
Figure 9-7 shows a large, 51-node OpenVMS Cluster that includes 45 
satellite nodes. The three boot servers, Integrity server 1, Integrity 
server 2, and Integrity server 3, share three disks: a common disk, a 
page and swap disk, and a system disk. The intersite link is connected 
to routers and has three LAN segments attached. Each segment has 15 
workstation satellites as well as its own boot node.
 
Figure 9-7 Forty-Five Satellite OpenVMS Cluster with Intersite 
Link
  
 
The advantages and disadvantages of the configuration shown in 
Figure 9-7 include:
 
Advantages
 
 
  - Decreased boot time, especially for an OpenVMS Cluster with such a 
  high node count. 
 Reference: For information about 
  booting an OpenVMS Cluster like the one in Figure 9-7 see 
  Section 10.2.4.
   - The MSCP server is enabled for satellites to access more storage.
  
 - Each boot server has its own page and swap disk, which reduces I/O 
  activity on the system disks.
  
 - All of the environment files for the entire OpenVMS Cluster are on 
  the common disk. This frees the satellite boot servers to serve only 
  root information to the satellites. 
 Reference: For 
  more information about common disks and page and swap disks, see 
  Section 10.2.
  
Disadvantages
 
 
  - The satellite boot servers on the Ethernet LAN segments can boot 
  satellites only on their own segments.
  
9.3.5 High-Powered Workstation OpenVMS Cluster (1995 Technology)
Figure 9-8 shows an OpenVMS Cluster configuration that provides high 
performance and high availability on the FDDI ring.
 
Figure 9-8 High-Powered Workstation Server Configuration 
1995
  
 
In Figure 9-8, several Alpha workstations, each with its own system 
disk, are connected to the FDDI ring. Putting Alpha workstations on the 
FDDI provides high performance because each workstation has direct 
access to its system disk. In addition, the FDDI bandwidth is higher 
than that of the Ethernet. Because Alpha workstations have FDDI 
adapters, putting these workstations on an FDDI is a useful alternative 
for critical workstation requirements. FDDI is 10 times faster than 
Ethernet, and Alpha workstations have processing capacity that can take 
advantage of FDDI's speed. (The speed of Fast Ethernet matches that of 
FDDI, and Gigabit Ethernet is 10 times faster than Fast Ethernet and 
FDDI.)
9.3.6 High-Powered Workstation OpenVMS Cluster (2004  Technology)
 
Figure 9-9 shows an OpenVMS Cluster configuration that provides high 
performance and high availability using Gigabit Ethernet for the LAN 
and Fibre Channel for storage.
 
Figure 9-9 High-Powered Workstation Server Configuration 
2004
  
 
In Figure 9-9, several Alpha workstations, each with its own system 
disk, are connected to the Gigabit Ethernet LAN. Putting Alpha 
workstations on the Gigabit Ethernet LAN provides high performance 
because each workstation has direct access to its system disk. In 
addition, the Gigabit Ethernet bandwidth is 10 times higher than that 
of the FDDI. Alpha workstations have processing capacity that can take 
advantage of Gigabit Ethernet's speed.
9.3.7 Guidelines for OpenVMS Clusters with Satellites
 
The following are guidelines for setting up an OpenVMS Cluster with 
satellites:
 
  - Extra memory is required for satellites of large LAN configurations 
  because each node must maintain a connection to every other node.
  
 - Configure network to eliminate bottlenecks (that is, allocate 
  sufficient bandwidth within the network cloud and on server 
  connections).
  
 - Maximize resources with MSCP dynamic load balancing, as shown in 
  Figure 9-5 and Figure 9-6.
  
 - Keep the number of nodes that require MSCP serving minimal for good 
  performance. 
 Reference: See Section 9.5.1 for more 
  information about MSCP overhead.
   - To save time, ensure that the booting sequence is efficient, 
  particularly when the OpenVMS Cluster is large or has multiple 
  segments. See Section 10.2.4 for more information about how to reduce 
  LAN and system disk activity and how to boot separate groups of nodes 
  in sequence.
  
 - Use multiple LAN adapters per host, and connect to independent LAN 
  paths. This enables simultaneous two-way communication between nodes 
  and allows traffic to multiple nodes to be spread over the available 
  LANs. In addition, multiple LAN adapters increase failover capabilities.
  
9.3.8 Extended LAN Configuration Guidelines
You can use bridges and switches between LAN segments to form an 
extended LAN. This can increase availability, distance, and aggregate 
bandwidth as compared with a single LAN. However, an extended LAN can 
increase delay and can reduce bandwidth on some paths. Factors such as 
packet loss, queuing delays, and packet size can also affect network 
performance. Table 9-3 provides guidelines for ensuring adequate 
LAN performance when dealing with such factors.  
 
  Table 9-3 Extended LAN Configuration Guidelines
  
    | Factor  | 
    Guidelines  | 
   
  
    | 
      Propagation delay
     | 
    
      The amount of time it takes a packet to traverse the LAN depends on the 
      distance it travels and the number of times it is relayed from one link 
      to another through a switch or bridge. If responsiveness is critical, 
      then you must control these factors.
       
      
      
               For high-performance applications, limit the number of switches between 
               nodes to two. For situations in which high performance is not required, 
               you can use up to seven switches or bridges between nodes.
      | 
   
  
    | 
      Queuing delay
     | 
    
      Queuing occurs when the instantaneous arrival rate at switches or 
      bridges and host adapters exceeds the service rate. You can control 
      queuing by:
      
      - Reducing the number of switches or bridges between nodes that 
      communicate frequently.
      
 - Using only high-performance switches or bridges and adapters.
      
 - Reducing traffic bursts in the LAN. In some cases, for example, you 
      can tune applications by combining small I/Os so that a single packet 
      is produced rather than a burst of small ones.
      
 - Reducing LAN segment and host processor utilization levels by using 
      faster processors and faster LANs, and by using switches or bridges for 
      traffic isolation.
      
  
     | 
   
  
    | 
      Packet loss
     | 
    
      Packets that are not delivered by the LAN require retransmission, which 
      wastes system and network resources, increases delay, and reduces 
      bandwidth. Bridges and adapters discard packets when they become 
      congested. You can reduce packet loss by controlling queuing, as 
      previously described.
        Packets are also discarded when they become damaged in transit. You 
      can control this problem by observing LAN hardware configuration rules, 
      removing sources of electrical interference, and ensuring that all 
      hardware is operating correctly.
       
      
      
               The retransmission timeout rate, which is a symptom of packet loss, 
               must be less than 1 timeout in 1000 transmissions for OpenVMS Cluster 
               traffic from one node to another. LAN paths that are used for 
               high-performance applications should have a significantly lower rate. 
               Monitor the occurrence of retransmission timeouts in the OpenVMS 
               Cluster.
       
      Reference: For information about monitoring the 
      occurrence of retransmission timeouts, see HP OpenVMS Cluster Systems.
      | 
   
  
    | 
      Switch or bridge recovery delay
     | 
    
      Choose switches or bridges with fast self-test time and adjust them for 
      fast automatic reconfiguration. This includes adjusting spanning tree 
      parameters to match network requirements.
       
      Reference: Refer to HP OpenVMS Cluster Systems for more information 
      about LAN bridge failover.
      | 
   
  
    | 
      Bandwidth
     | 
    
      All LAN paths used for OpenVMS Cluster communication must operate with 
      a nominal bandwidth of at least 10 Mb/s. The average LAN segment 
      utilization should not exceed 60% for any 10-second interval.
        For Gigabit Ethernet and 10Gigabit Ethernet configurations, enable 
      jumbo frames where possible.
      | 
   
  
    | 
      Traffic isolation
     | 
    
      Use switches or bridges to isolate and localize the traffic between 
      nodes that communicate with each other frequently. For example, use 
      switches or bridges to separate the OpenVMS Cluster from the rest of 
      the LAN and to separate nodes within an OpenVMS Cluster that 
      communicate frequently from the rest of the OpenVMS Cluster.
        Provide independent paths through the LAN between critical systems 
      that have multiple adapters.
      | 
   
  
    | 
      Packet size
     | 
    
       Ensure that the LAN path supports a data field of at least 4474 bytes 
       end to end. For Gigabit Ethernet devices using jumbo frames, set 
       NISCS_MAX_PTKSZ to 8192 bytes.
        Some failures cause traffic to switch from an LAN path that 
      supports a large packet size to a path that supports only smaller 
      packets. It is possible to implement automatic detection and recovery 
      from these kinds of failures.
      | 
   
 
9.3.9 System Parameters for OpenVMS Clusters
In an OpenVMS Cluster with satellites and servers, specific system 
parameters can help you manage your OpenVMS Cluster more efficiently. 
Table 9-4 gives suggested values for these system parameters.  
 
  Table 9-4 OpenVMS Cluster System Parameters
  
    | System Parameter  | 
    Value for   Satellites  | 
    Value for   Servers  | 
   
  
    | 
      LOCKDIRWT
     | 
    
      0
     | 
    
      1-4. The setting of LOCKDIRWT influences a node's willingness to serve 
      as a resource directory node and also may be used to determine 
      mastership of resource trees. In general, a setting greater than 1 is 
      determined after careful examination of a cluster node's specific 
      workload and application mix and is beyond the scope of this document.
     | 
   
  
    | 
      SHADOW_MAX_COPY
     | 
    
      0
     | 
    
      4, where a significantly higher setting may be appropriate for your 
      environment
     | 
   
  
    | 
      MSCP_LOAD
     | 
    
      0
     | 
    
      1
     | 
   
  
    | 
      NPAGEDYN
     | 
    
      Higher than for standalone node
     | 
    
      Higher than for satellite node
     | 
   
  
    | 
      PAGEDYN
     | 
    
      Higher than for standalone node
     | 
    
      Higher than for satellite node
     | 
   
  
    | 
      VOTES
     | 
    
      0
     | 
    
      1
     | 
   
  
    | 
      EXPECTED_VOTES
     | 
    
      Sum of OpenVMS Cluster votes
     | 
    
      Sum of OpenVMS Cluster votes
     | 
   
  
    | 
      RECNXINTERVL
      1
     | 
    
      Equal on all nodes
     | 
    
      Equal on all nodes
     | 
   
 
 
1Correlate with bridge timers and LAN utilization.
 
 
Reference: For more information about these 
parameters, see HP OpenVMS Cluster Systems and HP Volume Shadowing for OpenVMS.
 
9.4 Scalability in a Cluster over IP
Cluster over IP allows a maximum of 96 nodes to be connected across 
geographical locations along with the support for storage. The usage of 
extended LAN configuration can be replaced by IP cluster communication. 
The LAN switches and bridges are replaced by the routers, thus 
overcoming the disadvantages of the LAN components. The routers can be 
used for connecting two or more logical subnets, which do not 
necessarily map one-to-one to the physical interfaces of the router.
9.4.1 Multiple node IP based Cluster System
 
Figure 9-10 shows an IP based cluster system that has multiple nodes 
connected to the system. The nodes can be located across different 
geographical locations thus, enabling disaster tolerance and high 
availability.
 
Figure 9-10 Multiple node IP based Cluster System
  
 
Advantages
 
 
  - Cluster communication on IP supports 10 Gigabit Ethernet that 
  provides a throughput of 10 Gb/s
  
 - Easy to configure
  
 - All nodes can access the other nodes and can have shared direct 
  access to storage
  
9.4.2 Guidelines for Configuring IP based Cluster
The following are the guidelines for setting up a cluster using IP 
cluster communication:
 
  - Requires the IP unicast address for remote node discovery
  
 - Requires IP multicast address, which is system administrator scoped 
  and is computed dynamically using the cluster group number. See 
  OpenVMS Cluster Systems for information on cluster 
  configuration.
  
 - IP address of the local machine is required along with the network 
  mask address
  
 - Requires the local LAN adapter on which the IP address will be 
  configured and is used for SCS.
  
9.5 Scaling for I/Os
The ability to scale I/Os is an important factor in the growth of your 
OpenVMS Cluster. Adding more components to your OpenVMS Cluster 
requires high I/O throughput so that additional components do not 
create bottlenecks and decrease the performance of the entire OpenVMS 
Cluster. Many factors can affect I/O throughput:
 
  - Direct access or MSCP served access to storage
  
 - Settings of the MSCP_BUFFER and MSCP_CREDITS system parameters
  
 - File system technologies, such as Files-11
  
 - Disk technologies, such as magnetic disks, solid-state disks, and 
  DECram
  
 - Read/write ratio
  
 - I/O size
  
 - Caches and cache "hit" rate
  
 - "Hot file" management
  
 - RAID striping and host-based striping
  
 - Volume shadowing
  
These factors can affect I/O scalability either singly or in 
combination. The following sections explain these factors and suggest 
ways to maximize I/O throughput and scalability without having to 
change in your application.
 
Additional factors that affect I/O throughput are types of 
interconnects and types of storage subsystems.
 
Reference: For more information about interconnects, 
see Chapter 4. For more information about types of storage 
subsystems, see Chapter 5. For more information about MSCP_BUFFER 
and MSCP_CREDITS, see HP OpenVMS Cluster Systems.)
9.5.1 MSCP Served Access to Storage
 
MSCP server capability provides a major benefit to OpenVMS Clusters: it 
enables communication between nodes and storage that are not directly 
connected to each other. However, MSCP served I/O does incur overhead. 
Figure 9-11 is a simplification of how packets require extra handling 
by the serving system.
 
Figure 9-11 Comparison of Direct and MSCP Served Access
  
 
In Figure 9-11, an MSCP served packet requires an extra 
"stop" at another system before reaching its destination. 
When the MSCP served packet reaches the system associated with the 
target storage, the packet is handled as if for direct access.
 
In an OpenVMS Cluster that requires a large amount of MSCP serving, I/O 
performance is not as efficient and scalability is decreased. The total 
I/O throughput is approximately 20% less when I/O is MSCP served than 
when it has direct access. Design your configuration so that a few 
large nodes are serving many satellites rather than satellites serving 
their local storage to the entire OpenVMS Cluster.
9.5.2 Disk Technologies
 
In recent years, the ability of CPUs to process information has far 
outstripped the ability of I/O subsystems to feed processors with data. 
The result is an increasing percentage of processor time spent waiting 
for I/O operations to complete.
 
Solid-state disks (SSDs), DECram, and RAID level 0 bridge this gap 
between processing speed and magnetic-disk access speed. Performance of 
magnetic disks is limited by seek and rotational latencies, while SSDs 
and DECram use memory, which provides nearly instant access.
 
RAID level 0 is the technique of spreading (or "striping") a 
single file across several disk volumes. The objective is to reduce or 
eliminate a bottleneck at a single disk by partitioning heavily 
accessed files into stripe sets and storing them on multiple devices. 
This technique increases parallelism across many disks for a single I/O.
 
Table 9-5 summarizes disk technologies and their features.
 
 
  Table 9-5 Disk Technology Summary
  
    | Disk Technology  | 
    Characteristics  | 
   
  
    | 
      Magnetic disk
     | 
    
      Slowest access time.
        Inexpensive.
        Available on multiple interconnects.
     | 
   
  
    | 
      Solid-state disk
     | 
    
      Fastest access of any I/O subsystem device.
        Highest throughput for write-intensive files.
        Available on multiple interconnects.
     | 
   
  
    | 
      DECram
     | 
    
      Highest throughput for small to medium I/O requests.
        Volatile storage; appropriate for temporary read-only files.
        Available on any Alpha or VAX system.
     | 
   
  
    | 
      RAID level 0
     | 
    
      Available on HSD, HSJ, and HSG controllers.
     | 
   
 
Note: Shared, direct access to a solid-state disk or 
to DECram is the fastest alternative for scaling I/Os.
9.5.3 Read/Write Ratio
 
The read/write ratio of your applications is a key factor in scaling 
I/O to shadow sets. MSCP writes to a shadow set are duplicated on the 
interconnect.
 
Therefore, an application that has 100% (100/0) read activity may 
benefit from volume shadowing because shadowing causes multiple paths 
to be used for the I/O activity. An application with a 50/50 ratio will 
cause more interconnect utilization because write activity requires 
that an I/O be sent to each shadow member. Delays may be caused by the 
time required to complete the slowest I/O.
 
To determine I/O read/write ratios, use the DCL command MONITOR IO.
9.5.4 I/O Size
 
Each I/O packet incurs processor and memory overhead, so grouping I/Os 
together in one packet decreases overhead for all I/O activity. You can 
achieve higher throughput if your application is designed to use bigger 
packets. Smaller packets incur greater overhead.
9.5.5 Caches
 
Caching is the technique of storing recently or frequently used data in 
an area where it can be accessed more easily---in memory, in a 
controller, or in a disk. Caching complements solid-state disks, 
DECram, and RAID. Applications automatically benefit from the 
advantages of caching without any special coding. Caching reduces 
current and potential I/O bottlenecks within OpenVMS Cluster systems by 
reducing the number of I/Os between components.
 
Table 9-6 describes the three types of caching.  
 
  Table 9-6 Types of Caching
  
    | Caching Type  | 
    Description  | 
   
  
    | 
      Host based
     | 
    
      Cache that is resident in the host system's memory and services I/Os 
      from the host.
     | 
   
  
    | 
      Controller based
     | 
    
      Cache that is resident in the storage controller and services data for 
      all hosts.
     | 
   
  
    | 
      Disk
     | 
    
       Cache that is resident in a disk.
     | 
   
 
Host-based disk caching provides different benefits from 
controller-based and disk-based caching. In host-based disk caching, 
the cache itself is not shareable among nodes. Controller-based and 
disk-based caching are shareable because they are located in the 
controller or disk, either of which is shareable.
 
  
 |