 |
Guidelines for OpenVMS Cluster Configurations
10.5.1 Two-Node MEMORY CHANNEL Cluster
In Figure 10-11, two nodes are connected by a MEMORY CHANNEL
interconnect, a LAN (Ethernet, FDDI, or ATM) interconnect, and a Fibre
Channel interconnect.
Figure 10-11 Two-Node MEMORY CHANNEL OpenVMS Cluster
The advantages and disadvantages of the configuration shown in
Figure 10-11 include:
Advantages
- Both nodes have shared, direct access to all storage.
- The Ethernet/FDDI/ATM interconnect enables failover if the MEMORY
CHANNEL interconnect fails.
- The limit of two MEMORY CHANNEL nodes means that no hub is
required; one PCI adapter serves as a virtual hub.
Disadvantages
- The amount of storage that is directly accessible to all nodes is
limited.
- A single SCSI interconnect or HSZ controller can become a single
point of failure.
If the OpenVMS Cluster in Figure 10-11 required more processing power
and better redundancy, this could lead to a configuration like the one
shown in Figure 10-12.
10.5.2 Three-Node MEMORY CHANNEL Cluster
In Figure 10-12, three nodes are connected by a high-speed MEMORY
CHANNEL interconnect, as well as by a LAN (Ethernet, FDDI, or ATM)
interconnect. These nodes also have shared, direct access to storage
through the Fibre Channel interconnect.
Figure 10-12 Three-Node MEMORY CHANNEL OpenVMS Cluster
The advantages and disadvantages of the configuration shown in
Figure 10-12 include:
Advantages
- All nodes have shared, direct access to storage.
- The Ethernet/FDDI/ATM interconnect enables failover if the MEMORY
CHANNEL interconnect fails.
- The addition of a MEMORY CHANNEL hub increases the limit on the
number of nodes to a total of four.
Disadvantage
- The amount of storage that is directly accessible to all nodes is
limited.
If the configuration in Figure 10-12 required more storage, this could
lead to a configuration like the one shown in Figure 10-13.
10.5.3 Four-Node MEMORY CHANNEL OpenVMS Cluster
Figure 10-13, each node is connected by a MEMORY CHANNEL interconnect
as well as by a CI interconnect.
Figure 10-13 MEMORY CHANNEL Cluster with a CI Cluster
The advantages and disadvantages of the configuration shown in
Figure 10-13 include:
Advantages
- All nodes have shared, direct access to all of the storage.
- This configuration has more than double the storage and processing
capacity as the one shown in Figure 10-12.
- If the MEMORY CHANNEL interconnect fails, the CI can take over
internode communication.
- The CIPCA adapters on two of the nodes enable the addition of Alpha
systems to a CI cluster that formerly comprised VAX (CIXCD-based)
systems.
- Multiple CIs between processors and storage provide twice the
performance of one path. Bandwidth further increases because MEMORY
CHANNEL offloads internode traffic from the CI, enabling the CI to be
devoted only to storage traffic. This improves the performance of the
entire cluster.
- Volume shadowed, dual-ported disks increase data availability.
Disadvantage
- This configuration is complex and requires the care of an
experienced system manager.
10.6 Scalability in SCSI OpenVMS Clusters
SCSI-based OpenVMS Clusters allow commodity-priced storage devices to
be used directly in OpenVMS Clusters. Using a SCSI interconnect in an
OpenVMS Cluster offers you variations in distance, price, and
performance capacity. This SCSI clustering capability is an ideal
starting point when configuring a low-end, affordable cluster solution.
SCSI clusters can range from desktop to deskside to departmental and
larger configurations.
Note the following general limitations when using the SCSI interconnect:
- Because the SCSI interconnect handles only storage traffic, it
must always be paired with another interconnect for node-to-node
traffic. In the figures shown in this section, MEMORY CHANNEL is the
alternate interconnect; but CI, DSSI, Ethernet, and FDDI could also be
used.
- Total SCSI cable lengths must take into account the system's
internal cable length. For example, an AlphaServer 1000 rackmount uses
1.6 m of internal cable to connect the internal adapter to the external
connector. Two AlphaServer 1000s joined by a 2 m SCSI cable would use
1.6 m within each system, resulting in a total SCSI bus length of 5.2
m.
Reference: For more information about internal
SCSI cable lengths as well as highly detailed information about
clustering SCSI devices, see Appendix A.
The figures in this section show a progression from a two-node SCSI
configuration with modest storage to a four-node SCSI hub configuration
with maximum storage and further expansion capability.
10.6.1 Two-Node Fast-Wide SCSI Cluster
In Figure 10-14, two nodes are connected by a 25-m, fast-wide
differential (FWD) SCSI bus, with MEMORY CHANNEL (or any) interconnect
for internode traffic. The BA356 storage cabinet contains a power
supply, a DWZZB single-ended to differential converter, and six disk
drives. This configuration can have either narrow or wide disks.
Figure 10-14 Two-Node Fast-Wide SCSI Cluster
The advantages and disadvantages of the configuration shown in
Figure 10-14 include:
Advantages
- Low cost SCSI storage is shared by two nodes.
- With the BA356 cabinet, you can use a narrow (8 bit) or wide (16
bit) SCSI bus.
- The DWZZB converts single-ended signals to differential.
- The fast-wide SCSI interconnect provides 20 MB/s performance.
- MEMORY CHANNEL handles internode traffic.
- The differential SCSI bus can be 25 m.
Disadvantage
- Somewhat limited storage capability.
If the configuration in Figure 10-14 required even more storage, this
could lead to a configuration like the one shown in Figure 10-15.
10.6.2 Two-Node Fast-Wide SCSI Cluster with HSZ Storage
In Figure 10-15, two nodes are connected by a 25-m, fast-wide
differential (FWD) SCSI bus, with MEMORY CHANNEL (or any) interconnect
for internode traffic. Multiple storage shelves are within the HSZ
controller.
Figure 10-15 Two-Node Fast-Wide SCSI Cluster with HSZ
Storage
The advantages and disadvantages of the configuration shown in
Figure 10-15 include:
Advantages
- Costs slightly more than the configuration shown in Figure 10-14,
but offers significantly more storage. (The HSZ controller enables you
to add more storage.)
- Cache in the HSZ, which also provides RAID 0, 1, and 5
technologies. The HSZ is a differential device; no converter is needed.
- MEMORY CHANNEL handles internode traffic.
- The FWD bus provides 20 MB/s throughput.
- Includes a 25 m differential SCSI bus.
Disadvantage
- This configuration is more expensive than the one shown in
Figure 10-14.
10.6.3 Three-Node Fast-Wide SCSI Cluster
In Figure 10-16, three nodes are connected by two 25-m, fast-wide
(FWD) SCSI interconnects. Multiple storage shelves are contained in
each HSZ controller, and more storage is contained in the BA356 at the
top of the figure.
Figure 10-16 Three-Node Fast-Wide SCSI Cluster
The advantages and disadvantages of the configuration shown in
Figure 10-16 include:
Advantages
- Combines the advantages of the configurations shown in
Figure 10-14 and Figure 10-15:
- Significant (25 m) bus distance and scalability.
- Includes cache in the HSZ, which also provides RAID 0, 1, and 5
technologies. The HSZ contains multiple storage shelves.
- FWD bus provides 20 MB/s throughput.
- With the BA356 cabinet, you can use narrow (8 bit) or wide (16 bit)
SCSI bus.
Disadvantage
- This configuration is more expensive than those shown in previous
figures.
10.6.4 Four-Node Ultra SCSI Hub Configuration
Figure 10-17 shows four nodes connected by a SCSI hub. The SCSI hub
obtains power and cooling from the storage cabinet, such as the BA356.
The SCSI hub does not connect to the SCSI bus of the storage cabinet.
Figure 10-17 Four-Node Ultra SCSI Hub Configuration
The advantages and disadvantages of the configuration shown in
Figure 10-17 include:
Advantages
- Provides significantly more bus distance and scalability than the
configuration shown in Figure 10-15.
- The SCSI hub provides fair arbitration on the SCSI bus. This
provides more uniform, predictable system behavior. Four CPUs are
allowed only when fair arbitration is enabled.
- Up to two dual HSZ controllers can be daisy-chained to the storage
port of the hub.
- Two power supplies in the BA356 (one for backup).
- Cache in the HSZs, which also provides RAID 0, 1, and 5
technologies.
- Ultra SCSI bus provides 40 MB/s throughput.
Disadvantage
- You cannot add CPUs to this configuration by daisy-chaining a SCSI
interconnect from a CPU or HSZ to another CPU.
- This configuration is more expensive than those shown in
Figure 10-14 and Figure 10-15.
- Only HSZ storage can be connected. You cannot attach a storage
shelf with disk drives directly to the SCSI hub.
10.7 Scalability in OpenVMS Clusters with Satellites
The number of satellites in an OpenVMS Cluster and the amount of
storage that is MSCP served determine the need for the quantity and
capacity of the servers.
Satellites are systems that do not have direct access to a system disk
and other OpenVMS Cluster storage. Satellites are usually workstations,
but they can be any OpenVMS Cluster node that is served storage by
other nodes in the OpenVMS Cluster.
Each Ethernet LAN segment should have only 10 to 20 satellite nodes
attached. Figure 10-18, Figure 10-19, Figure 10-20, and
Figure 10-21 show a progression from a 6-satellite LAN to a
45-satellite LAN.
10.7.1 Six-Satellite OpenVMS Cluster
In Figure 10-18, six satellites and a boot server are connected by
Ethernet.
Figure 10-18 Six-Satellite LAN OpenVMS Cluster
The advantages and disadvantages of the configuration shown in
Figure 10-18 include:
Advantages
- The MSCP server is enabled for adding satellites and allows access
to more storage.
- With one system disk, system management is relatively simple.
Reference: For information about managing system
disks, see Section 11.2.
Disadvantage
- The Ethernet is a potential bottleneck and a single point of
failure.
If the boot server in Figure 10-18 became a bottleneck, a
configuration like the one shown in Figure 10-19 would be required.
10.7.2 Six-Satellite OpenVMS Cluster with Two Boot Nodes
Figure 10-19 shows six satellites and two boot servers connected by
Ethernet. Boot server 1 and boot server 2 perform MSCP server dynamic
load balancing: they arbitrate and share the work load between them and
if one node stops functioning, the other takes over. MSCP dynamic load
balancing requires shared access to storage.
Figure 10-19 Six-Satellite LAN OpenVMS Cluster with Two Boot
Nodes
The advantages and disadvantages of the configuration shown in
Figure 10-19 include:
Advantages
- The MSCP server is enabled for adding satellites and allows access
to more storage.
- Two boot servers perform MSCP dynamic load balancing.
Disadvantage
- The Ethernet is a potential bottleneck and a single point of
failure.
If the LAN in Figure 10-19 became an OpenVMS Cluster bottleneck, this
could lead to a configuration like the one shown in Figure 10-20.
10.7.3 Twelve-Satellite LAN OpenVMS Cluster with Two LAN Segments
Figure 10-20 shows 12 satellites and 2 boot servers connected by two
Ethernet segments. These two Ethernet segments are also joined by a LAN
bridge. Because each satellite has dual paths to storage, this
configuration also features MSCP dynamic load balancing.
Figure 10-20 Twelve-Satellite OpenVMS Cluster with Two LAN
Segments
The advantages and disadvantages of the configuration shown in
Figure 10-20 include:
Advantages
- The MSCP server is enabled for adding satellites and allows access
to more storage.
- Two boot servers perform MSCP dynamic load balancing.
From the
perspective of a satellite on the Ethernet LAN, the dual paths to the
Alpha and VAX nodes create the advantage of MSCP load balancing.
- Two LAN segments provide twice the amount of LAN capacity.
Disadvantages
- This OpenVMS Cluster configuration is limited by the number of
satellites that it can support.
- The single HSJ controller is a potential bottleneck and a single
point of failure.
If the OpenVMS Cluster in Figure 10-20 needed to grow beyond its
current limits, this could lead to a configuration like the one shown
in Figure 10-21.
10.7.4 Forty-Five Satellite OpenVMS Cluster with FDDI Ring
Figure 10-21 shows a large, 51-node OpenVMS Cluster that includes 45
satellite nodes. The three boot servers, Alpha 1, Alpha 2, and Alpha 3,
share three disks: a common disk, a page and swap disk, and a system
disk. The FDDI ring has three LAN segments attached. Each segment has
15 workstation satellites as well as its own boot node.
Figure 10-21 Forty-Five Satellite OpenVMS Cluster with FDDI
Ring
The advantages and disadvantages of the configuration shown in
Figure 10-21 include:
Advantages
- Decreased boot time, especially for an OpenVMS Cluster with such a
high node count.
Reference: For information about
booting an OpenVMS Cluster like the one in Figure 10-21 see
Section 11.2.4.
- The MSCP server is enabled for satellites to access more storage.
- Each boot server has its own page and swap disk, which reduces I/O
activity on the system disks.
- All of the environment files for the entire OpenVMS Cluster are on
the common disk. This frees the satellite boot servers to serve only
root information to the satellites.
Reference: For
more information about common disks and page and swap disks, see
Section 11.2.
- The FDDI ring provides 10 times the capacity of one Ethernet
interconnect.
Disadvantages
- The satellite boot servers on the Ethernet LAN segments can boot
satellites only on their own segments.
10.7.5 High-Powered Workstation OpenVMS Cluster
Figure 10-22 shows an OpenVMS Cluster configuration that provides high
performance and high availability on the FDDI ring.
Figure 10-22 High-Powered Workstation Server
Configuration
In Figure 10-22, several Alpha workstations, each with its own system
disk, are connected to the FDDI ring. Putting Alpha workstations on the
FDDI provides high performance because each workstation has direct
access to its system disk. In addition, the FDDI bandwidth is higher
than that of the Ethernet. Because Alpha workstations have FDDI
adapters, putting these workstations on an FDDI is a useful alternative
for critical workstation requirements. FDDI is 10 times faster than
Ethernet, and Alpha workstations have processing capacity that can take
advantage of FDDI's speed.
10.7.6 Guidelines for OpenVMS Clusters with Satellites
The following are guidelines for setting up an OpenVMS Cluster with
satellites:
- Extra memory is required for satellites of large LAN configurations
because each node must maintain a connection to every other node.
- Place only 10 to 20 satellites on each LAN segment.
- Maximize resources with MSCP dynamic load balancing, as shown in
Figure 10-19 and Figure 10-20.
- Keep the number of nodes that require MSCP serving minimal for good
performance.
Reference: See Section 10.8.1 for more
information about MSCP overhead.
- To save time, ensure that the booting sequence is efficient,
particularly when the OpenVMS Cluster is large or has multiple
segments. See Section 11.2.4 for more information about how to reduce
LAN and system disk activity and how to boot separate groups of nodes
in sequence.
- Use two or more LAN adapters per host (up to four adapters are
supported for OpenVMS Cluster communications), and connect to
independent LAN paths. This enables simultaneous two-way communication
between nodes and allows traffic to multiple nodes to be spread over
the available LANs.
10.7.7 Extended LAN Configuration Guidelines
You can use bridges between LAN segments to form an extended LAN
(ELAN). This can increase availability, distance, and aggregate
bandwidth as compared with a single LAN. However, an ELAN can increase
delay and can reduce bandwidth on some paths. Factors such as packet
loss, queuing delays, and packet size can also affect ELAN performance.
Table 10-3 provides guidelines for ensuring adequate LAN performance
when dealing with such factors.
Table 10-3 ELAN Configuration Guidelines
Factor |
Guidelines |
Propagation delay
|
The amount of time it takes a packet to traverse the ELAN depends on
the distance it travels and the number of times it is relayed from one
link to another by a bridge or a station on the FDDI ring. If
responsiveness is critical, then you must control these factors.
When an FDDI is used for OpenVMS Cluster communications, the ring
latency when the FDDI ring is idle should not exceed 400 ms. FDDI
packets travel at 5.085 microseconds/km and each station causes an
approximate 1-ms delay between receiving and transmitting. You can
calculate FDDI latency by using the following algorithm:
Latency = (distance in km) * (5.085 ms/km) + (number of stations) *
(1 ms/station)
For high-performance applications, limit the number of bridges between
nodes to two. For situations in which high performance is not required,
you can use up to seven bridges between nodes.
|
Queuing delay
|
Queuing occurs when the instantaneous arrival rate at bridges and host
adapters exceeds the service rate. You can control queuing by:
- Reducing the number of bridges between nodes that communicate
frequently.
- Using only high-performance bridges and adapters.
- Reducing traffic bursts in the LAN. In some cases, for example, you
can tune applications by combining small I/Os so that a single packet
is produced rather than a burst of small ones.
- Reducing LAN segment and host processor utilization levels by using
faster processors and faster LANs, and by using bridges for traffic
isolation.
|
Packet loss
|
Packets that are not delivered by the ELAN require retransmission,
which wastes network resources, increases delay, and reduces bandwidth.
Bridges and adapters discard packets when they become congested. You
can reduce packet loss by controlling queuing, as previously described.
Packets are also discarded when they become damaged in transit. You
can control this problem by observing LAN hardware configuration rules,
removing sources of electrical interference, and ensuring that all
hardware is operating correctly.
Packet loss can also be reduced by using VMS Version 5.5--2 or later,
which has PEDRIVER congestion control.
The retransmission timeout rate, which is a symptom of packet loss,
must be less than 1 timeout in 1000 transmissions for OpenVMS Cluster
traffic from one node to another. ELAN paths that are used for
high-performance applications should have a significantly lower rate.
Monitor the occurrence of retransmission timeouts in the OpenVMS
Cluster.
Reference: For information about monitoring the
occurrence of retransmission timeouts, see OpenVMS Cluster Systems.
|
Bridge recovery delay
|
Choose bridges with fast self-test time and adjust bridges for fast
automatic reconfiguration.
Reference: Refer to OpenVMS Cluster Systems for more information
about LAN bridge failover.
|
Bandwidth
|
All LAN paths used for OpenVMS Cluster communication must operate with
a nominal bandwidth of at least 10 Mb/s. The average LAN segment
utilization should not exceed 60% for any 10-second interval.
Use FDDI exclusively on the communication paths that have the
highest performance requirements. Do not put an Ethernet LAN segment
between two FDDI segments. FDDI bandwidth is significantly greater, and
the Ethernet LAN will become a bottleneck. This strategy is especially
ineffective if a server on one FDDI must serve clients on another FDDI
with an Ethernet LAN between them. A more appropriate strategy is to
put a server on an FDDI and put clients on an Ethernet LAN, as
Figure 10-21 shows.
|
Traffic isolation
|
Use bridges to isolate and localize the traffic between nodes that
communicate with each other frequently. For example, use bridges to
separate the OpenVMS Cluster from the rest of the ELAN and to separate
nodes within an OpenVMS Cluster that communicate frequently from the
rest of the OpenVMS Cluster.
Provide independent paths through the ELAN between critical systems
that have multiple adapters.
|
Packet size
|
You can adjust the NISCS_MAX_PKTSZ system parameter to use the full
FDDI packet size. Ensure that the ELAN path supports a data field of at
least 4474 bytes end to end.
Some failures cause traffic to switch from an ELAN path that
supports 4474-byte packets to a path that supports only smaller
packets. It is possible to implement automatic detection and recovery
from these kinds of failures. This capability requires that the ELAN
set the value of the priority field in the FDDI frame-control byte to
zero when the packet is delivered on the destination FDDI link.
Ethernet-to-FDDI bridges that conform to the IEEE 802.1 bridge
specification provide this capability.
|
|