HP OpenVMS Cluster Systems


Previous Contents Index

F.2 Addressing LAN Communication Problems

This section describes LAN Communication Problems and how to address them.

F.2.1 Symptoms

Communication trouble in OpenVMS Cluster systems may be indicated by symptoms such as the following:

Before you initiate complex diagnostic procedures, do not overlook the obvious. Always make sure the hardware is configured and connected properly and that the network is started. Also, make sure system parameters are set correctly on all nodes in the OpenVMS Cluster.

F.2.2 Traffic Control

Keep in mind that an OpenVMS Cluster system generates substantially heavier traffic than other LAN protocols. In many cases, cluster behavior problems that appear to be related to the network might actually be related to software, hardware, or user errors. For example, a large amount of traffic does not necessarily indicate a problem with the OpenVMS Cluster network. The amount of traffic generated depends on how the users utilize the system and the way that the OpenVMS Cluster is configured with additional interconnects (such as DSSI and CI).

If the amount of traffic generated by the OpenVMS Cluster exceeds the expected or desired levels, then you might be able to reduce the level of traffic by:

F.2.3 Excessive Packet Losses on LAN Paths

Prior to OpenVMS Version 7.3, an SCS virtual circuit closure was the first indication that a LAN path had become unusable. In OpenVMS Version 7.3, whenever the last usable LAN path is losing packets at an excessive rate, PEDRIVER displays the following console message:


%PEA0, Excessive packet losses on LAN path from local-device-name 
 to device-name on REMOTE NODE node-name 

This message is displayed when PEDRIVER recently had to perform an excessively high rate of packet retransmissions on the LAN path consisting of the local device, the intervening network, and the device on the remote node. The message indicates that the LAN path has degraded and is approaching, or has reached, the point where reliable communications with the remote node are no longer possible. It is likely that the virtual circuit to the remote node will close if the losses continue. Furthermore, continued operation with high LAN packet losses can result in significant loss in performance because of the communication delays resulting from the packet loss detection timeouts and packet retransmission.

The corrective steps to take are:

  1. Check the local and remote LAN device error counts to see whether a problem exists on the devices. Issue the following commands on each node:


    $ SHOW DEVICE local-device-name 
    $ MC SCACP 
    SCACP> SHOW LAN device-name 
    $ MC LANCP 
    LANCP> SHOW DEVICE device-name/COUNTERS 
    

  2. If device error counts on the local devices are within normal bounds, contact your network administrators to request that they diagnose the LAN path between the devices.

F.2.4 Preliminary Network Diagnosis

If the symptoms and preliminary diagnosis indicate that you might have a network problem, troubleshooting LAN communication failures should start with the step-by-step procedures described in Appendix C. Appendix C helps you diagnose and solve common Ethernet and FDDI LAN communication failures during the following stages of OpenVMS Cluster activity:

The procedures in Appendix C require that you verify a number of parameters during the diagnostic process. Because system parameter settings play a key role in effective OpenVMS Cluster communications, Section F.2.6 describes several system parameters that are especially important to the timing of LAN bridges, disk failover, and channel availability.

F.2.5 Tracing Intermittent Errors

Because PEDRIVER communication is based on channels, LAN network problems typically fall into these areas:

Diagnosing failures at this level becomes more complex because the errors are usually intermittent. Moreover, even though PEDRIVER is aware when a channel is unavailable and performs error recovery based on this information, it does not provide notification when a channel failure occurs; PEDRIVER provides notification only for virtual circuit failures.

However, the Local Area OpenVMS Cluster Network Failure Analysis Program (LAVC$FAILURE_ANALYSIS), available in SYS$EXAMPLES, can help you use PEDRIVER information about channel status. The LAVC$FAILURE_ANALYSIS program (documented in Appendix D) analyzes long-term channel outages, such as hard failures in LAN network components that occur during run time.

This program uses tables in which you describe your LAN hardware configuration. During a channel failure, PEDRIVER uses the hardware configuration represented in the table to isolate which component might be causing the failure. PEDRIVER reports the suspected component through an OPCOM display. You can then isolate the LAN component for repair or replacement.

Reference: Section F.8 addresses the kinds of problems you might find in the NISCA protocol and provides methods for diagnosing and solving them.

F.2.6 Checking System Parameters

Table F-4 describes several system parameters relevant to the recovery and failover time limits for LANs in an OpenVMS Cluster.

Table F-4 System Parameters for Timing
Parameter Use
RECNXINTERVAL
Defines the amount of time to wait before removing a node from the OpenVMS Cluster after detection of a virtual circuit failure, which could result from a LAN bridge failure. If your network uses multiple paths and you want the OpenVMS Cluster to survive failover between LAN bridges, make sure the value of RECNXINTERVAL is greater than the time it takes to fail over those paths.

Reference: The formula for calculating this parameter is discussed in Section 3.2.10.

MVTIMEOUT
Defines the amount of time the OpenVMS operating system tries to recover a path to a disk before returning failure messages to the application. Relevant when an OpenVMS Cluster configuration is set up to serve disks over either the Ethernet or FDDI. MVTIMEOUT is similar to RECNXINTERVAL except that RECNXINTERVAL is CPU to CPU, and MVTIMEOUT is CPU to disk.
SHADOW_MBR_TIMEOUT
Defines the amount of time that the Volume Shadowing for OpenVMS tries to recover from a transient disk error on a single member of a multiple-member shadow set. SHADOW_MBR_TIMEOUT differs from MVTIMEOUT because it removes a failing shadow set member quickly. The remaining shadow set members can recover more rapidly once the failing member is removed.

Note: The TIMVCFAIL system parameter, which optimizes the amount of time needed to detect a communication failure, is not recommended for use with LAN communications. This parameter is intended for CI and DSSI connections. PEDRIVER (which is for Ethernet and FDDI) usually surpasses the detection provided by TIMVCFAIL with the listen timeout of 8 to 9 seconds.

F.2.7 Channel Timeouts

Channel timeouts are detected by PEDRIVER as described in Table F-5.

Table F-5 Channel Timeout Detection
PEDRIVER Actions Comments
Listens for HELLO datagram messages, which are sent over channels at least once every 3 seconds Every node in the OpenVMS Cluster multicasts HELLO datagram messages on each LAN adapter to notify other nodes that it is still functioning. Receiving nodes know that the network connection is still good.
Closes a channel when HELLO datagrams or sequenced messages have not been received for a period of 8 to 9 seconds Because HELLO datagram messages are transmitted at least once every 3 seconds, PEDRIVER times out a channel only if at least two HELLO datagram messages are lost and there is no sequenced message traffic.
Closes a virtual circuit when:
  • No channels are available.
  • The packet size of the only available channels is insufficient.
The virtual circuit is not closed if any other channels to the node are available except when the packet sizes of available channels are smaller than the channel being used for the virtual circuit. For example, if a channel fails over from FDDI to Ethernet, PEDRIVER may close the virtual circuit and then reopen it after negotiating the smaller packet size that is necessary for Ethernet segmentation.
Does not report errors when a channel is closed OPCOM "Connection loss" errors or SYSAP messages are not sent to users or other system applications until after the virtual circuit shuts down. This fact is significant, especially if there are multiple paths to a node and a LAN hardware failure or IP network issue occurs. In this case, you might not receive an error message; PEDRIVER continues to use the virtual circuit over another available channel.
Reestablishes a virtual circuit when a channel becomes available again PEDRIVER reopens a channel when HELLO datagram messages are received again.

F.3 Using SDA to Monitor LAN or IP Communications

This section describes how to use SDA to monitor LAN or IP communications.

F.3.1 Isolating Problem Areas

If your system shows symptoms of intermittent failures during run time, you need to determine whether there is a network problem or whether the symptoms are caused by some other activity in the system.

Generally, you can diagnose problems in the NISCA protocol or the network using the OpenVMS System Dump Analyzer utility (SDA). SDA is an effective tool for isolating problems on specific nodes running in the OpenVMS Cluster system.

Reference: The following sections describe the use of some SDA commands and qualifiers. You should also refer to the HP OpenVMS System Analysis Tools Manual or the OpenVMS VAX System Dump Analyzer Utility Manual for complete information about SDA for your system.

F.3.2 SDA Command SHOW PORT

The SDA command SHOW PORT provides relevant information that is useful in troubleshooting PEDRIVER and LAN adapters in particular. Begin by entering the SHOW PORT command, which causes SDA to define cluster symbols. Example F-1 illustrates how the SHOW PORT command provides a summary of OpenVMS Cluster data structures.

Example F-1 SDA Command SHOW PORT Display

$ ANALYZE/SYSTEM
SDA> SHOW PORT
VAXcluster data structures 
-------------------------- 
 
                  --- PDT Summary Page --- 
 
 PDT Address          Type         Device          Driver Name 
 -----------          ----         -------         ----------- 
 
  80C3DBA0             pa          PAA0            PADRIVER 
  80C6F7A0             pe          PEA0            PEDRIVER

F.3.3 Monitoring Virtual Circuits

To examine information about the virtual circuit (VC) that carries messages between the local node (where you are running SDA) and another remote node, enter the SDA command SHOW PORT/VC=VC_remote-node-name. Example F-2 shows how to examine information about the virtual channel running between a local node and the remote node, NODE11.

Example F-2 SDA Command SHOW PORT/VC Display

SDA> SHOW PORT/VC=VC_NODE11
VAXcluster data structures 
-------------------------- 
                 --- Virtual Circuit (VC) 98625380 --- 
Remote System Name:  NODE11  (0:VAX)     Remote SCSSYSTEMID:  19583 
Local System ID:  217 (D9)              Status: 0005 open,path 
------ Transmit -------  ----- VC Closures -----  (7)--- Congestion Control ---- 
Msg Xmt(1)      46193196  SeqMsg TMO            0  Pipe Quota/Slo/Max(8) 31/ 7/31 
  Unsequence          3  CC DFQ Empty          0  Pipe Quota Reached(9)   213481 
  Sequence     41973703  Topology Change(5)     0  Xmt C/T(10)              0/1984 
  ReXmt(2)       128/106  NPAGEDYN Low(6)        0  RndTrp uS(11)        18540+7764 
  Lone ACK      4219362                           UnAcked Msgs                0 
Bytes Xmt     137312089                           CMD Queue Len/Max        0/21 
------- Receive -------  - Messages  Discarded -  ----- Channel Selection ----- 
Msg Rcv(3)      47612604  No Xmt Chan           0  Preferred Channel    9867F400 
  Unsequence          3  Rcv Short Msg         0  Delay Time           FAAD63E0 
  Sequence     37877271  Illegal Seq Msg       0  Buffer Size              1424 
  ReRcv(4)         13987  Bad Checksum          0  Channel Count              18 
  Lone ACK      9721030  TR DFQ Empty          0  Channel Selections      32138 
  Cache             314  TR MFQ Empty          0  Protocol                1.3.0 
  Ill ACK             0  CC MFQ Empty          0  Open(12) 8-FEB-1994 17:00:05.12 
Bytes Rcv    3821742649  Cache Miss            0  Cls(13) 17-NOV-1858 00:00:00.00 

The SHOW PORT/VC=VC_remote-node-name command displays a number of performance statistics about the virtual circuit for the target node. The display groups the statistics into general categories that summarize such things as packet transmissions to the remote node, packets received from the remote node, and congestion control behavior. The statistics most useful for problem isolation are called out in Example F-2 and described in Table F-6.

Note: The counters shown in Example F-2 are stored in fixed-size fields and are automatically reset to 0 when a field reaches its maximum value (or when the system is rebooted). Because fields have different maximum sizes and growth rates, the field counters are likely to reset at different times. Thus, for a system that has been running for a long time, some field values may seem illogical and appear to contradict others.

Table F-6 SHOW PORT/VC Display
Field Description
(1) Msg Xmt (messages transmitted) Shows the total number of packets transmitted over the virtual circuit to the remote node, including both sequenced and unsequenced (channel control) messages, and lone acknowledgments. (All application data is carried in sequenced messages.) The counters for sequenced messages and lone acknowledgments grow more quickly than most other fields.
(2) ReXmt (retransmission) Indicates the number of retransmissions and retransmit related timeouts for the virtual circuit.
  • The rightmost number (106) in the ReXmt field indicates the number of times a timeout occurred. A timeout indicates one of the following problems:
    • The remote system NODE11 did not receive the sequenced message sent by UPNVMS.
    • The sequenced message arrived but was delayed in transit to NODE11.
    • The local system UPNVMS did not receive the acknowledgment to the message sent to remote node NODE11.
    • The acknowledgment arrived but was delayed in transit from NODE11.

    Congestion either in the network or at one of the nodes can cause the following problems:

    • Congestion in the network can result in delayed or lost packets. Network hardware problems can also result in lost packets.
    • Congestion in UPNVMS or NODE11 can result either in packet delay because of queuing in the adapter or in packet discard because of insufficient buffer space.
  • The leftmost number (128) indicates the number of packets actually retransmitted. For example, if the network loses two packets at the same time, one timeout is counted but two packets are retransmitted. A retransmission occurs when the local node does not receive an acknowledgment for a transmitted packet within a predetermined timeout interval.

    Although you should expect to see a certain number of retransmissions (especially in heavily loaded networks), an excessive number of retransmissions wastes network bandwidth and indicates excessive load or intermittent hardware failure. If the leftmost value in the ReXmt field is greater than about 0.01% to 0.05% of the total number of the transmitted messages shown in the Msg Xmt field, the OpenVMS Cluster system probably is experiencing excessive network problems or local loss from congestion.

(3) Msg Rcv (messages received) Indicates the total number of messages received by local node UPNVMS over this virtual circuit. The values for sequenced messages and lone acknowledgments usually increase at a rapid rate.
(4) ReRcv (rereceive) Displays the number of packets received redundantly by this system. A remote system may retransmit packets even though the local node has already successfully received them. This happens when the cumulative delay of the packet and its acknowledgment is longer than the estimated round-trip time being used as a timeout value by the remote node. Therefore, the remote node retransmits the packet even though it is unnecessary.

Underestimation of the round-trip delay by the remote node is not directly harmful, but the retransmission and subsequent congestion-control behavior on the remote node have a detrimental effect on data throughput. Large numbers indicate frequent bursts of congestion in the network or adapters leading to excessive delays. If the value in the ReRcv field is greater than approximately 0.01% to 0.05% of the total messages received, there may be a problem with congestion or network delays.

(5) Topology Change Indicates the number of times PEDRIVER has performed a failover from FDDI to Ethernet, which necessitated closing and reopening the virtual circuit. In Example F-2, there have been no failovers. However, if the field indicates a number of failovers, a problem may exist on the FDDI ring.
(6) NPAGEDYN (nonpaged dynamic pool) Displays the number of times the virtual circuit was closed because of a pool allocation failure on the local node. If this value is nonzero, you probably need to increase the value of the NPAGEDYN system parameter on the local node.
(7) Congestion Control Displays information about the virtual circuit to control the pipe quota (the number of messages that can be sent to the remote node [put into the "pipe"] before receiving an acknowledgment and the retransmission timeout). PEDRIVER varies the pipe quota and the timeout value to control the amount of network congestion.
(8) Pipe Quota/Slo/Max Indicates the current thresholds governing the pipe quota.
  • The leftmost number (31) is the current value of the pipe quota (transmit window). After a timeout, the pipe quota is reset to 1 to decrease congestion and is allowed to increase quickly as acknowledgments are received.
  • The middle number (7) is the slow-growth threshold (the size at which the rate of increase is slowed) to avoid congestion on the network again.
  • The rightmost number (31) is the maximum value currently allowed for the VC based on channel limitations.

Reference: See Appendix G for PEDRIVER congestion control and channel selection information.

(9) Pipe Quota Reached Indicates the number of times the entire transmit window was full. If this number is small as compared with the number of sequenced messages transmitted, it indicates that the local node is not sending large bursts of data to the remote node.
(10) Xmt C/T (transmission count/target) Shows both the number of successful transmissions since the last time the pipe quota was increased and the target value at which the pipe quota is allowed to increase. In the example, the count is 0 because the pipe quota is already at its maximum value (31), so successful transmissions are not being counted.
(11) RndTrp uS (round trip in microseconds) Displays values that are used to calculate the retransmission timeout in microseconds. The leftmost number (18540) is the average round-trip time, and the rightmost number (7764) is the average variation in round-trip time. In the example, the values indicate that the round trip is about 19 milliseconds plus or minus about 8 milliseconds.

VC round trip time values are dependent on the delayed ACK or the ACKholdoff delay, that is, 100 ms. The VC trip time is also dependent on the network traffic.

If there is sufficient cluster traffic, the receive window at the remote node gets filled and the ACK is delivered sooner.

If the cluster is idle with no traffic, there may be a delay of 100ms to send the ACK. Hence, in an idle cluster with less traffic, the VC round trip delay value is normally high. As the traffic increases, the VC round trip time delay value drops.

Deviation/Variance: Whenever a new ACK delay is measured, it is compared with the current estimate of the ACK delay. The difference is a measure of the error in the delay estimate (delayError). This delayError is used as a correction to update the current estimate of ACK delay.

To prevent a "bad" measurement from estimate, the correction due to a single measurement is limited to a fraction.

The average of the absolute value of the delayError from the mean is used as estimation for the delay's variance.

(12) Open and Cls Displays open (Open) and closed (Cls) timestamps for the last significant changes in the virtual circuit. The repeated loss of one or more virtual circuits over a short period of time (fewer than 10 minutes) indicates network problems.
(13) Cls If you are analyzing a crash dump, you should check whether the crash-dump time corresponds to the timestamp for channel closures (Cls).


Previous Next Contents Index