OpenVMS Cluster Systems
F.7.7 Transport (TR) Header
The transport (TR) header is used to pass SCS datagrams and sequenced
messages between cluster nodes. The important fields for network
troubleshooting are the TR datagram flags, message acknowledgment, and
sequence numbers.
Note that because the CC and TR headers occupy the same space, a TR/CC
flag identifies the type of message being transmitted over the channel.
Figure F-10 shows the portions of the TR header that are needed for
network troubleshooting, and Table F-11 describes these fields.
Figure F-10 TR Header
Note: The TR header shown in Figure F-10 is used
when both nodes are running Version 1.4 or later of the NISCA protocol.
If one or both nodes are running Version 1.3 or an earlier version of
the protocol, then both nodes will use the message acknowledgment and
sequence number fields in place of the extended message acknowledgment
and extended sequence number fields, respectively.
Table F-11 Fields in the TR Header
Field |
Description |
Datagram flags (bits <7:0>)
|
Provide additional information about the transport datagram.
Value |
Abbreviated Datagram Type |
Expanded Datagram Type |
Function |
0
|
DATA
|
Packet data
|
Contains data to be delivered to the upper levels of software.
|
1
|
SEQ
|
Sequence flag
|
Set to 1 if this is a sequenced message and the sequence number is
valid.
|
2
|
Reserved
|
|
Set to 0.
|
3
|
ACK
|
Acknowledgment
|
Acknowledges the field is valid.
|
4
|
RSVP
|
Reply flag
|
Set when an ACK datagram is needed immediately.
|
5
|
REXMT
|
Retransmission
|
Set for all retransmissions of a sequenced message.
|
6
|
Reserved
|
|
Set to 0.
|
7
|
TR/CC flag
|
Transport flag
|
Set to 0; indicates a TR datagram.
|
|
Message acknowledgment
|
An increasing value that specifies the last sequenced message segment
received by the local node. All messages prior to this value are also
acknowledged. This field is used when one or both nodes are running
Version 1.3 or earlier of the NISCA protocol.
|
Extended message acknowledgment
|
An increasing value that specifies the last sequenced message segment
received by the local node. All messages prior to this value are also
acknowledged. This field is used when both nodes are running Version
1.4 or later of the NISCA protocol.
|
Sequence number
|
An increasing value that specifies the order of datagram transmission
from the local node. This number is used to provide guaranteed delivery
of this sequenced message segment to the remote node. This field is
used when one or both nodes are running Version 1.3 or earlier of the
NISCA protocol.
|
Extended sequence number
|
An increasing value that specifies the order of datagram transmission
from the local node. This number is used to provide guaranteed delivery
of this sequenced message segment to the remote node. This field is
used when both nodes are running Version 1.4 or later of the NISCA
protocol.
|
F.8 Using a LAN Protocol Analysis Program
Some failures, such as packet loss resulting from congestion,
intermittent network interruptions of less than 20 seconds, problems
with backup bridges, and intermittent performance problems, can be
difficult to diagnose. Intermittent failures may require the use of a
LAN analysis tool to isolate and troubleshoot the NISCA protocol levels
described in Section F.1.
As you evaluate the various network analysis tools currently available,
you should look for certain capabilities when comparing LAN analyzers.
The following sections describe the required capabilities.
F.8.1 Single or Multiple LAN Segments
Whether you need to troubleshoot problems on a single LAN segment or on
multiple LAN segments, a LAN analyzer should help you isolate specific
patterns of data. Choose a LAN analyzer that can isolate data matching
unique patterns that you define. You should be able to define data
patterns located in the data regions following the LAN header
(described in Section F.7.2). In order to troubleshoot the NISCA
protocol properly, a LAN analyzer should be able to match multiple data
patterns simultaneously.
To troubleshoot single or multiple LAN segments, you must minimally
define and isolate transmitted and retransmitted data in the TR header
(see Section F.7.7). Additionally, for effective network
troubleshooting across multiple LAN segments, a LAN analysis tool
should include the following functions:
- A distributed enable function that allows you to
synchronize multiple LAN analyzers that are set up at different
locations so that they can capture information about the same event as
it travels through the LAN configuration
- A distributed combination trigger function that
automatically triggers multiple LAN analyzers at different locations so
that they can capture information about the same event
The purpose of distributed enable and distributed combination trigger
functions is to capture packets as they travel across multiple LAN
segments. The implementation of these functions discussed in the
following sections use multicast messages to reach all LAN segments of
the extended LAN in the system configuration. By providing the ability
to synchronize several LAN analyzers at different locations across
multiple LAN segments, the distributed enable and combination trigger
functions allow you to troubleshoot LAN configurations that span
multiple sites over several miles.
F.8.2 Multiple LAN Segments
To troubleshoot multiple LAN segments, LAN analyzers must be able to
capture the multicast packets and dynamically enable the trigger
function of the LAN analyzer, as follows:
Step |
Action |
1
|
Start capturing the data according to the rules specific to your LAN
analyzer. Compaq recommends that only one LAN analyzer transmit a
distributed enable multicast packet on the LAN. The packet must be
transmitted according to the media access-control rules.
|
2
|
Wait for the distributed enable multicast packet. When the packet is
received, enable the distributed combination trigger function. Prior to
receiving the distributed enable packet, all LAN analyzers must be able
to ignore the trigger condition. This feature is required in order to
set up multiple LAN analyzers capable of capturing the same event. Note
that the LAN analyzer transmitting the distributed enable should not
wait to receive it.
|
3
|
Wait for an explicit (user-defined) trigger event or a distributed
trigger packet. When the LAN analyzer receives either of these
triggers, the LAN analyzer should stop the data capture.
Prior to receiving either trigger, the LAN analyzer should continue
to capture the requested data. This feature is required in order to
allow multiple LAN analyzers to capture the same event.
|
4
|
Once triggered, the LAN analyzer completes the distributed trigger
function to stop the other LAN analyzers from capturing data related to
the event that has already occurred.
|
The HP 4972A LAN Protocol Analyzer, available from the Hewlett-Packard
Company, is one example of a network failure analysis tool that
provides the required functions described in this section.
Reference: Section F.10 provides examples that use
the HP 4972A LAN Protocol Analyzer.
F.9 Data Isolation Techniques
The following sections describe the types of data you should isolate
when you use a LAN analysis tool to capture OpenVMS Cluster data
between nodes and LAN adapters.
F.9.1 All OpenVMS Cluster Traffic
To isolate all OpenVMS Cluster traffic on a specific LAN segment,
capture all the packets whose LAN header contains the protocol type
60--07.
Reference: See also Section F.7.2 for a description of
the LAN headers.
F.9.2 Specific OpenVMS Cluster Traffic
To isolate OpenVMS Cluster traffic for a specific cluster on a specific
LAN segment, capture packets in which:
- The LAN header contains the the protocol type 60--07.
- The DX header contains the cluster group number specific to that
OpenVMS Cluster.
Reference: See Sections F.7.2 and
F.7.5 for descriptions of the LAN and DX headers.
F.9.3 Virtual Circuit (Node-to-Node) Traffic
To isolate virtual circuit traffic between a specific pair of nodes,
capture packets in which the LAN header contains:
- The protocol type 60--07
- The destination SCS address
- The source SCS address
You can further isolate virtual circuit traffic between a specific pair
of nodes to a specific LAN segment by capturing the following
additional information from the DX header:
- The cluster group code specific to that OpenVMS Cluster
- The destination SCS transport address
- The source SCS transport address
Reference: See Sections F.7.2 and
F.7.5 for LAN and DX header information.
F.9.4 Channel (LAN Adapter--to--LAN Adapter) Traffic
To isolate channel information, capture all packet information on every
channel between LAN adapters. The DX header contains information useful
for diagnosing heavy communication traffic between a pair of LAN
adapters. Capture packets in which the LAN header contains:
- The destination LAN adapter address
- The source LAN adapter address
Because nodes can use multiple LAN adapters, specifying the source and
destination LAN addresses may not capture all of the traffic for the
node. Therefore, you must specify a channel as the source LAN address
and the destination LAN address in order to isolate traffic on a
specific channel.
Reference: See Section F.7.2 for information about the
LAN header.
F.9.5 Channel Control Traffic
To isolate channel control traffic, capture packets in which:
- The LAN header contains the the protocol type 60--07.
- The CC header datagram flags byte (the TR/CC flag, bit <7>)
is set to 1.
Reference: See Sections F.7.2 and
F.7.6 for a description of the LAN and CC headers.
F.9.6 Transport Data
To isolate transport data, capture packets in which:
- The LAN header contains the the protocol type 60--07.
- The TR header datagram flags byte (the TR/CC flag, bit <7>)
is set to 0.
Reference: See Sections F.7.2 and
F.7.7 for a description of the LAN and TR headers.
F.10 Setting Up an HP 4972A LAN Protocol Analyzer
The HP 4972A LAN Protocol Analyzer, available from the Hewlett-Packard
Company, is highlighted here because it meets all of the requirements
listed in Section F.8. However, the HP 4972A LAN Protocol Analyzer is
merely representative of the type of product useful for LAN network
troubleshooting.
Note: Use of this particular product as an example
here should not be construed as a specific purchase requirement or
endorsement.
This section provides some examples of how to set up the HP 4972A LAN
Protocol Analyzer to troubleshoot the local area OpenVMS Cluster system
protocol for channel formation and retransmission problems.
F.10.1 Analyzing Channel Formation Problems
If you have a LAN protocol analyzer, you can set up filters to capture
data related to the channel control header (described in Section F.7.6).
You can trigger the LAN analyzer by using the following datagram fields:
- Protocol type set to 60--07 hexadecimal
- Correct cluster group number
- TR/CC flag set to 1
Then look for the HELLO, CCSTART, VERF, and VACK datagrams in the
captured data. The CCSTART, VERF, VACK, and SOLICIT_SRV datagrams
should have the AUTHORIZE bit (bit <4>) set in the CC flags byte.
Additionally, these messages should contain the scrambled cluster
password (nonzero authorization field). You can find the scrambled
cluster password and the cluster group number in the first four
longwords of SYS$SYSTEM:CLUSTER_AUTHORIZE.DAT file.
Reference: See Sections F.9.3 through
F.9.5 for additional data isolation techniques.
F.10.2 Analyzing Retransmission Problems
Using a LAN analyzer, you can trace datagrams as they travel across an
OpenVMS Cluster system, as described in Table F-12.
Table F-12 Tracing Datagrams
Step |
Action |
1
|
Trigger the analyzer using the following datagram fields:
- Protocol type set to 60--07
- Correct cluster group number
- TR/CC flag set to 0
- REXMT flag set to 1
|
2
|
Use the distributed enable function to allow the same event to be
captured by several LAN analyzers at different locations. The LAN
analyzers should start the data capture, wait for the distributed
enable message, and then wait for the explicit trigger event or the
distributed trigger message. Once triggered, the analyzer should
complete the distributed trigger function to stop the other LAN
analyzers capturing data.
|
3
|
Once all the data is captured, locate the sequence number (for nodes
running the NISCA protocol Version 1.3 or earlier) or the extended
sequence number (for nodes running the NISCA protocol Version 1.4 or
later) for the datagram being retransmitted (the datagram with the
REXMT flag set). Then, search through the previously captured data for
another datagram between the same two nodes (not necessarily the same
LAN adapters) with the following characteristics:
- Protocol type set to 60--07
- Same DX header as the datagram with the REXMT flag set
- TR/CC flag set to 0
- REXMT flag set to 0
- Same sequence number or extended sequence number as the datagram
with the REXMT flag set
|
4
|
The following techniques provide a way of searching for the problem's
origin.
IF... |
THEN... |
The datagram appears to be corrupt
|
Use the LAN analyzer to search in the direction of the source node for
the corruption cause.
|
The datagram appears to be correct
|
Search in the direction of the destination node to ensure that the
datagram gets to its destination.
|
The datagram arrives successfully at its LAN segment destination
|
Look for a TR packet from the destination node containing the sequence
number (for nodes running the NISCA protocol Version 1.3 or earlier) or
the extended sequence number (for nodes running the NISCA protocol
Version 1.4 or later) in the message acknowledgment or extended message
acknowledgement field. ACK datagrams have the following fields set:
- Protocol type set to 60--07
- Same DX header as the datagram with the REXMT flag set
- TR/CC flag set to 0
- ACK flag set to 1
|
The acknowledgment was not sent, or if a significant delay occurred
between the reception of the message and the transmission of the
acknowledgment
|
Look for a problem with the destination node and LAN adapter. Then
follow the ACK packet through the network.
|
The ACK arrives back at the node that sent the retransmission packet
|
Either of the following conditions may exist:
- The retransmitting node is having trouble receiving LAN data.
- The round-trip delay of the original datagram exceeded the
estimated timeout value.
You can verify the second possibility by using SDA and looking at
the ReRcv field of the virtual circuit display of the system receiving
the retransmitted datagram.
Reference: See Example F-2 for an example of this
type of SDA display.
|
|
Reference: See Appendix G for more information
about congestion control and PEDRIVER message retransmission.
F.11 Filters
This section describes:
- How to use the HP 4972A LAN Protocol Analyzer filters to isolate
packets that have been retransmitted or that are specific to a
particular OpenVMS Cluster.
- How to enable the distributed enable and trigger functions.
F.11.1 Capturing All LAN Retransmissions for a Specific OpenVMS Cluster
Use the values shown in Table F-13 to set up a filter, named
LAVc_TR_ReXMT, for all of the LAN retransmissions for a specific
cluster. Fill in the value for the local area OpenVMS Cluster group
code (nn--nn) to isolate a specific OpenVMS Cluster on the
LAN.
Table F-13 Capturing Retransmissions on the LAN
Byte Number |
Field |
Value |
1
|
DESTINATION
|
xx--xx--xx--xx--xx--xx
|
7
|
SOURCE
|
xx--xx--xx--xx--xx--xx
|
13
|
TYPE
|
60--07
|
23
|
LAVC_GROUP_CODE
|
nn--nn
|
31
|
TR FLAGS
|
0x1xxxxx
2
|
33
|
ACKING MESSAGE
|
xx--xx
|
35
|
SENDING MESSAGE
|
xx--xx
|
1Base 2
F.11.2 Capturing All LAN Packets for a Specific OpenVMS Cluster
Use the values shown in Table F-14 to filter all of the LAN packets
for a specific cluster. Fill in the value for OpenVMS Cluster group
code (nn--nn) to isolate a specific OpenVMS Cluster on the
LAN. The filter is named LAVc_all.
|