HP OpenVMS Cluster Systems


Previous Contents Index

F.10 Data Isolation Techniques

The following sections describe the types of data you should isolate when you use a LAN analysis tool to capture OpenVMS Cluster data between nodes and LAN adapters.

F.10.1 All OpenVMS Cluster Traffic

To isolate all OpenVMS Cluster traffic on a specific LAN segment, capture all the packets whose LAN header contains the protocol type 60--07.

Reference: See also Section F.8.2 for a description of the LAN headers.

F.10.2 Specific OpenVMS Cluster Traffic

To isolate OpenVMS Cluster traffic for a specific cluster on a specific LAN segment, capture packets in which:

Reference: See Sections F.8.2 and F.8.4 for descriptions of the LAN and DX headers.

F.10.3 Virtual Circuit (Node-to-Node) Traffic

To isolate virtual circuit traffic between a specific pair of nodes, capture packets in which the LAN header contains:

You can further isolate virtual circuit traffic between a specific pair of nodes to a specific LAN segment by capturing the following additional information from the DX header:

Reference: See Sections F.8.2 and F.8.4 for LAN and DX header information.

F.10.4 Channel (LAN Adapter--to--LAN Adapter) Traffic

To isolate channel information, capture all packet information on every channel between LAN adapters. The DX header contains information useful for diagnosing heavy communication traffic between a pair of LAN adapters. Capture packets in which the LAN header contains:

Because nodes can use multiple LAN adapters, specifying the source and destination LAN addresses may not capture all of the traffic for the node. Therefore, you must specify a channel as the source LAN address and the destination LAN address in order to isolate traffic on a specific channel.

Reference: See Section F.8.2 for information about the LAN header.

F.10.5 Channel Control Traffic

To isolate channel control traffic, capture packets in which:

Reference: See Sections F.8.2 and F.8.5 for a description of the LAN and CC headers.

F.10.6 Transport Data

To isolate transport data, capture packets in which:

Reference: See Sections F.8.2 and F.8.6 for a description of the LAN and TR headers.

F.11 Setting Up an HP 4972A LAN Protocol Analyzer

The HP 4972A LAN Protocol Analyzer, available from the Hewlett-Packard Company, is highlighted here because it meets all of the requirements listed in Section F.9. However, the HP 4972A LAN Protocol Analyzer is merely representative of the type of product useful for LAN network troubleshooting.

Note: Use of this particular product as an example here should not be construed as a specific purchase requirement or endorsement.

This section provides some examples of how to set up the HP 4972A LAN Protocol Analyzer to troubleshoot the local area OpenVMS Cluster system protocol for channel formation and retransmission problems.

F.11.1 Analyzing Channel Formation Problems

If you have a LAN protocol analyzer, you can set up filters to capture data related to the channel control header (described in Section F.8.5).

You can trigger the LAN analyzer by using the following datagram fields:

Then look for the HELLO, CCSTART, VERF, and VACK datagrams in the captured data. The CCSTART, VERF, VACK, and SOLICIT_SRV datagrams should have the AUTHORIZE bit (bit <4>) set in the CC flags byte. Additionally, these messages should contain the scrambled cluster password (nonzero authorization field). You can find the scrambled cluster password and the cluster group number in the first four longwords of SYS$SYSTEM:CLUSTER_AUTHORIZE.DAT file.

Reference: See Sections F.10.3 through F.10.5 for additional data isolation techniques.

F.11.2 Analyzing Retransmission Problems

Using a LAN analyzer, you can trace datagrams as they travel across an OpenVMS Cluster system, as described in Table F-12.

Table F-12 Tracing Datagrams
Step Action
1 Trigger the analyzer using the following datagram fields:
  • Protocol type set to 60--07
  • Correct cluster group number
  • TR/CC flag set to 0
  • REXMT flag set to 1
2 Use the distributed enable function to allow the same event to be captured by several LAN analyzers at different locations. The LAN analyzers should start the data capture, wait for the distributed enable message, and then wait for the explicit trigger event or the distributed trigger message. Once triggered, the analyzer should complete the distributed trigger function to stop the other LAN analyzers capturing data.
3 Once all the data is captured, locate the sequence number (for nodes running the NISCA protocol Version 1.3 or earlier) or the extended sequence number (for nodes running the NISCA protocol Version 1.4 or later) for the datagram being retransmitted (the datagram with the REXMT flag set). Then, search through the previously captured data for another datagram between the same two nodes (not necessarily the same LAN adapters) with the following characteristics:
  • Protocol type set to 60--07
  • Same DX header as the datagram with the REXMT flag set
  • TR/CC flag set to 0
  • REXMT flag set to 0
  • Same sequence number or extended sequence number as the datagram with the REXMT flag set
4 The following techniques provide a way of searching for the problem's origin.
IF... THEN...
The datagram appears to be corrupt Use the LAN analyzer to search in the direction of the source node for the corruption cause.
The datagram appears to be correct Search in the direction of the destination node to ensure that the datagram gets to its destination.
The datagram arrives successfully at its LAN segment destination Look for a TR packet from the destination node containing the sequence number (for nodes running the NISCA protocol Version 1.3 or earlier) or the extended sequence number (for nodes running the NISCA protocol Version 1.4 or later) in the message acknowledgment or extended message acknowledgement field. ACK datagrams have the following fields set:
  • Protocol type set to 60--07
  • Same DX header as the datagram with the REXMT flag set
  • TR/CC flag set to 0
  • ACK flag set to 1
The acknowledgment was not sent, or if a significant delay occurred between the reception of the message and the transmission of the acknowledgment Look for a problem with the destination node and LAN adapter. Then follow the ACK packet through the network.
The ACK arrives back at the node that sent the retransmission packet Either of the following conditions may exist:
  • The retransmitting node is having trouble receiving LAN data.
  • The round-trip delay of the original datagram exceeded the estimated timeout value.

You can verify the second possibility by using SDA and looking at the ReRcv field of the virtual circuit display of the system receiving the retransmitted datagram.

Reference: See Example F-2 for an example of this type of SDA display.

Reference: See Appendix G for more information about congestion control and PEDRIVER message retransmission.

F.12 Filters

This section describes:

F.12.1 Capturing All LAN Retransmissions for a Specific OpenVMS Cluster

Use the values shown in Table F-13 to set up a filter, named LAVc_TR_ReXMT, for all of the LAN retransmissions for a specific cluster. Fill in the value for the local area OpenVMS Cluster group code (nn--nn) to isolate a specific OpenVMS Cluster on the LAN.

Table F-13 Capturing Retransmissions on the LAN
Byte Number Field Value
1 DESTINATION xx--xx--xx--xx--xx--xx
7 SOURCE xx--xx--xx--xx--xx--xx
13 TYPE 60--07
23 LAVC_GROUP_CODE nn--nn
31 TR FLAGS 0x1xxxxx 2
33 ACKING MESSAGE xx--xx
35 SENDING MESSAGE xx--xx


1Base 2

F.12.2 Capturing All LAN Packets for a Specific OpenVMS Cluster

Use the values shown in Table F-14 to filter all of the LAN packets for a specific cluster. Fill in the value for OpenVMS Cluster group code (nn--nn) to isolate a specific OpenVMS Cluster on the LAN. The filter is named LAVc_all.

Table F-14 Capturing All LAN Packets (LAVc_all)
Byte Number Field Value
1 DESTINATION xx--xx--xx--xx--xx--xx
7 SOURCE xx--xx--xx--xx--xx--xx
13 TYPE 60--07
23 LAVC_GROUP_CODE nn--nn
33 ACKING MESSAGE xx--xx
35 SENDING MESSAGE xx--xx

F.12.3 Setting Up the Distributed Enable Filter

Use the values shown in Table F-15 to set up a filter, named Distrib_Enable, for the distributed enable packet received event. Use this filter to troubleshoot multiple LAN segments.

Table F-15 Setting Up a Distributed Enable Filter (Distrib_Enable)
Byte Number Field Value ASCII
1 DESTINATION 01--4C--41--56--63--45 .LAVcE
7 SOURCE xx--xx--xx--xx--xx--xx  
13 TYPE 60--07 `.
15 TEXT xx  

F.12.4 Setting Up the Distributed Trigger Filter

Use the values shown in Table F-16 to set up a filter, named Distrib_Trigger, for the distributed trigger packet received event. Use this filter to troubleshoot multiple LAN segments.

Table F-16 Setting Up the Distributed Trigger Filter (Distrib_Trigger)
Byte Number Field Value ASCII
1 DESTINATION 01--4C--41--56--63--54 .LAVcT
7 SOURCE xx--xx--xx--xx--xx--xx  
13 TYPE 60--07 `.
15 TEXT xx  

F.13 Messages

This section describes how to set up the distributed enable and distributed trigger messages.

F.13.1 Distributed Enable Message

Table F-17 shows how to define the distributed enable message (Distrib_Enable) by creating a new message. You must replace the source address (nn nn nn nn nn nn) with the LAN address of the LAN analyzer.

Table F-17 Setting Up the Distributed Enable Message (Distrib_Enable)
Field Byte Number Value ASCII
Destination 1 01 4C 41 56 63 45 .LAVcE
Source 7 nn nn nn nn nn nn  
Protocol 13 60 07 `.
Text 15 44 69 73 74 72 69 62 75 74 65 Distribute
  25 64 20 65 6E 61 62 6C 65 20 66 d enable f
  35 6F 72 20 74 72 6F 75 62 6C 65 or trouble
  45 73 68 6F 6F 74 69 6E 67 20 74 shooting t
  55 68 65 20 4C 6F 63 61 6C 20 41 he Local A
  65 72 65 61 20 56 4D 53 63 6C 75 rea VMSclu
  75 73 74 65 72 20 50 72 6F 74 6F ster Proto
  85 63 6F 6C 3A 20 4E 49 53 43 41 col: NISCA

F.13.2 Distributed Trigger Message

Table F-18 shows how to define the distributed trigger message (Distrib_Trigger) by creating a new message. You must replace the source address (nn nn nn nn nn nn) with the LAN address of the LAN analyzer.

Table F-18 Setting Up the Distributed Trigger Message (Distrib_Trigger)
Field Byte Number Value ASCII
Destination 1 01 4C 41 56 63 54 .LAVcT
Source 7 nn nn nn nn nn nn  
Protocol 13 60 07 `.
Text 15 44 69 73 74 72 69 62 75 74 65 Distribute
  25 64 20 74 72 69 67 67 65 72 20 d trigger
  35 66 6F 72 20 74 72 6F 75 62 6C for troubl
  45 65 73 68 6F 6F 74 69 6E 67 20 eshooting
  55 74 68 65 20 4C 6F 63 61 6C 20 the Local
  65 41 72 65 61 20 56 4D 53 63 6C Area VMScl
  75 75 73 74 65 72 20 50 72 6F 74 uster Prot
  85 6F 63 6F 6C 3A 20 4E 49 53 43 ocol: NISC
  95 41 A

F.14 Programs That Capture Retransmission Errors

You can program the HP 4972 LAN Protocol Analyzer, as shown in the following source code, to capture retransmission errors. The starter program initiates the capture across all of the LAN analyzers. Only one LAN analyzer should run a copy of the starter program. Other LAN analyzers should run either the partner program or the scribe program. The partner program is used when the initial location of the error is unknown and when all analyzers should cooperate in the detection of the error. Use the scribe program to trigger on a specific LAN segment as well as to capture data from other LAN segments.

F.14.1 Starter Program

The starter program initially sends the distributed enable signal to the other LAN analyzers. Next, this program captures all of the LAN traffic, and terminates as a result of either a retransmitted packet detected by this LAN analyzer or after receiving the distributed trigger sent from another LAN analyzer running the partner program.

The starter program shown in the following example is used to initiate data capture on multiple LAN segments using multiple LAN analyzers. The goal is to capture the data during the same time interval on all of the LAN segments so that the reason for the retransmission can be located.


Store: frames matching LAVc_all 
 or Distrib_Enable 
 or Distrib_Trigger 
       ending with LAVc_TR_ReXMT 
        or Distrib_Trigger 
 
Log file: not used 
 
Block 1:   Enable_the_other_analyzers 
     Send message Distrib_Enable 
       and then 
     Go to block 2 
 
Block 2:   Wait_for_the_event 
     When frame matches LAVc_TR_ReXMT then go to block 3 
 
Block 3:   Send the distributed trigger 
     Mark frame 
       and then 
     Send message Distrib_Trigger 


Previous Next Contents Index