HP OpenVMS Cluster Systems


Previous Contents Index

F.3.4 Monitoring PEDRIVER for LAN devices

The SDA command PE LAN_DEVICE is useful for displaying PEDRIVER LAN device data. Each LAN device is a local LAN device on the system being used for NISCACP communications.


SDA> PE LAN_DEVICE 

In the following example PE LAN_DEVICE displays the LAN device summary of I64MOZ

Example F-3 SDA Command PE LAN_DEVICE

SDA> PE LAN_DEVICE 
 
PE$SDA Extension on I64MOZ (HP rx4640  (1.50GHz/6.0MB)) at 21-NOV-2008 15:43:12.53 
---------------------------------------------------------------------------------- 
 
I64MOZ Device Summary 21-NOV-2008 15:43:12.53: 
 
         Device  Line Buffer  MgtBuf  Load    Mgt        Current       Total    Errors & 
  Device  Type  Speed  Size  SizeCap  Class Priority   LAN Address     Bytes     Events  Status 
  ------  ----  -----  ----  -------  ----- --------   -----------     -----     ------  ------ 
   LCL             0   1426       0      0      0  00-00-00-00-00-00    31126556       0  Run Online Local Restart 
   EIA           100   1426       0   1000      0  00-30-6E-5D-97-AE     5086238       2  Run Online Restart 
   EIB          1000   1426       0   1000      0  00-30-6E-5D-97-AF           0  229120  Run Online Restart 
 

F.3.5 Monitoring PEDRIVER Buses for LAN Devices

The SDA command SHOW PORT/BUS=BUS_LAN-device command is useful for displaying the PEDRIVER representation of a LAN adapter. To PEDRIVER, a bus is the logical representation of the LAN adapter. (To list the names and addresses of buses, enter the SDA command SHOW PORT/ADDR=PE_PDT and then press the Return key twice.) Example F-4 shows a display for the LAN adapter named EXA.

Example F-4 SDA Command SHOW PORT/BUS Display

SDA> SHOW PORT/BUS=BUS_EXA
VAXcluster data structures 
-------------------------- 
--- BUS: 817E02C0  (EXA)  Device: EX_DEMNA  LAN Address: AA-00-04-00-64-4F --- 
                                   LAN Hardware Address: 08-00-2B-2C-20-B5 
Status: 00000803 run,online(1),restart 
------- Transmit ------  ------- Receive -------  ---- Structure Addresses --- 
Msg Xmt        20290620  Msg Rcv        67321527  PORT Address        817E1140 
  Mcast Msgs    1318437    Mcast Msgs   39773666  VCIB Addr           817E0478 
  Mcast Bytes 168759936    Mcast Bytes 159660184  HELLO Message Addr  817E0508 
Bytes Xmt    2821823510  Bytes Rcv    3313602089  BYE Message Addr    817E0698 
Outstand I/Os         0  Buffer Size        1424  Delete BUS Rtn Adr  80C6DA46 
Xmt Errors(2)      15896  Rcv Ring Size        31 
Last Xmt Error 0000005C         Time of Last Xmt Error(3)21-JAN-1994 15:33:38.96 
--- Receive Errors ----  ------ BUS Timer ------  ----- Datalink Events ------ 
TR Mcast Rcv          0  Handshake TMO  80C6F070  Last  7-DEC-1992 17:15:42.18 
Rcv Bad SCSID         0  Listen TMO     80C6F074  Last Event          00001202 
Rcv Short Msg         0  HELLO timer           3  Port Usable                1 
Fail CH Alloc         0  HELLO Xmt err(4)    1623  Port Unusable              0 
Fail VC Alloc         0                           Address Change             1 
Wrong PORT            0                           Port Restart Fail          0 
 
Field Description
(1) Status: The Status line should always display a status of "online" to indicate that PEDRIVER can access its LAN adapter.
(2) Xmt Errors (transmission errors) Indicates the number of times PEDRIVER has been unable to transmit a packet using this LAN adapter.
(3) Time of Last Xmt Error You can compare the time shown in this field with the Open and Cls times shown in the VC display in Example F-2 to determine whether the time of the LAN adapter failure is close to the time of a virtual circuit failure.

Note: Transmission errors at the LAN adapter bus level cause a virtual circuit breakage.

(4) HELLO Xmt err (HELLO transmission error) Indicates how many times a message transmission failure has "dropped" a PEDRIVER HELLO datagram message. (The Channel Control [CC] level description in Section F.1 briefly describes the purpose of HELLO datagram messages.) If many HELLO transmission errors occur, PEDRIVER on other nodes probably is timing out a channel, which could eventually result in closure of the virtual circuit.

The 1623 HELLO transmission failures shown in Example F-4 contributed to the high number of transmission errors (15896). Note that it is impossible to have a low number of transmission errors and a high number of HELLO transmission errors.

F.3.6 Monitoring LAN Adapters

Use the SDA command SHOW LAN/COUNT to display information about the LAN adapters as maintained by the LAN device driver (the command shows counters for all protocols, not just PEDRIVER [SCA] related counters). Example F-5 shows a sample display from the SHOW LAN/COUNTERS command.

Example F-5 SDA Command SHOW LAN/COUNTERS Display

$ ANALYZE/SYSTEM
SDA> SHOW LAN/COUNTERS
 
LAN Data Structures 
------------------- 
             -- EXA Counters Information 22-JAN-1994 11:21:19 -- 
 
Seconds since zeroed         3953329    Station failures                   0 
Octets received          13962888501    Octets sent              11978817384 
PDUs received              121899287    PDUs sent                   76872280 
Mcast octets received     7494809802    Mcast octets sent          183142023 
Mcast PDUs received         58046934    Mcast PDUs sent              1658028 
Unrec indiv dest PDUs              0    PDUs sent, deferred          4608431 
Unrec mcast dest PDUs              0    PDUs sent, one coll          3099649 
Data overruns                      2    PDUs sent, mul coll          2439257 
Unavail station buffs(1)            0    Excessive collisions(2)          5059 
Unavail user buffers               0    Carrier check failure              0 
Frame check errors               483    Short circuit failure              0 
Alignment errors               10215    Open circuit failure               0 
Frames too long                  142    Transmits too long                 0 
Rcv data length error              0    Late collisions                14931 
802E PDUs received             28546    Coll detect chk fail               0 
802 PDUs received                  0    Send data length err               0 
Eth PDUs received          122691742    Frame size errors                  0 
 
LAN Data Structures 
------------------- 
        -- EXA Internal Counters Information 22-JAN-1994 11:22:28 -- 
 
Internal counters address   80C58257    Internal counters size            24 
Number of ports                    0    Global page transmits              0 
No work transmits            3303771    SVAPTE/BOFF transmits              0 
Bad PTE transmits                  0    Buffer_Adr transmits               0 
 
Fatal error count                  0    RDL errors                         0 
Transmit timeouts                  0    Last fatal error                None 
Restart failures                   0    Prev fatal error                None 
Power failures                     0    Last error CSR              00000000 
Hardware errors                    0    Fatal error code                None 
Control timeouts                   0    Prev fatal error                None 
 
Loopback sent                      0    Loopback failures                  0 
System ID sent                     0    System ID failures                 0 
ReqCounters sent                   0    ReqCounters failures               0 
 
      -- EXA1 60-07 (SCA) Counters Information 22-JAN-1994 11:22:31 -- 
 
Last receive(3)       22-JAN 11:22:31    Last transmit(3)    22-JAN 11:22:31 
Octets received           7616615830    Octets sent               2828248622 
PDUs received               67375315    PDUs sent                   20331888 
Mcast octets received              0    Mcast octets sent                  0 
Mcast PDUs received                0    Mcast PDUs sent                    0 
Unavail user buffer                0    Last start attempt              None 
Last start done       7-DEC 17:12:29    Last start failed               None 
   .
   .
   .

The SHOW LAN/COUNTERS display usually includes device counter information about several LAN adapters. However, for purposes of example, only one device is shown in Example F-5.
Field Description
(1) Unavail station buffs (unavailable station buffers) Records the number of times that fixed station buffers in the LAN driver were unavailable for incoming packets. The node receiving a message can lose packets when the node does not have enough LAN station buffers. (LAN buffers are used by a number of consumers other than PEDRIVER, such as DECnet, TCP/IP, and LAT.) Packet loss because of insufficient LAN station buffers is a symptom of either LAN adapter congestion or the system's inability to reuse the existing buffers fast enough.
(2) Excessive collisions Indicates the number of unsuccessful attempts to transmit messages on the adapter. This problem is often caused by:
  • A LAN loading problem resulting from heavy traffic (70% to 80% utilization) on the specific LAN segment.
  • A component called a screamer. A screamer is an adapter whose protocol does not adhere to Ethernet or FDDI hardware protocols. A screamer does not wait for permission to transmit packets on the adapter, thereby causing collision errors to register in this field.

If a significant number of transmissions with multiple collisions have occurred, then OpenVMS Cluster performance is degraded. You might be able to improve performance either by removing some nodes from the LAN segment or by adding another LAN segment to the cluster. The overall goal is to reduce traffic on the existing LAN segment, thereby making more bandwidth available to the OpenVMS Cluster system.

(3) Last receive and Last transmit The difference in the times shown in the Last receive and Last transmit message fields should not be large. Minimally, the timestamps in these fields should reflect that HELLO datagram messages are being sent across channels every 3 seconds. Large time differences might indicate:
  • A hardware failure
  • Whether or not the LAN driver sees the NISCA protocol as being active on a specific LAN adapter

F.3.7 Monitoring PEDRIVER Buses for IP interfaces

The SDA command SHOW PORT/BUS=BUS_IP_interface command is useful for displaying the PEDRIVER representation of an IP interface. To PEDRIVER, a bus is the logical representation of the IP interface. (To list the names and addresses of buses, enter the SDA command SHOW PORT/ADDR=PE_PDT and then press the Return key twice.) The following example shows a display for the IP interface named IE0. command.

Example F-6 SDA Command SHOW PORT/BUS =BUS_IP_interface

$ ANALYZE/SYSTEM
SDA> SHOW PORT/BUS=886C0010
 
VMScluster data structures 
-------------------------- 
--- BUS: 886C0010  (IE0)  Device: IP  IP Address:  16.138.182.6 (1)
Status: 00004203 run,online,xmt_chaining_disabled (2)
------- Transmit ------  ------- Receive -------  ---- Structure Addresses --- 
Msg Xmt      2345987277 (3)  Msg Rcv      2452130165 (4)  PORT Address        8850B9B8 
  Mcast Msgs          0    Mcast Msgs          0  VCIB Addr           886C02A0 
  Mcast Bytes         0    Mcast Bytes         0  HELLO Message Addr  886C02A0 
Bytes Xmt    3055474713  Bytes Rcv    3545255112  BYE Message Addr    886C05CC 
Outstand I/Os         0  Buffer Size        1394  Delete BUS Rtn Adr  90AA2EC8 
Xmt Errors (5)          0  Rcv Ring Size         0 
 
--- Receive Errors ----  ------ BUS Timer ------  ----- Datalink Events ------ 
TR Mcast Rcv          0  Handshake TMO  00000000  Last 22-SEP-2008 12:20:50.06 
Rcv Bad SCSID         0  Listen TMO     00000000  Last Event          00004002 
Rcv Short Msgs        0  HELLO timer           6  Port Usable                1 
Fail CH Alloc         0  HELLO Xmt err         0  Port Unusable              0 
Fail VC Alloc         0                           Address Change             0 
Wrong PORT            0                           Port Restart Fail          0 
 
Field Description
(1) IP Address Displays the IP address of the interface.
(2) Status The Status line should always display a status of "online" to indicate that PEDRIVER can access its IP interface.
(3) Msg Xmt (messages transmitted) Shows the total number of packets transmitted over the virtual circuit to the remote node. It provides the Multicast (mcast) and Multicast bytes transmitted.
(4) Msg Rcv (messages received) Shows the total number of packets received over the virtual circuit from the remote node. It provides the Multicast (mcast) and Multicast bytes transmitted.
(5) Xmt Errors (transmission errors) Indicates the number of times PEDRIVER has been unable to transmit a packet using this IP interface.

F.3.8 Monitoring PEDRIVER Channels for IP Interfaces

The SDA command SHOW PORT/Channel=Channel_IP_interface command is useful for displaying the PEDRIVER representation of an IP interface. To the PEDRIVER, a channel is the logical communication path between two IP interfaces located on different nodes. (To list the names and addresses of channels created, enter the SDA command SHOW SYMBOL CH_* and then press the Return key.) The following example shows a display for the IP interface named IE0.

Example F-7 SDA Command SHOW PORT/CHANNEL Display

$ ANALYZE/SYSTEM
SDA>  show port/channel=CH_OOTY_IE0_WE0
VMScluster data structures 
-------------------------- 
 -- PEDRIVER Channel (CH:886C5A40) for Virtual Circuit (VC:88161A80) OOTY   -- 
State: 0004 open                Status: 6F path,open,xchndis,rmhwavld,tight,fast 
                                ECS Status: Tight,Fast 
BUS: 886BC010 (IE0)  Lcl Device: IP    Lcl IP Address: 16.138.182.6 1 (1)
Rmt BUS Name:  WE0   Rmt Device: IP    Rmt IP Address: 15.146.235.10 2 (2)
Rmt Seq #: 0004  Open:  4-OCT-2008 00:18:58.94  Close:  4-OCT-2008 00:18:24.53 
 
- Transmit Counters ---  - Receive Counters ----  - Channel Characteristics -- 
Bytes Xmt     745486312  Bytes Rcv    2638847244  Protocol Version       1.6.0 
Msg Xmt        63803681  Msg Rcv       126279729  Supported Services  00000000 
  Ctrl Msgs         569    Ctrl Msgs         565  Local CH Sequence #     0003 
  Ctrl Bytes      63220    Ctrl Bytes      62804  Average RTT (usec)    5780.8 
                           Mcast Msgs     106871  Buffer Size: 
                           Mcast Bytes  11114584    Current               1394 
- Errors ---------------------------------------    Remote                1394 
Listen TMO            2  Short CC Msgs         0    Local                 1394 
TR ReXmt            605  Incompat Chan         0    Negotiated            1394 
DL Xmt Errors         0  No MSCP Srvr          0  Priority                   0 
CC HS TMO             0  Disk Not Srvd         0  Hops                       2 
Bad Authorize         0  Old Rmt Seq#          0  Load Class               100 
Bad ECO               0                           Rmt TR Rcv Cache Size     64 
Bad Multicast         0                           Rmt DL Rcv Buffers         8 
                                                  Losses                     0 
- Miscellaneous -------  - Buf Size Probing-----  - Delay Probing ------------ 
Prv Lstn Timer        5  SP Schd Timeout       6  DP Schd Timeouts           0 
Next ECS Chan  886C5A40  SP Starts             1  DP Starts                  0 
                         SP Complete           1  DP Complete                0 
- Management ----------  SP HS TMO             0  DP HS TMO                  1 
Mgt Priority          0  HS Remaining Retries  4 
Mgt Hops              0  Last Probe Size    1395 
Mgt Max Buf Siz    8110 
 
Field Description
(1) Lcl IP Address (Local IP Address) Displays the IP address of the local interface.
(2) Rmt IP Address (Remote IP Address) Displays the IP address of the remote interface.

F.4 Using SCACP to Monitor Cluster Communications

The SCA Control Program (SCACP) utility is designed to monitor and manage cluster communications. It is derived from the Systems Communications Architecture (SCA), which defines the communications mechanisms that allow nodes in an OpenVMS Cluster system to cooperate.

SCA does the following:

To invoke SCACP, enter the following command at the DCL prompt:


$ RUN SYS$SYSTEM:SCACP 

SCACP displays the following prompt, at which you can enter SCACP commands using the standard rules of DCL syntax:


SCACP> 

For more information about SCACP, see HP OpenVMS System Management Utilities Reference Manual.

F.5 Troubleshooting NISCA Communications

F.5.1 Areas of Trouble

Sections F.6 and F.7 describe two likely areas of trouble for LAN networks: channel formation and retransmission. The discussions of these two problems often include references to the use of a LAN analyzer tool to isolate information in the NISCA protocol.

Reference: As you read about how to diagnose NISCA problems, you may also find it helpful to refer to Section F.8, which describes the NISCA protocol packet, and Section F.9, which describes how to choose and use a LAN network failure analyzer.

F.6 Channel Formation

Channel-formation problems occur when two nodes cannot communicate properly between LAN adapters.

F.6.1 How Channels Are Formed

Table F-7 provides a step-by-step description of channel formation.

Table F-7 Channel Formation
Step Action
1 Channels are formed when a node sends a HELLO datagram from its LAN adapter to a LAN adapter on another cluster node. If this is a new remote LAN adapter address, or if the corresponding channel is closed, the remote node receiving the HELLO datagram sends a CCSTART datagram to the originating node after a delay of up to 2 seconds.
2 Upon receiving a CCSTART datagram, the originating node verifies the cluster password and, if the password is correct, the node responds with a VERF datagram and waits for up to 5 seconds for the remote node to send a VACK datagram. (VERF, VACK, CCSTART, and HELLO datagrams are described in Section F.8.5.)
3 Upon receiving a VERF datagram, the remote node verifies the cluster password; if the password is correct, the node responds with a VACK datagram and marks the channel as open. (See Figure F-3.)
4  
WHEN the local node... THEN...
Does not receive the VACK datagram within 5 seconds The channel state goes back to closed and the handshake timeout counter is incremented.
Receives the VACK datagram within 5 seconds and the cluster password is correct The channel is opened.
5 Once a channel has been formed, it is maintained (kept open) by the regular multicast of HELLO datagram messages. Each node multicasts a HELLO datagram message at least once every 3.0 seconds over each LAN adapter. Either of the nodes sharing a channel closes the channel with a listen timeout if it does not receive a HELLO datagram or a sequence message from the other node within 8 to 9 seconds. If you receive a "Port closed virtual circuit" message, it indicates a channel was formed but there is a problem receiving traffic on time. When this happens, look for HELLO datagram messages getting lost.

Figure F-3 shows a message exchange during a successful channel-formation handshake.

Figure F-3 Channel-Formation Handshake



Previous Next Contents Index