DECamds User's Guide
3.14.1 Notes About the Display
Following are notes about the display of data in the window:
- The window does not follow highlighting conventions: virtual
circuit lines are displayed normally and are left-aligned; SysApp lines
are dimmed and are indented by a column.
- You cannot filter out any data.
- The data items in the window are sorted on an "as-found" basis. You
cannot change sort criteria at this time.
- DECamds signals the LOSTVC event when a virtual circuit between two
nodes has been lost. This loss might be due either to a cluster node
crashing or to cluster problems that caused the virtual circuit to
close.
LOSTVC, <node> lost virtual circuit (<string>) to node <node>
|
- You can change collection intervals.
3.15 NISCA Summary Window
The Network Interconnect System Communication Architecture (NISCA) is
the transport protocol responsible for carrying messages such as disk
I/Os and lock messages across Ethernet and FDDI LANs to other nodes in
the cluster. More detailed information about the protocol is in the
OpenVMS Cluster Systems manual.
The NISCA Summary window shown in Figure 3-17 displays detailed
information about the LAN (Ethernet or FDDI) connection between two
nodes. DECamds displays one window per virtual circuit provided the
virtual circuit is running over a PEA0: device.
The purpose of this window is to view statistics in real time and to
troubleshoot problems found in the NISCA protocol. The window is
intended primarily as an aid to diagnosing LAN-related problems. The
OpenVMS Cluster Systems manual describes the parameters shown in this window and
tells how to use them to diagnose LAN-related cluster problems.
The window provides the same information as the OpenVMS System Dump
Analyzer (SDA) command SHOW PORTS/VC=VC_nodex. (VC refers to virtual
circuit; nodex is a node in the cluster. The system defines VC-nodex
after a SHOW PORTS command is issued from SDA.)
Figure 3-17 NISCA Summary Window
To open an NISCA Summary window, do one of the following:
- In the SCA Summary window, click MB3 on a row with the PEA0:
Virtual Circuit. Choose View SysApps from the popup menu, click MB3 on
a SysApps node, and Choose Display NISCA. The system displays the NISCA
Summary window.
Note: If the Display NISCA option
is dimmed, the NISCA protocol is not running for that system
application.
- Double-click MB1 on a row with a PEA0: to display an expanded list
below the node name.
- Double-click MB1 on a SysApps node to display the NISCA Summary
window.
3.15.1 Data Displayed
Panels in the NISCA Summary window contain the data described in the
following tables.
Table 3-17 lists data items displayed in the Transmit Panel, which
contains data packet transmission information.
Table 3-17 Data Items in the Transmit Panel
Data Item |
Description |
Packets
|
Number of packets transmitted through the virtual circuit to the remote
node, including both sequenced and unsequenced (channel control)
messages, and lone acknowledgments.
|
Unsequenced (DG)
|
Count and rate of the number of unsequenced datagram packages
transmitted.
|
Sequenced
|
Count and rate of the number of sequenced packages transmitted.
Sequenced messages are used for application data.
|
Lone ACK
|
Count and rate of the number of lone acknowledgments.
|
ReXmt Count
|
Number of packets retransmitted. Retransmission occurs when the local
node does not receive an acknowledgment for a transmitted packet within
a predetermined timeout interval.
|
ReXmt Timeout
|
Number of retransmission timeouts that have occurred.
|
ReXmt Ratio
|
Ratio of ReXmt Count current and past to the current and past number of
sequenced messages sent.
|
Bytes
|
Count and rate of the number of bytes transmitted through the virtual
circuit.
|
Table 3-18 describes data items displayed in the Receive Panel,
which contains data packet reception information.
Table 3-18 Data Items in the Receive Panel
Data Item |
Description |
Packets
|
Number of packets transmitted through the virtual circuit to the remote
node, including both sequenced and unsequenced (channel control)
messages, and lone acknowledgments.
|
Unsequenced (DG)
|
Count and rate of the number of unsequenced packages received.
|
Sequenced
|
Count and rate of the number of sequenced packages received. Sequenced
messages are used for application data.
|
Lone ACK
|
Count and rate of the number of lone acknowledgments.
|
Duplicate
|
Number of redundant packets received by this system.
|
Out of Order
|
Number of packets received out of order by this system.
|
Illegal Ack
|
Number of illegal acknowledgments received.
|
Bytes
|
Count and rate of the number of bytes received through the virtual
circuit.
|
Table 3-19 describes data items displayed in the Congestion Control
Panel, which contains transmit congestion control information.
The values in the panel list the number of messages that can be sent to
the remote node before receiving an acknowledgment and the
retransmission timeout.
Table 3-19 Data Items in the Congestion Control Panel
Data Item |
Description |
Transmit Window Current
|
Current value of the pipe quota (transmit window). After a timeout, the
pipe quota is reset to 1 to decrease congestion and is allowed to
increase quickly as acknowledgments are received.
|
Transmit Window Grow
|
The slow growth threshold: size at which the rate of increase is slowed
to avoid congestion on the network again.
|
Transmit Window Max
|
Maximum value of pipe quota currently allowed for the virtual circuit
based on channel limitations.
|
Transmit Window Reached
|
Number of times the entire transmit window was full. If this number is
small as compared with the number of sequenced messages transmitted,
the local node is not sending large bursts of data to the remote node.
|
Roundtrip uSec
|
Average roundtrip time for a packet to be sent and acknowledged. The
value is displayed in microseconds.
|
Roundtrip Deviation uSec
|
Average deviation of the roundtrip time. The value is displayed in
microseconds.
|
Retransmit Timeout uSec
|
Value used to determine packet retransmission timeout. If a packet does
not receive either an acknowledging or a responding packet, the packet
is assumed to be lost and will be resent.
|
UnAcked Messages
|
Number of unacknowledged messages.
|
CMD Queue Length
|
Current length of all command queues.
|
CMD Queue Max
|
Maximum number of commands in queues so far.
|
Table 3-20 describes data items displayed in the Channel Selection
Panel, which contains channel selection information.
Table 3-20 Data Items in the Channel Selection Panel
Data Item |
Description |
Buffer Size
|
Maximum PPC data buffer size for this virtual circuit.
|
Channel Count
|
Number of channels connected to this virtual circuit.
|
Channel Selections
|
Number of channel selections performed.
|
Protocol
|
NISCA Protocol version.
|
Local Device
|
Name of the local device that the channel uses to send and receive
packets.
|
Local LAN Address
|
Address of the local LAN device that performs sends and receives.
|
Remote Device
|
Name of the remote device that the channel uses to send and receive
packets.
|
Remote LAN Address
|
Address of the remote LAN device performing the sends and receives.
|
Table 3-21 describes data items displayed in the VC Closures panel,
which contains information about the number of times a virtual circuit
has closed for a particular reason.
Table 3-21 Data Items in the VC Closures Panel
Data Item |
Description |
SeqMsg TMO
|
Number of sequence transmit timeouts.
|
CC DFQ Empty
|
Number of times the channel control DFQ was empty.
|
Topology Change
|
Number of times PEDRIVER performed a failover from FDDI to Ethernet,
necessitating the closing and reopening of the virtual circuit.
|
NPAGEDYN Low
|
Number of times the virtual circuit was lost because of a pool
allocation failure on the local node.
|
Table 3-22 lists data items displayed in the Packets Discarded
Panel, which contains information about the number of times packets
were discarded for a particular reason.
Table 3-22 Data Items in the Packets Discarded Panel
Data Item |
Description |
No Xmt Chan
|
Number of times there was no transmit channel.
|
Ill Seq Msg
|
Number of times an illegal sequenced message was received.
|
TR DFQ Empty
|
Number of times the Transmit DFQ was empty.
|
CC MFQ Empty
|
Number of times the Control Channel MFQ was empty.
|
Rcv Short Msg
|
Number of times a short transport message was received.
|
Bad Checksum
|
Number of times there was a checksum failure.
|
TR MFQ Empty
|
Number of times the Transmit MFQ was empty.
|
Cache Miss
|
Number of messages that could not be placed in the cache.
|
3.15.2 Notes About the Display
Following are notes about the display of data in the window:
- No highlighting conventions are used in the NISCA Summary window.
- You cannot sort or filter the data displayed in this window.
- You can change collection intervals.
Chapter 4 Performing Fixes
You can perform fixes to resolve resource availability
problems and improve system availability.
This chapter covers the following topics:
- Understanding fixes
- Performing fixes
- Typical fix examples
Caution
Performing certain actions to fix a problem can have serious
repercussions on a system, including possibly causing a system failure.
Therefore, only experienced system managers should perform fixes.
|
4.1 Understanding Fixes
When DECamds detects a resource availability problem, it analyzes the
problem and proposes one or more fixes to improve the situation. Most
fixes correspond to an OpenVMS system service call.
The following fixes are available from DECamds:
Fix Category |
Possible Fixes |
System Service Call |
Memory usage fixes
|
Adjust working set
Purge working set
|
$ADJWSL
$PURGWS
|
Process fixes
|
Delete a process
Exit an image
|
$DELPRC
$FORCEX
|
Adjust Process Quota Limit fix
|
Change limits for AST, BIO, DIO, ENQ, FIL, PRC, and TQE process quota
limits
|
None
|
Process state fixes
|
Resume a process
Suspend a process
|
$RESUME
$SUSPND
|
Process priority fixes
|
Lower or raise a process priority
|
$SETPRI
|
Quorum fix
|
Adjust cluster quorum
|
None
|
System fix
|
Crash node
|
None
|
Before you perform a fix, you should understand the following
information:
- Fixes are optional.
- You must have write access to perform a fix. (See Section 1.3 for
more information about DECamds security.)
- You cannot undo many fixes. (After using the crash node fix, for
example, the node must be rebooted.)
- The exit image, delete process, and suspend process fixes should
not be applied to system processes. Doing so can require rebooting the
node.
- Whenever you exit an image, you cannot return to that image.
- Processes that have exceeded their job or process quota cannot be
deleted.
- DECamds ignores fixes applied to the SWAPPER process.
4.2 Performing Fixes
Standard OpenVMS privileges restrict write access of users. When you
run the Data Analyzer, you must have the CMKRNL privilege to send a
write (fix) instruction to a node with a problem.
To initiate a fix, perform one of the following actions:
- From any of the data windows, double-click on a process, and then
choose an action from the Fix menu.
- Click MB3 on an event, and choose Fix from the menu.
DECamds displays a dialog box listing the fixes you can perform for the
selected event. The recommended choice is highlighted. When you click
on OK or Apply, DECamds performs one of the following actions:
- If the event you selected is not specific to a certain process,
DECamds automatically performs the fix. Some fixes are performed
automatically when "(automatic)" is displayed next to the
selection.
- If the event is specific to a process, DECamds displays another
dialog box in which you can specify the fix parameters. For example,
for the Adjust Working Set Size fix, you specify a new working set size
for the process.
DECamds performs the highlighted fix as long as
the event still exists. If the event you are fixing has changed, the
dialog box disappears when you click on OK, Apply, or Cancel, and the
fix is not performed.
Table 4-1 summarizes all fixes alphabetically and specifies the
windows from which they are available.
Table 4-1 Summary of DECamds Fixes
Problem to be Solved |
Fix |
Available From |
Effects |
Process quota has reached its limit and has entered RWAIT state
|
Adjust Process Quota Limit
|
Single Process Summary
Event Log
|
Process receives greater limit.
|
Cluster hung
|
Adjust Quorum
|
Node Summary
Cluster Transition/Overview Summary
|
Quorum for cluster is adjusted.
|
Working set too high or low
|
Adjust Working Set
|
Memory Summary
Single Process Summary
Event Log
|
Removes unused pages from working set; page faulting might occur.
|
Runaway process
|
Change Process Priority
|
CPU Summary
Single Process Summary
Event Log
|
Priority stays at selected setting.
|
Node resource hanging cluster
|
Crash Node
|
System Overview
Node Summary
Single Lock Summary
|
Node crashes with operator requested shutdown.
|
Process looping, intruder
|
Delete Process
|
Any process window
|
Process no longer exists.
|
Endlessly process loop in same PC range
|
Exit Image
|
Any process window
|
Exit from current image.
|
Node or process low memory
|
Purge Working Set
|
Event Log
Memory Summary
Single Process Summary
|
Frees memory; page faulting might occur.
|
Process previously suspended
|
Resume Process
|
Event Log
Memory Summary
CPU Summary
Process I/O Summary
Single Process Summary
|
Process starts from point it was suspended.
|
Runaway process, unwelcome intruder
|
Suspend Process
|
Event Log
Memory Summary
CPU Summary
Process I/O Summary
Single Process Summary
|
Process gets no computes.
|
The following sections provide reference information about each DECamds
fix.
4.2.1 Adjust Quorum Fix
When you perform the Adjust Quorum fix, DECamds displays a dialog box
similar to the one shown in Figure 4-1.
Figure 4-1 FIX Adjust Quorum Dialog Box
The Adjust Quorum fix forces the node to refigure the quorum value.
This fix is the equivalent of the Interrupt Priority C (IPC) mechanism
used at system consoles for the same purpose. The fix forces the
adjustment for the entire cluster so that each node in the cluster will
have the same new quorum value.
The Adjust Quorum fix is useful when the number of votes in a cluster
falls below the quorum set for that cluster. This fix allows you to
readjust the quorum so that it corresponds to the current number of
votes in the cluster.
4.2.2 Adjust Process Quota Limit
When you perform the Adjust Process Quota Limit fix, DECamds displays a
dialog box similar to the one shown in Figure 4-2.
Figure 4-2 FIX Adjust Process Quota Limit Dialog Box
If a process is waiting for a resource, you can use the Adjust Process
Quota Limit fix to increase the resource limit so that the process can
continue. The increased limit is only in effect for the life of the
process, however; any new process will be assigned the quota set in the
UAF.
To use this fix, select the resource and then use the slide bar to
change the current setting. Finally, select one of the following:
- OK --- to apply the fix and exit the window
- Apply --- to apply the fix and not exit the window (so that you can
continue to make changes)
- Cancel --- not to perform the fix and exit the window
4.2.3 Adjust Working Set Fix
When you perform the Adjust Working Set fix, DECamds displays a dialog
box similar to the one shown in Figure 4-3.
Figure 4-3 FIX Adjust Working Set Size Dialog Box
Adjusting the working set can give needed memory to other processes
that are page faulting. In your adjustment, try to bring the working
set size closer to the actual count being used by nonpage faulting
processes.
Caution
If the automatic working set adjustment is enabled for the system, a
fix to Adjust Working Set Size will disable the automatic adjustment
for the process.
|
|