|
OpenVMS Cluster Systems
C.3 Satellite Fails to Boot
To boot successfully, a satellite must communicate with a MOP server
over the LAN. You can use DECnet event logging to verify this
communication. Proceed as follows:
Step |
Action |
1
|
Log in as system manager on the MOP server.
|
2
|
If event logging for management-layer events is not already enabled,
enter the following NCP commands to enable it:
NCP> SET LOGGING MONITOR EVENT 0.*
NCP> SET LOGGING MONITOR STATE ON
|
3
|
Enter the following DCL command to enable the terminal to receive
DECnet messages reporting downline load events:
$ REPLY/ENABLE=NETWORK
|
4
|
Boot the satellite. If the satellite and the MOP server can communicate
and all boot parameters are correctly set, messages like the following
are displayed at the MOP server's terminal:
DECnet event 0.3, automatic line service
From node 2.4 (URANUS), 15-JAN-1994 09:42:15.12
Circuit QNA-0, Load, Requested, Node = 2.42 (OBERON)
File = SYS$SYSDEVICE:<SYS10.>, Operating system
Ethernet address = 08-00-2B-07-AC-03
DECnet event 0.3, automatic line service
From node 2.4 (URANUS), 15-JAN-1994 09:42:16.76
Circuit QNA-0, Load, Successful, Node = 2.44 (ARIEL)
File = SYS$SYSDEVICE:<SYS11.>, Operating system
Ethernet address = 08-00-2B-07-AC-13
WHEN... |
THEN... |
The satellite cannot communicate with the MOP server (VAX or Alpha).
|
No message for that satellite appears. There may be a problem with a
LAN cable connection or adapter service.
|
The satellite's data in the DECnet database is incorrectly specified
(for example, if the hardware address is incorrect).
|
A message like the following displays the correct address and indicates
that a load was requested:
DECnet event 0.7, aborted service
request
From node 2.4 (URANUS) 15-JAN-1994
Circuit QNA-0, Line open error
Ethernet address=08-00-2B-03-29-99
Note the absence of the node name, node address, and system root.
|
|
Sections C.3.2 through C.3.5 provide more information
about satellite boot troubleshooting and often recommend that you
ensure that the system parameters are set correctly.
C.3.1 Displaying Connection Messages
To enable the display of connection messages during a conversational
boot, perform the following steps:
Step |
Action |
1
|
Enable conversational booting by setting the satellite's
NISCS_CONV_BOOT system parameter to 1. On Alpha systems, update the
ALPHAVMSSYS.PAR file, and on VAX systems update the VAXVMSSYS.PAR file
in the system root on the disk server.
|
2
|
Perform a conversational boot.
++On Alpha systems, enter the following command at the console:
>>> b -flags 0,1
|
|
+On VAX systems, set bit <0> in register R5. For example, on a
VAXstation 3100 system, enter the following command on the console:
>>> B/1
|
3
|
Observe connection messages.
Display connection messages during a satellite boot to determine
which system in a large cluster is serving the system disk to a cluster
satellite during the boot process. If booting problems occur, you can
use this display to help isolate the problem with the system that is
currently serving the system disk. Then, if your server system has
multiple LAN adapters, you can isolate specific LAN adapters.
|
4
|
Isolate LAN adapters.
Isolate a LAN adapter by methodically rebooting with only one
adapter connected. That is, disconnect all but one of the LAN adapters
on the server system and reboot the satellite. If the satellite boots
when it is connected to the system disk server, then follow the same
procedure using a different LAN adapter. Continue these steps until you
have located the bad adapter.
|
+VAX specific
++Alpha specific
Reference: See also Appendix C for help with
troubleshooting satellite booting problems.
C.3.2 General OpenVMS Cluster Satellite-Boot Troubleshooting
If a satellite fails to boot, use the steps outlined in this section to
diagnose and correct problems in OpenVMS Cluster systems.
Step |
Action |
1
|
Verify that the boot device is available. This check is particularly
important for clusters in which satellites boot from multiple system
disks.
|
2
|
Verify that the DECnet network is up and running.
|
3
|
Check the cluster group code and password. The cluster group code and
password are set using the CLUSTER_CONFIG.COM procedure.
|
4
|
Verify that you have installed the correct OpenVMS Alpha and OpenVMS
VAX licenses.
|
5
|
Verify system parameter values on each satellite node, as follows:
VAXCLUSTER = 2
NISCS_LOAD_PEA0 = 1
NISCS_LAN_OVRHD = 0
NISCS_MAX_PKTSZ = 1498
1
SCSNODE is the name of the computer.
SCSSYSTEMID is a number that identifies the computer.
VOTES = 0
The SCS parameter values are set differently depending on your
system configuration.
Reference: Appendix A describes how to set these
SCS parameters.
To check system parameter values on a satellite node that cannot
boot, invoke the SYSGEN utility on a running system in the OpenVMS
Cluster that has access to the satellite node's local root. (Note that
you must invoke the SYSGEN utility from a node that is running the same
type of operating system---for example, to troubleshoot an Alpha
satellite node, you must run the SYSGEN utility on an Alpha system.)
Check system parameters as follows:
Step |
Action |
A
|
Find the local root of the satellite node on the system disk. The
following example is from an Alpha system running DECnet for OpenVMS:
$ MCR NCP SHOW NODE HOME CHARACTERISTICS
Node Volatile Characteristics as of 10-JAN-1994 09:32:56
Remote node = 63.333 (HOME)
Hardware address = 08-00-2B-30-96-86
Load file = APB.EXE
Load Assist Agent = SYS$SHARE:NISCS_LAA.EXE
Load Assist Parameter = ALPHA$SYSD:[SYS17.]
The local root in this example is ALPHA$SYSD:[SYS17.].
Reference: Refer to the DECnet--Plus documentation for
equivalent information using NCL commands.
|
B
|
Enter the SHOW LOGICAL command at the system prompt to translate the
logical name for ALPHA$SYSD.
$ SHO LOG ALPHA$SYSD
"ALPHA$SYSD" = "$69$DUA121:" (LNM$SYSTEM_TABLE)
|
C
|
Invoke the SYSGEN utility on the system from which you can access the
satellite's local disk. (This example invokes the SYSGEN utility on an
Alpha system using the Alpha parameter file ALPHAVMSSYS.PAR. The SYSGEN
utility on VAX systems differs in that it uses the VAX parameter file
VAXVMSSYS.PAR). The following example illustrates how to enter the
SYSGEN command USE with the system parameter file on the local root for
the satellite node and then enter the SHOW command to query the
parameters in question.
$ MCR SYSGEN
SYSGEN> USE $69$DUA121:[SYS17.SYSEXE]ALPHAVMSSYS.PAR
SYSGEN> SHOW VOTES
Parameter
Name Current Default Min. Max. Unit Dynamic
--------- ------- ------- --- ----- ---- -------
VOTES 0 1 0 127 Votes
SYSGEN> EXIT
|
|
1For Ethernet adapters, the value of NISCS_MAX_PKTSZ is
1498. For FDDI adapters, the value is 4468.
C.3.3 MOP Server Troubleshooting
To diagnose and correct problems for MOP servers, follow the steps
outlined in this section.
Step |
Action |
1
|
Perform the steps outlined in Section C.3.2.
|
2
|
Verify the NCP circuit state is on and the service is enabled. Enter
the following commands to run the NCP utility and check the NCP circuit
state.
$ MCR NCP
NCP> SHOW CIRCUIT ISA-0 CHARACTERISTICS
Circuit Volatile Characteristics as of 12-JAN-1994 10:08:30
Circuit = ISA-0
State = on
Service = enabled
Designated router = 63.1021
Cost = 10
Maximum routers allowed = 33
Router priority = 64
Hello timer = 15
Type = Ethernet
Adjacent node = 63.1021
Listen timer = 45
|
3
|
If service is not enabled, you can enter NCP commands like the
following to enable it:
NCP> SET CIRCUIT
circuit-id STATE OFF
NCP> DEFINE CIRCUIT
circuit-id SERVICE ENABLED
NCP> SET CIRCUIT
circuit-id SERVICE ENABLED STATE ON
The DEFINE command updates the permanent database and ensures that
service is enabled the next time you start the network. Note that
DECnet traffic is interrupted while the circuit is off.
|
4
|
Verify that the load assist parameter points to the system disk and the
system root for the satellite.
|
5
|
Verify that the satellite's system disk is mounted on the MOP server
node.
|
6
|
++On Alpha systems, verify that the load file is APB.EXE.
|
7
|
For MOP booting, the satellite node's parameter file (ALPHAVMSYS.PAR
for Alpha computers and VAXVMSSYS.PAR for VAX computers) must be
located in the [SYSEXE] directory of the satellite system root.
|
8
|
Ensure that the file CLUSTER_AUTHORIZE.DAT is located in the
[SYSCOMMON.SYSEXE] directory of the satellite system root.
|
++Alpha specific
C.3.4 Disk Server Troubleshooting
To diagnose and correct problems for disk servers, follow the steps
outlined in this section.
Step |
Action |
1
|
Perform the steps in Section C.3.2.
|
2
|
For each satellite node, verify the following system parameter values:
MSCP_LOAD = 1
MSCP_SERVE_ALL = 1
|
3
|
The disk servers for the system disk must be connected directly to the
disk.
|
C.3.5 Satellite Booting Troubleshooting
To diagnose and correct problems for satellite booting, follow the
steps outlined in this section.
Step |
Action |
1
|
Perform the steps in Sections C.3.2, C.3.3, and
C.3.4.
|
2
|
For each satellite node, verify that the VOTES system parameter is set
to 0.
|
3
|
++On Alpha systems, verify the DECnet network database on the MOP
servers by running the NCP utility and entering the following commands
to display node characteristics. The following example displays
information about an Alpha node named UTAH:
$ MCR NCP
NCP> SHOW NODE UTAH CHARACTERISTICS
Node Volatile Characteristics as of 15-JAN-1994 10:28:09
Remote node = 63.227 (UTAH)
Hardware address = 08-00-2B-2C-CE-E3
Load file = APB.EXE
Load Assist Agent = SYS$SHARE:NISCS_LAA.EXE
Load Assist Parameter = $69$DUA100:[SYS17.]
The load file must be APB.EXE. In addition, when booting Alpha
nodes, for each LAN adapter specified on the boot command line, the
load assist parameter must point to the same system disk and root
number.
|
4
|
+On VAX systems, verify the DECnet network database on the MOP servers
by running the NCP utility and entering the following commands to
display node characteristics. The following example displays
information about a VAX node named ARIEL:
$ MCR NCP
NCP> SHOW CHAR NODE ARIEL
Node Volatile Characteristics as of 15-JAN-1994 13:15:28
Remote node = 2.41 (ARIEL)
Hardware address = 08-00-2B-03-27-95
Tertiary loader = SYS$SYSTEM:TERTIARY_VMB.EXE
Load Assist Agent = SYS$SHARE:NISCS_LAA.EXE
Load Assist Parameter = DISK$VAXVMSRL5:<SYS12.>
Note that on VAX nodes, the tertiary loader is
SYS$SYSTEM:TERTIARY_VMB.EXE.
|
5
|
On Alpha and VAX systems, verify the following information in the NCP
display:
Step |
Action |
A
|
Verify the DECnet address for the node.
|
B
|
Verify the load assist agent is SYS$SHARE:NISCS_LAA.EXE.
|
C
|
Verify the load assist parameter points to the satellite system disk
and correct root.
|
D
|
Verify that the hardware address matches the satellite's Ethernet
address. At the satellite's console prompt, use the information shown
in Table 8-3 to obtain the satellite's current LAN hardware address.
Compare the hardware address values displayed by NCP and at the
satellite's console. The values should be identical and should also
match the value shown in the SYS$MANAGER:NETNODE_UPDATE.COM file. If
the values do not match, you must make appropriate adjustments. For
example, if you have recently replaced the satellite's LAN adapter, you
must execute CLUSTER_CONFIG.COM CHANGE function to update the network
database and NETNODE_UPDATE.COM on the appropriate MOP server.
|
|
6
|
Perform a conversational boot to determine more precisely why the
satellite is having trouble booting. The conversational boot procedure
displays messages that can help you solve network booting problems. The
messages provide information about the state of the network and the
communications process between the satellite and the system disk server.
Reference: Section C.3.6 describes booting messages
for Alpha systems.
|
+VAX specific
++Alpha specific
C.3.6 Alpha Booting Messages (Alpha Only)
On Alpha systems, the messages are displayed as shown in Table C-2.
Table C-2 Alpha Booting Messages (Alpha Only)
Message |
Comments |
%VMScluster-I-MOPSERVER, MOP server for downline load was node UTAH |
This message displays the name of the system providing the DECnet MOP
downline load. This message acknowledges that control was properly
transferred from the console performing the MOP load to the image that
was loaded.
|
If this message is not displayed, either the MOP load failed or the
wrong file was MOP downline loaded.
|
%VMScluster-I-BUSONLINE, LAN adapter is now running 08-00-2B-2C-CE-E3 |
This message displays the LAN address of the Ethernet or FDDI adapter
specified in the boot command. Multiple lines can be displayed if
multiple LAN devices were specified in the boot command line. The
booting satellite can now attempt to locate the system disk by sending
a message to the cluster multicast address.
|
If this message is not displayed, the LAN adapter is not initialized
properly. Check the physical network connection. For FDDI, the adapter
must be on the ring.
|
%VMScluster-I-VOLUNTEER, System disk service volunteered by node EUROPA AA-00-04-00-4C-FD |
This message displays the name of a system claiming to serve the
satellite system disk. This system has responded to the multicast
message sent by the booting satellite to locate the servers of the
system disk.
|
If this message is not displayed, one or more of the following
situations may be causing the problem:
- The network path between the satellite and the boot server either
is broken or is filtering the local area OpenVMS Cluster multicast
messages.
- The system disk is not being served.
- The CLUSTER_AUTHORIZE.DAT file on the system disk does not match
the other cluster members.
|
%VMScluster-I-CREATECH, Creating channel to node EUROPA 08-00-2B-2C-CE-E2 08-00-2B-12-AE-A2 |
This message displays the LAN address of the local LAN adapter (first
address) and of the remote LAN adapter (second address) that form a
communications path through the network. These adapters can be used to
support a NISCA virtual circuit for booting. Multiple messages can be
displayed if either multiple LAN adapters were specified on the boot
command line or the system serving the system disk has multiple LAN
adapters.
|
If you do not see as many of these messages as you expect, there may be
network problems related to the LAN adapters whose addresses are not
displayed. Use the Local Area OpenVMS Cluster Network Failure Analysis
Program for better troubleshooting (see Section D.5).
|
%VMScluster-I-OPENVC, Opening virtual circuit to node EUROPA |
This message displays the name of a system that has established an
NISCA virtual circuit to be used for communications during the boot
process. Booting uses this virtual circuit to connect to the remote
MSCP server.
|
|
%VMScluster-I-MSCPCONN, Connected to a MSCP server for the system disk, node EUROPA |
This message displays the name of a system that is actually serving the
satellite system disk.
|
If this message is not displayed, the system that claimed to serve the
system disk could not serve the disk. Check the OpenVMS Cluster
configuration.
|
%VMScluster-W-SHUTDOWNCH, Shutting down channel to node EUROPA 08-00-2B-2C-CE-E3 08-00-2B-12-AE-A2 |
This message displays the LAN address of the local LAN adapter (first
address) and of the remote LAN adapter (second address) that have just
lost communications. Depending on the type of failure, multiple
messages may be displayed if either the booting system or the system
serving the system disk has multiple LAN adapters.
|
|
%VMScluster-W-CLOSEVC, Closing virtual circuit to node EUROPA |
This message indicates that NISCA communications have failed to the
system whose name is displayed.
|
|
%VMScluster-I-RETRY, Attempting to reconnect to a system disk server |
This message indicates that an attempt will be made to locate another
system serving the system disk. The LAN adapters will be reinitialized
and all communications will be restarted.
|
|
%VMScluster-W-PROTOCOL_TIMEOUT, NISCA protocol timeout |
Either the booting node has lost connections to the remote system or
the remote system is no longer responding to requests made by the
booting system. In either case, the booting system has declared a
failure and will reestablish communications to a boot server.
|
|
|