 |
» |
|
|
 |
Ask the Wizard Questions
Trace a lock in cluster
The Question is:
How do I find a who has locks mastered in a clustered environment. I
would like to find the PID for the use who has that particular record
locked
The Answer is:
DECamds can often be useful here. or you can do it by hand:
Finding the lock master isn't so hard, but finding the lock name for a
particular record will be! If you can make a process attempt to read
the locked record with the RAB$V_WAT flag (or whatever the equivalent
in your favorite language is), that process will be queued against the
lock and you can determine its name with the SDA SHOW PROCESS/LOCK
command. Once there, the following article explains how to trace the
holder of the lock:
[OpenVMS] How To Trace A Hung Lock Request On A Clustered System
COPYRIGHT (c) 1988, 1993 by Digital Equipment Corporation.
ALL RIGHTS RESERVED. No distribution except as provided under contract.
Copyright (c) Digital Equipment Corporation 1988, 1994. All rights reserved
PRODUCT: VMScluster Software for OpenVMS AXP
VAXcluster Software for OpenVMS VAX
COMPONENT: Lock Manager
SOURCE: Digital Customer Support Center
OVERVIEW:
This article explains how to use the SDA Utility to trace hung lock
requests on systems in a VAXcluster environment. If you have a single
node environment, you may want to reference another article in the
database on how to trace a hung lock request on a non-clustered
system, as the procedure is simplified on a single node.
BACKGROUND:
A process may hang in LEF or RWAST waiting to complete a lock request
(ENQ / ENQW). That lock request may be blocked by another process
currently holding the lock with an incompatible mode.
The following procedure illustrates how to locate the process holding
the lock with an incompatible mode. When the process is found you can
either delete the competing process, or work on isolating the
coordination problem and rewriting the programs to utilize other
synchronization techniques, such as Blocking ASTs.
More information on locks can be found in the "VAX/VMS System Services
Reference Manual" and the "VAX/VMS Internals And Data Structures"
Manual.
PROCEDURE SUMMARY:
The following steps and example should guide you through the isolation
of the process that is blocking a lock request. A summary of the steps
appears first:
o Identify the hung process from SDA [Steps 1,2,3]
o Locate the hung process's Lock Block [4]
o From the Lock Block locate the Resource Block; [5]
to do this check to see if the lock is a:
'Process copy', go to Step [6]
or a
'Local copy', go to Step [7]
o 'Process copy' locks mean the Master Resource [6]
Block is located on another node and must be
viewed from that node.
a. Record the 'remote' Lock ID
b. Identify and log into the node in question
c. Display the Lock Block on this node and verify
it is the one in question.
o Use the Lock ID to show the Master Resource Block. [7]
o From the Resource Block locate the Blocking Lock. [8]
If the Blocking Lock is a:
'Master copy of lock', goto Step [9]
or a
'Local copy', goto Step [10]
o 'Master copy of lock' indicates this lock is [9]
actually on another node. Identify the remote
Lock ID, the node. Log into that node. Display
and verify the Lock information on that node.
o Display the Lock ID on the node the Blocking Lock [10]
is on and verify the lock Resource Name and Mode.
From the Blocking Lock locate the blocking process.
DETAILED PROCEDURE:
1) Identify which process is hung waiting for a lock request. In
this example, assume that process M_MORREN, in system SALES, is
hung waiting for a lock request.
2) Invoke SDA on the node with the hung process and locate that
process's INDEX number:
Sales$ SET PROCESS/PRIVILEGE=CMKRNL ! needed privilege to
run SDA
Sales$ ANALYZE/SYSTEM ! invoke SDA to look
around
VAX/VMS System analyzer
SDA> SHOW SUMMARY ! print summary output
Current process summary
-----------------------
Extended Indx Process name Username State Pri PCB PHD
-- PID -- ---- --------------- ----------- ------- --- -------- -------
20400101 0001 SWAPPER HIB 16 80197F98 80197E0
20400106 0006 ERRFMT SYSTEM HIB 7 803F6440 80BE4E0
20400107 0007 CACHE_SERVER SYSTEM HIB 16 803FCF50 80D2E80
. . .
20400666 0066 BRUCE_1 B_ARMER HIB 4 80467150 81631E0
+--->20400767 0067 M_MORREN M_MORREN LEF 7 80479710 81B5860
| 20400768 0068 SMITH SMITH LEF 4 80479EA0 81C3420
|
+--------This is the hung process in this example, its process index
is 0067, from the second column.
If you use the command SHOW SUMMARY/IMAGE, you also see the name
of the image the process is executing.
3) Set your process index to the blocked process in SDA and VERIFY
you have the correct process - either by process name or by what
image the process is running (as shown in SHOW PROC/CHANNEL).
SDA> SHOW PROCESS/CHANNEL/IND=67 ! set up process index
and view image
Process index: 0067 Name: M_MORREN Extended PID: 20400767
-------------------------------------------------------------
Process active channels
-----------------------
Channel Window Status Device/file accessed
------- ------ ------ --------------------
0010 00000000 GREAT$DUA10:
+--->0020 8080E920 GREAT$DUA10:[M_MORREN]LOCK_C.EXE
| 0030 808C7420 GREAT$DUS100:[SYSE.SYSCOMMON.SYSL
| 0040 00000000 RTA2:
| 0050 00000000 RTA2:
| 0060 808C7AE0 GREAT$DUS100:[SYSE.SYSCOMMON.SYSL
|
+--------- The image the process is executing is usually one of the
first few channels. In this example, it is running the
image LOCK_C.EXE.
4) Look at what locks that process currently has outstanding. This
is done with the "SHOW PROC/LOCK" command. The locks a process
has are ordered such that the locks the process is waiting for
are near the end of the lock list - so you may have to go through
many locks to get to the locks that are blocked by another
process.
Any lock that says 'Granted at' you can ignore as that lock
request has already been completed. If no locks are waiting then
the process may not be waiting on a lock or the lock it was
waiting on has since been granted.
A blocked lock will say either 'Waiting for' or 'Converting
to'. In this example, the lock is 'Waiting for' a new ENQW
request. ----------------------------------------------------------+
|
NOTE: |
This example is from a Pre OpenVMS VAX V5.4 system. |
The lock display on Post OpenVMS VAX V5.4 systems has |
been modified and the status text, e.g.; 'Waiting for' |
or 'Converting to', is located on its own individual |
line in the middle of the display. |
|
SDA> SHOW PROCESS/LOCK |
|
Process index: 0067 Name: M_MORREN Extended PID: 20400767 |
------------------------------------------------------------- |
Lock data: |
|
Lock id: 005107E3 PID: 00070067 Flags: |
Par. id: 00000000 Waiting for EX <-----------------+
Sublocks: 0
LKB: 80867F00
Resource: 414A5F45 4C505041 APPLE_JA Status: ASYNC
Length 30 20202020 20204B43 CK
User mode 20202020 20202020
System 00002020 20202020 ..
Process copy of lock 01CD0145 on system 00010003 <--------------+
|
5) You have now identified the Blocked Lock. The next step is to |
identify the Resource Block for that lock. The resource block |
could exist on this node or another node in the cluster. To tell |
where the Master Resource Block is located look at the text at the |
end of the lock displayed in Step 4. ------------------------------+
If the text says 'Process copy of lock xxx on system yyy' then
the Master Resource Block is located on another system and you
must go to that system to get more information. If this is the
case, go to Step 6.
If the text says 'Local copy', then this is the system with the
lock and you can use this Lock Id on this node for Step 7.
6) 'Process copy of lock xxx on system yyy' indicates the lock
exists on another node in the cluster. Take note of the
following fields:
a. Lock Id on the remote node - in this example it is 01CD0145
b. System Id - in this example is is 00010003
c. Resource Name - in this example it is APPLE_JACK
To identify the node the lock is on with the System Id you can
enter "SHOW CLUSTER" and examine both the Node name and the CSID
number:
SDA> SHOW CLUSTER
VAXcluster data structures
--------------------------
--- VAXcluster Summary ---
Quorum Votes Quorum Disk Votes Status Summary
------ ----- ----------------- --------------
2 3 6553 quorum
--- CSB list ---
Address Node CSID Votes State Status
------- ---- ---- ----- ----- ------
807DFF40 FRANK 00010007 0 open member,qf_noaccess
8071E640 SAM 00010001 1 open member,qf_noaccess
8071D9F0 HAL <-+-> 00010003 1 local member,qf_same
8071E700 SALES | 00010002 1 open member,qf_noaccess
|
+-----------------+
|
+-In this example HAL is node 00010003 and is thus the node with the
Master Resource Block.
NOTE: To go from CSID to cluster member display in SDA, do a
$ SHOW CLUSTER/CSID=
This saves time looking for the correct node on a large
cluster. You can also abbreviate the CSID to just the low
order word index (similar to how SET and SHOW PROCESS use
the index of the PID).
You must now log onto that node and enter SDA to examine the Lock
on the node with Master Resource Block.
Hal$ SET PROCESS/PRIVILEGE=CMKRNL
Hal$ ANALYZE/SYSTEM
VAX/VMS System analyzer
SDA> SHOW LOCK 1CD0145 ! check the lock out
Lock database
-------------
Lock id: 01CD0145 PID: 00000000 Flags:
Par. id: 00000000 Waiting for EX
Sublocks: 0
LKB: 80F34200
Resource: 414A5F45 4C505041 APPLE_JA Status: ASYNC MSTCPY
Length 30 20202020 20204B43 CK
User mode 20202020 20202020
Group 022 00002020 20202020 ..
Master copy of lock 005107E3 on system 00010002
You should now verify that you have the correct lock:
a. Lock Id is 01CD0145, which matches
b. Resource Name is APPLE_JACK, which matches
c. Lock Mode is still 'Waiting for EX', which matches
d. The last line has 'Master copy of lock' shows the correct
Lock Id and System Id from the node you just came from.
Now that you know you have the correct lock you can continue with
Step 7.
7) The next step is to print out the Master Resource Block and
identify the Blocking Lock. To do this enter "SHOW
RESOURCE/LOCK=" from SDA, taking the from the
'Lock id:' field in the lock display.
Once the resource block is displayed, the blocking lock will be
found in the 'Granted Queue' and will have a lock mode
incompatible with the lock mode we are requesting. In the
following example the is taken from the display in
Step 6, 01CD0145.
SDA> SHOW RESOURCE/LOCK=01CD0145
Resource database
-----------------
Address of RSB: 80BF2810 Group grant mode: EX
Parent RSB: 00000000 Conversion grant mode: EX
Sub-RSB count: 0 BLKAST count: 0
Value block: 00000000 00000000 00000000 00000000 Seq. #: 00000000
Resource: 414A5F45 4C505041 APPLE_JA
Length 30 20202020 20204B43 CK CSID: 00000000
User mode 20202020 20202020 Directory entry
Group 022 00002020 20202020 ..
Granted queue (Lock ID / Gr mode):
1---> 05210A82 EX <-------This is the blocking
lock request and
Conversion queue (Lock ID / Gr/Rq mode): its 'Lock Id' is
2---> 032500B4 NL/EX 05210A82.
Waiting queue (Lock ID / Rq mode):
3---> 01CD0145 EX
NOTE 1: This is the lock that is blocking our request. This
lock is granted at EXclusive access and there is another
lock also waiting to get access to the resource (Note 2).
NOTE 2: Another process doing a conversion request from NL
(null) to EX (exclusive) is also blocked.
NOTE 3: This is our lock request from node SALES in the 'Waiting
Queue'.
8) The Blocking Lock has been identified, now you must see if the
process owning that lock is on this node or another node in the
cluster. Display the lock information using the SDA SHOW LOCK
command:
SDA> SHOW LOCK 5210A82
Lock database
-------------
Lock id: 05210A82 PID: 00000000 Flags:
Par. id: 00000000 Granted at EX
Sublocks: 0
LKB: 80FC7CE0
Resource: 414A5F45 4C505041 APPLE_JA Status: MSTCPY
Length 30 20202020 20204B43 CK
User mode 20202020 20202020
Group 022 00002020 20202020 ..
Master copy of lock 028605D7 on system 00010001
If the last line of text says it is a 'Local Copy' then you are
on the correct node to get the process owning the lock and can go
directly to Step 10.
If the last line of text says it is a 'Master copy of lock' then
the lock exists on another node and you must get onto that node
to locate the blocking process. If this is the case, go to Step 9.
9) 'Master copy of lock xxx on system yyy' indicates that the
blocking lock exists on another node in the cluster. Take note
of the following fields:
a. Lock Id on the remote node, in this example 28605D7
b. System Id of the remote node, in this case 00010001
c. Resource Name, in this case APPLE_JACK.
To identify the node the lock is on with the System Id you can
enter "SHOW CLUSTER" and examine both the Node name and the CSID
number.
SDA> SHOW CLUSTER
VAXcluster data structures
--------------------------
--- VAXcluster Summary ---
Quorum Votes Quorum Disk Votes Status Summary
------ ----- ----------------- --------------
2 3 6553 quorum
--- CSB list ---
Address Node CSID Votes State Status
------- ---- ---- ----- ----- ------
807DFF40 FRANK 00010007 0 open member,qf_noaccess
8071E640 SAM <-+-> 00010001 1 open member,qf_noaccess
8071D9F0 HAL | 00010003 1 local member,qf_same
8071E700 SALES | 00010002 1 open member,qf_noaccess
|
+---------------+
|
+---The node holding the lock in this case is SAM as its CSID
(System Id) matches that of the lock.
You must now log onto that node and enter SDA to examine the Lock
on the node with Master Resource Block.
Sam$ SET PROCESS/PRIVILEGE=CMKRNL
Sam$ ANALYZE/SYSTEM
VAX/VMS System analyzer
SDA> SHOW LOCK 28605D7 ! check the lock out
Process index: 005C Name: M_MORREN Extended PID: 202002DC
-------------------------------------------------------------
Lock data:
Lock id: 028605D7 PID: 0005005C Flags:
Par. id: 00000000 Granted at EX
Sublocks: 0
LKB: 808748C0
Resource: 414A5F45 4C505041 APPLE_JA Status:
Length 30 20202020 20204B43 CK
User mode 20202020 20202020
System 00002020 20202020 ..
Process copy of lock 05210A82 on system 00010003
You should now verify that you have the correct lock:
a. Lock Id is 028605d7, which matches
b. Resource Name is APPLE_JACK, which matches
c. Lock Mode is 'Granted at EX', this is the blocking lock
d. The last line has 'Process copy of lock' and shows the
Lock Id and System Id from the 'Mastering' node.
Now that you have the correct lock, you can continue to Step 10.
10) To see the process holding the incompatible lock, you use the PID
field as an index number on the system which the process exists
on. In this example, we will use the Lock Block from Step 9,
with the Lock Id of 028605d7 and PID of 005c (we only use the
lower four hex digits).
Hopefully either the process name holding the lock, or the image
it is executing, will give some clue as to why the process has
kept the lock. In this case LOCK_A.EXE has taken out an
EXclusive lock on the resource APPLE_JACK and is now waiting for
terminal input.
SDA> SHOW PROCESS/CHANNEL/INDEX=5C
Process index: 005C Name: M_MORREN Extended PID: 202002DC
-------------------------------------------------------------
Process active channels
-----------------------
Channel Window Status Device/file accessed
------- ------ ------ --------------------
0010 00000000 GREAT$DUA10:
0020 80879000 GREAT$DUA10:[M_MORREN]LOCK_A.EX
0030 8081B5E0 GREAT$DUS100:[SYSE.SYSCOMMON.SY
0040 00000000 Busy RTA1:
0050 00000000 RTA1:
0060 8081BFA0 GREAT$DUS100:[SYSE.SYSCOMMON.SY
|