Hello guys,
I have a 2 node cluster of DS20e, running tru64 5.1A,
pk3 with oracle parallel server runnning on it.
One of our tester was testing the cluster, both nodes
were up and running fine. He powers off node1, instead
node2 should take over, it crashes badly..with a
dump..
I was able to reproduce the problem again...
Can anyone shed some light on it.....
I will summarize.
Here is the message before dump...
rm_state_change: mchan0 slot 0 offline
rm_lrail_remove_node: logical_rail 0 hubslot 0
MEMORY CHANNEL API - node 0 has left the cluster
MEMORY CHANNEL API - cleaning up after node 0
CNX MGR: communication error detected for node 1
CNX MGR: delay 1 secs 0 usecs
CNX QDISK: Cluster transition, releasing claim to 1
quorum disk vote.
CNX MGR: quorum lost, suspending cluster operations.
kch: suspending activity
dlm: suspending lock activity
CNX MGR: Reconfig operation complete
CNX MGR: membership configuration index: 5 (3
additions, 2 removals)
ics_mct: Node 1 is now down
CNX MGR: Node sdp1 1 incarn 0xb54bb csid 0x10001 has
been removed from the clust
er
CLSM Rebuild: starting...
dlm: resuming lock activity
kch: resuming activity
CNX QDISK: Successfully claimed quorum disk, adding 1
vote.
CNX MGR: quorum (re)gained, (re)starting cluster
operations.
clua: reconfiguring for member 1 down
CLSM Rebuild: initiated
CLSM Rebuild: completed
CLSM Rebuild: done.
Recovering filesystem mounted at / to this node
(member id 2)
Recovery to this node (member id 2) complete for
filesystem mounted at /
Recovering filesystem mounted at /backup_vol to this
node (member id 2)
Recovery to this node (member id 2) complete for
filesystem mounted at /backup_v
ol
Recovering filesystem mounted at /archive to this node
(member id 2)
Recovery to this node (member id 2) complete for
filesystem mounted at /archive
Recovering filesystem mounted at /var to this node
(member id 2)
Recovery to this node (member id 2) complete for
filesystem mounted at /var
Recovering filesystem mounted at /oracle to this node
(member id 2)
Recovery to this node (member id 2) complete for
filesystem mounted at /oracle
Recovering filesystem mounted at /usr to this node
(member id 2)
Recovery to this node (member id 2) complete for
filesystem mounted at /usr
Jan 28 23:30:50 sdp2 vmunix: lsm:volio: Subdisk
dsk14-01 block 16: Uncorrectable
read errorRecovering filesystem mounted at /oracle to
this node (member id 2)
Jan 28 23:30:53 CAAD[1049213]: Attempting to start
`cluster_lockd` on member `sd
p2`
28-Jan-2003 23:30:46 [700] SCSI event
28-Jan-2003 23:30:46 [700] SCSI event
28-Jan-2003 23:30:47 [700] SCSI event
trap: invalid memory read access from kernel mode
faulting virtual address: 0x0000000000000060
pc of faulting instruction: 0xfffffc00006b1d60
ra contents at time of fault: 0xfffffc00006b1ba0
sp contents at time of fault: 0xfffffe05446f7160
panic (cpu 1): kernel memory fault
dump device name: dsk3-06, num: 0x130004d, off:
0x3dd3e20, len: 0x5ed9cd
LSM attempting to dump to device at major 19 minor 77
dump device name: dsk10b-01, num: 0x13000b3, off: 0x0,
len: 0x5ed9cd
LSM attempting to dump to device at major 19 minor 179
DUMP: blocks available: 12432282
DUMP: blocks wanted: 298194 (partial compressed
dump) [OKAY]
DUMP: Device Disk Blocks Available
DUMP: ------ ---------------------
DUMP: 0x130004d 4666201 - 6216138 (of 6216139)
[primary swap]
DUMP.prom: Open: dev 0x51000c6, block 0: SCSI 1 7 0 3
300 0 0
DUMP: Writing header... [1024 bytes at dev 0x130004d,
block 71047147]
DUMP: Writing data........................... [27MB]
DUMP: Writing header... [1024 bytes at dev 0x130004d,
block 71047147]
DUMP: crash dump complete.
bit not set in any intr_enable reg
halted CPU 1
halted CPU 0
=====
Jay Nash
E-mail:mgn321_at_yahoo.com
__________________________________________________
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com
Received on Thu Jan 30 2003 - 14:50:29 NZDT