Hi Managers,
I've been observing "strange" things happening with devices files
lately under v5.1.
Our setup: t64 v5.1 + TruCluster v5.1 + Patch Kit 0002
Sometimes (intermittently) we find that from one node or another, we
no longer have access to a device. If we reboot - everything is fine.
e.g. If you try to do a disklabel to a disk, you get the message:
disklabel: dsk57: No such device or address
I've been experiencing this with disks (from an HSG80) or SCSI tapes
(from a DLT892).
I experienced this again today when I added two new disks. The
procedure I followed:
- Created stripe sets on HSG80
- Initialised stripes
- Added units
- On node 2, did a 'hwmgr -scan scsi'
The disks were assigned as dsk57 and dsk58
- On node 1, did a 'hwmgr -scan scsi'
I could see the LUNs, but the device names were 'unknown'
- On node 1, did a 'dsfmgr -K'
I could now see the devices as dsk57 dsk58
- On node 2 did a 'disklabel -wr dsk57'
Received the "No such device or address" message
- On node 2 did a 'disklabel -wr dsk58'
Worked fine.
- On node 1 did a 'disklabel -wr dsk57'
Worked fine.
- On node 1 did a 'disklabel -wr dsk58'
Worked fine.
- On node 1 and 2, then tried a 'disklabel -r dsk57'
Worked fine from 1, but got the "No such device or address" message
on 2.
- On node 1 and 2, then tried a 'disklabel -r dsk58'
Worked fine from both.
>From node 2, I tried doing a truss on the 'disklabel -r dsk57' command,
and the "interesing" part of the output came up with:
# truss disklabel -r dsk57
[ Output truncated ... ]
stat("dsk57c", 0x0000000140009DE0) = 0
open("dsk57c", O_RDONLY, 043777733540) Err#6 No such device or
address
disklabewrite(2, " d i s k l a b e", 8) = 8
l: write(2, " l : ", 3) = 3
getuid() = 0 [ 0 ]
getuid() = 0 [ 0 ]
getgid() = 1 [ 1 ]
getgroups(32, 0x000000011FFF9140) = 6
open("/usr/lib/nls/msg/C/libc.cat", O_RDONLY, 00) Err#2 No such file or
directory
getuid() = 0 [ 0 ]
open("/usr/share/.msg_conv-C", O_RDONLY, 01777777777760002723350) Err#2 No
such file or directory
dsk57write(2, " d s k 5 7", 5) = 5
: write(2, " : ", 2) = 2
No such device or addresswrite(2, " N o s u c h d e v i".., 25) = 25
write(2, "\n", 1) = 1
sigprocmask(SIG_BLOCK, 0xFFFFF137, 0x00000000) = 0
_exit(4)
Now, the files are there, checking from node 1:
# file /dev/rdisk/dsk57c
/dev/rdisk/dsk57c: character special (19/1109) SCSI #1 "HSG80" disk #2
(SCSI ID #0) (SCSI LUN #33)
but from node 2:
# file /dev/rdisk/dsk57c
/dev/rdisk/dsk57c: character special (19/1109)
All I'm left with is a reboot; but mydowntime window isn't for a couple
of days.
Has anyone else experienced this, or have any ideas why it is occuring?
Thanks in advance,
gunther
Received on Fri Jun 08 2001 - 03:27:27 NZST