device file strangeness in t64/trucluster v.5.1

From: Gunther Feuereisen <gunther_at_gfh.com.au>
Date: Fri, 08 Jun 2001 13:24:50 +1000

Hi Managers,

I've been observing "strange" things happening with devices files
lately under v5.1.

Our setup: t64 v5.1 + TruCluster v5.1 + Patch Kit 0002

Sometimes (intermittently) we find that from one node or another, we
no longer have access to a device. If we reboot - everything is fine.

e.g. If you try to do a disklabel to a disk, you get the message:

disklabel: dsk57: No such device or address

I've been experiencing this with disks (from an HSG80) or SCSI tapes
(from a DLT892).

I experienced this again today when I added two new disks. The
procedure I followed:

- Created stripe sets on HSG80

- Initialised stripes

- Added units

- On node 2, did a 'hwmgr -scan scsi'
  The disks were assigned as dsk57 and dsk58

- On node 1, did a 'hwmgr -scan scsi'
  I could see the LUNs, but the device names were 'unknown'

- On node 1, did a 'dsfmgr -K'
  I could now see the devices as dsk57 dsk58

- On node 2 did a 'disklabel -wr dsk57'
  Received the "No such device or address" message

- On node 2 did a 'disklabel -wr dsk58'
  Worked fine.

- On node 1 did a 'disklabel -wr dsk57'
  Worked fine.

- On node 1 did a 'disklabel -wr dsk58'
  Worked fine.

- On node 1 and 2, then tried a 'disklabel -r dsk57'
  Worked fine from 1, but got the "No such device or address" message
  on 2.

- On node 1 and 2, then tried a 'disklabel -r dsk58'
  Worked fine from both.

>From node 2, I tried doing a truss on the 'disklabel -r dsk57' command,
and the "interesing" part of the output came up with:

# truss disklabel -r dsk57
[ Output truncated ... ]
stat("dsk57c", 0x0000000140009DE0) = 0
open("dsk57c", O_RDONLY, 043777733540) Err#6 No such device or
address
disklabewrite(2, " d i s k l a b e", 8) = 8
l: write(2, " l : ", 3) = 3
getuid() = 0 [ 0 ]
getuid() = 0 [ 0 ]
getgid() = 1 [ 1 ]
getgroups(32, 0x000000011FFF9140) = 6
open("/usr/lib/nls/msg/C/libc.cat", O_RDONLY, 00) Err#2 No such file or
directory
getuid() = 0 [ 0 ]
open("/usr/share/.msg_conv-C", O_RDONLY, 01777777777760002723350) Err#2 No
such file or directory
dsk57write(2, " d s k 5 7", 5) = 5
: write(2, " : ", 2) = 2
No such device or addresswrite(2, " N o s u c h d e v i".., 25) = 25

write(2, "\n", 1) = 1
sigprocmask(SIG_BLOCK, 0xFFFFF137, 0x00000000) = 0
_exit(4)

Now, the files are there, checking from node 1:

# file /dev/rdisk/dsk57c
/dev/rdisk/dsk57c: character special (19/1109) SCSI #1 "HSG80" disk #2
(SCSI ID #0) (SCSI LUN #33)

but from node 2:

# file /dev/rdisk/dsk57c
/dev/rdisk/dsk57c: character special (19/1109)

All I'm left with is a reboot; but mydowntime window isn't for a couple
of days.

Has anyone else experienced this, or have any ideas why it is occuring?

Thanks in advance,
gunther
Received on Fri Jun 08 2001 - 03:27:27 NZST

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:42 NZDT