Hello DEC Unix experts,
We have a problem with application blocking in UNINTERRUPTIBLE state forever
inside an asynchronous I/O (aio_read/aio_write system call)
on a raw LSM volume of type GEN when this mirrored volume is in SYNC.
The volume consist of two RZ28 disks mirrored which become is state SYNC
because of a previous failure of the system.
The problem occurs on DEC 3000/600 running DEC OSF/1 V3.2A (Rev 17) with
LSM V1.2.
$ volprint -t -g sysdg
DG NAME GROUP-ID
DM NAME DEVICE TYPE PRIVLEN PUBLEN PUBPATH
V NAME USETYPE KSTATE STATE LENGTH READPOL PREFPLEX
PL NAME VOLUME KSTATE STATE LENGTH LAYOUT ST-WIDTH MODE
SD NAME PLEX PLOFFS DISKOFFS LENGTH DISK-NAME DEVICE
dg sysdg 799613465.1087.edcsys1
dm sysdg01 rz16 sliced 512 4109952 /dev/rrz16g
dm sysdg02 rz24 sliced 512 4109952 /dev/rrz24g
sd sysdg01-01 pl-01 0 0 524288 sysdg01 rz16
sd sysdg01-02 pl-01 524288 524288 524288 sysdg01 rz16
sd sysdg01-03 pl-01 1048576 1048576 524288 sysdg01 rz16
sd sysdg01-04 pl-01 1572864 1572864 524288 sysdg01 rz16
sd sysdg01-05 pl-01 2097152 2097152 524288 sysdg01 rz16
sd sysdg01-06 pl-03 0 2621440 524288 sysdg01 rz16
sd sysdg01-07 pl-05 0 3145728 61440 sysdg01 rz16
sd sysdg01-08 pl-07 0 3207168 409600 sysdg01 rz16
sd sysdg01-09 pl-09 0 3616768 493184 sysdg01 rz16
sd sysdg02-01 pl-02 0 0 524288 sysdg02 rz24
pl pl-01 vol01 ENABLED ACTIVE 2621440 CONCAT - RW
pl pl-02 vol01 ENABLED ACTIVE 2621440 CONCAT - RW
pl pl-03 vol02 ENABLED ACTIVE 524288 CONCAT - RW
pl pl-04 vol02 ENABLED ACTIVE 524288 CONCAT - RW
pl pl-05 vol03 ENABLED ACTIVE 61440 CONCAT - RW
pl pl-06 vol03 ENABLED ACTIVE 61440 CONCAT - RW
pl pl-07 edc_data ENABLED ACTIVE 409600 CONCAT - RW
pl pl-08 edc_data ENABLED TEMP 409600 CONCAT - WO
pl pl-09 edc_log ENABLED ACTIVE 493184 CONCAT - RW
pl pl-10 edc_log ENABLED TEMP 493184 CONCAT - WO
v edc_data fsgen ENABLED ACTIVE 409600 SELECT -
v edc_log gen ENABLED ACTIVE 493184 SELECT -
v vol01 fsgen ENABLED ACTIVE 2621440 SELECT -
v vol02 fsgen ENABLED ACTIVE 524288 SELECT -
v vol03 gen ENABLED SYNC 61440 SELECT -
The access to the volume vol03 will cause the application to block inside
the system call.
Using kdbx -k /vmunix /dev/mem, we get the following trace for the blocking
process:
> 0 thread_block() ["../../../../src/kernel/kern/sched_prim.c":1860, 0xfffffc0
00470f00]
1 mpsleep(0xfffffc00005767f0, 0x18, 0xffffffff00000800, 0x0, 0xffffffff831dc
48) ["../../../../src/kernel/bsd/kern_synch.c":434, 0xfffffc000043cc74]
2 event_wait(0x1400ce1b8, 0x1400ce1f0, 0x0, 0xffffffff83193680, 0x0) ["../..
../../src/kernel/kern/event.c":134, 0xfffffc000046c280]
3 biowait(bp = 0xffffffff831dcc60) ["../../../../src/kernel/vfs/vfs_bio.c":1
10, 0xfffffc000044f618]
4 volstrategy0(bp = 0xffffffff8319f6c0) ["../../../../src/kernel/vxvm/vol/vo
.c":1710, 0xfffffc00005727ec]
5 volstrategy(bp = 0x41242000) ["../../../../src/kernel/vxvm/vol/vol.c":1392
0xfffffc00005722dc]
6 aio_transfer(0xffffffff831b6210, 0xffffffff926678c8, 0xffffffff926678b8, 0
fffffc00004ed13c, 0x0) ["../../../../src/kernel/bsd/kern_aio.c":1763, 0xfffffc0
0023c490]
7 syscall(0x38000, 0x1, 0x1, 0x21, 0x200000c) ["../../../../src/kernel/arch/
lpha/syscall_trap.c":515, 0xfffffc00004eb814]
8 _Xsyscall(0x8, 0x120327ac4, 0x1400cf320, 0x40620820, 0x1) ["../../../../sr
/kernel/arch/alpha/locore.s":1086, 0xfffffc00004dbf84]
We report the problem since May-95 to DEC Customer Support but because the
application is the Sybase dataserver, DEC and Sybase play ping pong with the
problem. Somebody have any ideas what is going on, or if a workaround exist?
Thanks, Sylvain Gagnon
CAE Electronics Ldt.
gagnons_at_cae.ca
o
Received on Mon Aug 14 1995 - 17:40:15 NZST