Hardware: DEC Alpha Server 4100
Operating System: Digital Unix
Operating System Version: 4.0d
Hi,
We are testing a mission critical system that runs on a DEC Alpha Server
4100. We have noticed a problem where our applications block performing
disk IO for up to 5 seconds when "sync" runs. The sync daemon runs every 30
seconds kicked off by the "update" daemon.
Each of our applications do asynchronous disk I/O using UFS. Each of our
applications write to files that no other process accesses. We expect some
blocking of our applications, but it seems as though our applications
sometimes block for the entire time that sync is running. I would have
expected our applications to be blocked for only a fraction of the amount of
time sync is running. Below we have 2 tracebacks produced using LADEBUG on
core file our application generated when this problem was encountered.
What I'd like to understand is how are locks obtained when "sync" is run? I
suspect we wouldn't have this problem if sync obtained locks on a per file
basis and held them only long enough to force that file's dirty pages to
disk.
Any insight you can provide on whether how sync works would be appreciated.
Below are 2 tracebacks from where our applications were blocked when sync
was running.
Traceback #1
> Welcome to the Ladebug Debugger Version 4.0-43
> ------------------
> object file name:
> /usr/users/atmp/performance/informals_0504/pla.exe.reza
> core file name: opfdp02.pla.core.first_core
> Core file produced from executable pla.exe.reza3
> Thread terminated at PC 0x3ff800d41d8 by signal IOT
> >0 0x3ff800d41d8 in __write(0x38f6c37404857, 0x14003aeb8,
> 0x3633303030303030, 0x1400f3237, 0x11fffe1e0, 0x140081f20)
> DebugInformationStrippedFromFile70
> #1 0x3ff801361e0 in __write_nc(0x0, 0x1d, 0x14091d608, 0x146d,
> 0x38f6c37404857, 0x38f6c37404858)
DebugInformationStrippedFromFile638
> #2 0x3ff800e47f4 in _xflsbuf(0x3ff80187640, 0x3ffc00803b8,
> 0x12b3120, 0x0, 0x1d010a00000000, 0x3ff0000000000000)
> DebugInformationStrippedFromFile82
> #3 0x3ff8018763c in __fflush_unlocked(0x1, 0x102, 0x13,
> 0x11fffe3c0, 0x100000000, 0x0)
DebugInformationStrippedFromFile82
> #4 0x3ff8017fbe8 in __fseek_unlocked(0x3ff800ec970,
0x3ffc00803b8,
> 0x12b3120, 0x14010c320, 0x1, 0x13)
DebugInformationStrippedFromFile173
> #5 0x3ff800ec96c in fseek(0x1201cb4c0, 0x1408fa500, 0x12b3120,
> 0x100000000, 0x1408fa480, 0x10)
DebugInformationStrippedFromFile173
> #6 0x1201cb4bc in SeekToEnd(0x1408fa480, 0x10, 0x1201dda20,
> 0x1409160c0, 0x1201c6da0, 0x1409160c0)
> DebugInformationStrippedFromFile2094
> #7 0x3ff801361e0 in __write_nc(0x1201c6da0, 0x1409160c0, 0x1,
> 0x12bebf0, 0x12b3120, 0x1201c64a0)
DebugInformationStrippedFromFile638
> #8 0x1201dda1c in write(0x1, 0x12bebf0, 0x12b3120, 0x1201c64a0,
> 0x1201c6520, 0x1409160c0) DebugInformationStrippedFromFile2123
> #9 0x1201c651c in ins() DebugInformationStrippedFromFile2091
>
Traceback #2
> Core file produced from executable cod.exe.tracebac
> Thread terminated at PC 0x3ff800d41d8 by signal IOT
> >0 0x3ff800d41d8 in __write(0x140034ba0, 0x140097800,
0x100000000,
> 0x3ff0000000000000, 0x1, 0x1) DebugInformationStrippedFromFile70
> #1 0x3ff801361e0 in __write_nc(0x1201767b0, 0x10000001e,
> 0x1400c4a08, 0x15fa, 0x120179bf4, 0x0)
DebugInformationStrippedFromFile638
> #2 0x3ff800e47f4 in _xflsbuf(0x3ff80187640, 0x3ffc00803f0,
> 0xbc1480, 0x0, 0x1e010a00000000, 0x4)
DebugInformationStrippedFromFile82
> #3 0x3ff8018763c in __fflush_unlocked(0x7, 0x102, 0x10,
> 0x11fffe490, 0x0, 0x820f1902000a0001)
DebugInformationStrippedFromFile82
> #4 0x3ff8017fbe8 in __fseek_unlocked(0x3ff800ec970,
0x3ffc00803f0,
> 0xbc1480, 0x14033ed60, 0x7, 0x10)
DebugInformationStrippedFromFile173
> #5 0x3ff800ec96c in fseek(0x1201b8a00, 0x1400aa6c0, 0xbc1480,
> 0x100000000, 0x1400aa640, 0x10)
DebugInformationStrippedFromFile173
> #6 0x1201b89fc in operator >>(0x1400aa640, 0x10, 0x1201c7100,
> 0x1400b1980, 0x1201b42e0, 0x1400b1980)
> DebugInformationStrippedFromFile2084
> #7 0x3ff801361e0 in __write_nc(0x1201b42e0, 0x1400b1980, 0x1,
> 0xbc960c, 0xbc1480, 0x1201b39e0)
DebugInformationStrippedFromFile638
> #8 0x1201c70fc in getFreeSlot(0xbc1480, 0x1201b39e0,
0x1201b3a60,
> 0x1400b1980, 0x1, 0x0) DebugInformationStrippedFromFile2108
> #9 0x1201b3a5c in moveItLeft()
DebugInformationStrippedFromFile2081
>
>
Thanks,
Alan Moshel
alan.moshel_at_lmco.com
Alan
Alan Moshel
Software Architect - SkyLine Air Traffic Control
Lockheed Martin Air Traffic Management, Rockville, MD.
office: 870/2D12 phone: (301) 640-3109 fax: (301)640-2391
Received on Fri Jul 16 1999 - 19:29:45 NZST