<BACKGROUND>
I work for team developing and administering distributed application
(DEC CXX 6+Tuxedo 6.4+Oracle 7.3+Digital Unix 4.0b/c/d). The application
is now installed on
about twenty Alphas - from 1000 to 8200 and works :-). Or at least
worked :-(.
</BACKGROUND>
<PROBLEM>
Recently we found, that on two machines (Digital Unix 4.0b) the
application sometimes mysteriously (but fully repeatably - we know the
method of reproducing the problem) crashes. Analyzing core we found sth
like:
(gdb) where
#0 0x3ff800e7a30 in symlink ()
#1 0x3ff80197920 in tis_lock_global ()
(gdb) info f
Stack level 0, frame at 0x11fffe8b0:
pc = 0x3ff800e7a30 in symlink; saved pc 0x3ff80197920
called by frame at 0x12001e8b0
Arglist at 0x11fffe880, args:
Locals at 0x11fffe8b0, Previous frame's sp is 0x11fffe8b0
(gdb) x/2s 0x11fffe880
0x11fffe880: "\210\004\bĀ˙\003"
0x11fffe887: ""
(the gdb output - apart from addresses - is the same for each core we
receive).
It seems to me, that sth wrong happens during execution of the symlink
function called
from tis_lock_global. Those functions are not called by our code (they
must be called by Tuxedo or Oracle).
We are rather sure, that the problem is not caused by our application -
it worked correctly and unchanged on the mentioned machines for some
time, the problems occured after last reboot. There were some
administrative changes (we are not able to check them fully but there
were recently installed patches OSFPAT00013601410, OSFPAT00050800410).
We also have similar machines where the application works correctly
(also on the "killing" data).
I would appreciate any ideas:
- what else can I do to diagnose the problem (I installed debug version
of my application and caught core)
- why can symlink function be called with such a strange parameters (and
for what)
- may be there is some well known problem with the OS version I use
Thanks in advance.
Marcin Kasperski
</PROBLEM>
Received on Wed Sep 16 1998 - 17:07:31 NZST