A couple of weeks ago I sent out a query on some mysterious "simultaneous
attempts to allocate walkid" messages I was getting while running my
application. Dr. Tom Blinn put me in touch with Randy Lowell, the developer
in charge of the OSF/1 loader. Here's what he had to say:
>I did recently recieve a QAR on the problem you
>discussed in your mail to Dr. Blinn. I responded
>to the QAR and asked the CSC to have you retest
>with the most recent loader patch. There are a
>lot of patches to the V2.1 loader for problems
>that were identified and fixed in V3.0 or V3.2.
>
>The walkid problem has shown up before. The
>"walkid" is a pseudo lock that the loader "allocates"
>before walking the dependency graph of a process.
>It is used to mark nodes in the dependency graph
>to prevent infinite recursion along circular paths.
>The walkid is "freed" when the graph walk is
>complete.
>
>The error message you're seeing is printed when
>the loader attempts to allocate a second walkid
>before freeing the first one.
>
>In a single-threaded application this shouldn't
>be possible, but all of the previous reports of
>this failure were for single-threaded applications.
>In those cases, I determined that the loader was
>being interrupted by a signal while walking the
>dependency graph. The signal invoked a signal
>handler which, in turn, called exit(), which calls
>back into the loader to run termination routines.
>Before running termination routines, the loader
>allocates a walkid, resulting in the warning message.
>
>A multi-threaded application has more of an
>opportunity to trigger this warning. There are
>recursive locks protecting loader calls from
>simultaneous access by multiple threads. In V2.1
>these locks are in loader routines accessed
>through libc_r.so. If, by some accident of linking
>or symbol-resolution, you have a multi-threaded
>application which resolves its loader calls from
>libc.so instead of libc_r.so the locks would be
>compromised and "simultaneous walkid" errors
>could easily result.
>
>In addition to improperly ordered dependencies,
>lazy-text symbol resolution could cause multiple-thread
>access to loader routines. We've guarded against this
>by loading multiple-thread routines with immediate
>binding forced on. This is done, once again, through
>libc_r.so. In this case a libc_r.so initialization routine
>executes a loader call which forces immediate binding.
>If this call, for some as yet undetermined reason,
>failed to resolve all lazy text symbols, it could
>explain the loader warnings.
After much additional searching, we discovered that my problem was caused
by a non-reentrant version of dlopen in libc_r.so. I got a patched library,
and the errors went away. Randy is looking into making the patched library
available through customer support.
Thanks again to Randy and to Dr. Blinn.
- Brad
Received on Thu Feb 09 1995 - 13:46:35 NZDT