Hello OSF managers,
I have a fairly complex socket problem with OSF/1. This is my last port of
call before I give up, to see if anyone has seen or heard of anything
similar.
As a bit of background, I have a large system typically running 100 or so
processes communicating internally with message queues and externally via
internet sockets. The problem processes, are the ones responsible for
talking to the outside world because they have to poll both message queues
and sockets and so they tend to waste lots of cpu( there can be up to 40 of
these). The solution to this was to use UNIX datagram sockets instead of
message queues and use the select system call instead of polling.
On OSF/1 ver 3.0 this worked fine. Unfortunately on OSF/1 ver3.2 and above I
have no end of problems. There are three types of problems:
1) The select call becomes very unreliable and starts returning when the
file descriptors it was monitoring had not undergone a change. I have
implemented a version of the system using the poll system call and it
exhibited exactly the same problem.
2) Extraneous data appears on the datagram socket. Everytime I start up the
system I see five bytes FE FF FF FF FF on the datagram socket. I know
nothing had written to it, so how these five bytes got there I do not know.
3) When I shut down the child processes the parent locks up and goes into an
uninteruptible state. You cannot dbx or kill -9 it, you have to shut down
the entire system.
All of the above problems are reproducible every time. I have also written
small test systems using all the above socket functionality and they exhibit
no problems.
Can anyone help?
Richard
Received on Wed Mar 27 1996 - 16:45:41 NZST