TCP socket errors - Gbit Ethernet - V4.0F

From: Maarten Litmaath <litmaath_at_fnal.gov>
Date: Tue, 23 Jan 2001 17:58:55 -0600

Hi,
we have two AlphaServers 4000 on V4.0F which besides 100 Mbps Ethernet
also have DEGPA-SA Gbps Ethernet cards (driver rev. 1.0.12, firmware rev.
11.3.2). We use them to send data over TCP to Linux nodes with 100 Mpbs
Ethernet interfaces. Usually this works fine, but occasionally (~< 1 %)
when the TCP connection has been successfully established (verified on
both ends) the first write() of 256 kB will return a short count, which
is always 136 kB. The other side, however, did not receive anything.
When the program on the AlphaServer then tries to write the remaining
120 kB, it gets an EPIPE (Broken pipe). The other side, however, did
not close the socket and just sits in a recv() waiting for data.
When the program on the AlphaServer exits, OSF1 does not send a FIN
packet to the other side, presumably because it thinks the other side
already broke the connection.

After scrutinizing for many days the program code on both ends we think
there must be a bug in some piece of the OS, most likely in the Gbit
driver code.

When data is being transferred via the Gbit interface, quite often we
observe messages like the following in /var/adm/messages:

Jan 19 04:29:33 xxx vmunix: NFS2 server yyy not responding still trying
Jan 19 04:29:37 xxx vmunix: NFS2 server yyy ok

Note that the time difference is only a few seconds.

Do these problems ring a bell? Have they been addressed in V5.1/V4.0G?
Thanks in advance for any tips. I will summarize helpful responses.
Best regards,
                Maarten <litmaath_at_fnal.gov> - Fermilab Computing Division
Received on Wed Jan 24 2001 - 00:00:07 NZDT

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:41 NZDT