This is sort of a long post. I wanted to put in as much information as I
could. Please bear with me.
I have a problem with my DU 4.0D patch kit #2 machine (AlpahServer 1000A).
Transfers from the machine to another (on campus or off) will pause for one
minute in the middle of the transfer. Transfer to the DU4.0D machine do
not have this problem. This problem is not actually restricted to ftp
transfers. It has also been seen when downloading a large image over the
web.
The 4.0D machine has 192M memory. It does not seem to have a problem with
swap, I/O or CPU that I can detect. Although the free memory (from vmstat)
is sometimes low (<500) - not necessarily during a transfer. The system is
used mostly for web serving. Users do have interactive accounts. The
network interface is FDDI. The machine does have a 10/100 Ethernet, but it
is not used (ifconfig tuo down). I do not see any errors in
/var/adm/messages, syslog or uerf for FDDI.
For example,
Machine A - DU4.0D
Machine B - SUN Solaris 2.6
Login to B, using ftp get a file from A:
41082880 bytes received in 2.6e+02 seconds (153.57 Kbytes/s)
put the same file to A:
41082880 bytes sent in 94 seconds (425.30 Kbytes/s)
I can run tcpdump on machine A during the ftp get and see the 1 minute
hangup. I am not certain if A is waiting for B, or the other way around.
I am uncertain of how to interpret the tcpdump output. I was expecting to
see A>B then B>A, etc.
09:37:36.018067 A.20 > B.39194: . 1620601:1622061(1460) ack
1 win 32850 [tos 0x8]
09:37:36.018067 A.20 > B.39194: . 1622061:1623521(1460) ack
1 win 32850 [tos 0x8]
09:37:36.018067 A.20 > B.39194: . 1623521:1624981(1460) ack
1 win 32850 [tos 0x8]
09:37:36.044462 B.39194 > A.20: . ack 1603081 win 24820 (DF
)
09:37:36.044462 A.20 > B.39194: . 1624981:1626441(1460) ack
1 win 32850 [tos 0x8]
09:37:36.044462 A.20 > B.39194: . 1626441:1627901(1460) ack
1 win 32850 [tos 0x8]
09:37:36.055215 B.39194 > A.20: . ack 1604541 win 24820 (DF
)
09:37:36.055215 A.20 > B.39194: . 1627901:1629361(1460) ack
1 win 32850 [tos 0x8]
09:38:40.642783 A.20 > B.39194: . 1604541:1606001(1460) ack *** 1:04 lapse
1 win 32850 [tos 0x8]
09:38:40.642783 A.20 > B.39194: . 1606001:1607461(1460) ack
1 win 32850 [tos 0x8]
I have been trying to tune the system (using suggestions from the web
server tuning guide, sys_check, and this list) to alleviate this problem to
no avail. Current settings in /etc/sysconfigtab are:
proc:
max-proc-per-user = 512
max-threads-per-user = 4096
maxusers = 2048
max-per-proc-data-size = 10737418240
max-per-proc-address-space = 10737418240
vm:
ubc-maxpercent = 80
vm-page-free-min = 50
vm-page-free-target = 400
vm-mapentries = 20000
vm-ubcseqpercent = 20
vm-ubcseqstartpercent = 50
vm-vpagemax = 131072
vm-maxvas = 10737418240
inet:
tcbhashsize = 16384
pmtu_enabled = 0
tcp_rexmit_interval_min = 128
tcp_sendspace = 65536
tcp_recvspace = 65536
socket:
somaxconn = 65535
sominconn = 65535
vfs:
bufcache = 1
name-cache-hash-size = 512
advfs:
AdvfsCacheMaxPercent=8
AdvfsMaxDevQLen=16
Below is part of the netstat -s output. Is this a lot of retransmitted
packets? Netstat -i shows 7 input errors, no output errors or collisions
for FDDI.
tcp:
1957856 packets sent
1581188 data packets (1013666096 bytes)
12173 data packets (9676578 bytes) retransmitted
173193 ack-only packets (119733 delayed)
0 URG only packets
523 window probe packets
166348 window update packets
24441 control packets
At this point I am not sure if it is a problem with the system or the
network. Most people I have talked to think that it is not a problem with
the network. The last thing I tried was setting tcp_sendspace and
tcp_recvspace to 65536. This was suggested by the Compaq support. All
suggestions are very much welcome and needed.
Ellen Davis
Ellen.Davis_at_uc.edu
Received on Wed Feb 10 1999 - 17:13:00 NZDT