Super Slow Network

From: Jonathan Williams <jonathw_at_shubertorg.com>
Date: Wed, 08 Aug 2001 09:36:14 -0400

Ok...a little background first:
Each of our ES40 servers has 4 network cards: Two gigabit cards, and two 10/100 cards. They are configured with NetRAIN, so there are only two active cards at any given time (one gigabit and one 10/100). After talking with Compaq we decided to set the netrain configuartion in the /etc/inet.local file, rather than the rc.config file (something about 5.0a not supporting the gigabit cards with netrain in rc.config...but it works just fine with the inet.local). So in our inet.local file, there are these lines:
/sbin/ifconfig alt0 down
/sbin/ifconfig alt1 down
/sbin/ifconfig nr0 10.0.25.36 netmask 255.255.255.0 add alt0,alt1
/sbin/ifconfig nr1 10.0.21.36 netmask 255.255.255.0 add ee0,ee1

This is the configuration we have been using in all our servers for quite a while now. We even have this setup on our new 5.1 box, just so it's the same as the others--and it has been working great.
Now for the problem:
Yesterday our Network guys were making some changes to our production network--and wanted to do a test. They wanted to put the 5.1 box on a separate VLAN. Mostly to keep the production and development environments separated by routers and switches and whatever else they use. On my end, all I had to do was change the IP address of the machine, as well as the default gateway. I made these changes and they made their changes, and everything seemed fine. That is until we tried connecting to any of the production databases. This failed miserably. We use Informix databases, and we were unable to use any of the databases. I know this isn't an Informix group, but perhaps there is a guru or two out there anyway (but Informix isn't our only problem, so don't stop reading yet). In informix, we can bring a database online (oninit -v), but we can't run any applications against the database when connecting through a "soc" (ie dbaccess, onmonitor, etc). If we change the INFORMIXSERVER variable to point to the "shm" instead of a "soc" it works great. Any suggestions why we can't use a soc?
Anyway...we thought somehow the network changes were to blame, so we went back to our original configuration, and the network guys put the network back EXACTLY how it was. It didn't fix any problems with informix, and then we started to notice a different problem. Network slowness. Any files that are transferred to or from this system go at an extremely slow rate (about 10 minutes PER Megabyte (yes, megabyte). I can't figure out what the problem is--being all the settings are back how they were. I did notice two odd things that probably mean a lot to somebody out there. One is that when I do netstat -nr, it show all the routes, and they are correct--but usually it just says "Route Tree for Protocol Family 2", but now it says this PLUS it says "Route Tree for Protocol Family 26" right above it. There are no routes listed under this heading, but I'm not sure why it is there in the first place (none of the other systems have this). The second odd thing I noticed was with a
"netstat -I nr1", it shows the network statistics for the device, and this is what it shows:
Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll
nr1 1500 <Link> 2:50:8b:b9:e9:76 917698 0 802077 1318 5374
nr1 1500 DLI none 917698 0 802077 1318 5374
nr1 1500 10.0.21 imperial 917698 0 802078 1318 5374

I'm pretty sure that the outbound packet errors and the collisions are a bad thing. But for the life of my I can't tell why this is happening.

Sorry for such a long post, but I wanted to be as detailed as possible. Thanks for any info at all...

Jonathan Williams
UNIX Systems Administrator
The Shubert Organization, Inc.
Received on Wed Aug 08 2001 - 13:37:17 NZST

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:42 NZDT