Gb interface not "RUNNING"

From: Stu Fuess <fuess_at_fnal.gov>
Date: Tue, 20 Feb 2001 15:04:56 -0600

We're doing some heavy load testing on a Gb Ethernet (DEGPA-SA)
interface on a GS80 (Tru64 5.1 PK2, TruCluster 5.1, alt driver
V2.0.1). As part of the testing we're using tcpdump, so have
the interface in promiscuous, copyall mode (pfconfig +p +c).
The tests are mostly one-direction tcp file transfers.

For two nights in a row the test has halted because the interface
appears to be "hung". The symptoms are:

  Error messages of the sort in daemon.log (I've added the line
  wrap and replaced the real subnet by a.b.c):

    Feb 20 02:59:03 mynode gated[524835]: task_send_packet:
    task RIP.0.0.0.0+520 socket 12 length 24 flags MSG_DONTROUTE(4)
    to a.b.c.255+520: Network is down

  a ping to a node on the static route served by this interface
  gives:

    ping: sendto: Network is down
    ping: wrote (remote node) 64 chars, ret=-1

  and ifconfig on the interface gives:

    alt0: flags=c27<UP,BROADCAST,DEBUG,NOTRAILERS,MULTICAST,SIMPLEX>
    inet a.b.c.d netmask ffffff00 broadcast a.b.c.255 ipmtu 1500

    Note that the normal "RUNNING" flag is now missing!

The interface remains in the static routing table, unless taken
"DOWN" in which case the cluster software seems to redirect the
route. A quick ifconfig DOWN then UP doesn't change the state.
The only known cure for the non-RUNNING state is to reboot
the node. We haven't found any "usual" resource problems - full
file systems, no swap, etc. The machine seems perfectly fine
except for this one interface.

My suspicion is that some kernel or driver resource has been used
up, particularly in the promiscuous, copyall mode.

Has any one seen this before? Any ideas on the underlying cause?
Any ideas on how to recover short of rebooting?

Thanks for any help!

Stu Fuess
Fermi National Accelerator Laboratory
fuess_at_fnal.gov
Received on Tue Feb 20 2001 - 21:06:17 NZDT

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:41 NZDT