---- * "Receive failures should not lead to a panic. Nevertheless talk with the switch administrators. They can check the port your server is attached to. Maybe the port runs at half duplex!?" * "Start by checking the cable, reset the netstat counters with the netstat -z command Check the errors on the switch, check for errors on the binary.errlog to make sure its not the card ;) Check the messages file." * "Looks like a bug in the "tu" (network interface) driver. The return address is pointing to an instruction in the interrupt service routine and that has apparently called a routine that helps service an "input" (receive) packet, but there is apparently a bug, which probably has resulted in register s1 containing an invalid value. If you look at the value that's the faulting virtual address, it looks just like a valid kernel address that's shifted 4 bits to the right -- compare the bit pattern to, say, the PC or the RA or for that matter the stack pointer. So, some code has apparently managed to put this trashed address into register s1 (if you had a register dump from the crash I am pretty sure that's what you'd find in s1), and the trick is to figure out what code path got you to this point with that invalid value in that register, because the kernel isn't allowed to read from that bad address. That's what's causing the panic. Since you say that the interface in question gets a lot of errors, I'd not be surprised if the bug is in an error handling code path. If you can fix the cause of the errors, then perhaps the bug will not be seen any more on your system with your current software. This assumes there is actually a problem with the interface. Some number of errors on Ethernet are common, the software is supposed to deal with them. On the other hand, any "tulip" hardware is getting long in the tooth, and there may be problem with cables or with some other part of the network to which this particular interface is connected. Reseating cables and perhaps replacing the interface hardware if it's not built in might resolve the problem. Or it might not, if it's some other piece of gear sending bad data, for example. There may be an updated version of the "tu.mod" file that's compatible with your patch level, or there may be a later patch kit that has a module that fixes the bug. If there is not, you will need to get what little is left of the HP support team to figure out the bug and give you a patched module. I doubt you will make much progress debugging this without sources and a way to reproduce the problem" * "I'd start with the obvious things like cables, switch port negotiation etc. However, since the system is actually crashing I'm guessing you've got some marginal hardware. I'd be looking to swap out the card if possible. Any chance of simply disabling it as a first? Anything in the binary errorlog analysers? If the hardware is on it's way out then DECevent, Compaq Analyse or whatever it's called these days might have a record of what's going wrong." Judith Reed Service delivery manager Navisite, Inc. 125 Elwood Davis Rd. Syracuse, NY 13212 315-453-2912 x5835 www.navisite.comReceived on Wed Aug 16 2006 - 12:45:37 NZST
This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:45 NZDT