SUMMARY: Client booting problem

From: Norris, Patrick <Patrick.Norris_at_trw.com>
Date: Tue, 25 Sep 2001 14:04:02 -0700

This is on the late side, but the summary may be of value to someone else someday.

I received two responses (Thank you both!):

First, Serguei Patchkovskii [patchkov_at_ucalgary.ca], who had seen this with corrupt firmware - which had me rechecking my settings, which was helpful. And, included the suggestion to run tcpdump, which helped narrow down the issue and solve a later, related problem.

Second, Dr. Blinn, who explained that it looked like a bug in tftp, and gave some additional details and background, which I will past below.

In the end, Dr. Blinn's info led us to find a workaround that takes care of the problem for us; we have adjusted the rate for the host task that was blocking the socket such that the incoming bootp request makes it through successfully.

Once again, thank you for the help.
-Patrick Norris
 patrick.norris_at_trw.com

---Dr. Blinn's response----
> > Have you seen this before, or do you maybe have an idea what it is?
>
> Yes, I believe it's a bug in the tftp daemon on the server system.
>
> > Is it possible that we're tying up sockets on the host that
> are needed for the client to do the network boot, and thu
> s the bootp request isn't serviced?
>
> Unlikely.
>
> > Is it possible that it's something as straightforward as
> tftp timing out on the bootp, only requiring some adjustment
> of settings?
>
> Yes. There is a timeout loop in the tftp daemon where under some
> circumstances, it can become convinced that it is no longer able
> go get packets to the client system, and it effectively "hangs up"
> the link, and there is NO simple workaround. Apparently, in your
> system's case, the load of the application (either on the system
> as a whole or on the network interface) is sufficient to trigger
> the logic in the tftp daemon that leads to the problem, and once
> it starts, you must, as you note, restart the boot procedure on
> the client system.
>
> > Any help would be greatly appreciated.
> >
> > -Patrick Norris
> > patrick.norris_at_trw.com
>
> I can't offer you much more help. I've seen this many times, it
> appears to be a subtle race interaction between the server and the
> client, I would argue that the client (the console firmware) isn't
> very robust, and that the server has never been debugged. There
> is no patch for this, and as far as I know, the bug has existed for
> a LONG time, and it's extremely unlikely to ever get fixed, because
> no one wants to agree to support DMS long term, and it mostly is a
> problem for DMS clients.
>
> Tom




> -----Original Message-----
> From: Norris (Non-TRW), Patrick
> Sent: Wednesday, August 29, 2001 1:44 PM
> To: tru64-unix-managers_at_ornl.gov
> Subject: Client booting problem
>
>
> Managers:
>
> I'm working with a pair of Alphas running 4.0D - one machine
> as the host and the other as the dms client, and we're
> experiencing a conflict of sorts.
>
> The problem shows up when I've got an Alpha running as a host
> and running our application code. (The application does some
> communications through sockets with applications that run on
> the client computers - the full environment has multiple
> clients, but the problem is reproducible with just the pair)
>
> While the application is running, if I try to boot the client
> Alpha, it never makes it up all the way. The following is a
> capture of the console while this is going on:
>
> PU 0 booting
>
> (boot ewa0.0.0.6.0 -flags a)
>
> Trying BOOTP boot.
>
> Broadcasting BOOTP Request...
> Received BOOTP Packet File Name is: /clients/dlnetlcf/.vmunix
> local inet address: 192.45.xx.xxy
> remote inet address: 192.45.xx.xxx
> TFTP Read File Name: /clients/dlnetlcf/.vmunix
> netmask = 255.255.255.0
> Server is on same subnet as client.
> ...Block FFFFFFF2 is not in any zone
> ........Block FFFFFFF2 is not in any zone
> ........Block FFFFFFF2 is not in any zone
> .......Block FFFFFFF2 is not in any zone
> ........Block FFFFFFF2 is not in any zone
> ........Block FFFFFFF2 is not in any zone
> ........Block FFFFFFF2 is not in any zone
>
> And it continues, but won't boot until the application
> running on the host is killed.
>
> Have you seen this before, or do you maybe have an idea what it is?
>
> Is it possible that we're tying up sockets on the host that
> are needed for the client to do the network boot, and thus
> the bootp request isn't serviced?
>
> Is it possible that it's something as straightforward as tftp
> timing out on the bootp, only requiring some adjustment of settings?
>
> Any help would be greatly appreciated.
>
> -Patrick Norris
> patrick.norris_at_trw.com
>
Received on Tue Sep 25 2001 - 21:04:59 NZST

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:42 NZDT