DMU clients hang without explanation

From: Roger Ruber <RUBER_at_tsl.uu.se>
Date: Fri, 03 Aug 2001 09:09:43 +0100 (MET)

                                                          Uppsala, 3-AUG-2001

    Hi,

    We have a Digital UNIX cluster consisting of some 30 nodes.
    One node is a DMU master, the other nodes boot from this node. Some
    of the nodes are used for CPU intensive calculations and I/O work and
    hang once a month or so without any trail in the error logs
    (uerf, /var/adm/messages, /var/adm/syslog.dated). The other nodes
    show no problems whatsoever.

    I suspect that it might be due to network access between the client
    nodes and the DMU master node. This maybe due to the CPU and I/O
    intensive jobs running on these nodes. Does anybody know if this is
    a correct guess, and if so, what would be the best way to try to
    improve the situation?

    The machines in question are:
    DMU master: DPWS 600au, running Digital UNIX 4.0E
    DMU client 1: DPWS 600au, 4.0E
               2: DS10 4.0F
               3: DS10 4.0F
               4: XP1000 4.0F
    The DMU master and clients 1, 2 and 3 are connected to the same Cisco
    XL3548 switch with full duplex 100Mb/s connections. The other DMU
    clients are connected to similar Cisco switches via a Gigabit backbone.
    The remaining nodes are DEC 3000/300 and AlphaStation 200 machines
    running Digital UNIX 4.0E and 4.0F. We have no problems with these
    machines, only with the clients #1-4.

    Thank you for your kind help,

                                   Roger Ruber.

  **************************************************************************
  * Roger Ruber, ruber_at_tsl.uu.se *
  * The Svedberg Laboratory, P.O. Box 533, S-75121 Uppsala, Sweden *
  * +46 - 18 - 471 3109 (telephone) (facsimile) +46 - 18 - 471 3833 *
  **************************************************************************
Received on Fri Aug 03 2001 - 07:10:39 NZST

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:42 NZDT