Low swap freezes cluster

From: Chris Loken <cloken_at_cita.utoronto.ca>
Date: Thu, 20 Sep 2001 00:27:14 -0400

I've got 4 ES40s and a GS320 (V5.1) configured as a TruCluster. Each
ES40 has a reasonable amount of memory (4GB) plus swap (4GB) on its
local disk. The GS has way more (64GB of each). Each machine has its own
IP and connection to a switch plus we have the memory channel
interconnects.

Occasionally, a user gets carried away and tries to allocate a huge
chunk of memory on an ES40. Problem is that the whole cluster freezes-up
when a *single* ES40 is being hammered. All I know for sure is that the
affected ES40 complains about free swap being less than 10% and then I
can't ssh into *any* of the nodes until I reboot the problem machine.

Any ideas why the whole cluster seems to hang and how to avoid this?
Can't (easily) stop a user from allocating 10GB on a single machine but
why should this inhibit access to the others?

   Thanks,

       Chris
+ABg- cloken_at_cita.utoronto.ca
Received on Thu Sep 20 2001 - 04:29:21 NZST

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:42 NZDT