I've got 4 ES40s and a GS320 (V5.1) configured as a TruCluster. Each
ES40 has a reasonable amount of memory (4GB) plus swap (4GB) on its
local disk. The GS has way more (64GB of each). Each machine has its own
IP and connection to a switch plus we have the memory channel
interconnects.
Occasionally, a user gets carried away and tries to allocate a huge
chunk of memory on an ES40. Problem is that the whole cluster freezes-up
when a *single* ES40 is being hammered. All I know for sure is that the
affected ES40 complains about free swap being less than 10% and then I
can't ssh into *any* of the nodes until I reboot the problem machine.
Any ideas why the whole cluster seems to hang and how to avoid this?
Can't (easily) stop a user from allocating 10GB on a single machine but
why should this inhibit access to the others?
Thanks,
Chris
+ABg- cloken_at_cita.utoronto.ca
Received on Thu Sep 20 2001 - 04:29:21 NZST