Dear All,
This weekend we had an air conditioning failure and the temperature in the 
machine room rose dramatically. The result was that every 18 minutes 
envmond shut down our Alpahaserver 4100, only for it to reboot. This 
carried on for about 27 hours until eventually (I presume) the hardware 
detected an error and refused to come up.
The machine was still extremely hot the following morning (about 12 hours 
later) so I'm not certain the power got turned off even then.
I am worried about the damage that might be caused by this, also the heat 
being generated which increases the problem for the other servers in the 
room. Is it possible to configure the system so that when a high 
temperature is detected the machine shuts down and powers itself off? In 
fact I don't mind if any kind of crash causes a power off, because tru64 
is so reliable we never get crashes (touch wood!).
Our relevant variables are:
# /usr/sbin/envconfig -q
ENVMON_CONFIGURED = 1
ENVMON_GRACE_PERIOD = 
ENVMON_MONITOR_PERIOD = 
ENVMON_HIGH_THRESH = 40
ENVMON_USER_SCRIPT = 
$ /sbin/consvar -l |grep -i boot
auto_action = BOOT
boot_dev = rz25
bootdef_dev = rz25
booted_dev = rz25
boot_file = 
booted_file = 
boot_osflags = A
booted_osflags = A
boot_reset = OFF
Regards,
Bob Vickers
Dept of Computer Science, Royal Holloway, University of London
Received on Mon Jun 05 2006 - 09:38:42 NZST