Hello !
We had the cryptic message
set_pmap_memdsc_state: start 0x0 end 0x50da cl 0xffffffffffd52d90
while booting our ES40 Model 1 EV67 667 MHz Tru 5.1-733. SRM 5.8-43
Our specialist for all weird cases (as always: Dr. Tom P.Blinn; many thanks !)
found out that this is a debug message of the OS when marking untested memory
pages
used by the kernel as clean. This may results in faults when the trusted
memory is actually damaged. The cause of the message was setting the console
variable 'memory_test' to 'partial' to save time while booting (we have 8 G,
so that saves about 2 minutes). The message does not appear on our old ES40/EV6
Tru V5.0, maybe because kernel debugging is not enabled. Afaik kernel debugging
can be switched off with the dxkerneltuner tool.
The temporary processor corrected error checks have disappeared after swapping
memory boards of two machines. Seems that this problem was caused by bad seated
Dimms or similar contact problems.
Here are the details provided by Dr.Blinn:
==================================================================================
These are the comments in the routine named set_pmap_memdsc_state():
/*
* set_pmap_memdsc_state
*
* This function is intended to deal with situations in which memory
* below 'vm_managed_pages' has not been tested, but may be in use.
* We want to mark these pages as good so that any future requests to test
* the page will return good and will not actually test the page. If we
* were to test one of these pages, we may overwrite whatever data the
* kernel had placed there.
*
* This function will not be necessary once the kernel and bootstrap code
* can handle the testing of ALL pages, and not have to wait until the cpusw[]
* array has been initialized.
*/
The message is coming out of this code:
#if 1
printf("set_pmap_memdsc_state: start %P end %P cl %P\n",
start, end, cl);
#endif
It looks to me like it was put there by a developer for debugging and just
never got removed.
It also looks like under some circumstances you may be running kernel code
out of untested memory, and if that's really what's happening and the memory
is flaky, you'll probably see panics.
If you omit "kernel debugging" you won't tie up memory reading the kernel's
symbol table into memory.
For what it's worth, the relevant routine seems to have been in the kernel
since about 1996 with no record of any changes.
If your console firmware supports forcing a full memory test (instead of
a partial test) before boot, I'd recommend you use that.
Tom
--
Dr. Udo Grabowski email: udo.grabowski_at_imk.fzk.de
Institut f. Meteorologie und Klimaforschung II, Forschungszentrum Karslruhe
Postfach 3640, D-76021 Karlsruhe, Germany Tel: (+49) 7247 82-6026
http://www.fzk.de/imk/imk2/ame/grabowski/ Fax: " -6141
Received on Tue Feb 13 2001 - 10:53:50 NZDT