In my original posting I described the situation that had brought my
system to a clinch the only exit from which had been turning the system
off... I was worried how to prevent the system from such things for the
future.
Thanks to:
Peter Chapin <pchapin_at_twilight.vtc.vsc.edu>
Alan Rollow <alan_at_nabeth.cxo.dec.com>
Steve VanDevender <stevev_at_hexadecimal.uoregon.edu>
whose advices will help me to avoid such troubles in the perspective.
Peter Chapin invokes ulimit command for ordinary users in /etc/profile to
limit data and stack segment sizes as well as CPU time and core file size.
It does prevent an ordinary user from accidently running a program that
tries to acquire all system resources.
Alan Rollow thinks that the only cure on such an old and limited system
is to get more physical memory or limit the amount of virtual memory that
can be used and that changing the process limits for data size may also
have an affect.
Having looked through the documentation on Tuning Subsystem and appropriate
manuals, I've arrived at a conclusion that I should tune the following
system parameters:
vm:
vm-maxvas = ...
proc:
max-per-proc-address-space = ...
per-proc-address-space = ...
max-per-proc-data-size = ...
per-proc-data-size = ...
This may be made using "sysconfig" facility that maintains the kernel
subsystem configuration. As it occured, now max-per-proc-address-space,
per-proc-address-space, and max-per-proc-data-size, each is equal
1073741824, i.e. 1 Gbyte, whereas the swap space is only 160 Mbyte and
memory equals 56936 Kbyte (64 Mbyte - the memory occupied by the kernel)
which is almost 5 times less (as it was from the very beginning in our FIS).
Steve VanDevender finds that we should use lazy swapping mode instead of
immediate.
The full answers and my original mail are beneath my signature.
Thanks again,
Irene
*************************************************************************
* *
* Irene A. Shilikhina e-mail: irene_at_alpha.iae.nsk.su *
* System administrator, *
* Institute of Automation & Electrometry, *
* Siberian Branch of Russian Academy of Sciences, *
* Novosibirsk, Russia *
*
http://www.iae.nsk.su/~irene *
*************************************************************************
* * *
* The road to hell is paved with * Every cloud has a silver lining. *
* good intentions. * *
* * *
*************************************************************************
From pchapin_at_twilight.vtc.vsc.eduWed Dec 2 15:56:12 1998
Date: Tue, 1 Dec 1998 06:41:17 -0500 (EST)
From: Peter Chapin <pchapin_at_twilight.vtc.vsc.edu>
To: "Irene A. Shilikhina" <irene_at_alpha.iae.nsk.su>
Subject: Re: System paralysed due to "Unable to obtain requested swap space"
On Tue, 1 Dec 1998, Irene A. Shilikhina wrote:
> - why didn't the system refuse the first task having created troubles with
> swap space (immediate mode!) or didn't drop any process itself in such
> situation? According to time stamps, the problem arose in starting the
> first program;
I can't answer this one.
> - how can I avoid it in the future?
Avoiding "denial of service" attacks -- which this essentially was (even
though the user was not malicious) -- is generally difficult. On my system
I invoke ulimits on ordinary users in /etc/profile like so:
# Set up process limits. These soft limits can be increased by the
# user. They are here only to prevent run-away programs from
# killing the system.
ulimit -Sd 20480 # 20 MByte data segment size.
ulimit -Ss 20480 # 20 MByte stack segment size.
ulimit -St 3600 # 1 Hour of CPU time.
ulimit -Sc 2048 # 1 MByte core file size.
ulimit -Sn 1024 # 1024 open files.
This, of course, only works for users who are using a shell that
understands the ulimit command (such as ksh or bash). Also it does not
provide any real security since the user can change these limits. However,
it does prevent an ordinary user from accidently running a program that
tries to acquire all system resources. Since my users are students
learning to program, I deem this method to be a useful safeguard. In fact,
it has often caused buggy student programs to be killed long before the
system has suffered. I think this method has been quite effective for us
at this site.
Note: This technique does not stop a user from running a large number of
small programs. It won't prevent a malicious attack -- just a few types of
accidents.
Peter
pchapin_at_twilight.vtc.vsc.edu
http://twilight.vtc.vsc.edu/~pchapin/
*****************************************************************************
From alan_at_nabeth.cxo.dec.comWed Dec 2 15:55:49 1998
Date: Tue, 1 Dec 1998 07:58:47 -0700
From: "Alan Rollow - Dr. File System's Home for Wayward Inodes."
<alan_at_nabeth.cxo.dec.com>
To: irene_at_alpha.iae.nsk.su
Subject: Re: System paralysed due to "Unable to obtain requested swap space"
It is more likely that the system is slow because you don't have
enough physical memory to support 160 MB of virtual memory that
processes are trying to use. The reservation of page and swap
space doesn't require any disk I/O. The use of that page/swap
space does. The noise the disk is making could easily be a
heavy I/O load trying to page and swap.
The 10% and related messages suggest that there are processes
trying to use even more virtual memory and failing. If they
were successful, the performance problem would be even worse.
You can use vmstat(1) to look at the paging performance counters
to see if how much paging is going on. Look at the number of
page-outs and swap-ins and swap-outs.
The task related message probably indicates that something is
trying to start lots of processes and failing because the
proc table or thread table isn't large enough. More processes
or threads will try to use even more virtual memory, which
will probably increase the frequency of running out of page
and swap space.
The only cure for paging performance problems on such an old
and limited system is to get more physical memory or limit
the amount of virtual memory that can be used. On systems
that can support multiple I/O adapters, there might be some
small benefit to splitting the paging I/O across multiple
busses and devices. But, it is always better to not page,
than to page more quickly. To limit the amount of virtual
memory that can be used, you'd want to lower amount of page
and swap space.
Changing the process limits for data size may also have an
affect. Some programs see how much virtual memory they can
use and then simply use that much. A more limited data size
would force such programs to be smaller.
*****************************************************************************
From stevev_at_hexadecimal.uoregon.eduWed Dec 2 15:56:21 1998
Date: Tue, 1 Dec 1998 10:50:04 -0800 (PST)
From: Steve VanDevender <stevev_at_hexadecimal.uoregon.edu>
To: "Irene A. Shilikhina" <irene_at_alpha.iae.nsk.su>
Subject: System paralysed due to "Unable to obtain requested swap space"
If you have a very small swap partition, you should use "lazy"
swapping instead of "conservative" swapping. "Conservative"
swapping reserves swap space for the full virtual address space
of each process, even if the process does not swap pages out.
"Lazy" swapping only allocates swap space for pages that are
swapped out, and also tends to perform better on a busy machine.
You can remove or rename the /sbin/swapdefault symbolic link,
then reboot, to change from conservative to lazy swap mode.
******************************************************************************
On Tue, 1 Dec 1998, Irene A. Shilikhina wrote:
>
> This morning when I came to my Alpha a bitter surprise was waiting for me...
> At once, I noticed a particular sound produced by a disk.
> All the dxconsole window was full of such messages:
>
> Dec 1 09:01:10 alpha vmunix: swap space below 10 percent free
> Dec 1 09:01:11 alpha vmunix: Unable to obtain requested swap space
>
> (I have to tell at once that we have an immediate swap mode since our
> only swap partition is 160 Mbytes).
>
> I wasn't able to get a prompt in my DECterm but at last managed to log in
> from an alphabetical terminal, take a look at the processes requiring the
> system resources most and execute "su" (though it was awful!). Nevertheless,
> I FAILED in trying to kill any of these processes having got nothing but
> these messages plus another one:
>
> Dec 1 09:17:11 alpha vmunix: fork/procdup: task_create failed. Code: 0x6
>
> I struggled against the situation for around 20 mins but to no avail...
> Nothing remained for me (in my understanding) but a drastic step - trying
> sync (without success either) and ... pressing Off switch...
>
> For THIS TIME, the things have settled without losses though I'm conscious of
> all possible damage as result of such an act, while I'd like to keep out of
> danger.
>
> My analysis showed the user whose activity was the direct source of the
> trouble - the time of the first system messages is coincident with starting
> (last night!) a number of copies of a computation task requiring great
> resources. (I've already had a conversation with him instructing to do
> the polite with respect to the system...) Well, what warries me much more
> is what I can do to avoid such things. For the moment, we cannot afford
> additional swap partition. So, I have two questions:
>
> - why didn't the system refuse the first task having created troubles with
> swap space (immediate mode!) or didn't drop any process itself in such
> situation? According to time stamps, the problem arose in starting the
> first program;
> - how can I avoid it in the future?
>
> Additional information:
> DU 3.2c and DEC 2000 model 300 (yes, they are old, I know).
>
> Thanks,
> Irene
>
> P.S. Of course, I know meaning of all these messages, and it's not the first
> time that we experience the deficiency of swap space, but the matter is
> why it came to a clinch.
Received on Wed Dec 02 1998 - 12:28:10 NZDT