From: Martin.Dusek@pregis.cz
Sent: Friday, September 17, 1999 12:09 AM
To: Hibberd, Jeremy
Subject: RE: System freeze - memory bottleneck

Hi Jeremy

we have similar configuration, i.e. Tru64 V4.0D, TruCluster v1.5 with
two AS4100 5/600 with 4GB memory, SAP R/3 3.0F, oracle 7.3.4 and 
later 8.0.5. We had hard problems with lack of memory, system very
often "hanged". There were two main problems:
First, everything nearly hanged during generation of huge files as core
       dumps or adding datafile to a tablespace. Nevertheless system
       doestn't fully freeze in these situations so we don't solve this.

Second, default UNIX kernel memory parameters are unusable with large
       memory usage. Main reasons:
       1. system begins to page when the number of free pages gets under
          vm-page-free-target (default 128), but this paging is very
          "slow" - not aggresive and it cannot keep pace with memory
          demands. So free pages get immediately under vm-page-free-min
       2. under vm-page-free-min (default 20) the paging becomes very
          aggressive but it is too late. Free memory otfen reaches 
          vm-page-free-reserved
       3. under vm-page-free-reserved (default 10) only privileged tasks
          cen get memory until it is freed

Another problem is with swapping - you should never allow that SAP
workprocesses are outswapped because it terribly increases response time
(in transaction st03 you can see wait times in hundreds of miliseconds).
You should set the memory parameters so as free memory is reached by
paging and not by swapping. Swapping is possible only for tasks which
are launched in the morning but then are sleeping all the time. This
is not the case of oracle processes os SAP workprocesses:
            dw.sapPP0_DVEBMGS02 pf=/usr/sap/PP0/SY...
            oraclePP0 (DESCRIPTI...

So now other reasons of memory problems:

       4. further, when free pages go under vm-page-free-swap (default
74),
          the "soft" swap begins - system outswaps all tasks that have
          been idle for 30 seconds or more. It is not the worse problem
          as mentioned above. But!
       5. when free pages go under vm-page-free-optimal (default 74, the
          same as vm-page-free-swap - don't ask me why. It does't give
          any sense) the hard swapping begins - so your SAP 
          workprocesses are doing nothing else than moving down and up
          between memory and swap disks. Typicall you can see hundreds
          or thousands page outs in vmstat.

Solution? Increase vm-page-free-target so as the VM system has anough
time
to free nonactive pages. Increase vm-page-free-min so the system has 
enough time to free memory by aggressive paging and never (NEVER!!) goes
under vm-page-free-reserved. Increase vm-page-free-swap but set it much
lower than vm-page-free-target so as the system first pages, than
swaps. Set vm-page-free-swap higher than vm-page-free-optimal so as the
hardswap "never" begins but the soft swap is possible (long idle tasks
can be outswapped). It is on you if you allow idle task swapping or not.
In my parameters the swapping never (or nearly never) begins because
vm-page-free-min=192 and vm-page-free-swap=128. You can change this.

How to do it?
Create file /etc/sapr3.stanza.vm with contents:

vm:
        vm-page-prewrite-target=384
        vm-page-free-target=512
        vm-page-free-min=192
        vm-page-free-swap=128
        vm-page-free-optimal=74
        vm-page-free-hardswap=2048

cd /etc
sysconfig -m -f ./sapr3.stanza.vm

Then I relinked the kernel by doconfig but I'm not sure if it is
necessary.
Perhaps rebooting the system could be enough.

Dont' be afraid to increase all parameters mush more. Mr. Kejzlar from
Pilsen University has set the limits to thousands and it works.

You can find a good manual at:
http://www.unix.digital.com/faqs/publications/base_doc/DOCUMENTATION/V40
D_HTML/V40D_HTML/AQ0R3FTE/CHVMXXXX.HTM


Regards,
Martin

_______________________________________________
Martin Dusek             martin.dusek@pregis.cz
IT BC departement manager  tel: +420-428-359571
PREGIS a.s., Smetanova 45       +420-602-437235
46621 Jablonec nad Nisou   fax: +420-428-317844
Czech Republic

============================================================================

From: Partin, Kevin S [Kevin.Partin@SW.Boeing.com]
Sent: Friday, September 17, 1999 6:37 AM
To: Hibberd, Jeremy
Subject: RE: System freeze - memory bottleneck

I have seen this problem on a PW433au running 4.0D. What is happening is
that all of the swap space is being used up. Once the machine runs out of
swap space, it will 'freeze' until the process causing the problem finishes
or crashes. The 'freeze' is the machine not being able get any free memory
pages. You are probably running with strict swap allocation. In reality, you
are probably not using any, or much, swap space, but the strict allocation
policy will allocate all swap space even though nothing is swapping.

In my situation, it was a very badly behaving X-Window application, with a
huge memory requirement, that caused by system to freeze.

Kevin
------------------------------------------
Kevin S. Partin
The Boeing Company
13100 Space Center Blvd.
Mail Code: JHOU-2230
Houston, TX 77059
Phone: 281-244-4088
Pager: 713-549-0713
Facsimile: 281-244-4984
Email: mailto:kevin.s.partin@boeing.com

============================================================================

One really needs to take a look at your sysconfigtab settings.  If you are
using defaults, I would guess that that the large report uses up all the
UBC.  This forces out all the other processes.  As Oracle is using shared
memory segments they all get swapped out and nothing can run.  It's amazing
you can't even get a login shell though.  It definitely sounds as if you are
getting thrashing of the UBC.  Take a look at your ubc settings and the
section on memory tuning on the system and admin guide under
http://www.unix.digital.com. You might want to monitor the UBC with dbx -k
/vmunix /dev/mem from a high priority root shell.  That should help get
output while the problem is occurring.


Marco
Marco Luchini
Unix support
Acco-UK

============================================================================

From: Davis, Alan [Davis@Tessco.Com]
Sent: Friday, September 17, 1999 6:23 AM
To: Hibberd, Jeremy
Subject: RE: System freeze - memory bottleneck

Without more performance info it's hard to tell, but my first guess is that
you should look at vm parameter ubc-maxpercent and related parameters.

This type of freeze can be seen when a single process takes the majority of
physical ram for i/o buffering.

By reducing ubc-maxpercent you limit the amount of ram stolen by i/o buffers
and keep enough memory to run the other processes.

Alan Davis

============================================================================

You don't have enough physical memory for your workload when the large report
is running.  Once it starts running and hogging all available physical memory
all the pages for everything else in user space quickly gets paged out, and in
particular all the presently inactive stuff gets paged out.  Once that happens
it's extremely unlikely that anything will get paged in enough to actually run
for long until the hog goes away.

You need to tune the system to limit the working set of this huge application
or you need to get more physical memory.  Buying more memory may turn out to
be both easier in the short run and cheaper in the long run; attempting to
tune a multi-user system to deliver good performance when it has to also work
with very demanding applications is a thankless and hopeless task.

Tom
 
 Dr. Thomas P. Blinn + UNIX Software Group + Compaq Computer Corporation
  110 Spit Brook Road, MS ZKO3-2/W17   Nashua, New Hampshire 03062-2698
   Technology Partnership Engineering           Phone:  (603) 884-0646
    Internet: tpb@zk3.dec.com           Digital's Easynet: alpha::tpb
     ACM Member: tpblinn@acm.org         PC@Home: tom@felines.mv.net

======================================================================

From: alan@nabeth.cxo.dec.com
Sent: Friday, September 17, 1999 10:09 AM
To: Hibberd, Jeremy
Subject: Re: System freeze - memory bottleneck 


	The report generation is probably doing a lot of sequential
	I/O, which causes the file system to go into read-ahead
	mode.  This quickly fills the buffer cache with data from
	the file being read, and flushes most everything else out.
	Dirty data will be written, may cause a short burst of I/O.
	Since the buffer cache uses the same memory as the processes,
	the kernel will try to free up more memory for the cache,
	which causes the high paging load.

	Part of the freeze is probably related the high I/O load.
	If you're using a small number of medium speed devices for
	the page/swap space, it can take a long time to page a
	few GB of memory.  If the Oracle data space is pagable,
	it is probably a good candidate for paging out.  Paging it
	back will when the report is through the file(s), will be
	another long I/O load.

	You might try running sys_check to see if can offer any
	advise on how to tune the virtual memory subsystem.  The
	V4.0D documentation should have a tuning guide that can
	help explain what the suggestions are doing.