/*
** COPYRIGHT (c) 1998 BY COMPAQ COMPUTER CORPORATION ALL RIGHTS RESERVED.
**
** THIS SOFTWARE IS FURNISHED UNDER A LICENSE AND MAY BE USED AND COPIED
** ONLY  IN  ACCORDANCE  OF  THE  TERMS  OF  SUCH  LICENSE  AND WITH THE
** INCLUSION OF THE ABOVE COPYRIGHT NOTICE. THIS SOFTWARE OR  ANY  OTHER
** COPIES THEREOF MAY NOT BE PROVIDED OR OTHERWISE MADE AVAILABLE TO ANY
** OTHER PERSON.  NO TITLE TO AND  OWNERSHIP OF THE  SOFTWARE IS  HEREBY
** TRANSFERRED.
**
** THE INFORMATION IN THIS SOFTWARE IS  SUBJECT TO CHANGE WITHOUT NOTICE
** AND  SHOULD  NOT  BE  CONSTRUED  AS A COMMITMENT BY COMPAQ COMPUTER
** CORPORATION.
**
** COMPAQ ASSUMES NO RESPONSIBILITY FOR THE USE  OR  RELIABILITY OF ITS
** SOFTWARE ON EQUIPMENT WHICH IS NOT SUPPLIED BY COMPAQ OR DIGITAL.
**
**=====================================================================
** WARNING - This example is provided for instructional and demo
**           purposes only.  The resulting program should not be
**           run on systems which make use of soft-affinity
**           features of OpenVMS, or while running applications
**           which are tuned for precise processor configurations.
**           We are continuing to explore enhancements such as this
**           program which will be refined and integrated into
**           future releases of OpenVMS.
**=====================================================================
**
** GCU$BALANCER.C - OpenVMS Galaxy CPU Load Balancer.
**
** This is an example of a privileged application which dynamically
** reassigns CPU resources among instances in an OpenVMS Galaxy.  The
** program must be run on each participating instance. Each image will
** create, or map to, a small shared memory section and periodically
** post information regarding the depth of that instances' COM queues.
** Based upon running averages of this data, each instance will
** determine the most, and least busy instance.  If these factors
** exist for a specified duration, the least busy instance having
** available secondary processors, will reassign one of its processors
** to the most busy instance, thereby effectively balancing processor
** usage across the OpenVMS Galaxy.  The program provides command line
** arguments to allow tuning of the load balancing algorithm.
** The program is admittedly shy on error handling.
**
** This program uses the following OpenVMS Galaxy system services:
**
**      SYS$CPU_TRANSITION   - CPU reassignment
**      SYS$CRMPSC_GDZRO_64  - Shared memory creation
**      SYS$SET_SYSTEM_EVENT - OpenVMS Galaxy event notification
**      SYS$*_GALAXY_LOCK_*  - OpenVMS Galaxy locking
**
** Since OpenVMS Galaxy resources are always reassigned via a "push"
** model, where only the owner instance can release its resources,
** one copy of this process must run on each instance in the OpenVMS
** Galaxy.
**
** ENVIRONMENT: OpenVMS V7.2 Multiple-instance Galaxy.
**
** REQUIRED PRIVILEGES:  CMKRNL required to count CPU queues
**                       SHMEM  required to map shared memory
**
** BUILD/COPY INSTRUCTIONS:
**
** Compile and link the example program as described below, or copy the
** precompiled image found in SYS$EXAMPLES:GCU$BALANCER.EXE to
** SYS$COMMON:[SYSEXE]GCU$BALANCER.EXE
**
** If your OpenVMS Galaxy instances utilize individual system disks,
** you will need to do the above for each instance.
**
** If you change the example program, compile and link it as follows:
**
**   $ CC GCU$BALANCER.C+SYS$LIBRARY:SYS$LIB_C/LIBRARY
**   $ LINK/SYSEXE GCU$BALANCER
**
** STARTUP OPTIONS:
**
** You must establish a DCL command for this program.  We have provided a
** sample command table file for this purpose. To install the new
** command, do the following:
**
**    $ SET COMMAND/TABLE=SYS$LIBRARY:DCLTABLES -
**      /OUT=SYS$COMMON:[SYSLIB]DCLTABLES GCU$BALANCER.CLD
**
** This command inserts the new command definition into DCLTABLES.EXE
** in your common system directory.  The new command tables will take
** effect when the system is rebooted.  If you would like to avoid a
** reboot, do the following:
**
**    $ INSTALL REPLACE SYS$COMMON:[SYSLIB]DCLTABLES.EXE
**
** After this command, you will need to log out, then log back in to
** use the command from any active processes.  Alternatively, if you
** would like to avoid logging out, do the following from each process
** you would like to run the balancer from:
**
**    $ SET COMMAND GCU$BALANCER.CLD
**
** Once your command has been established, you may use the various
** command line parameters to control the balancer algorithm.
**
**    $ CONFIGURE BALANCER{/STATISTICS} x y time
**
** Where: "x" is the number of load samples to take.
**        "y" is the number of queued processes required to trigger
**            resource reassignment.
**        "time" is the delta time between load sampling.
**
** The /STATISTICS qualifier causes the program to display a
** continuous status line.  This is useful for tuning the parameters.
** This output is not visible if the balancer is run detached, as is
** the case if it is invoked via the GCU.  It is intended to be used
** only when the balancer is invoked directly from DCL in a DECterm
** window.
**
** For example: $ CONFIG BAL 3 1 00:00:05.00
**
**        Starts the balancer which samples the system load every
**        5 seconds.  After 3 samples, if the instance has one or
**        more processes in the COM queue, a resource (CPU)
**        reassignment will occur, giving this instance another CPU.
**
** GCU STARTUP:
**
** The GCU provides a menu item for launching SYS$SYSTEM:GCU$BALANCER.EXE
** and a dialog for altering the balancer algorithm.  These features will
** only work if the balancer image is properly installed as described
** the the following paragraphs.
**
** To use the GCU-resident balancer startup option, you must:
**
** 1) Compile, link, or copy the balancer image as described previously.
** 2) Invoke the GCU via: $ CONFIGURE GALAXY   You may need to set your
**    DECwindows display to a suitably configured workstation or PC.
** 3) Select the "CPU Balancer" entry from the "Galaxy" menu.
** 4) Select appropriate values for your system.  This may take some
**    testing.  By default, the values are set aggressively so that
**    the balancer action can be readily observed.  If your system is
**    very heavily loaded, you will need to increase the values
**    accordingly to avoid excessive resource reassignment.  The GCU
**    does not currently save these values, so you may want to write
**    them down once you are satisfied.
** 5) Select the instance/s you wish to have participate, then select
**    the "Start" function, then press OK.  The GCU should launch the
**    process GCU$BALANCER on all selected instances.  You may want to
**    verify these processes have been started.
**
** SHUTDOWN WARNING:
**
** In an OpenVMS Galaxy, no process may have shared memory mapped on an
** instance when it leaves the Galaxy, as during a shutdown. Because of
** this, SYS$MANAGER:SYSHUTDWN.COM must be modified to stop the process
** if the GCU$BALANCER program is run from a SYSTEM UIC.  Processes in the
** SYSTEM UIC group are not terminated by SHUTDOWN.COM when shutting down
** or rebooting OpenVMS. If a process still has shared memory mapped when
** an instance leaves the Galaxy, the instance will crash with a
** GLXSHUTSHMEM bugcheck.
**
** To make this work, SYS$MANAGER:SYSHUTDWN.COM must stop the process as
** shown in the example below.  Alternatively, the process can be run
** under a suitably privileged, non-SYSTEM UIC.
**
** SYSHUTDWN.COM EXAMPLE - Paste into SYS$MANAGER:SYSHUTDWN.COM
**
**    $!
**    $! If the GCU$BALANCER image is running, stop it to release shmem.
**    $!
**    $ procctx = f$context("process",ctx,"prcnam","GCU$BALANCER","eql")
**    $ procid  = f$pid(ctx)
**    $ if procid .NES. "" then $ stop/id='procid'
**
** Note, you could also just do a "$ STOP GCU$BALANCER" statement.
**
** OUTPUTS:
**
**    If the logical name GCU$BALANCER_VERIFY is defined, notify the
**    SYSTEM account when CPUs are reassigned.  If the /STATISTICS
**    qualifier is specified, a status line is continually displayed,
**    but only when run directly from the command line.
**
** REVISION HISTORY:
**
** 02-Dec-1998 Greatly improved instructions.
** 03-Nov-1998 Improved instructions.
** 24-Sep-1998 Initial code example and integration with GCU.
*/
#include <brkdef>
#include <builtins>
#include <cstdef>
#include <descrip>
#include <glockdef>
#include <ints>
#include <pdscdef>
#include <psldef>
#include <secdef>
#include <ssdef>
#include <starlet>
#include <stdio>
#include <stdlib>
#include <string>
#include <syidef>
#include <sysevtdef>
#include <vadef>
#include <vms_macros>
#include <cpudef>
#include <iosbdef.h>
#include <efndef.h>
/* For CLI */
#include <cli$routines.h>
#include <chfdef.h>
#include <climsgdef.h>
#define HEARTBEAT_RESTART     0 /* Flags for synchronization            */
#define HEARTBEAT_ALIVE       1
#define HEARTBEAT_TRANSPLANT  2
#define GLOCK_TIMEOUT    100000 /* Sanity check, max time holding gLock */
#define _failed(x) (!( (x) & 1) )
$DESCRIPTOR(system_dsc, "SYSTEM");       /* Brkthru account name   */
$DESCRIPTOR(gblsec_dsc, "GCU$BALANCER"); /* Global section name    */
struct  SYI_ITEM_LIST {              /* $GETSYI item list format */
  short buflen,item;
  void *buffer,*length;
};
/* System information and an item list to use with $GETSYI */
static unsigned long total_cpus;
static uint64   partition_id;
static long     max_instances = 32;
iosb            g_iosb;
struct SYI_ITEM_LIST syi_itemlist[3] = {
     {sizeof (long), SYI$_ACTIVECPU_CNT,&total_cpus,  0},
     {sizeof (long), SYI$_PARTITION_ID, &partition_id,0},
     {0,0,0,0}};
extern uint32 *SCH$AQ_COMH;          /* Scheduler COM queue address */
unsigned long PAGESIZE;              /* Alpha page size             */
uint64        glock_table_handle;    /* Galaxy lock table handle    */
/*
** Shared Memory layout (64-bit words):
** ====================================
** 0  to  n-1:  Busy count, where 100 = 1 process in a CPU queue
** n  to  2n-1: Heartbeat (status) for each instance
** 2n to  3n-1: Current CPU count on each instance
** 3n to  4n-1: Galaxy lock handles for modifying heartbeats
**
** where n = max_instances * sizeof(long).
**
** We assume the entire table (easily) fits in two Alpha pages.
*/
/* Shared memory pointers must be declared volatile */
volatile uint64  gs_va = 0;          /* Shmem section address     */
volatile uint64  gs_length = 0;      /* Shmem section length      */
volatile uint64 *gLocks;             /* Pointers to gLock handles */
volatile uint64 *busycnt,*heartbeat,*cpucount;
/*********************************************************************/
/* FUNCTION init_lock_tables - Map to the Galaxy locking table and   */
/* create locks if needed. Place the lock handles in a shared memory */
/* region, so all processes can access the locks.                    */
/*                                                                   */
/* ENVIRONMENT: Requires SHMEM and CMKRNL to create tables.          */
/* INPUTS:      None.                                                */
/* OUTPUTS:     Any errors from lock table creation.                 */
/*********************************************************************/
int init_lock_tables (void)
{
    int status,i;
    unsigned long sanity;
    uint64 handle;
    unsigned int min_size, max_size;
/* Lock table names are 15-byte padded values, unique across a Galaxy.*/
   char table_name[] = "GCU_BAL_GLOCK  ";
/* Lock names are 15-byte padded values, but need not be unique. */
   char lock_name[] = "GCU_BAL_LOCK   ";
/* Get the size of a Galaxy lock */
 status = sys$get_galaxy_lock_size(&min_size,&max_size);
 if (_failed(status) ) return (status);
    /*
    ** Create or map to a process space Galaxy lock table. We assume
    ** one page is enough to hold the locks. This will work for up
    ** to 128 instances.
    */
    status = sys$create_galaxy_lock_table(table_name,PSL$C_USER,
                PAGESIZE,GLCKTBL$C_PROCESS,0,min_size,&glock_table_handle);
    if (_failed(status) ) return (status);
    /*
    ** Success case 1: SS$_CREATED
    ** We created the table, so  populate it with locks and
    ** write the handles to shared memory so the other partitions
    ** can access them. Only one instance can receive SS$_CREATED
    ** for a given lock table; all other mappers will get SS$_NORMAL.
    */
    if (status == SS$_CREATED)
    {
      printf ("%%GCU$BALANCER-I-CRELOCK, Creating G-locks\n");
      for (i=0; i<max_instances>pdsc$q_entry[0];
   sub_addr[1] = sub_addr[0] + PAGESIZE;
   if (__PAL_PROBER( (void *)sub_addr[0],sizeof(int),PSL$C_USER) != 0)
        sub_addr[1] = sub_addr[0];
   status = sys$lkwset(sub_addr,locked_code,PSL$C_USER);
   if (_failed(status) ) exit(status);
}
/*********************************************************************/
/* FUNCTION reassign_a_cpu - Reassign a single CPU to another        */
/* instance.                                                         */
/*                                                                   */
/* ENVIRONMENT: Requires CMKRNL privilege.                           */
/* INPUTS:      most_busy_id: partition ID of destination.           */
/* OUTPUTS:     None.                                                */
/*                                                                   */
/* Donate one CPU at a time - then wait for the remote instance to   */
/* reset its heartbeat and recalculate its load.                     */
/*********************************************************************/
void reassign_a_cpu(int most_busy_id)
{
  int status,i;
  static char op_msg[255];
  static char iname_msg[1];
  $DESCRIPTOR(op_dsc,op_msg);
  $DESCRIPTOR(iname_dsc,"");
  iname_dsc.dsc$w_length = 0;
  /* Update CPU info */
  status = sys$getsyiw(EFN$C_ENF,0,0,&syi_itemlist, &g_iosb,0,0);
  if (_failed(status) ) exit(status);
  /* Don't attempt reassignment if we are down to one CPU */
  if (total_cpus > 1)
  {
  status = sys$acquire_galaxy_lock(gLocks[most_busy_id],GLOCK_TIMEOUT,0);
  if (_failed(status) ) exit(status);
  heartbeat[most_busy_id] = HEARTBEAT_TRANSPLANT;
  status = sys$release_galaxy_lock(gLocks[most_busy_id]);
  if (_failed(status) ) exit(status);
  status = sys$cpu_transitionw(CST$K_CPU_MIGRATE,CST$K_ANY_CPU,0,
                                  most_busy_id,0,0,0,0,0,0);
  if (status & 1)
  {
    if (getenv ("GCU$BALANCER_VERIFY") )
    {
      sprintf(op_msg,
             "\n\n*****GCU$BALANCER: Reassigned a CPU to instance %li\n",
             most_busy_id);
      op_dsc.dsc$w_length = strlen(op_msg);
      sys$brkthru(0,&op_dsc,&system_dsc,BRK$C_USERNAME,0,0,0,0,0,0,0);
      }
      update_cpucount(0);  /* Update the CPU count after donating one */
    }
  }
}
/********************************************************************/
/* IMAGE ENTRY - MAIN                                               */
/*                                                                  */
/* ENVIRONMENT: OpenVMS Galaxy                                      */
/* INPUTS:      None.                                               */
/* OUTPUTS:     None.                                               */
/********************************************************************/
int main(int argc, char **argv)
{
   int           show_stats = 0;
   long          busy,most_busy,nprocs;
   int64         delta;
   unsigned long status,i,j,k,system_cpus,instances;
   unsigned long arglst         = 0;
   uint64        version_id[2]  = {0,1};
   uint64        region_id      = VA$C_P0;
   uint64        most_busy_id,cpu_hndl = 0;
/* Static descriptors for storing parameters.  Must match CLD defs */
  $DESCRIPTOR(p1_desc,"P1");
  $DESCRIPTOR(p2_desc,"P2");
  $DESCRIPTOR(p3_desc,"P3");
  $DESCRIPTOR(p4_desc,"P4");
  $DESCRIPTOR(stat_desc,"STATISTICS");
/* Dynamic descriptors for retrieving parameter values */
  struct dsc$descriptor_d samp_desc = {0,DSC$K_DTYPE_T,DSC$K_CLASS_D,0};
  struct dsc$descriptor_d proc_desc = {0,DSC$K_DTYPE_T,DSC$K_CLASS_D,0};
  struct dsc$descriptor_d time_desc = {0,DSC$K_DTYPE_T,DSC$K_CLASS_D,0};
  struct SYI_ITEM_LIST syi_pagesize_list[3] = {
    {sizeof (long), SYI$_PAGE_SIZE      , &PAGESIZE    ,0},
    {sizeof (long), SYI$_GLX_MAX_MEMBERS,&max_instances,0},
    {0,0,0,0}};
/*
** num_samples and time_desc determine how often the balancer should
** check to see if any other instance needs more CPUs. num_samples 
** determines the number of samples used to calculate the running 
** average, and sleep_dsc determines the amount of time between 
** samples.
**
** For example, a sleep_dsc of 30 seconds and a num_samples of 20 means
** that a running average over the last 10 minutes (20 samples * 30 secs)
** is used to balance CPUs.
**
** load_tolerance is the minimum load difference which triggers a CPU
** migration. 100 is equal to 1 process in the computable CPU queue.
*/
   int num_samples;     /* Number of samples in running average      */
   int load_tolerance;  /* Minimum load diff to trigger reassignment */
/* Parse the CLI */
                                                /* CONFIGURE VERB */
   status      = CLI$PRESENT(&p1_desc);         /* BALANCER       */
   if (status != CLI$_PRESENT) exit(status);
   status      = CLI$PRESENT(&p2_desc);         /* SAMPLES        */
   if (status != CLI$_PRESENT) exit(status);
   status      = CLI$PRESENT(&p3_desc);         /* PROCESSES      */
   if (status != CLI$_PRESENT) exit(status);
   status      = CLI$PRESENT(&p4_desc);         /* TIME           */
   if (status != CLI$_PRESENT) exit(status);
   status     = CLI$GET_VALUE(&p2_desc,&samp_desc);
   if (_failed(status) ) exit(status);
   status     = CLI$GET_VALUE(&p3_desc,&proc_desc);
   if (_failed(status) ) exit(status);
   status     = CLI$GET_VALUE(&p4_desc,&time_desc);
   if (_failed(status) ) exit(status);
   status     = CLI$PRESENT(&stat_desc);
   show_stats = (status == CLI$_PRESENT) ? 1 : 0;
   num_samples = atoi(samp_desc.dsc$a_pointer);
   if (num_samples <= 0) num_samples = 3;
   load_tolerance = (100 * (atoi(proc_desc.dsc$a_pointer) ) );
   if (load_tolerance <= 0) load_tolerance = 100;
   if (show_stats)
     printf("Args: Samples: %d, Processes: %d, Time: %s\n",
        num_samples,load_tolerance/100,time_desc.dsc$a_pointer);
   lockdown();                  /* Lock down the cpu_q subroutine */
   /* Get the page size and max members for this system */
   status = sys$getsyiw(EFN$C_ENF,0,0,&syi_pagesize_list,&g_iosb,0,0);
   if (_failed(status) ) return (status);
   if (max_instances == 0) max_instances = 1;
   /* Get our partition ID and initial CPU info */
   status = sys$getsyiw(EFN$C_ENF,0,0,&syi_itemlist,&g_iosb,0,0);
   if (_failed(status) ) return (status);
   /* Map two pages of shared memory */
status = sys$crmpsc_gdzro_64(&gblsec_desc,version_id,0,PAGESIZE+PAGESIZE,
         ®ion_id,0,PSL$C_USER,(SEC$M_EXPREG|SEC$M_SYSGBL|SEC$M_SHMGS),
         &gs_va,&gs_length);
if (_failed(status) ) exit(status);
   /* Initialize the pointers into shared memory */
   busycnt   = (uint64 *) gs_va;
   heartbeat = (uint64 *) gs_va     + max_instances;
   cpucount  = (uint64 *) heartbeat + max_instances;
   gLocks    = (uint64 *) cpucount  + max_instances;
   cpucount[partition_id] = total_cpus;
   /* Create or map the Galaxy lock table */
   status = init_lock_tables();
   if (_failed(status) ) exit(status);
   /* Initialize delta time for sleeping */
   status = sys$bintim(&time_desc,&delta);
   if (_failed(status) ) exit(status);
   /*
   ** Register for CPU migration events. Whenever a CPU is added to
   ** our active set, the routine "update_cpucount" will fire.
   */
   status = sys$set_system_event(SYSEVT$C_ADD_ACTIVE_CPU,
              update_cpucount,0,0,SYSEVT$M_REPEAT_NOTIFY,&cpu_hndl);
   if (_failed(status) ) exit(status);
   /* Force everyone to resync before we do anything */
   for (j=0; j<max_instances; j++)
   {
     status = sys$acquire_galaxy_lock(gLocks[j],GLOCK_TIMEOUT,0);
     if (_failed(status) ) exit(status);
     heartbeat[j] = HEARTBEAT_RESTART;
     status = sys$release_galaxy_lock (gLocks[j]);
     if (_failed(status) ) exit(status);
   }
   printf("%%GCU$BALANCER-S-INIT, CPU balancer initialized.\n\n");
   /*** Main loop ***/
   do
   {
     /* Calculate a running average and update it */
     nprocs = sys$cmkrnl(cpu_q,&arglst) * 100;
/* Check out our state... */
switch (heartbeat[partition_id])
{
  case HEARTBEAT_RESTART: /* Mark ourself for reinitializition. */
  {
    update_cpucount(0);
    status = sys$acquire_galaxy_lock(gLocks[partition_id],GLOCK_TIMEOUT,0);
    if (_failed(status) ) exit(status);
    heartbeat[partition_id] = HEARTBEAT_ALIVE;
    status = sys$release_galaxy_lock(gLocks[partition_id]);
    if (_failed(status) ) exit(status);
    break;
  }
  case HEARTBEAT_ALIVE: /* Update running average and continue. */
  {
    busy = (busycnt[partition_id]*(num_samples-1)+nprocs)/num_samples;
    busycnt[partition_id] = busy;
    break;
   }
   case HEARTBEAT_TRANSPLANT:  /* Waiting for a new CPU to arrive. */
   {
    /*
     ** Someone just either reset us, or gave us a CPU and put a wait
     ** on further donations.  Reassure the Galaxy that we're alive,
     ** and calculate a new busy count.
      */
     busycnt[partition_id] = nprocs;
     status = sys$acquire_galaxy_lock(gLocks[partition_id],GLOCK_TIMEOUT,0);
     if (_failed(status) ) exit(status);
     heartbeat[partition_id] = HEARTBEAT_ALIVE;
     status = sys$release_galaxy_lock(gLocks[partition_id]);
     if (_failed(status) ) exit(status);
     break;
   }
   default:         /* This should never happen. */
   {
     exit(0);
     break;
   }
  }
  /* Determine the most_busy instance. */
  for (most_busy_id=most_busy=i=0; i<max_instances; i++)
  {
    if (busycnt[i] > most_busy)
    {
      most_busy_id = (uint64) i;
      most_busy    = busycnt[i];
    }
  }
  if (show_stats)
    printf("Current Load: %3Ld, Busiest Instance: %Ld, Queue Depth: %4d\r",
          busycnt[partition_id],most_busy_id,(nprocs/100) );
  /* If someone needs a CPU and we have an extra, donate it. */
   if ( (most_busy > busy + load_tolerance) &&
       (cpucount[partition_id] > 1) &&
       (heartbeat[most_busy_id] != HEARTBEAT_TRANSPLANT) &&
       (most_busy_id != partition_id) )
     {
        reassign_a_cpu(most_busy_id);
     }
     /* Hibernate for a while and do it all again. */
     status = sys$schdwk(0,0,&delta,0);
     if (_failed(status) ) exit(status);
     status = sys$hiber();
     if (_failed(status) ) exit(status);
   } while (1);
   return (1);
}
 |