![]() |
![]() HP OpenVMS Systemsask the wizard |
![]() |
The Question is: batch jobs periodically 'stop'. no error messages in logs, no console messages. (and yessir, i've poked around the FAQ's for a while) i have an Alphaserver 4100, VMS 7.1-2, 2.5 G memory, disk-shadowing in use. on a very unpredictable schedule, batch-jobs running DEC-Basic .EXE images against Prolog-3 indexed files simply.... stop. no indications of problems are detectable. upon re-star t, the jobs will run to completion normally, and will run to completion without modification for many more cycles. the system has been checked for disk errors, IO contention, process quotas, and a number of other 'obvious' problems. no luck. this problem has persisted for the better part of a two year period. ANY assistance will be heartily appreciated. Additional info: the software is 'off the shelf' thirdparty stuff that runs quite normally at other installations; the support-engineers from this thirdparty provider have tried a number of fixes - no good. i have moved files from one-disk to another (attempt to reduce head contention) - no good. i have implemented a schedule of 'file rebuilds' using ANALYZE / RMS and CONVERT/FDL on all files that are involved - no good. "hopefully awaiting a blow from the magic stick" thanks in advance. The Answer is : That application software runs on one site has relatively little bearing on whether or not the application will run at another (and different) site -- site-specific latent application problems and site-specific coding dependencies are surprisingly common within application code. For some of the typical programming bugs that can lead to unpredictable behaviour, please see topics (1661) and (2681). As a suggestion, establish a signal handler within the application images, and code the handler to report details of any errors. Compare the PQL parameter settings for the default process quotas. Check the default mailbox quota parameters. Check the disk fragmentation levels. Check the OpenVMS system error log for any RMS bugchecks. Ensure you have all current mandatory ECOs for OpenVMS applied. Check the auditing logs for any unexpected use of WORLD privilege, and for unexpected use of the $forcex or $delprc system services. The $delprc call is used by the DCL command STOP/ID. (You may well have to enable these audits.) If there is privileged-mode code involved, consider setting the parameter BUGCHECKFATAL to cause non-fatal system bugchecks to be elevated to fatal OpenVMS system bugchecks -- rather than simply having the process terminate, a non-fatal bugcheck will then cause the OpenVMS system to crash (and to write a dumpfile).
|