DIGITAL_Parallel_Software_Environment_______________ Release Notes: HPF, PVM, and MPI Part Number: AA-Q62ME-TE September 1997 This document contains information about DIGITAL's High Performance Fortran (HPF), Parallel Virtual Machine (PVM), and Message Passing Interface (MPI) software. Revision/Update Information: September, 1997 Operating System Versions: HPF Support: Digital UNIX V4.0 and higher PVM: Digital UNIX V4.0a and higher MPI: Digital UNIX V4.0a and higher Software Versions: DEC Fortran 90 Version 5.0 DIGITAL Parallel Software Environment Version 1.4 DIGITAL PVM Version 1.4 DIGITAL MPI Version 1.4 Digital Equipment Corporation Maynard, Massachusetts ________________________________________________________________ First Printed Edition, March, 1995 Revision, September 1997 Digital Equipment Corporation makes no representations that the use of its products in the manner described in this publication will not infringe on existing or future patent rights, nor do the descriptions contained in this publication imply the granting of licenses to make, use, or sell equipment or software in accordance with the description. Possession, use, or copying of the software described in this publication is authorized only pursuant to a valid written license from Digital or an authorized sublicensor. © Digital Equipment Corporation 1997. All Rights Reserved. Digital believes the information in this publication is accurate as of its publication date; such information is subject to change without notice. Digital is not responsible for any inadvertent errors. Digital conducts its business in a manner that conserves the environment and protects the safety and health of its employees, customers, and the community. The following are trademarks of Digital Equipment Corporation: AdvantageCluster, Alpha AXP, AXP, Bookreader, DEC, DEC Fortran, DECconnector, DECmcc, DECnet, DECserver, DECstation, DECsystem, DECsupport, DECwindows, DELNI, DEMPR, Digital, GIGAswitch, POLYCENTER, ThinWire, TURBOchannel, TruCluster, ULTRIX, VAX, VAX DOCUMENT, VAX FORTRAN, VMS, and the DIGITAL logo. Network File System and NFS are registered trademarks of Sun Microsystems, Inc. Motif is a registered trademark of Open Software Foundation, Inc., licensed by Digital; Open Software Foundation, OSF and OSF/1 are registered trademarks of Open Software Foundation, Inc. PostScript is a registered trademark of Adobe Systems, Inc. UNIX is a registered trademark in the United States and other countries licensed exclusively through X/Open Company Ltd. X Window System is a trademark of Massachusetts Institute of Technology. All other trademarks and registered trademarks are the property of their respective holders. This document is available on CDROM. This document was prepared using VAX DOCUMENT Version 2.1. _________________________________________________________________ Contents 1 PVM Release Notes 1.1 Prerequisite Software......................... 1-1 1.2 Supported Platforms........................... 1-1 1.3 Installation.................................. 1-1 1.3.1 PVM Software Now Distributed with PSE..... 1-1 1.3.2 MEMORY CHANNEL[TM] Patch Required......... 1-2 1.4 Applications Must be Re-Linked for Version 1.4........................................... 1-2 1.5 Version for Shared Memory Only................ 1-2 1.6 Documentation (PostScript and HTML)........... 1-2 1.7 Known Problems................................ 1-2 1.7.1 Large Messages............................ 1-3 1.7.2 exec Routines............................. 1-3 1.7.3 Exhausting Virtual Memory Resources....... 1-3 1.8 Miscellaneous................................. 1-3 1.8.1 pvm_exit Blocks to Wait for an Available Receiver.................................. 1-3 1.8.2 -pthread Compile-Line Argument Needed..... 1-3 1.8.3 PVM Environment Variable Defaults......... 1-4 1.9 Problems, Suggestions or Comments............. 1-4 2 MPI Release Notes 2.1 Prerequisite Software......................... 2-1 2.2 Supported Platforms........................... 2-1 2.3 Compatibility................................. 2-1 2.4 Installation.................................. 2-1 2.4.1 MPI Software Now Distributed with PSE..... 2-1 2.4.2 MEMORY CHANNEL[TM] Patch Required......... 2-2 2.5 Documentation (PostScript and HTML)........... 2-2 2.6 New in Version 1.4............................ 2-2 2.6.1 Command-Line Options Changed.............. 2-2 iii 2.6.2 stdin is Now Available.................... 2-3 2.6.3 Minor Bug Fixes........................... 2-3 2.7 Known Problems................................ 2-3 2.7.1 MPI_REQUEST_FREE.......................... 2-3 2.7.2 MPI_CANCEL................................ 2-3 2.7.3 exec Routines............................. 2-3 2.7.4 Exhausting Virtual Memory Resources....... 2-3 2.7.5 Error Message: "No MEMORY CHANNEL installed"................................ 2-4 2.8 Miscellaneous................................. 2-4 2.8.1 User-Level Access for MEMORY CHANNEL[TM]............................... 2-4 2.8.2 Shared Memory Segment Limit............... 2-4 2.8.3 -ump_bufs Requres a Multiple of 32........ 2-5 2.9 Problems, Suggestions or Comments............. 2-5 3 PSE Release Notes (HPF Support only) 3.1 Re-Compile Existing Programs.................. 3-1 3.2 Updated Fortran Run-Time Library Required on All Nodes..................................... 3-2 3.3 Overview...................................... 3-2 3.4 Installation.................................. 3-3 3.4.1 Sites Currently Using PSE................. 3-3 3.4.2 Dataless Environments..................... 3-5 3.4.3 Ladebug binaries.......................... 3-6 3.5 Reporting Problems............................ 3-6 3.6 Software Versions............................. 3-6 3.7 High Performance Fortran Support.............. 3-6 3.8 PSE System Software Subset.................... 3-7 3.8.1 New and Changed Features for Version 1.3....................................... 3-7 3.8.2 Features that First Appeared in Version 1.2....................................... 3-7 3.8.3 Features That First Appeared in Version 1.1....................................... 3-7 3.8.4 Known Problems............................ 3-10 3.8.5 Restrictions.............................. 3-13 3.9 Parallel Programming Environment Subset....... 3-13 3.9.1 New Features.............................. 3-14 3.9.2 Known Problems............................ 3-14 3.9.2.1 Debugger................................ 3-14 3.9.2.2 Profiler................................ 3-14 iv 3.9.3 Restrictions-Debugger..................... 3-14 3.10 PSE Network Kernel Binaries Subset............ 3-14 3.11 PSE Documentation............................. 3-15 3.11.1 HPF and PSE Manual........................ 3-15 3.11.2 HPF Tutorial.............................. 3-15 3.11.3 Reference Pages........................... 3-15 4 HPF Compiler Release Notes 4.1 Overview...................................... 4-1 4.2 Re-Compile Existing Programs.................. 4-1 4.3 Updated Fortran Run-Time Library Required on All Nodes..................................... 4-2 4.4 Optimization.................................. 4-2 4.4.1 The -fast Compile-Time Option............. 4-2 4.4.2 Non-Parallel Execution of Code and Data Mapping Removal........................... 4-3 4.4.3 INDEPENDENT DO Loops...................... 4-4 4.4.3.1 INDEPENDENT DO Loops Currently Parallelized............................ 4-4 4.4.3.2 INDEPENDENT DO Loops Containing Procedure Calls......................... 4-5 4.4.4 Nearest-Neighbor Optimization............. 4-6 4.5 Unsupported Features.......................... 4-7 4.5.1 Command Line options not Compatible with the -wsf Option........................... 4-7 4.5.2 HPF_LOCAL Routines........................ 4-8 4.5.3 Non-Resident PURE Functions............... 4-8 4.5.4 Nonadvancing I/O on stdin and stdout...... 4-8 4.5.5 WHERE and Nested FORALL................... 4-9 4.6 New Features.................................. 4-11 4.6.1 SHADOW Directive Now Supported............ 4-11 4.6.2 Pointers Now Handled in Parallel.......... 4-11 4.6.3 SHADOW Directive Required for Nearest-Neighbor POINTER or TARGET Arrays.................................... 4-11 4.6.4 Descriptive Mapping Directives are Now Obsolescent............................... 4-11 4.6.5 New support for HPF Local Library Routines GLOBAL_LBOUND and GLOBAL_UBOUND........... 4-12 4.6.6 REDUCTION Clause in INDEPENDENT Directives................................ 4-12 v 4.6.7 HPF_SERIAL Restriction Lifted for Procedures Called from INDEPENDENT DO Loops..................................... 4-13 4.7 Problems Fixed in This Version................ 4-13 4.8 Obsolete Features Deleted..................... 4-13 4.8.1 GLOBAL_TO_PHYSICAL and GLOBAL_LBOUNDS are Deleted................................... 4-13 4.9 Known Problems................................ 4-14 4.9.1 Pointer Assignment Inside FORALL Unreliable................................ 4-14 4.9.2 ASSOCIATED Intrinsic is Unreliable........ 4-14 4.9.3 Widths Given with the SHADOW Directive Agree with Automatically Generated Widths.................................... 4-14 4.9.4 Using EOSHIFT for Nearest Neighbor Calculations.............................. 4-14 4.9.5 "Variable used before its value has been defined" Warning.......................... 4-15 4.9.6 GRADE_UP and GRADE_DOWN Are Not Stable Sorts..................................... 4-15 4.9.7 Restrictions on Routines Compiled with -nowsf_main............................... 4-15 4.10 Miscellaneous................................. 4-15 4.10.1 What To Do When Encountering Unexpected Program Behavior.......................... 4-15 4.10.1.1 Segmentation Faults..................... 4-16 4.10.1.2 Programs that Hang...................... 4-16 4.10.1.3 Programs with Zero Sized Arrays......... 4-17 4.10.2 Stack and Data Space Usage................ 4-17 4.10.3 Non-"-wsf" main programs.................. 4-17 4.10.4 Use the Extended Form of HPF_ALIGNMENT.... 4-18 4.10.5 RAN and SECNDS Are Not PURE............... 4-18 4.10.6 RANDOM_NUMBER intrinsic is serialized..... 4-18 4.10.7 EXTRINSIC(SCALAR) Changed to EXTRINSIC(HPF_SERIAL)..................... 4-19 4.10.8 Mask Expressions Referencing Multiple FORALL Indices............................ 4-19 4.11 Example Programs.............................. 4-20 vi 5 Comments, Problems, and Help 5.1 Sending Digital Your Comments on This Product....................................... 5-1 5.2 Getting Help from DIGITAL..................... 5-2 5.3 Readers Comments Form-Documentation........... 5-2 Tables 1-1 Default Values for PVM Environment Variables................................. 1-4 vii 1 _________________________________________________________________ PVM Release Notes 1.1 Prerequisite Software DIGITAL PVM Version 1.4 requires DIGITAL UNIX Version 4.0a or higher. In addition, if MEMORY CHANNEL[TM] support is needed, the following are required: o DIGITAL TruCluster software version 1.4 (or TruCluster MEMORY CHANNEL[TM] Software) o A MEMORY CHANNEL[TM] patch (see Section 2.4.2) 1.2 Supported Platforms DIGITAL PVM is a DIGITAL proprietary implementation of PVM, for Alpha systems running DIGITAL UNIX. Both stand-alone SMP systems and MEMORY CHANNEL[TM] clusters are supported. 1.3 Installation 1.3.1 PVM Software Now Distributed with PSE For convenience, DIGITAL PVM software is now distributed together with DIGITAL Parallel Software Environment (PSE). For installation instructions, refer to the PSE installation guide. DIGITAL PVM is a completely separate facility from PSE support for HPF. These facilities do not interoperate in any way. PSE support for HPF does not need to be installed in order to use DIGITAL PVM. PVM Release Notes 1-1 PVM Release Notes 1.3 Installation 1.3.2 MEMORY CHANNEL[TM] Patch Required Digital PVM Version 1.4 requires Patch ID TCR141-013 or its successor to be installed on all MEMORY CHANNEL[TM] cluster members. To insure that you are installing the latest patch, please send mail to pvm@ilo.dec.com. To obtain patches, please use your regular Digital support channel. If you have any questions or concerns, please send mail to pvm@ilo.dec.com. 1.4 Applications Must be Re-Linked for Version 1.4 Programs previously linked with the PVM130 archive library, will have to be re-linked to operate within the PVM140 environment. 1.5 Version for Shared Memory Only DIGITAL PVM installation provides two versions of PVM, a version for shared memory only, and a full version for both shared memory and Memory Channel. Both versions are installed, but the installation script configures links to activate only one of the versions, based on whether TruCluster software is detected at install time. Scripts are provided to modify this configuration after installation. For More Information: o See the DIGITAL PVM User Guide 1.6 Documentation (PostScript and HTML) The DIGITAL PVM User Guide can be found in PostScript format at /usr/opt/PVM140/pvm_guide.ps. It can also be found in HTML format on the Consolidated Layered Products Documentation CD-ROM. 1.7 Known Problems The following problems with DIGITAL PVM were known at the time of release: 1-2 PVM Release Notes PVM Release Notes 1.7 Known Problems 1.7.1 Large Messages When sending large messages (i.e. "larger than the channel size", a second thread is started. The channel remains "busy" until the message is received. Therefore subsequent sends on a busy channel are only queued to be sent. This can impact on performance since the queue is only processed when inside PVM function calls. This effect can be ameliorated by use of larger PVM_MC_ CHAN_SIZE and/or PVM_SM_CHAN_SIZE parameters. 1.7.2 exec Routines A process that has called a PVM routine may fail if it calls execl, execv, execle, execve, execlp, or execvp. The exec routine returns EWOULDBLOCK. The problem can be avoided if the process calls fork, and calls the exec routine in the child process. 1.7.3 Exhausting Virtual Memory Resources When a process that has called a PVM routine forks, the child process sometimes loses some virtual memory resources. If an application uses multiple generations of processes (parent makes PVM call then forks child, which makes a PVM call then forks, and so on), the application may run out of vm-mapentries. The number of vm-mapentries available to an application can be changed using the sysconfig(8) command 1.8 Miscellaneous 1.8.1 pvm_exit Blocks to Wait for an Available Receiver A PVM task can send messages before a receiver task is available to read those messages. Such messages are queued in the sender task. If the sender task calls pvm_exit before a receiver task is available to read a message then the pvm_exit will block until a receiver task becomes available. 1.8.2 -pthread Compile-Line Argument Needed The -pthread argument must be included on the compile line. PVM Release Notes 1-3 PVM Release Notes 1.8 Miscellaneous 1.8.3 PVM Environment Variable Defaults The default values of the DIGITAL PVM environment variables are described in Table 1-1. Table_1-1_Default_Values_for_PVM_Environment_Variables_____ Environment_Variable__Default_Value________________________ PVM_BUF_SIZE 256000 PVM_MC_CHAN_SIZE 204800 PVM_SM_CHAN_SIZE 204800 PVM_NUM_BUFS__________100__________________________________ ________________________ Note ________________________ All the "size" parameters should be multiples of 1024. ______________________________________________________ 1.9 Problems, Suggestions or Comments Any problems, suggestions or comments should be addressed to pvm@ilo.dec.com. 1-4 PVM Release Notes 2 _________________________________________________________________ MPI Release Notes 2.1 Prerequisite Software DIGITAL MPI Version 1.4 requires DIGITAL UNIX Version 4.0a or higher. The Fortran runtime libraries are also required. In addition, if MEMORY CHANNEL[TM] support is needed, the following are required: o DIGITAL TruCluster software version 1.4 (or TruCluster MEMORY CHANNEL[TM] Software) o A MEMORY CHANNEL[TM] patch (see Section 2.4.2) 2.2 Supported Platforms DIGITAL MPI is a DIGITAL proprietary implementation of MPI, for Alpha systems running Digital UNIX . Both stand-alone SMP systems and MEMORY CHANNEL[TM] clusters are supported. 2.3 Compatibility DIGITAL MPI is compatible with version 1.0.12 of MPICH from Argonne National Labs, and will run over shared memory and MEMORY CHANNEL[TM]. Applications built using DIGITAL MPI cannot communicate via MPI with applications built using MPICH. 2.4 Installation 2.4.1 MPI Software Now Distributed with PSE For convenience, DIGITAL MPI software is now distributed together with DIGITAL Parallel Software Environment (PSE). For installation instructions, refer to the PSE installation guide. MPI Release Notes 2-1 MPI Release Notes 2.4 Installation DIGITAL MPI is a completely separate facility from PSE support for HPF. These facilities do not interoperate in any way. PSE support for HPF does not need to be installed in order to use DIGITAL MPI. 2.4.2 MEMORY CHANNEL[TM] Patch Required Digital MPI Version 1.4 requires Patch ID TCR141-013 or its successor to be installed on all MEMORY CHANNEL[TM] cluster members. To insure that you are installing the latest patch, please send mail to mpi@ilo.dec.com. To obtain patches, please use your regular Digital support channel. If you have any questions or concerns, please send mail to mpi@ilo.dec.com. 2.5 Documentation (PostScript and HTML) The DIGITAL MPI User Guide can be found in PostScript format at /usr/opt/MPI140/mpi_guide.ps. It can also be found in HTML format on the Consolidated Layered Products Documentation CD-ROM. There is a sample mpi program and instructions in /usr /examples/mpi. Reference pages for all of the MPI routines are included in this distribution, and can be accessed with the man command. 2.6 New in Version 1.4 The following are new in MPI Version 1.4. 2.6.1 Command-Line Options Changed In DIGITAL MPI Version 1.4, there are a few differences in the run-time command-line options: o The option -nh is deleted o The option -pf is deleted o The effect of the option -ump_key has changed For More Information: o See the DIGITAL MPI User Guide. 2-2 MPI Release Notes MPI Release Notes 2.6 New in Version 1.4 2.6.2 stdin is Now Available In Version 1.3 stdin was closed on all processes. It is now available to the process with rank 0 in MPI_COMM_WORLD. 2.6.3 Minor Bug Fixes A number of small bugs in MPI behavior have been fixed. Applications which did not expose these bugs should see not any difference. 2.7 Known Problems These are the known restrictions in DIGITAL MPI version 1.4: 2.7.1 MPI_REQUEST_FREE MPI_REQUEST_FREE does not work properly when the request is not completed (the mpich portable code has the same problem). 2.7.2 MPI_CANCEL MPI_CANCEL does not work (the mpich portable code does not implement this). 2.7.3 exec Routines A process that has called MPI_INIT may fail if it calls execl, execv, execle, execve, execlp, or execvp. The exec routine returns EWOULDBLOCK. This problem can be avoided if the process calls fork, and calls the exec routine in the child process. 2.7.4 Exhausting Virtual Memory Resources When a process that has called MPI_INIT forks, the child process sometimes loses some virtual memory resources. If an application uses multiple generations of processes (parent makes MPI calls then forks child, which makes MPI calls then forks, and so on), the application may run out of vm-mapentries. The number of vm-mapentries available to an application may be changed using the sysconfig(8) command. MPI Release Notes 2-3 MPI Release Notes 2.7 Known Problems 2.7.5 Error Message: "No MEMORY CHANNEL installed" A call to MPI_INIT that fails with an error reporting "no MEMORY CHANNEL installed" usually indicates that the MEMORY CHANNEL[TM] patch has not been installed. For More Information: o See Section 2.4.2 2.8 Miscellaneous 2.8.1 User-Level Access for MEMORY CHANNEL[TM] If you are using MEMORY CHANNEL[TM], ensure that each machine in the cluster that you intend to use has been initialised for user-level access. You can check this by searching for a process called imc_mapper on each machine. For example, # ps a | grep imc_mapper | grep -v grep PID TTY S TIME CMD 657 ttyp2 U 0:00.01 /usr/sbin/imc_mapper If this process does not exist on any host that you intend to use as part of the cluster, you should execute the following command (as root) on each such host: # /usr/sbin/imc_init This only needs to be executed once, and has effect until the machine is next shut down. 2.8.2 Shared Memory Segment Limit DIGITAL MPI uses shared memory for communication within a single host. The default system-wide maximum shared-memory segment a process can allocate is 4MB. For programs with a large number of processes this may need to be increased. This is done by editing the /etc/sysconfigtab file and adding (or modifying) the following entry: ipc: shm-max=size in bytes For this change to take effect, a reboot is necessary. 2-4 MPI Release Notes MPI Release Notes 2.8 Miscellaneous 2.8.3 -ump_bufs Requres a Multiple of 32 If you specify a buffer size using the -ump_bufs option in the dmpirun command, then you must specify a buffer that is a multiple of 32. Note: you are wasting memory resources if the size you specify is not a multiple of 8192. 2.9 Problems, Suggestions or Comments Any problems, suggestions or comments should be addressed to mpi@ilo.dec.com. MPI Release Notes 2-5 3 _________________________________________________________________ PSE Release Notes (HPF Support only) This chapter contains the release notes for the DIGITAL Parallel Software Environment (PSE) HPF support. The release notes for the HPF components of the DIGITAL Fortran compiler are contained in Chapter 4. The information in this chapter does not apply to PVM or MPI. For PVM, please turn to Chapter 1. For MPI, please turn to Chapter 2. This software version requires DIGITAL UNIX Version 4.0 and higher. DIGITAL TruCluster or MemoryChannel Device Driver software is required to use MemoryChannel as a system interconnect. 3.1 Re-Compile Existing Programs The current versions of the HPF run-time library and of DIGITAL Fortran work only with each other. Neither of them is compatible with earlier versions of each other. This means that newly-compiled programs will not run correctly with the old HPF RTL, and previously-compiled programs will not run correctly with the new HPF RTL. The current version of the HPF run-time library is the PSEHPF140 subset of Version 1.4 of the DIGITAL Parallel Software Environment. The current version of DIGITAL Fortran is 5.0. This means that programs which were compiled with compiler versions earlier than V5.0 will no longer run correctly. Re-linking is not sufficient; they must be recompiled and relinked. PSE Release Notes (HPF Support only) 3-1 PSE Release Notes (HPF Support only) 3.2 Updated Fortran Run-Time Library Required on All Nodes 3.2 Updated Fortran Run-Time Library Required on All Nodes Version 374 or higher of the Fortran run-time library (Fortran RTL) must be installed on every node in your PSE cluster, not just the nodes on which you use the compiler, unless you always link using the -non_shared option. This version of the Fortran RTL contains fixes for a number of problems in earlier versions, particularly non-advancing I/O. The Fortran RTL can be installed using the procedure described in the manual titled Digital Fortran Installation Guide for Digital UNIX , but installing only the Fortran RTL subset. You do not need a Fortran PAK to install or use the Fortran RTL. 3.3 Overview PSE supports the development and running of distributed parallel applications on clusters of one or more Alpha systems running the DIGITAL UNIX operating system. High Performance Fortran (HPF) is the language supported today, but PSE's runtime support is generic and could be used to support other languages in the future. HPF applications compiled with DIGITAL Fortran make use of PSE's runtime libraries and services to run in parallel. Application startup, data exchange between processes, and I/O operations are handled transparently by PSE. Components include: o System software to manage application execution o PSE cluster configuration and monitoring tools o Parallel profiler o Parallel debuggers (dbx in n windows / Ladebug in n windows) 3-2 PSE Release Notes (HPF Support only) PSE Release Notes (HPF Support only) 3.3 Overview ________________________ Note ________________________ The PSE software and its environment are not required for either the creation or compilation of HPF programs. HPF programs can be compiled and debugged in scalar mode (on a single processor), whether that processor is a member of a PSE cluster or not. This mode of compilation produces a nonparallel application. However, certain aspects of parallel program profiling and any execution of a parallel program do require the PSE environment. ______________________________________________________ 3.4 Installation 3.4.1 Sites Currently Using PSE Running different versions of PSE within the same cluster is not supported. Sites that are upgrading to 1.4 of PSE must delete older versions of PSE on all cluster members and install version 1.4. This includes Field Test versions of PSE 1.4. ________________________ Note ________________________ Deleting any of the PSE subsets will affect any currently running HPF applications. Make sure these applications terminate before proceeding. ______________________________________________________ If you are upgrading to version 1.4 from a previous version of PSE, you can complete your installation by simply following these steps: o Delete the existing software on all PSE cluster members, as follows: 1. Log in as superuser (login name root). 2. Make sure you are at the root directory (/) by entering the following command: # cd / 3. Enter the following form of the setld command: # setld -i | grep PSE PSE Release Notes (HPF Support only) 3-3 PSE Release Notes (HPF Support only) 3.4 Installation 4. Look for the word "installed" in the listing produced. You can delete any subset names displayed in response to the preceding command. Deletion is accomplished with the following command: # setld -d PSEHPFnnn PSEPPEnnn PSEWSFnnn PSEKBINxxxnnn PSEMANnnn In this command, xxx is the UNIX version number, and nnn is the PSE version number. Use 080 for Version 0.8, 110 for Version 1.1, and so on. For example, to delete the PSE Version 1.0 subsets for DIGITAL UNIX version 3.0, type the following command: % setld -d PSEHPF100 PSEPPE100 PSEWSF100 PSEKBIN300100, PSEMAN100 5. Repeat steps 1-4 for each host that is a PSE cluster member. o Install the new PSE software, as described in the installation guide and in the DIGITAL High Performance Fortran HPF and PSE Manual. o As superuser (login name root), type the following command: # pseconfig add existingfarm where existingfarm is the name of the pre-existing PSE cluster. o [Optional] The format of the PSE_PREF_COMM environment variable changed beginning with version 1.1. To allow support of PSE features such as automatic selection of messaging medium, heterogeneous networks, and MemoryChannel support, use the psedbedit command to delete any PSE_PREF_COMM definitions for PSE versions earlier than 1.1. ________________________ Note ________________________ Note: If you are using the PSE_PREF_COMM environment variable, command line switch, or database entry, you should make sure its value is compatible with PSE version 1.1 or higher. Beginning with version 1.1, PSE_PREF_COMM must be a string that lists in descending order of preference, the communication technology that should be used. The possible values 3-4 PSE Release Notes (HPF Support only) PSE Release Notes (HPF Support only) 3.4 Installation and the default preference order are "shm" for shared memory, "mc" for memory channel, "atm" for asynchronous transfer mode, "fddi" for fiber distributed data interface, and "ethernet" for Ethernet. For example: % my_program -pref_comm mc,fddi ______________________________________________________ The DIGITAL High Performance Fortran HPF and PSE Manual contains a detailed description of the installation procedure. For More Information: o On setld, see the DIGITAL High Performance Fortran HPF and PSE Manual. o On psedbedit, see the DIGITAL High Performance Fortran HPF and PSE Manual. o On PSE_PREF_COMM, see the DIGITAL High Performance Fortran HPF and PSE Manual. 3.4.2 Dataless Environments The PSE subsets can be installed in the DIGITAL UNIX Dataless environment, but only through the dmu utility. pse-remote-install cannot be used for this purpose. Once the PSE subsets are installed in the Dataless environment, it is recommended that all clients registered for this environment be deleted and re-added. You must then issue the following command on each client: # pseconfig add clustername Where clustername is: o "psefarm" for basic (non-database) PSE clusters o the pathname to the database file for file-based PSE clusters o a PSE cluster name for DNS-based PSE clusters. PSE Release Notes (HPF Support only) 3-5 PSE Release Notes (HPF Support only) 3.4 Installation 3.4.3 Ladebug binaries The PSE software kit includes separate binaries of the PSE debugger (Ladebug in n windows) for various versions of DIGITAL UNIX. When the software is installed, all the different Ladebug binaries are unpacked into the /usr/opt/PSE140/ppe/usr/lib/wsf/ directory. Then, a link is made from /usr/library/wsf/ladebug to the most recent operating-system-specific Ladebug for the resident operating system. 3.5 Reporting Problems For detailed information on how to handle problems, send comments, or get help from Customer Service Support, please refer to Chapter 5. 3.6 Software Versions The setld sets relevant to HPF support are: o PSEHPF140 - High Performance Fortran Support o PSEPPE140 - Parallel Programming Environment (HPF) o PSEWSF140 - PSE System Software (HPF) o PSEMAN140 - Parallel Software Environment Manual Pages (HPF) 3.7 High Performance Fortran Support PSE version 1.4 is compatible only with DIGITAL Fortran versions 5.0 and vice versa. In addition, successful execution of parallel HPF programs using PSE requires a recent version of the Fortran RTL (fortrtl_374 or higher) to reside on every PSE peer in the cluster. For More Information: o On installing the correct version of the Fortran RTL throughout the PSE cluster, see Section 4.3. 3-6 PSE Release Notes (HPF Support only) PSE Release Notes (HPF Support only) 3.8 PSE System Software Subset 3.8 PSE System Software Subset This section describes the new features and known problems with the PSE system software subset. 3.8.1 New and Changed Features for Version 1.3 The PSE Network Kernel Binaries subset (support for UDP_ prime) is no longer provided. 3.8.2 Features that First Appeared in Version 1.2 The following are the features that first appeared in PSE Version 1.2: o The HPF RTL (PSEHPF subset) was updated. o The PSE system software subset was enhanced to improve bandwidth when using MemoryChannel. 3.8.3 Features That First Appeared in Version 1.1 The following are the features that first appeared in PSE Version 1.1: o Support for a basic PSE cluster, without a user- maintained database. This allows PSE to be installed and applications up and running within minutes. Hosts on the network that are running PSE automatically detect each other, and form a single-partition PSE cluster. A basic PSE cluster is the default at install time. The pseconfig, lspart, pspart, and psemon utilities have been enhanced to support basic PSE clusters. pseconfig now supports three new command arguments: o addbasic, to configure a basic PSE cluster o delbasic, to de-configure a basic PSE cluster o modbasic, to modify a basic PSE cluster. For additional functionality, a user-maintained database can optionally be added later by using the psedbedit utility. o Support for a mix of communications types within an application - For example, an application running on a cluster of SMPs connected with MemoryChannel or FDDI will now by default use shared memory to communicate between peer processes within an SMP and will use PSE Release Notes (HPF Support only) 3-7 PSE Release Notes (HPF Support only) 3.8 PSE System Software Subset FDDI or MemoryChannel to commuicate between machines. Multiple peers that are run on a single processor (- virtual) now use shared memory to communicate. o MemoryChannel support - PSE now supports DIGITAL's new low latency, high bandwidth MemoryChannel interconnect. Steps required to utilize MemoryChannel using HPF and PSE: - Install and configure the MemoryChannel hardware - Install and configure the TruCluster software (send mail to pse@hpc.pko.dec.com for assistance) - Install and configure PSE V1.1 (or higher) on each machine in your TruCluster - Build your application using DIGITAL Fortran V4.0 or higher. If you will be linking -non_shared, linking must be done on a machine with TruCluster (see Section 3.8.5). o Asynchronous Transfer Mode (ATM) support - PSE has been tested using ATM as the interconnect between members of a cluster. Please install and configure your ATM network as described in the DIGITAL UNIX and the DIGITAL ATM hardware documentation. o New values for PSE_PREF_COMM - If you are using the PSE_PREF_COMM environment variable database entry, or the the -pref_comm command-line switch, you should update its value to be compatible with PSE version 1.1 or higher. Beginning with version 1.1 PSE_PREF_COMM should be a string that lists in descending order of preference, the communication technology that should be used. The possible values are and the default preference order are "shm" for shared memory, "mc" for MemoryChannel, "atm" for asynchronous transfer mode, "fddi" for fiber distributed data interface, and "ethernet" for Ethernet. For example, % myprogram -pref_comm mc,fddi o New command line switches when running an application o -mc_no_errck - Do not check for Memory Channel errors on each send and receive of a message (default is to check for errors). Because the error rate of the 3-8 PSE Release Notes (HPF Support only) PSE Release Notes (HPF Support only) 3.8 PSE System Software Subset MemoryChannel hardware is so low, we are providing a mode of operation that eliminates the error checking overhead. o -connections -- Display a connection map that shows interconnect technology and protocol used for communications between peers o -pref_comm a,b,c - Specify the prefered communications technology that the applications should use. Argument options: shm mc atm fddi ethernet o Starting with V4.0 of DIGITAL UNIX, a user can set the environment variable _RLD_ARGS to -log_stderr to have all loader errors reported on standard error. This allows you to see loader-related errors that occur on other machines in your PSE cluster when running an application. o Foreground and background color specifications for psemon in the /usr/lib/X11/app-defaults/PSEmon resource file have been commented out. Your window manager's default foreground and background colors will be used instead. o The farm menu interface of psemon has been enhanced to support displaying basic, file-base, and DNS-based PSE clusters. For more information, refer to the psemon(1) manpage. o Improved support for file based (non-DNS) PSE clusters. - The pspart, lspart, pseconfig, psedbedit, and psemon commands now work with file based PSE clusters if the PSE_FARM environment variable is set to the full pathname of the PSE cluster database file. The following is an example of file-based PSE cluster definition database. For more information, please see the HPF and PSE Manual. PSE Release Notes (HPF Support only) 3-9 PSE Release Notes (HPF Support only) 3.8 PSE System Software Subset # # This is a very basic template for a file based (tree) cluster # containing two nodes: ash and birch in a single partition # called "compute" # # A portnumber is assigned, e.g. 7298 # Though the PSE_PREF_COMM definition is not required as # we've enumerated the default, the system administrator should # be able to eliminate or reorder the comm options as desired # by the user community. # # The system manager can modify /etc/csh.login to # include: # # setenv PSE_FARM // # # User can change the PSE_FARM environment variable or override it # by using the -farm command line switch when running an application. # #Partition name or Token Value #configuration date # compute PSE_MEMBERS ash birch compute PSE_PRIORITY_LIMIT 19 configuration_data PSE_LOADSERVERS ash configuration_data PSE_DEFAULT_PARTITION compute configuration_data PSE_PARTITIONS compute configuration_data PSE_PREF_COMM shm mc atm fddi ethernet configuration_data PSE_SERVICE 7298 3.8.4 Known Problems The following known problems have been identified: o Status value returned by the C shell - The exit status values returned by the C shell (csh> echo $status) are incorrect when a program exits because of the receipt of a signal (e.g. segmentation fault). Signal-related exit values should be reported as signal value + 128. Currently, signal-related exits are calculated as a signed 8-bit value by summing the signal value with 128. For example, if a segmentation fault occurs, an exit value of -117 is returned, which was calculated 3-10 PSE Release Notes (HPF Support only) PSE Release Notes (HPF Support only) 3.8 PSE System Software Subset by summing the signal value of 11 with 128, and interpreting the result as an 8-bit signed number. o .rhosts - A frequently reported problem for PSE users is that they see the message "Permission denied. Startup of io_manager on host (foo.com) failed!" when running an application. To avoid this, please check to be sure that the hostname for all machines from which applications will be run are listed in your .rhosts file in your home directory. o psedbedit - As far as the Domain Name System (DNS) is concerned, a PSE cluster name is case insensitive. For example, mycluster.univ.edu is equivalent to MyCluster.univ.edu. However, the PSE cluster DNS database file and the PSE cluster IP service port entry in /etc/services, are case sensitive. psedbedit does not handle the PSE cluster name case insensitivity correctly. Consider the following situation. A new PSE cluster named MyCluster was created. At a later time, psedbedit was run to edit MyCluster, but this time mycluster was specified as a PSE cluster name. psedbedit properly loaded the MyCluster database from DNS, as the PSE cluster name is case insensitive. But since the specified PSE cluster name is mycluster, it failed to recognize the case insensitivity when checking /etc/services, and when checking the existence of the /etc/namedb/MyCluster.db file. The workaround for this problem is to always use lowercase letters for PSE cluster names or to always specify the PSE cluster name with the same capitalization as when it was created. The original capitalization can be seen by listing the /etc/namedb directory. o psemon - psemon might fail to allocate a color when other applications currently running have already allocated all of the shared colormap entries. When this happens, psemon displays an error dialog box and changes its color model to monochrome. There is currently no way of changing the color model back to color when the other applications deallocate the colormap entries. The workaround is to exit psemon and restart it. PSE Release Notes (HPF Support only) 3-11 PSE Release Notes (HPF Support only) 3.8 PSE System Software Subset If you define a default window foreground color that is very light, the PSE cluster member's nodenames might not be readable as its foreground color is the default window foreground that you have defined. The workaround is to copy /usr/lib/X11/app-defaults/PSEMon to your home directory, and add the following entry: Tk.f3*Foreground: newcolor Where newcolor is a new color that gives better contrast to psemon's background color. o pse-remote-install - pse-remote-install does not currently perform subset dependency reordering of the subsets given as its argument. For example, consider the following two subsets: PSEHPF110 and PSEWSF110, where PSEWSF110 depends on PSEHPF110. On deletion, for example: pse-remote-install -d . . . PSEHPF110 PSEWSF110 pse-remote-install refuses to delete the PSEHPF110 subset, even though PSEWSF110 is specified as the other subset to be deleted. Similarly on loading, for example: pse-remote-install -l . . . PSEWSF140 PSEHPF140 pse-remote-install refuses to install the PSEWSF140 as its dependency. PSEHPF140 is not installed, although PSEHPF140 is specified as the other subset to be installed. This problem will be fixed in a future release. The current workaround is to specify the subset names in the proper dependency. pse-remote-install cannot be used to load or delete subsets in a DIGITAL UNIX dataless environment. Use the dmu(8) utility to load and the setld(8) utility to delete. o io_manager - Startup failures due to lack of system resources (low swap space, out of processes, and so forth) are not reported correctly to the controlling process. This results from not attaching to /dev/tty. As a consequence, users may experience application exits with a non-zero exit status, but without any visible error explanation. 3-12 PSE Release Notes (HPF Support only) PSE Release Notes (HPF Support only) 3.8 PSE System Software Subset o farmd - When farmd is started by hand it should be run as a background process: % /usr/sbin/farmd -farm -xyz & An alternative to starting farmd by hand is to use lspart -jobslots which will trigger inetd to start farmd. 3.8.5 Restrictions The following restrictions apply: o Use of the fork() and exec() system services is not supported in PSE version 1.4. o Applications linked with the -non_shared option must have the MemoryChannel library libimc.a available on the local machine at link time in order to run on a machine with MemoryChannel. When this library is not available on the machine doing the linking, there are four workarounds: - Run the application specifying -c with a list of arguments that excludes "mc". For example, % myprogram -c shm,tcp This example will use TCP (possibly over the MemoryChannel hardware!) for messaging between machines. - Link -call_shared instead of -non_shared. - Link on a machine that has libimc.a available. - Arrange for libimc.a to be available on the non- MemoryChannel machine doing the linking. 3.9 Parallel Programming Environment Subset This section describes the new features and known problems with the Parallel Programming Environment subset. PSE Release Notes (HPF Support only) 3-13 PSE Release Notes (HPF Support only) 3.9 Parallel Programming Environment Subset 3.9.1 New Features None. 3.9.2 Known Problems This section describes the known problems with the Parallel Programming Environment Subset. 3.9.2.1 Debugger dbx version 3.11.6 is shipped with PSE as an alternative to Ladebug. dbx version 3.11.10 is shipped with V4.0A of Digital UNIX. 3.9.2.2 Profiler The following are the known problems with the pprof Profile Analysis Tool: o Reduction and nearest_neighbor communications are not profiled. o Interval profiling report (-comm_variables) - This currently displays variable names in the form of compiler generated "mangled" names, which may not be the same as the original names in the program. 3.9.3 Restrictions-Debugger The following restrictions apply to the debugger: o The modified version of dbx supplied with the PSE requires the DEC OSF/1 Developers' Extensions (OSF-DEV) license. o In order to use the modified version of Ladebug that is provided with PSE, all PSE cluster members must meet the pre-install requirements for Ladebug as stated in the Ladebug cover letter. If the Ladebug pre-install requirements are not met, then either improper operation will occur or the modified dbx might be used instead. 3.10 PSE Network Kernel Binaries Subset The PSE Network Kernel Binaries subset (support for UDP_ prime) is no longer provided, beginning with PSE Version 1.3. 3-14 PSE Release Notes (HPF Support only) PSE Release Notes (HPF Support only) 3.11 PSE Documentation 3.11 PSE Documentation The documentation and man pages have been updated for this release. 3.11.1 HPF and PSE Manual The HPF and PSE Manual is now available in HTML format on the Consolodated Layered Products Documentation CD-ROM. 3.11.2 HPF Tutorial The HPF Tutorial is a series of textbook-style chapters explaining the development of example programs in the HPF language. These chapters include basic explanation of HPF programming for Fortran programmers who are still learning the HPF language, along with scattered tips about DIGITAL's implementation of HPF that will be useful even to experienced HPF programmers. The presentation is geared specifically toward using DIGITAL's implementation of HPF, and refers to (included) usable source code of working HPF programs. The HPF Tutorial is included in the HPF and PSE Manual. It is also included in the PSE kit as the file /usr/opt/PSE140 /docs/pse_tutor.ps (PostScript format). 3.11.3 Reference Pages The reference pages are available on-line using the man command. Reference pages with the .3hpf extension are supplied by both the PSEMAN140 and the PSESHPF107 software subsets in two directories: /usr/opt/PSE140/man/usr/share/man/man3 and /usr/lib/cmplrs/hpfrtl_107 respectively. These reference pages are identical. The symbolic links point to whichever subset directory is installed the latest. If you want to force these symbolic links to point to a specific subset directory, run one of the following commands: For /usr/opt/PSE140/man/usr/share/man/man3: % setld -c PSEMAN140 relink For /usr/lib/cmplrs/hpfrtl_107: % setld -c PSESHPF107 select PSE Release Notes (HPF Support only) 3-15 PSE Release Notes (HPF Support only) 3.11 PSE Documentation When one of the subsets is deleted, the subset deletion script does the following: o PSEMAN140 deletion: - Removes the /usr/man/man3/*.3hpf symbolic links - If /usr/lib/cmplrs/hpfrtl exists (indicating that the HPF Scalar Libraries subset is installed), it restores the symbolic links to point to /usr/lib /cmplrs/hpfrtl/*.3hpf. o PSESHPF107 deletion: - Removes the /usr/man/man3/*.3hpf symbolic links - If the PSEMAN140 subset is installed, it restores the symbolic links to point to /usr/opt/PSE140/man/usr /share/man/man3/*.3hpf. - If the PSESHPFnnn subset is installed, where nnn is the next highest version, it restores the symbolic links to this version. This overwrites the previously restored PSEMAN140 symbolic links. 3-16 PSE Release Notes (HPF Support only) 4 _________________________________________________________________ HPF Compiler Release Notes This chapter contains the release notes for the the High Performance Fortran (HPF) components of the DIGITAL Fortran compiler, version 5.0. PSE software is required for parallel execution of HPF programs. The release notes for DIGITAL Parallel Software Environment (PSE) are contained in Chapter 3. In the event of last minute changes, these will be documented in the online release notes. These are found in /usr/opt/PSE140/docs. The PostScript version is pse140_relnotes.ps and the text version is pse140_relnotes.txt. This software version requires DIGITAL UNIX Version 4.0 and higher. 4.1 Overview This chapter contains the release notes for using the DIGITAL Fortran compiler version 5.0 with the PSE. 4.2 Re-Compile Existing Programs The current versions of the HPF run-time library and of DIGITAL Fortran work only with each other. Neither of them is compatible with earlier versions of each other. This means that newly-compiled programs will not run correctly with the old HPF RTL, and previously-compiled programs will not run correctly with the new HPF RTL. The current version of the HPF run-time library is the PSEHPF140 subset of Version 1.4 of the DIGITAL Parallel Software Environment. The current version of DIGITAL Fortran is 5.0. HPF Compiler Release Notes 4-1 HPF Compiler Release Notes 4.2 Re-Compile Existing Programs This means that programs which were compiled with compiler versions earlier than V5.0 will no longer run correctly. Re-linking is not sufficient-they must be recompiled and relinked. 4.3 Updated Fortran Run-Time Library Required on All Nodes Version 374 or higher of the Fortran run-time library (Fortran RTL) must be installed on every node in your PSE cluster, not just the nodes on which you use the compiler, unless you always link using the -non_shared option. This version of the Fortran RTL contains fixes for a number of problems in earlier versions, particularly non-advancing I/O. The Fortran RTL can be installed using the procedure described in the manual titled Digital Fortran Installation Guide for Digital UNIX , but installing only the Fortran RTL subset. You do not need a Fortran PAK to install or use the Fortran RTL. 4.4 Optimization This section contains release notes relevant to increasing code performance. You should also refer to Chapter 7 of the DIGITAL High Performance Fortran HPF and PSE Manual for more detail. 4.4.1 The -fast Compile-Time Option To get optimal performance from the compiler, use the - fast option if possible. You may use -fast if your program does not reference any zero-sized arrays or array sections. If neither of these options is selected, the compiler is required to insert a series of checks to guard against irregularities (such as division by zero) in the generated code that zero-sized data objects can cause. Depending upon the particular application, these checks can have a noticeable (or even major) effect on performance. The -fast or -assume nozsize compile-time options may not be used in a program where lines containing any zero- sized arrays or array sections are executed. If any line containing zero-sized arrays is executed in a program compiled with either of these options, incorrect program results occur. 4-2 HPF Compiler Release Notes HPF Compiler Release Notes 4.4 Optimization If you suspect that an array or array section named on a certain program line may be zero-sized, you can insert a run-time check that prevents execution of that line whenever the array or array section is zero-sized. If the difference between the UBOUND and LBOUND of any dimension of the array or array section is less than zero, the array or array section is zero-sized. The following expresses this in Fortran 90: if (ANY((UBOUND(A) - LBOUND(A)) < 0)) If you mask out the execution of all occurrences of zero- sized arrays or array sections using run-time checks such as this, you may compile the program with the -fast or -assume nozsize compiler options. 4.4.2 Non-Parallel Execution of Code and Data Mapping Removal The following constructs are not handled in parallel: o Reductions with non-constant DIM argument. o CSHIFT, EOSHIFT and SPREAD with non-constant DIM argument. o Some array-constructors o PACK, UNPACK, RESHAPE o xxx_PREFIX, xxx_SUFFIX, GRADE_UP, GRADE_DOWN o In the current implementation of DIGITAL Fortran, all I/O operations are serialized through a single processor; see Chapter 7 of the DIGITAL High Performance Fortran HPF and PSE Manual for more details o Date and time intrinsics, including DATE_AND_TIME, SYSTEM_CLOCK, DATE, IDATE, TIME, and SECNDS o Random number intrinsics, including RANDOM_NUMBER, RANDOM_SEED, and RAN If an expression contains a non-parallel construct, the entire statement containing the expression is executed in a nonparallel fashion. The use of such constructs can cause degradation of performance. DIGITAL recommends avoiding the use of constructs to which the above conditions apply in the computationally intensive kernel of a routine or program. HPF Compiler Release Notes 4-3 HPF Compiler Release Notes 4.4 Optimization 4.4.3 INDEPENDENT DO Loops A number of conditions must be met in order for the compiler to successfully parallelize INDEPENDENT DO loops. These conditions are described in the following two sections. 4.4.3.1 INDEPENDENT DO Loops Currently Parallelized Not all INDEPENDENT DO loops are currently parallelized. It is important to use the -show hpf or -show hpf_indep compile-time option, which will give a message whenever a loop marked INDEPENDENT is not parallelized. Currently, a nest of INDEPENDENT DO loops is parallelized whenever the following conditions are met: o All array subscripts must either - contain no references to INDEPENDENT DO loop variables, or - contain one reference to an INDEPENDENT DO loop variable and the subscript expression is an affine function of that DO loop variable. o At least one array reference must reference all the independent loops in a nest of independent loops. o The loop nest must either - require no inter-processor communication, or - can be made to require no inter-processor communication with compiler-generated copyin/copyout code around the loop nest. o When INDEPENDENT DO loops are nested, the NEW keyword must be used to assert that all loop variables (except the outer loop variable) are NEW. It is recommended that the outer DO loop variable be in the NEW list, as well. o Any INDEPENDENT DO loops containing procedure calls must meet the requirements listed in Section 4.4.3.2. 4-4 HPF Compiler Release Notes HPF Compiler Release Notes 4.4 Optimization 4.4.3.2 INDEPENDENT DO Loops Containing Procedure Calls In order to execute in parallel, the entire body of INDEPENDENT DO loops containing procedure calls must be encapsulated in an ON HOME RESIDENT region. Routines called from inside INDEPENDENT DO loops must be resident. "Resident" means that the procedure can execute on each processor without reading or writing any data that is not local to that processor. The programmer must encapsulate the entire body of the loop in an ON HOME RESIDENT region, which promises the compiler that inter-processor communication will not be required, neither in the body of the INDEPENDENT DO loop containing the procedure call, nor in any subroutine or function called (directly or indirectly) from the loop, if the iterations are distributed as requested. Unlike procedures called from inside FORALLs, procedures called from inside INDEPENDENT DO loops do not need to be PURE. However, DIGITAL's highly-optimized send/receive synchronization paradigm (described in the Introduction of the DIGITAL High Performance Fortran HPF and PSE Manual) requires that no inter-processor communication occur as a result of the procedure call. Here is an example of an INDEPENDENT DO loop containing an ON HOME RESIDENT directive and a procedure call: !HPF$ INDEPENDENT DO i = 1, 10 !HPF$ ON HOME (B(i)), RESIDENT BEGIN A(i) = addone(B(i)) !HPF$ END ON END DO . . . CONTAINS FUNCTION addone(x) INTEGER, INTENT(IN) :: x INTEGER addone addone = x + 1 END FUNCTION addone HPF Compiler Release Notes 4-5 HPF Compiler Release Notes 4.4 Optimization The ON HOME RESIDENT region does not impose any syntactic restrictions. It is merely an assertion that inter- processor communication will not actually be required at run time. 4.4.4 Nearest-Neighbor Optimization The following is a list of conditions that must be satisfied in an array assignment, FORALL statement, or INDEPENDENT DO loop in order to take advantage of the nearest-neighbor optimization: o The relevant arrays in an INDEPENDENT DO loop must have shadow edges explicitly declared with the SHADOW directive. o Relevant arrays with the POINTER or TARGET attributes must have shadow edges explicitly declared with the SHADOW directive. o The arrays involved in the nearest-neighbor style assignment statements should not be PUBLIC module variables or variables assigned by USE association. However, if both the actual and all associated dummies are assigned a shadow-edge width with the SHADOW directive, this restriction is lifted. o A value must be specified for the -wsf option on the command line. o Some interprocessor communication must be necessary in the statement. o Corresponding dimensions of an array must be distributed in the same way (though they can be offset using an ALIGN directive). If the -wsf flag's optional nn field is used to specify a maximum shadow-edge width, only constructs with a subscript difference less than or equal to the value specified by nn will be recognized as nearest neighbor. For example, the assignment statement (FORALL (i=1:n) A(i) = B(i-3)) has a subscript difference of 3. In a program compiled with the flag -nearest_neIghbor 2, this assignment statement would not be eligible for the nearest neighbor optimization. o The left-hand side array must be distributed BLOCK in at least one dimension. 4-6 HPF Compiler Release Notes HPF Compiler Release Notes 4.4 Optimization o The arrays must not have complex subscripts (no vector- valued subscripts, and any subscripts containing a FORALL index must be affine functions of one FORALL index; further, that FORALL index must not be repeated in any other subscript of a particular array reference). o Subscript triplet strides must be known at compile time and be greater than 0. o The arrays must be distributed BLOCK or serial (*) in each dimension. Compile with the -show hpf or -show hpf_nearest switch to see which lines are treated as nearest-neighbor. Nearest-neighbor communications are not profiled by the pprof profiler. See Section 3.9.2.2. For More Information: o On profiling nearest-neighbor computations, see Section 3.9.2.2 o On using EOSHIFT for nearest-neighbor computations, see Section 4.9.4 4.5 Unsupported Features This section lists unsupported features in this release of DIGITAL Fortran. 4.5.1 Command Line options not Compatible with the -wsf Option The following command line options may not be used with the -wsf option: o The -feedback and -cord options are not compatible, since they require the use of -p, which is not compatible with -wsf. o -double_size 128 o -gen_feedback o -p, -p1, -pg (use -pprof instead) o -fpe1, -fpe2, -fpe3, -fpe4 o -om o -mp HPF Compiler Release Notes 4-7 HPF Compiler Release Notes 4.5 Unsupported Features 4.5.2 HPF_LOCAL Routines The following procedures in the HPF Local Routine Library are not supported in the current release: o ACTIVE_NUM_PROCS o ACTIVE_PROCS_SHAPE o HPF_MAP_ARRAY o HPF_NUMBER_MAPPED o LOCAL_BLKCNT o LOCAL_LINDEX o LOCAL_UINDEX 4.5.3 Non-Resident PURE Functions PURE functions are handled properly only if they are resident. "Resident" means that the function can execute on each processor without reading or writing any data that is not local to that processor. Non-resident PURE functions are not handled. They will probably cause failure of the executable at run-time if used in FORALLs. 4.5.4 Nonadvancing I/O on stdin and stdout Nonadvancing I/O does not work correctly on stdin and stdout. For example, this program is supposed to print the prompt ending with the colon and keep the cursor on that line. Unfortunately, the prompt does not appear until after the input is entered. PROGRAM SIMPLE INTEGER STOCKPRICE WRITE (6,'(A)',ADVANCE='NO') 'Stock price1 : ' READ (5, *) STOCKPRICE WRITE (6,200) 'The number you entered was ', STOCKPRICE 200 FORMAT(A,I) END PROGRAM SIMPLE 4-8 HPF Compiler Release Notes HPF Compiler Release Notes 4.5 Unsupported Features The work-around for this bug is to insert a CLOSE statement after the WRITE to stdout. This effectively flushes the buffer. PROGRAM SIMPLE INTEGER STOCKPRICE WRITE (6,'(A)',ADVANCE='NO') 'Stock price1 : ' CLOSE (6) ! Add close to get around bug READ (5, *) STOCKPRICE WRITE (6,200) 'The number you entered was ', STOCKPRICE 200 FORMAT(A,I) END PROGRAM SIMPLE 4.5.5 WHERE and Nested FORALL The following statements are not currently supported: o WHERE statements inside FORALLs o FORALLs inside WHEREs o Nested FORALL statements When nested DO loops are converted into FORALLs, nesting is ordinarily not necessary. For example, DO x=1, 6 DO y=1, 6 A(x, y) = B(x) + C(y) END DO END DO can be converted into FORALL (x=1:6, y=1:6) A(x, y) = B(x) + C(y) In this example, both indices (x and y) can be defined in a single FORALL statement that produces the same result as the nested DO loops. In general, nested FORALLs are required only when the outer index is used in the definition of the inner index. For example, consider the following DO loop nest, which adds 3 to the elements in the upper triangle of a 6 x 6 array: HPF Compiler Release Notes 4-9 HPF Compiler Release Notes 4.5 Unsupported Features DO x=1, 6 DO y=x, 6 A(x, y) = A(x, y) + 3 END DO END DO In Fortran 90, this DO loop nest can be replaced with the following nest of FORALL structures: FORALL (x=1:6) FORALL (y=x:6) A(x, y) = A(x, y) + 3 END FORALL END FORALL However, nested FORALL is not currently supported in parallel (i.e. with the -wsf option). A work-around is to use a single FORALL with a mask expression: FORALL (x=1:6, y=1:6, y>=x .AND. y<=6) A(x, y) = A(x, y) + 3 END FORALL All three of these code fragments would convert a matrix like this: [8 8 8 8 8 8 ] [8 8 8 8 8 8 ] [8 8 8 8 8 8 ] [ ] [8 8 8 8 8 8 ] [8 8 8 8 8 8 ] [8 8 8 8 8 8 ] [ ] into this matrix: [11 11 11 11 11 11 ] [ 8 11 11 11 11 11 ] [ 8 8 11 11 11 11 ] [ ] [ 8 8 8 11 11 11 ] [ 8 8 8 8 11 11 ] [ 8 8 8 8 8 11 ] [ ] Using a mask introduces a minor inefficiency (the mask condition is checked with each iteration), so you may want to replace this with nested FORALLs when this becomes supported in a future release. 4-10 HPF Compiler Release Notes HPF Compiler Release Notes 4.6 New Features 4.6 New Features This section describes the new HPF features in this release of DIGITAL Fortran. 4.6.1 SHADOW Directive Now Supported The new SHADOW directive, as defined in Version 2.0 of the High Performance Fortran Language Specification, is now supported. SHADOW is now a separate HPF directive, rather than a keyword inside the DISTRIBUTE directive. 4.6.2 Pointers Now Handled in Parallel Mapped variables with the POINTER attribute are now handled in parallel. This capability is an approved extension of the High Performance Fortran Language Specification. 4.6.3 SHADOW Directive Required for Nearest-Neighbor POINTER or TARGET Arrays The compiler will not generate shadow edges automatically for arrays with the POINTER or TARGET attributes. In order to be eligible for the compiler's nearest-neighbor optimization, POINTER or TARGET arrays must explicitely be given shadow edges using the SHADOW directive. If pointer assignment is done, both the POINTER and the TARGET must have the same mapping, including shadow edges. For More Information: o On the conditions that must be satisfied for a statement to be eligible for the nearest-neighbor optimization, see Section 4.4.4 of these Release Notes. 4.6.4 Descriptive Mapping Directives are Now Obsolescent In Version 1 of the HPF Language Specification, a special form of the DISTRIBUTE and ALIGN directives was used in interfaces and procedures when mapped arrays were passed to a procedure. Known as descriptive mapping, it was specified by an asterisk (*) appearing before the left parenthesis "(" in a DISTRIBUTE directive, or after the WITH in an ALIGN directive. For example, !HPF$ DISTRIBUTE R*(BLOCK, BLOCK) !HPF$ ALIGN S WITH *R HPF Compiler Release Notes 4-11 HPF Compiler Release Notes 4.6 New Features Beginning with version 2.0 of the High Performance Fortran Language Specification (DIGITAL Fortran version 5.0), the meaning of descriptive syntax has changed. Descriptive mapping is now a weak assertion that the programmer believes that no data communication is required at the procedure interface. If this assertion is wrong, the data communication will in fact occur. Although there is now no semantic difference between the descriptive form and the ordinary prescriptive form, there is still some benefit in using the descriptive form. DIGITAL Fortran generates informational messages when a descriptive directive is specified if the compiler is unable to confirm that there will in fact be no communication. These messages can uncover subtle programming mistakes that cause performance degradation. Existing programs with descriptive mapping directives will continue to compile and run with no modification and no performance penalty. In the future, DIGITAL may provide a command-line option that specifies that descriptive directives be treated as strong assertions that data communication will not be necessary at the procedure interface. This would allow the compiler to omit checking whether the mappings of the actual and dummy agree, leading to performance improvement in some cases. 4.6.5 New support for HPF Local Library Routines GLOBAL_LBOUND and GLOBAL_UBOUND The following HPF Local Library routines are now supported: o GLOBAL_LBOUND o GLOBAL_UBOUND 4.6.6 REDUCTION Clause in INDEPENDENT Directives The REDUCTION clause in INDEPENDENT directives is now supported. 4-12 HPF Compiler Release Notes HPF Compiler Release Notes 4.6 New Features 4.6.7 HPF_SERIAL Restriction Lifted for Procedures Called from INDEPENDENT DO Loops Previous versions required procedures called from inside INDEPENDENT DO loops to HPF_SERIAL in order to obtain parallel execution. This restriction is now lifted. For More Information: o On the requirements for parallel execution of INDEPENDENT DO loops containing procedure calls, see Section 4.4.3.2 of these Release Notes. 4.7 Problems Fixed in This Version This section lists problems in previous versions that have been fixed in this version. o Some bugs in implementing whole structure references in IO and assignment were fixed. o Aligning components of derived types is now supported. o The restriction that statements with scalar subscripts are not eligible for the nearest-neighbor optimization is now removed. statements with scalar subscripts may now be eligible for the nearest-neighbor optimization if that array dimension is (effectively) mapped serially. o Nearest-neighbor assignments with derived types are now eligible for the nearest-neighbor optimization. 4.8 Obsolete Features Deleted 4.8.1 GLOBAL_TO_PHYSICAL and GLOBAL_LBOUNDS are Deleted The following obsolete HPF Local Library routines have been deleted: o GLOBAL_TO_PHYSICAL o GLOBAL_LBOUNDS HPF Compiler Release Notes 4-13 HPF Compiler Release Notes 4.9 Known Problems 4.9 Known Problems 4.9.1 Pointer Assignment Inside FORALL Unreliable In programs compiled with the -wsf option, pointer assignments inside a FORALL do not work reliably. In many cases, incorrect program results may occur. 4.9.2 ASSOCIATED Intrinsic is Unreliable In DIGITAL Fortran Version 5.0, the ASSOCIATED intrinisc sometimes returns incorrect results in programs compiled with the -wsf compile-time option. This problem did not occur in previous versions of DIGITAL Fortran. 4.9.3 Widths Given with the SHADOW Directive Agree with Automatically Generated Widths When compiler-determined shadow widths don't agree with the widths given with the SHADOW directive, less efficient code will usually be generated. To avoid this problem, create a version of your program without the SHADOW directive, and compile with the -show hpf or -show hpf_near option. The compiler will generate messages that include the sizes of the compiler- determined shadow widths. Make sure that any widths you specify with the SHADOW directive match the compiler- generated widths. 4.9.4 Using EOSHIFT for Nearest Neighbor Calculations In the current compiler version, the compiler does not always recognize nearest-neighbor calculations coded using EOSHIFT. For maximum performance, use the -show hpf option at compile time to find out whether the algorithm was recognized as nearest neighbor. If the compiler does not recognize your algorithm, you may need to re-code it in order to take advantage of the nearest-neighbor optimization. 4-14 HPF Compiler Release Notes HPF Compiler Release Notes 4.9 Known Problems 4.9.5 "Variable used before its value has been defined" Warning The compiler may inappropriately issue a "Variable is used before its value has been defined" warning. If the variable named in the warning does not appear in your program (e.g. var$0354), you should ignore the warning. 4.9.6 GRADE_UP and GRADE_DOWN Are Not Stable Sorts In the current implementation, GRADE_UP and GRADE_DOWN are not stable sorts. 4.9.7 Restrictions on Routines Compiled with -nowsf_main The following are restrictions on dummy arguments to routines compiled with the -nowsf_main compile-time option: o The dummy must nots be assumed-size o The dummy must not be of type CHARACTER*(*) o The dummy must not have the POINTER attribute o %LOC must not be applied to distributed arguments Failure to adhere to these restrictions may result in program failure, or incorrect program results. 4.10 Miscellaneous This section contains miscellaneous release notes relevant to HPF. 4.10.1 What To Do When Encountering Unexpected Program Behavior This section gives some guidelines about what to do when your program displays unexpected behavior at runtime. The two most common problems are incorrect programs that either segmentation fault or hang at runtime. Before attempting to debug parallel HPF programs, it is important to verify first that the program runs correctly when compiled without the -wsf command line switch. When the problem occurs only with the -wsf switch, the best way to debug these programs is to execute them with the -debug command line switch. In addition, programs with zero sized arrays which were compiled with -fast or -assume nozsize may behave erratically or fail to execute. HPF Compiler Release Notes 4-15 HPF Compiler Release Notes 4.10 Miscellaneous 4.10.1.1 Segmentation Faults When a program segmentation faults at runtime it can be confusing because it may look like the program executed, even though no output is produced. The PSE does not always display an error message when the return status of the executed program is non zero. In particular, if the program segmentation faults it does not display an error message, the program just stops. In this example, program "bad" gets a segmentation fault at runtime. # bad -peers 4 # To see the execution status, type this csh command (other shells require different commands): # echo $status A status of -117 indicates a segmentation fault (See Section 3.8.4). Alternatively, you can run the program in the debugger. This is better because it shows what went wrong on each peer. To do this, use the -debug command line switch. # bad -peers 4 -debug See Chapter 9 of the DIGITAL High Performance Fortran HPF and PSE Manual for more information. Note that some correct programs may segmentation fault at runtime due to lack of stack space and data space. See Section 4.10.2 for further details. 4.10.1.2 Programs that Hang If your program hangs at runtime, rerun it in the debugger. You can type /c in the debugger to get it to stop. Then look at the stack frames to determine where and why the program is hanging. Programs can hang for many reasons. Some of the more common reasons are: o Incorrect or incorrectly-spelled HPF directives o Incorrect usage of extrinsic routines o Templates not large enough o Incorrect interfaces o Missing interface blocks 4-16 HPF Compiler Release Notes HPF Compiler Release Notes 4.10 Miscellaneous o Allocatables aligned incorrectly o Arrays aligned outside of template bounds It is always best to compile, run, and debug the program without the -wsf switch first to verify program correctness. Since it is easier to debug scalar programs than parallel programs, this should always be done first. 4.10.1.3 Programs with Zero Sized Arrays Programs with zero sized arrays should not be compiled with the -fast or the -assume nozsize command line switches; see Chapter 8 in the DIGITAL High Performance Fortran HPF and PSE Manual. If you incorrectly compile this way there are several different types of behavior that might occur. The program might return an error status of -122 or -177 or 64. It might also hang (or timeout when the -timeout switch is used). Try compiling the program without these options and execute it to see if it works correctly now. If it does, there is most likely a zero-sized array in the program. 4.10.2 Stack and Data Space Usage Exceeding the available stack or data space on a processor can cause the program execution to fail. The failure takes the form of a segmentation violation, which results in an error status of -117 (See Section 3.8.4). This problem can often be corrected by increasing the stack and data space sizes or by reducing the stack and data requirements of the program. The following csh commands increase the sizes of the stack and data space up to system limits (other shells require different commands): limit stacksize unlimited limit datasize unlimited If your system limits are not sufficient, contact your system administrator, and request that maxdsiz (the data space limit) and/or maxssiz (the stack limit) be increased. 4.10.3 Non-"-wsf" main programs The ability to call parallel HPF subprograms from non- parallel (Fortran or non-Fortran) main programs, is supported in this release. For more information, see Chapter 6 of the DIGITAL High Performance Fortran HPF and PSE Manual. HPF Compiler Release Notes 4-17 HPF Compiler Release Notes 4.10 Miscellaneous 4.10.4 Use the Extended Form of HPF_ALIGNMENT Due to an anomoly in the High Performance Fortran Language Specification, the extended version of the HPF_ALIGNMENT library routine (High Performance Fortran Language Specification V.2 Section 12.2) is incompatible with the standard version (High Performance Fortran Language Specification V.2 Section 7.7). In particular, the DYNAMIC argument, which is valid only in the extended version, is not the final argument in the argument list. Because each compiler vendor must choose to implement only one version of this library routine, programs that use this routine are not portable from one compiler to another unless keywords are used for each of the optional arguments. DIGITAL chooses to supports the extended version of this library routine. 4.10.5 RAN and SECNDS Are Not PURE The intrinsic functions RAN and SECNDS are serialized (not executed in parallel). As a result, they are not PURE functions, and cannot be used within a FORALL construct or statement. 4.10.6 RANDOM_NUMBER intrinsic is serialized As noted elsewhere in the release notes, the RANDOM_NUMBER intrinsic is serialized. Use of a mapped array argument causes slow execution. The intrinsic is serialized to ensure that it always returns the same results, independent of the number of processors on which the program executes or the mapping of the array. However, if this is not critical to the correctness of a program, faster execution occurs if the intrinsic is issued in an EXTRINSIC(HPF_ LOCAL) routine. The following program shows these two ways of using RANDOM_ NUMBER. Note that the EXTRINSIC(HPF_LOCAL) routine method returns different answers when the program is compiled with the -wsf option: 4-18 HPF Compiler Release Notes HPF Compiler Release Notes 4.10 Miscellaneous real, dimension(4) :: a !hpf$ distribute (block) :: a integer ssize, seed(1:10) interface EXTRINSIC(HPF_LOCAL) subroutine local_rand(x) real, dimension(:), intent(out) :: x !hpf$ inherit :: x !hpf$ distribute *(block) :: x end subroutine local_rand end interface call random_seed(size=ssize) print *, "Seed size=", ssize call random_seed(get=seed(1:ssize)) call random_number(a) print *, a call random_seed(put=seed(1:ssize)) call local_rand(a) print *, a end EXTRINSIC(HPF_LOCAL) subroutine local_rand(x) real, dimension(:), intent(out) :: x !hpf$ inherit :: x !hpf$ distribute *(block) :: x call random_number(x) end subroutine local_rand 4.10.7 EXTRINSIC(SCALAR) Changed to EXTRINSIC(HPF_SERIAL) EXTRINSIC(SCALAR) was renamed to EXTRINSIC(HPF_SERIAL) to be compatible with Versions 1.1 and later of the High Performance Fortran Language Specification. EXTRINSIC(SCALAR) continues to be supported in this release, but may not be supported in future releases. 4.10.8 Mask Expressions Referencing Multiple FORALL Indices FORALL statements containing mask expressions referencing more than seven FORALL indices do not work properly. HPF Compiler Release Notes 4-19 HPF Compiler Release Notes 4.11 Example Programs 4.11 Example Programs The /usr/examples/hpf directory contains example Fortran programs. Most of these programs are referred to in the HPF Tutorial section of the DIGITAL High Performance Fortran HPF and PSE Manual. Others are just there to show examples of HPF code and PVM code. The provided makefile can be used to compile all these programs. o heat_example.f90 solves a heat flow distribution problem. It is referred to by the Solving Nearest Neighbor Problems section of the DIGITAL High Performance Fortran HPF and PSE Manual. o io_example.f90 implements a network striped file. It is referred to by the Network Striped Files chapter of the DIGITAL High Performance Fortran HPF and PSE Manual. This program is a good example of how to use EXTRINSIC(HPF_LOCAL) routines. o lu.f90 implements a LU Decomposition. It is referred to by the LU Decomposition chapter of the DIGITAL High Performance Fortran HPF and PSE Manual. o mandelbrot.f90 visualizes the Mandelbrot Set. It is referred to by the HPF Tutorial. This program uses the PURE attribute and non-Fortran subprograms within an HPF program. Mandelbrot also requires these files: simpleX.h, simpleX.c, and dope.h. Read the README.mandelbrot file to see how to compile and execute Mandelbrot. o pi_example.f90 calculates pi using four different Fortran 90 methods. This program contains a timing module which may be pulled out and used separately. o shallow.f90 is a optimized HPF version of the Shallow Water benchmark. o twin.f90 demonstrates DIGITAL Fortran's new non-wsf main program capability. o hpf_gexample.f is a Fortran program with explicit calls to PVM. It demonstrates some group and reduction operations in PVM. You must have PVM installed to run this program. 4-20 HPF Compiler Release Notes HPF Compiler Release Notes 4.11 Example Programs o hpf.tcl is a TK-based HPF data distribution learning aid. It illustrates the data distribution patterns represented by various data distributions, such as (BLOCK, *), (*, CYCLIC), (BLOCK, CYCLIC), etc. o fft.f90 performs a fast Fourier transform, achieving parallelism by means of EXTRINSIC(HPF_LOCAL) routines. HPF Compiler Release Notes 4-21 5 _________________________________________________________________ Comments, Problems, and Help This chapter gives you detailed instructions on how to submit comments on this product, report problems, and get help from the Customer Support Center (CSC). 5.1 Sending Digital Your Comments on This Product DIGITAL welcomes your comments on this product and on its documentation. You can send comments to us in the following ways: o Internet electronic mail: - HPF issues: pse@hpc.pko.dec.com - PVM issues: pvm@ilo.dec.com - MPI issues: mpi@ilo.dec.com o FAX: 508-493-3608 ATTN: PSE Team o A Reader's Comment Card sent to the address on the form. o A letter sent to the following address: Digital Equipment Corporation High Performance Computing Group 129 Parker Street PKO3-2/B12 Maynard, Massachusetts 01754-2195 USA o Online questionnaire form. Print or edit the questionnaire form provided near the end of these release notes. Send the form by Internet mail, FAX, or the postal service. Comments, Problems, and Help 5-1 Comments, Problems, and Help 5.2 Getting Help from DIGITAL 5.2 Getting Help from DIGITAL If you have a customer support contract and have comments or questions about DIGITAL Fortran software, you should contact DIGITAL's Customer Support Center (CSC), preferably using electronic means such as DSNlink. In the United States, customers can call the CSC at (800) 354-9000. 5.3 Readers Comments Form-Documentation Use the following form as a template for sending comments about HPF and PSE documentation. This form can be sent by Internet mail, FAX, or postal service. 5-2 Comments, Problems, and Help Comments, Problems, and Help 5.3 Readers Comments Form-Documentation --------------------------------------------------------------------- Please complete this survey and send an online version (via Internet) or a hardcopy version (via FAX or postal service) to: Internet mail: pse@hpc.pko.dec.com FAX: (508) 493-3608 Postal Service: Digital Equipment Corporation, High Performance Computing Group Documentation, PKO3-2/B12 129 Parker Street Maynard, Massachusetts 01754-2195 USA Manual Title: ______________________________________________________ Order Number: ______________________________________________________ Version: ________________________________________________ _____________________________________________________________________ _____________________________________________________________________ We welcome any comments on this manual or on any of HPF or PSE documentation. Your comments and suggestions help us improve the quality of our publications. 1. If you found any errors, please list them: Page Description ____ _______________________________________________________________ ____ _______________________________________________________________ ____ _______________________________________________________________ 2. How can we improve the content, usability, or otherwise improve our documentation set? _____________________________________________________________________ _____________________________________________________________________ _____________________________________________________________________ _____________________________________________________________________ Your Name/Title ____________________________________ Dept. ________ Company _____________________________________________ Date ________ Internet address or FAX number ______________________________________ Mailing address _____________________________________________________ Comments, Problems, and Help 5-3 Comments, Problems, and Help 5.3 Readers Comments Form-Documentation ___________________________________________ Phone _________________ --------------------------------------------------------------------- 5-4 Comments, Problems, and Help