![]() |
![]() HP OpenVMS Systemsask the wizard |
![]() |
The Question is: Analysing Exception-Informations of HPARITH We are using mathematical models written in Fortran on OpenVMS AXP. To be able to continue execution after an exception, we have implemented a Condition-Handler using LIB$ESTABLISH and LIB$REVERT. In the Condition-Handler we are checking the actual except ion (e.g. SIGNAL_ARRAY(2) .EQ. SS$_HPARITH) and if valid for continuation (in the routine we established the handler) write a specific message via SYS$GETMSG and SYS$FAOL and then do a stack rewind to establishing routine - SYS$UNWIND( %REF(MECHANISM_ARRA Y.CHF$IS_MCH_DEPTH) , ). The message we got is as follows: Message: %SYSTEM-F-HPARITH, high performance arithmetic trap,Imask=00000000, Fmask=00002000, summary=04, PC=000000000004CD80, PS=0000001B Checking Summary-Bits we found out: "Division by Zero: An attempt was made to perform a floating divide operation with a divisor of 0." Using the Linker-Map-File and the Exception-PC we found out the routine, causing the exception: Part of Linker-Map-File: Psect Name Module Name Base End Length Align Attributes ---------- ----------- ---- --- ------ ----- ---------- $CODE$ 00030000 00069327 00039328 ( 234280.) OCTA 4 PIC,CON,REL,LCL, SHR, EXE,NOWRT,NOVEC, MOD ... SCM_ENERGY_BALANCE 0004C7E0 0004DDDB 000015FC ( 5628.) OCTA 4 We now want do know the statements causing the exception (as given by the Traceback-Handler) or the variable (by using the floating register write mask as shown below). Assuming 64-Bit Adresses we can find out the Offset to Base-Adress of routine found and then calculate a PC-Offset to CODE-Base-Adress - but this seems not to be exact (because of compiler generating unknown machine code ?). We assume, that Exception is g iven via TETA = 0.0 Part of List-File: 10931 TETA 10932 > = RHO*CP_TOT (ACT_STRIP, I_LEN, I_THICK) 10933 > + RHO*(H_AUSTENITE(ACT_STRIP, I_LEN, I_THICK)-H_FERRITE(ACT_STRIP, I_LEN, I_THICK)) 10934 > * DP_DTEMP_TOT (ACT_STRIP, I_LEN, I_THICK) 10936 TETA_L = LAMBDA(ACT_STRIP, I_LEN, I_THICK)/TETA 10937 TETA_V = RHO*CP_TOT (ACT_STRIP, I_LEN, I_THICK)/TETA 10938 TETA_P = RHO*( H_AUSTENITE (ACT_STRIP, I_LEN, I_THICK) 10939 > - H_FERRITE (ACT_STRIP, I_LEN, I_THICK))/TETA If then using Register adresses as shown from List-File Address Type Name ** R*4 TETA REG-00000023 R*4 TETA_L REG-00000024 R*4 TETA_V REG-00000026 R*4 TETA_P we are NOT able to see a relation to floating register write mask. The question now is: - Is the way, we are trying to find out variable or statement causing the exception correct ? - Why are we not able to do the last step ? - Is there any other way to find out variable or statement causing the exception ? Thanks in advance Herbert PS Actually we are switching off LIB$ESTABLISH and LIB$REVERT via Mail-Message, to get a Traceback in case of an exception (but the program then ends). The Answer is : On Alpha, the processing of arithmetic exceptions is delayed for performance reasons, so the exception PC does not directly identify which instruction caused the exception. This is why additional information is captured. In this example, "Fmask=00002000" means that the divide instruction that incurred the exception wrote its result into register F13. If you wish to track this manually, you should compile your code with /LIST/MACHINE_CODE qualifiers to determine the actual sequence of instructions generated. You are already following the right procedure to find where in the code to look, but you now need to look at the instructions executed prior to the place where the exception was reported. The exception for a divide might be delayed quite a few cycles, depending on which Alpha model you have, so you might have to examine instructions for some distance prior to the exception. Look specifically for a "DIVx Fa,Fb,Fc" instruction where Fc is F13. When that instruction was executed, Fb contained zero. (It does indeed seem likely that Fb represents the variable TETA.) If you wish finer granularity on exceptions, the Alpha architecture requires you use a construct known as a trap barrier (TRAPB) or an exception barrier (EXCB). Particularly should you need to specifically identify a failing instruction. On Alpha, the floating point traps can be delivered at any time up to the next TRAPB (or CALL_PAL, which implicitly includes TRAPB) operation -- and thus the exception is usually only effectively identified within the program unit. If you deem it necessary and appropriate, you can explicitly request the compiler option /SYNCHRONOUS_EXCEPTIONS, and thus cause the compiler to insert TRAPB instructions. The presence of the TRAPB instructions will ensure that any arithmetic exception to be delivered immediately after the instruction that caused it. Use of this technique will reduce the performance of your application program, however. For details on traps, exceptions, and on floating point, you will want to acquire the Alpha Architecture Reference Manual. (Copies of this manual and of hardware-related documentation are available for downloading, please see the OpenVMS FAQ for pointers.)
|