User's GuideIntroduction |Translating | Debugging
| Performance | fpx messages | fpxr messages
Chapter 4: Enhancing Performance
This chapter discusses various ways to enhance the performance of translated
executables.
Enhancing performance by repeating translation
In many cases, you
can improve translated performance by retranslating the original SunOS executable. The
first time fpx translates an executable or a shared library, it attempts to find
all the entry points. But at run time, fpxr may encounter entry points that fpx
did not detect. When this occurs, fpxr interprets the newly discovered code and
records information about each new entry point it finds. If you set the environment
variable FPXR_GENERATE_FEEDBACK when you run an exectuable, fpxr writes the
information to a feedback file, called executable.hif by default. The performance
of the translated executable improves after you retranslate with the feedback file because
fpx knows where to find the previously undiscovered code.
Also, branches and
subroutine calls whose targets may vary at run time need to call into fpxr to
locate their targets. The run-time performance of executables using this behavior can be
improved substantially by using feedback. This is especially true of programs that use X11
services and programs written in C++.
Most programs benefit
somewhat by retranslating with feedback. If, upon program exit, fpxr prints a
message like this one, you can improve performance substantially:
Total instructions
emulated: nn
Run the program again
with the FPXR_GENERATE_FEEDBACK environment variable set, and follow the instructions in
the section Creating and using a feedback file.
If you set the
FPXR_GENERATE_FEEDBACK environment variable before running a translated executable, fpxr
may print a message like the following: nn lines written to .hif file...
If such a message
does occur at run time, you can improve performance by retranslating the file.
Creating and using a feedback file
Figure 4-1
illustrates the process for creating a feedback file and retranslating the executable
until no untranslated code is detected at run time.
Figure
4-1 Using feedback files to improve performance
Correcting
for nonfinite numbers and SIGFPEs
If the translated
executable aborts at run time with messages about floating-point exceptions (SIGFPEs), or
issues a message that says the executable was performing calculations with nonfinite
numbers, you have to retranslate the program with the -full_fp option in the fpx
command line. Nonfinite numbers are infinites, denormalized numbers, and NaNs
(not-a-numbers). Using the -full_fp option enables Alpha to handle the nonfinite
calculations so the results are completion safe and conform to the IEEE standard for
nonfinite numbers, IEEE/ANSI 754-1985.
Only use -full_fp
if you receive SIGFPEs at run time. The -full_fp option exacts a performance
penalty but allows the executable to complete correctly.
Note that in many
cases, the use of a NaN or a denormalized number indicates a bug in the original program,
usually the use of an uninitialized double-precision floating-point variable. If you can,
you should check the source program for such errors and correct them.
Figure 4-2
illustrates the process for correcting for floating-point exceptions (SIGFPEs) or other
problems caused by nonfinite numbers.
Figure
4-2 Correcting for nonfinite numbers and SIGFPEs
Refer to the figure
and and follow these steps:
1 |
The
translated program aborts at run time, issues floating-point exceptions (SIGFPEs), or
issues the following fpxr message: The translated executable attempted a
floating-point calculation with a denormalized number...
To correct for
nonfinite numbers and SIGFPEs, proceed to the next step. |
2 |
Translate
the input executable with the -full_fp option. The resulting translated executable
should work properly. If you still receive the SIGFPE errors,
document the behavior and email your report to fpx-bug@amt.tay1.dec.com
so that -full_fp can be fixed. |
3 |
Run the
executable on the Alpha system. |
To improve
performance in other aspects of the executable, such as getting jump points corrected, you
should run the executable with FPXR_GENERATE_FEEDBACK and use the feedback process
described in the section Creating and using a feedback
file. Note that you must always have -full_fp set when you retranslate the
input executable.
Notes on nonfinite numbers
SPARC based code
handles some computations using nonfinite numbers transparently. For example, the code may
generate or use NaNs, infinites, or by denormalized numbers. This may happen inadvertently
during computations with uninitialized variables.
On the SPARC
architecture, such operations proceed without generating a SIGFPE either because the
hardware can handle them, or because they trap to operating system software that
transparently fixes up the results.
On the Alpha
architecture, attempts to use nonfinite numbers cause an exception to occur, as do
conditions that might produce such nonfinite numbers, such as overflow. These
floating-point traps are imprecise. That is, a trap may not be delivered until several
instructions after the faulting one was issued. The system can only recover from such
traps if the Alpha code is structured in such a way as to be "completion safe."
By default, fpx generates
code that is not completion safe, because of the performance penalty incurred otherwise.
If a program uses a nonfinite number or if it encounters an exceptional condition (such as
an overflow or a divide by zero), it generates a floating-point exception, and Digital
UNIX reports this with a SIGFPE signal. There is no way to recover from such an exception
because there is no reliable way to determine which instruction caused the exception on
Alpha.
Note that because
SPARC handles these computations transparently, you may not be aware that the program uses
nonfinite numbers in computations at all. Problems are invisible until the program fails
after translation due to SIGFPEs with imprecise fault PCs.
If an executable that
ran on SPARC fails with SIGFPE when translated to Alpha, retranslate the executable with
the fpx option -full_fp. With -full_fp enabled, all floating-point
code is generated to be completion safe. When the translated executable uses a nonfinite
number or encounters an exceptional condition, -full_fp enables the exception to be
handled by the Digital UNIX operating system, and the result is the expected value as
specified in the IEEE floating-point standard, IEEE/ANSI 754-1985. Performance is somewhat
slower, but you do not get traps.
In summary, if the
executable is translated with -full_fp, then
All computations that use nonfinite
numbers yield the results specified in IEEE/ANSI 754-1985, such as:
|
2 + Inf = Inf |
|
Inf - Inf = NaN |
|
All computations that produce nonfinite
numbers yield the results specified in IEEE/ANSI 754-1985, such as: |
For additional
information, see the ieee(1) reference page.
SPARC floating-point control negister
The fpx command
does not precisely emulate the behavior of the SPARC floating-point control register. If a
program expects some of its operations to trap, and it sets the SPARC floating-point
control register to enable one or more traps, that behavior is ignored by the translated
program and the default result is generated instead. Thus, for example, PL/I programs that
use the "ON OVERFLOW" or "ON ZERODIVIDE" statements to catch
floating-point problems find that the on-units are never entered.
Correcting unaligned double-precision floating-Point numbers
The fpx option
-F controls floating-point optimizations. If you receive an error message saying,
Unaligned access on ldt or Unaligned access on stt during run time, retranslate the
executable with the -F option and run the executable again.
The message indicates
that the original SPARC executable used double-precision floating-point numbers that are
not aligned on a natural boundary for the Alpha system. The operating system can correct
unaligned accesses at run time, but at a major performance penalty.
The -F option
generates a conservative sequence for loading and storing double-precision numbers that is
faster than allowing the operating system to fix the instruction, but is slower than the
single load or store opcode. If you do not use the -F option, then the Digital UNIX
system corrects each instance of an unaligned floating-point quadword instruction at run
time, which can be very slow.
For information about
how floating-point numbers are represented in the Alpha architecture, refer to the Alpha
Architecture Reference Manual.
Writing a feedback file
If you can identify
hidden entry points or other useful information about the input executable, you can write
a feedback file manually to give fpx the information it is unable to discover
during a translation. Hidden entry points are never called directly through SPARC jump or
branch instructions. For example, code that is invoked only by signal handlers in stripped
executables is usually a hidden entry point.
Your feedback files
must follow the naming requirements described in Creating
and using a feedback file and the format shown in Format used in feedback files in this chapter.
Format used in feedback files
The feedback file
consists of a series of property records in ASCII, one per line, each of which attaches a
property to an offset in the executable. For example:
+419a84 jalr
"+4177ac +3"
Table 4-1 describes
the components of a property record.
Table
4-1 Property Record Components
Component |
Description |
Example |
offset |
The offset is a plus sign (+)
followed by a hexadecimal number representing the address of an instruction in the
executable. |
+419a84 |
property name |
The property name is a name from
Table 4-2. |
jalr |
property value
(optional) |
The property value format
depends on the property name. |
"+4177ac+3" |
The conventions
for specifying property records are
|
Use only one property record per line. |
|
Lines beginning with a semicolon (;)
are comment records and may occur anywhere in the file. The translator ignores them. |
|
Case is significant in property names. |
|
For each line, use spaces to separate
the values of the offset, property
name, and property value. |
|
Symbol names and values that contain
spaces or special characters must be enclosed in quotation marks ("). |
Example 4-1 is
an excerpt from a sample .hif file. Use the format shown if you are
writing a feedback file.
Example
4-1 Sample feedback file
|
+419a84 jalr |
"+4177ac +3" |
|
+420acc jr |
"+421018 +75" |
|
+420acc jr |
"+437de4 +20" |
|
+4177ac jalentry |
"subr1" |
|
+421018 branchentry |
"label2" |
Properties Supported in Feedback Files
Table 4-2 describes
the properties supported in feedback files.
Table
4-2 Property names and values in feedback files
Property name |
Property Value |
Interpretation |
branchentry |
symbolic name |
Defines the offset as the target of a
branch instruction. |
dataentry |
none |
Defines the offset to be data; causes fpx
to interpret at that location |
denorm |
fn |
Indicates that there is sometimes
adenormalized number in SPARC floating register fn (where n is an even
number between 0 and 30), at execution time. The offset can be any location inside the
basic block where the problem occurs. The translator generates code to change any
denormalized value in fn to a true zero. This is usually the correct action because
denormalized numbers are not produced as the result of any floating-point computations on
an Alpha system, and therefore usually result from coding errors. |
full_fp |
length |
Enables the -full_fp behavior
for the basic blocks containing the instructions at the SPARC address through
address+length-1. Redundant if -full_fp is used at translation. Use this to avoid
translating the entire program by using -full_fp if you know exactly where the
program uses or produces nonfinite
numbers. However, be sure that you have found all of the places where the code can trap. |
jalentry |
symbolic name |
Defines the offset as the target of a
jmp instruction. |
jalr |
"+offset2 +count" |
Defines the offset as the start of a
basic block that ends with a jalr instruction that transferred to offset2 count
times during the run. There may be multiple jalr properties for the same offset. |
jr |
"+offset2 +count" |
Defines the offset as the start of a
basic block that ends with a jal instruction that transferred to offset2 count
times during the run. There may be multiple jr properties for the same offset. |
sets |
List of registers and condition codes
in the form: "%0
%1..." 1 |
Specifies resources set by routine. |
+sets |
List of registers and condition codes
in the form: "%0
%1..." 1 |
Adds resources to those specified by a
sets property name for the same address. |
to_alpha |
rn |
Indicates that the program expects that
at location offset, register rn (where n is a number between 1 and
31) contains the address of an Alpha instruction in the program, but when actually running
the translated program, rn contains the address of a SPARC instruction. This entry
causes fpx to insert code just before offset that converts the contents from
a SPARC code address to an Alpha code address. This can happen when a program places the
address of SPARC code into the return address location on the stack, then tries to
transfer control to it. |
to_sparc |
rn |
Indicates that the program expects that
at location offset, register rn (where n is a number between 1 and 31)
contains the address of a SPARC instruction in the program, but when actually running the
translated program, rn contains the address of an Alpha instruction. This entry
causes fpx to insert code just before offset that converts the contents of rn
from an Alpha code address to a SPARC code address. This can happen when the code in the
original program fetches a program counter value from the stack or from a system
structure, and then tries to decode the SPARC instruction stream at that address, or tries
to look up the address in a hard-coded table of locations. |
uses |
List of registers and condition codes
in the form: "%0
%1..." 1 |
Specifies resources used by routine. |
+use |
List of registers and condition codes
in the form: "%0 %1..." 1 |
Adds resources to those specified by a
uses property name for the same address. |
-uses |
List of registers and condition codes
in the form: "%0 %1..." 1 |
Removes resources from those specified
by a uses property name for the same address. |
1 The SPARC registers are
|
%g0 - %g7 |
%o0 - %o7 |
|
%l0 - %l7 |
%i0 - %i7 |
|
%f0- %f31 |
%psr |
|
%y |
%fsr |
An .iif or .hif file can also reference ICC_NZ,
ICC_V, ICC_C, ICC, FCC, RETURN, and M.unk.
If you have
questions about FreePort Express, send email to fpx-info@scrugs.lkg.dec.com.
|