Creating the speed: background
optimization The background optimizer produces
high-speed native Alpha code from x86 code by using information that is gathered
into profiles by the runtime. The native Alpha code is subsequently made available to the
runtime and executed the next time the image is run. It is this coordinated process that
adds high performance to the transparency of execution, and truly distinguishes DIGITAL
FX!32.
Design
goals
The
operation and output of the background optimizer must be as transparent and robust as the
runtime environment. The user must never see the operation of the background optimizer and
it must always present code to the runtime that runs to correct completion. To ensure
transparency, the background optimizer design allows for no assumptions, no manual
initiation, and no user intervention in any question/answer cycle. Therefore, all
optimization must be based on unambiguous and unimpeachable criteria.
Coupled
with the stringent need for transparent and flawless operation is a requirement for the
highest possible performance. It was an explicit goal for the background optimizer to
generate Alpha code that runs at 70% of the performance of Alpha code that is generated by
an Alpha compiler. The designers recognized that high performance required the
exploitation of the full range of optimization techniques that have been developed for
modern compilers.
Realization
of the goals
The
background optimizer guarantees transparent and robust operation by cooperating with the
runtime to ensure the coherency of the x86 machine state. A coherent x86
machine state means that x86 register assignments, call/return boundaries, the x86
stack, and exception condition processing reflect faithfully what would be observed on
actual x86 hardware.
Achieving
the performance goals required that the designers exploit the full range of modern
compiler optimization techniques. And, all modern optimization techniques are predicated
on global optimization.
Perhaps
the most serious flaw in previous binary translators was their limitation to the basic
block or perhaps the extended basic block, as the fundamental unit of translation. All
modern optimizing compilers require global optimization techniques that directly conflict
with such a basic-block unit restriction. Therefore, removal of this restriction was the
fundamental performance requirement. The background optimizer successfully removes this
restriction by organizing carefully chosen groupings of basic blocks into significantly
larger units, called translation units. Conceptually, a translation unit approximates a
"routine" in a more traditional compiler. Thus, for the first time, the
translation unit technology allows the full exploitation of global optimization
techniques.
The
background optimizer takes full advantage of the potential available from global
optimization. After initial setup, it uses techniques such as in-lining to prepare code
sequences for further optimization from techniques such as value propagation, common
subexpression elimination, and scheduling.
By
achieving its performance goals, the background optimizer is able to feed high-speed
native Alpha code to the runtime, providing the largest part of the overall performance of
DIGITAL FX!32.
User
interface: the DIGITAL FX!32 manager
The
user can view status and provide management information to the server with the DIGITAL
FX!32 Manager. The Manager is available as an icon in the DIGITAL FX!32 program group.
For
example, the user can specify a disk-space limit for optimized code and profiles. The
server then ensures that any specified limit is not exceeded by discarding old or
infrequently used optimized code and profiles. The user can protect important but
seldem-used code from being discarded through easy-to-use dialog boxes.
The
DIGITAL FX!32 Manager provides context-sensitive help, as well as providing access to the
on-line help found in the DIGITAL FX!32 program group.
Conclusion
DIGITAL
FX!32 provides fast and transparent execution of x86 Win32 applications on Windows
NT Alpha because, for the first time, binary translation is coordinated with runtime
emulation. Coordination between the runtime and the binary translator (the background
optimizer) is provided by the FX!32 server.
The
runtime provides transparent execution because it contains an emulator that implements the
entire x86 user-mode instruction set and because it provides the complete x86
Win32 environment.
The
performance of DIGITAL FX!32 comes from executing high-speed native Alpha code. The
majority of that code is produced by a positive feedback loop that exists between the
runtime and the background optimizer.The background optimizer provides high performance
because it uses global optimization techniques that previously were only available to
modern compilers; the background optimizer is the first translator that can use global
optimization.
It is
the coordination between the background optimizer and the runtime that successfully
combines high performance with transparent execution and truly distinguishes DIGITAL
FX!32.
|