DIGITAL logo   DIGITAL Semiconductor
  Updated: 26 September 1997
  DIGITAL Semiconductor Home
Alpha Migration Tools Home
DIGITAL FX!32
Getting Started
Download
Tested Apps
Technical Support
White Paper
Freeport Express
DECmigrate
Linux/Alpha EM86

Alpha Migration Tools Title

DIGITAL FX!32

White paper: How DIGITAL FX!32 works

DIGITAL FX!32 provides fast and transparent execution of 32-bit x86 applications on Windows NT 4.0 Alpha. FX!32 runs these applications at speeds comparable to high performance x86 platforms.

Before the introduction of DIGITAL FX!32, there were two technologies for running an application on a different architecture than the one on which it was originally compiled: emulation and binary translation. Each technology has an advantage, but also a drawback. Emulation is transparent and robust, but delivers only modest performance. Binary translation is fast, but not transparent. For the first time, DIGITAL FX!32 combines these technologies to provide both fast and transparent execution.

Because of its technical innovation, there are patents pending for the overall design of DIGITAL FX!32, as well as for several individual elements in the design. Further, BYTE Magazine signaled its enthusiasm by presenting DIGITAL FX!32 with its Technology of the Year award at the 1995 Fall Comdex.

This technical introduction describes how DIGITAL FX!32 combines the two technologies.

Components

DIGITAL FX!32 consists of three interoperating components. There is a run-time environment that provides the transparent execution, a binary translator (also called the background optimizer) that provides the high performance, and a server that coordinates them. Although DIGITAL FX!32 is transparent and does not require user intervention, it includes a graphical interface -- the DIGITAL FX!32 Manager -- for viewing status and providing management input.

Installing DIGITAL FX!32 establishes the transparent environment for running x86 Win32 applications. Code in the Windows NT operating system automatically invokes DIGITAL FX!32 when the user runs an x86 Win32 application.

Transparent execution: the runtime environment

The Windows NT operating system invokes the DIGITAL FX!32 run-time environment (called the "runtime") when the user runs an x86 Win32 application. The runtime provides transparent execution because it contains an emulator that implements the entire x86 user-mode instruction set, and because it provides the complete x86 Win32 environment.

When the application is first executed, DIGITAL FX!32 "knows" nothing about the application and runs it completely in the emulator. Successive runs of the application exchange increasingly more of the application's x86 instructions for native Alpha instructions. Eventually, little of the application is run in the emulator. Of course, the emulator must remain present to interpret those x86 instructions that, for whatever reason, cannot be translated.

The rest of the transparency is provided by full support for the Win32 environment, such as multiple threads and structured exception handling. Transparency is also provided by dynamic jacketing, which solves the multi-architecture problem. For example, the runtime fully supports the Microsoft OLE service architecture (OLE2), and supports it across both the Alpha and x86 architectures. The runtime jackets the interfaces to all OLE objects, allowing the interfaces to be called from either x86 or Alpha code. The caller of the OLE object does not need to know the object's architecture.

Alpha code provides performance -- The performance of DIGITAL FX!32 comes from executing high-speed native Alpha code. To secure high performance, the runtime transparently substitutes native Alpha code in place of x86 code whenever possible.

Taking advantage of API libraries -- The native Alpha API libraries are compiled from source code, using DIGITAL world-class optimizing compilers. The contents of these libraries are semantically equivalent and significantly faster than the corresponding API libraries for Windows NT on x86. DIGITAL FX!32 provides static jackets that allow the use of these native Alpha API libraries with x86 images. The jackets resolve the different calling conventions between the x86 image and the native Alpha API. Using these native Alpha API's provides a real performance win.

Further, the dynamic jacketing that is part of the DIGITAL FX!32 OLE implementation allows direct use of native Alpha OLE objects, such as those supplied with Windows NT Alpha or those of any OLE-enabled native Alpha application. This too provides greatly enhanced performance.

Feedback loop from the background optimizer

The runtime imports a majority of the native Alpha code it executes from a positive feedback loop that exists between it and the background optimizer.

Commercial applications typically consist of numerous executable files, called images. Some images are unique to the application and some are shared across different applications on the system. Each time the runtime loads an x86 image, it asks the server if optimized code exists for that image, to run in place of the slower x86 code. Optimized code is high-speed native Alpha code, produced by the background optimizer after previously running the image under DIGITAL FX!32.

After loading the optimized code, the runtime sets up tables in the emulator that correlate addresses between any x86 code and the optimized code. The runtime then initiates the emulator.

Embedded within the emulator is an x86 interpreter, which starts executing the application. From careful design and alignment with the Alpha architecture, the interpreter is both small and efficient. It is small enough to reside in high-speed cache memory, is optimized for the Alpha processor pipeline, and takes full advantage of the 64-bit Alpha processor registers.

As it interprets unoptimized portions of x86 images, the runtime collects and saves execution profiles for subsequent use by the background optimizer. The performance of DIGITAL FX!32 is based on this cooperation between the runtime and the background optimizer. The coordination is provided by the server.

Coordinating the process: the server

The server manages the DIGITAL FX!32 environment by coordinating the run-time environment and the background optimizer. The server acts according to DIGITAL FX!32 defaults or according to metrics that are specified in the DIGITAL FX!32 Manager. In response to those metrics, the server manages execution profiles and invokes the background optimizer.

When an x86 image is unloaded, the server looks for a new or enlarged profile. A new profile means that a previously unseen x86 image has been executed and may require optimization. An enlarged profile contains new information, indicating that the current translated image is incomplete. In either case, the server places the image and the corresponding profile on the work list for the background optimizer.

This process is repeated each time the image is run until the size of its profile stabilizes, indicating that virtually all routines in the image are optimized. At this point, running the image executes high-performance Alpha Win32 code, rather than the slower x86 code. The image runs at its highest performance.

Creating the speed: background optimization

The background optimizer produces high-speed native Alpha code from x86 code by using information that is gathered into profiles by the runtime. The native Alpha code is subsequently made available to the runtime and executed the next time the image is run. It is this coordinated process that adds high performance to the transparency of execution, and truly distinguishes DIGITAL FX!32.

Design goals

The operation and output of the background optimizer must be as transparent and robust as the runtime environment. The user must never see the operation of the background optimizer and it must always present code to the runtime that runs to correct completion. To ensure transparency, the background optimizer design allows for no assumptions, no manual initiation, and no user intervention in any question/answer cycle. Therefore, all optimization must be based on unambiguous and unimpeachable criteria.

Coupled with the stringent need for transparent and flawless operation is a requirement for the highest possible performance. It was an explicit goal for the background optimizer to generate Alpha code that runs at 70% of the performance of Alpha code that is generated by an Alpha compiler. The designers recognized that high performance required the exploitation of the full range of optimization techniques that have been developed for modern compilers.

Realization of the goals

The background optimizer guarantees transparent and robust operation by cooperating with the runtime to ensure the coherency of the x86 machine state. A coherent x86 machine state means that x86 register assignments, call/return boundaries, the x86 stack, and exception condition processing reflect faithfully what would be observed on actual x86 hardware.

Achieving the performance goals required that the designers exploit the full range of modern compiler optimization techniques. And, all modern optimization techniques are predicated on global optimization.

Perhaps the most serious flaw in previous binary translators was their limitation to the basic block or perhaps the extended basic block, as the fundamental unit of translation. All modern optimizing compilers require global optimization techniques that directly conflict with such a basic-block unit restriction. Therefore, removal of this restriction was the fundamental performance requirement. The background optimizer successfully removes this restriction by organizing carefully chosen groupings of basic blocks into significantly larger units, called translation units. Conceptually, a translation unit approximates a "routine" in a more traditional compiler. Thus, for the first time, the translation unit technology allows the full exploitation of global optimization techniques.

The background optimizer takes full advantage of the potential available from global optimization. After initial setup, it uses techniques such as in-lining to prepare code sequences for further optimization from techniques such as value propagation, common subexpression elimination, and scheduling.

By achieving its performance goals, the background optimizer is able to feed high-speed native Alpha code to the runtime, providing the largest part of the overall performance of DIGITAL FX!32.

User interface: the DIGITAL FX!32 manager

The user can view status and provide management information to the server with the DIGITAL FX!32 Manager. The Manager is available as an icon in the DIGITAL FX!32 program group.

For example, the user can specify a disk-space limit for optimized code and profiles. The server then ensures that any specified limit is not exceeded by discarding old or infrequently used optimized code and profiles. The user can protect important but seldem-used code from being discarded through easy-to-use dialog boxes.

The DIGITAL FX!32 Manager provides context-sensitive help, as well as providing access to the on-line help found in the DIGITAL FX!32 program group.

Conclusion

DIGITAL FX!32 provides fast and transparent execution of x86 Win32 applications on Windows NT Alpha because, for the first time, binary translation is coordinated with runtime emulation. Coordination between the runtime and the binary translator (the background optimizer) is provided by the FX!32 server.

The runtime provides transparent execution because it contains an emulator that implements the entire x86 user-mode instruction set and because it provides the complete x86 Win32 environment.

The performance of DIGITAL FX!32 comes from executing high-speed native Alpha code. The majority of that code is produced by a positive feedback loop that exists between the runtime and the background optimizer.The background optimizer provides high performance because it uses global optimization techniques that previously were only available to modern compilers; the background optimizer is the first translator that can use global optimization.

It is the coordination between the background optimizer and the runtime that successfully combines high performance with transparent execution and truly distinguishes DIGITAL FX!32.

pointer DIGITAL FX!32 feedback
   
Burgundy bar
DIGITAL Home Feedback Search Sitemap Subscribe Help
Legal