 |
Introduction
Since 1994, Stromasys has been developing system emulators,
starting with PDP-11 emulation. Now known as “cross-platform virtualization,”
these system emulators provide a complete software model of a legacy hardware
system, running on a modern server. This technology allows the users to keep
their original operating system and applications running in an unmodified form,
while gaining performance improvements, increased storage capacity, and
increased reliability.
When we built the first system emulators, we had no idea
that the market would demand for these solutions. Nor, did we realize the
efforts required to develop commercial quality emulator products, particularly
the importance of a comprehensive product testing strategy. This article gives
an insight to the development of testing technologies and procedures that we
created to ensure reliable replacement products with superior performance.
Designing and Testing Virtual Systems
In principle, creating a software abstraction layer of a
hardware computer component is straightforward. A subsystem, for instance a CPU,
is an implementation of sequential logic. The functionality is well defined and
can be described in a high-level programming language. In modern CPU design,
creating a software model before creating the desired hardware is a common
practice, and we repeat this process for all the relevant subsystems.
If the emulator architecture follows the hardware
architecture closely, the result is a library of emulated subsystem components,
each representing a hardware subsystem in the form of a code module. In the case
of the first generation PDP-11 emulator, we designed approximately 130 modules
written in C++, representing the various CPU modes, memory management, disk and
tape controllers, various serial and parallel I/O options, Ethernet controllers,
and so on.
Given the resemblance between the original hardware and the
emulated components, the obvious choice for testing the emulator was using the
original hardware diagnostics. Having few other tools at our disposal at that
time, we pushed the component emulation to a level where the original hardware
diagnostics would even show the revision numbers of the hardware components that
were emulated. This unnecessary complexity resulted in a slow implementation
compounded by running with the virtual system modules in synchronous mode to
avoid timing problems. However, in dealing with PDP-11 and slow VAX models, this
got us a correctly working emulator with the existing diagnostics.
However, we cannot always use the existing hardware
diagnostics. An emulated tape drive, for instance, has a tape positioning time
that is close to zero. The hardware diagnostics find that hard to accept, and
even on a correct implementation, software might malfunction. Putting NOOP’s in
drivers to make them work correctly will create havoc in a faster virtual
interface, as it would with faster hardware.
In the development cycle of a virtual system, booting an operating system on it
is a milestone that indicates that a minimum of virtual hardware is working
correctly. Interestingly, most operating systems are not very useful by
themselves to debug a virtual system. When we could boot VMS on our MicroVAX
emulator, later tests revealed that about 85% of the emulated VAX instructions
had errors in some modes. A surprise was that without floating point
instructions VMS would not boot. Unfortunately, floating point instructions for
VAX are complex to implement correctly on an x86 architecture. Nevertheless, the
ability to run the original OS is an important step, since it greatly
facilitates the development and execution of custom diagnostic tools.
Before the complexity of our virtual system products drove
us to the development of custom diagnostic tools, and from there to a formal
test process. We found that implementation of an emulator on more than one host
platform helps error discovery. From the same source code, errors typically
appear in different places, or only in one implementation, and are easier to
track down. For example, in an industrial application on a virtual PDP-11 hosted
on Windows NT, using Alpha hardware the menu
items were correctly displayed in white text. On an x86 platform, some menu
items appeared in red, because the x 86 C++ compilers generated an error that
propagated in a rarely used FORTRAN call.
Introduction of Formal Tools
In the development of our CHARON-VAX emulators, we used a tool called AXE,
which we obtained from Compaq. AXE was the verification tool for the VAX
hardware development groups in Digital:
- Locate bugs in the hardware during development
- Verify that we meet the requirements of DEC STD 032, the
functional requirements for a VAX
hardware system.
AXE creates test cases composed of a VAX CPU instruction with operand
specifiers appropriate for that instruction. After execution, AXE compares the
relevant state with a “known good” result and reports the differences to the
user. The individual test cases are generated on the fly, creating billions of
unique sequences. These variations cover essentially all aspects of a VAX CPU
and memory management unit: opcode, opcode or operand placement with respect to
page boundaries, (stack) page protection, translation buffer validity, overlap
of string operands, page faulting, etc. AXE also
covers testing of exception handling like reserved opcodes, arithmetic traps
addressing modes, and invalid translations.
The final AXE test before Digital released VAX hardware was a comparison
against a reference database. The green light was given if a test of ten million
cases of each user mode instruction (in all access modes), and 5000 cases of
each privileged instruction mode were passed without a single error. AXE has
been valuable in the development of our virtual VAX systems, and we base the
first phase of our virtual Alpha product readiness tests on a similar concept,
using proprietary tools. Such testing is very time-consuming. We passed our
virtual VAX 3100 AXE readiness test in the year 2002
in a certification sequence that lasted more than three weeks on a 500 MHz
Celeron processor.
Both for “hardware” and “virtual hardware” developers, a good working CPU is
only one of the many components to integrate and test. An advantage is that many
hardware components have standard functions such as SCSI controllers, UARTS,
disk drives, Ethernet controllers, and so on. These components are either fully
tested by the supplier or can be tested individually while connected to a good
system. Virtual system developers do not have that luxury. To create a complete
hardware abstraction layer of a new system, we must emulate such components as
well, in order to create a portable product. As we learned over the years, this
makes the reuse of known good virtual components extremely important. If we make
the elements of a virtual component library usable on different virtual systems,
saving development time is far from the most important benefit. The repeated
integration testing of such components in multiple designs leads to a
significant increase in reliability. Provided, that strict code management,
re-utilization, and maintenance is enforced. After 10 years, and more than 40
PDP-11, VAX, and Alpha models, the investment pays off in the form of a rich set
of reliable virtual components. As an example, a virtual Sparc Workstation
prototype reuses 70% of the existing virtual VAX components, allowing fast
development and less testing effort.
The Current Virtual Alpha System Test Process
As we pushed the technology further, in particular to develop fast, virtual
SMP systems (for instance a virtual 16-CPU CHARON-AXP system), the processes we
used to ensure the reliability of a single CPU virtual VAX were insufficient. We
had to change the virtualization technology in several fundamental ways:
- High speed virtual CPUs require complex virtualization algorithms to
obtain the desired performance, typically involving more than one host CPU
per virtual CPU. Just the drive for performance required years of research,
as we wanted to reach two goals at the same time: achieve portability of the
acceleration code (for reuse with other architectures) and match the clock
rate with more modern CPUs.
- It is no longer feasible to run the components of a virtual system in
synchronous mode when virtual systems get more complex. The interrupt
latency suffers badly and interferes with proper operation. It also prevents
effective use of multi-core host systems.
- An important constraint for virtual SMP systems is that their CPUs must
clock at exactly the same frequency. This is easy to implement in a hardware
system, but requires a very sophisticated synchronization mechanism in the
virtual environment. Testing this is implicit, as an operating system like
Tru64 shuts down any CPU that has a perceived clock rate that is even
slightly different from the primary one.
An advanced virtual SMP system design requires three kinds of tests:
- Functionality tests to ensure that the binary compatibility with its
hardware CPU equivalent is uncompromised. In particular, we must test
mathematical operations extensively to assure correct overflow and underflow
in floating point formats. For virtual Alpha systems, we developed a suite
of proprietary tests (similar to AXE) that we calibrated on a hardware Alpha
of the same generation as the virtual implementation. These tests include
verification that self-modifying code works correctly (there is a myth that
emulators would not be able to
handle such code). We, typically, run such tests every night when we are
working on code optimization of the virtual CPU implementation.
- Performance tests to ensure that the ongoing CPU optimizations have the
desired effect. The optimization algorithms are dependent on binary code
structures; an automated test procedure submits 100’s of binary Alpha code
segments representing many different types of user applications. A weighted
average is calculated with a check that no type of use is significantly
degraded. We aim at a small but overall improvement with every release.
- System integration and Quality control tests, which have become the
predominant tests in terms of calendar time before every product release. As
we extend our virtual products with new components, for instance, FDDI or
Fibre channel support, we must scrutinize the impact on the rest of the
system in even more depth than for the original hardware, where the
influence between the individual components was limited. Furthermore, our
products can run
on many different host systems. Proper operation on a representative set of
host systems is an essential factor in quality verification.
Organizational Impact
During the development of larger VAX systems in 2004-2005, the time needed to
test our virtual systems started to limit our product release throughput, in
spite of having a dedicated full-time test engineer in addition to the usual
testing process in engineering. Splitting the testing formally in subsets (used
during development) and creating a formal “product readiness test” (PRT)
improved the situation somewhat. Products that pass a PRT are initially coded in
their license as Field test capable. The field test program aims at collecting
20 user months of field test experience, for instance, five users during
4 months, but commercial pressure can reduce this. Field tests are a great way
to verify the quality of the documentation. During the field test period, minor
product problems are fixed and documentation is updated. This process is
iterative, and a typical product release goes through a PRT three times before
its official release as a product. The PRT for the Alpha platform contains more
than 200 individual tests, and takes about 2 to 3 weeks. The number of tests is
rising as we add functionality to
our products.
Engineering groups tend not to be the best judges of the readiness of their
products. In light of this, the solution was to create a separate Quality
Control group in our organization, also responsible for measuring and publishing
formal performance numbers and reviewing product documentation.
Conclusion
We aimed at giving some insight into the process of creating and testing a
reliable virtual computer system. Successful hardware emulation depends not only
on good design, but also on rigorous testing and quality control procedures. The
technologies we have developed and the understanding we have built of the
testing processes to create a reliable virtual system are universal, making
development of other virtual legacy systems easier. Our newest development
center in Shenzhen, China, is now
expanding to take these testing technologies outside of our historic focus on
DECs PDP-11, VAX, and Alpha systems.
For More Information
To find out more about Stromasys and our CHARON-VAX and CHARON-AXP
virtualization software products, please visit www.stromasys.com.
|