Regarding AlphaStation 4/233 performance concerns

From: John, dtn 381-0378 19-Jan-1995 1151 <henning_at_i4get.enet.dec.com>
Date: Thu, 19 Jan 95 11:57:58 EST

A recent message ("SUMMARY: AlphaStation 4/233 vs. 3000/400") to this
mailing list has generated significant follow-on interest. There seems to
be an opportunity to add some additional information into the discussion.

Note: I asked Dave Sill (Alpha-OSF-managers maintainer) if it would be all
right to send the information to this mailing list. After some hesitation
(to be sure this wasn't going to be flame war), Dave approved my doing so;
and asked me to mention that I had sought his approval first. To help
ensure that this is not a flame war, I have tried to address the concerns
raised in a way which is "non-personal", and have tried to keep this
note as factual as I can.

First off, let me admit that I am mystified by the reported results. If
Anthony Baxter says that he saw a "make stage 1 of gcc" take 1750 seconds on
a DEC 3000 and 1500 seconds on an AlphaStation 200 4/233, then that's what
he saw. I believe him. But these results are definitely a mystery, and
deserve further investigation. I've sent him a set of questions that
hopefully will help to resolve the mystery.

In the meantime, though, I'd like to address some of the performance
hypotheses and concerns which have already been sent to this mailing list.


1) It was suggested that perhaps the multiuser AIM benchmarks are missing
   from the AlphaStation Performance Flash because of "selective reporting".

   There are so many benchmarks to be run on so many speed variations of so
   many platforms with constantly improving compilers, etc., that we can't
   possibly run all benchmarks on all platforms. We run (and certify) the
   key benchmarks that we believe are most appropriate for each platform.
   Because the AlphaStation 200 4/233 is a workstation, not a server, the
   AIM server suite is not anywhere near the top of our list to run here.

   Digital has more complete AIM reports than any other vendor. AIM results
   require auditing which was not done prior to the AlphaStation 200 4/233
   annoucement. We will run AIM III based on market demmand.

   Note: I am a member of the CSG Performance Group within engineering
   at Digital. We are the group that authors the Performance Flashes.


2) It was suggested that the PCI bus may be slower than the Turbochannel
   bus. This is both true, false, and probably irrelevant, all at the
   same time! Let me explain -

   The false part: the raw peak bandwidth of TurboChannel is 100 MB/sec; PCI
   (32-bit) is 132 MB/sec.

   The true part: Today's available PCI adapters tend not to make full use
   of this bandwidth, but you can expect to see them exploit more over time.
   (And future 64-bit PCI implementations will double the raw bandwidth on
   the PCI.)

   The irrelevant part: the peak speed of the TurboChannel bus or the PCI
   bus is relevant to an application only if you are attempting to crank
   through a significant fraction of peak, for example for a very complex
   multi-head graphics workload.

   But a typical disk-bound workload needs a much smaller fraction -- for
   example, 7.6% (computed as Fast SCSI's 10 MB/sec divided by PCI's peak of
   132). I would guess that "make stage 1 of gcc" is disk bound, not
   graphics bound.


3) It was suggested that the disk controller on the AlphaStation 200 4/233
   is slower than the disk controller on the DEC 3000 Model 400.

   This assertion is not correct, assuming we are talking about the on-board
   disk controllers. The 3000/400 SCSI controller ran at only 5 MB/sec; the
   on-board controller in the AlphaStation 200 4/233 is the NCR 810 SCSI
   interface, which runs at 10 MB/sec.

   On the other hand, the disks themselves MIGHT be the issue here. If the
   workload "make stage 1 of gcc" is a disk-intensive workload (for compiler
   input, compiler output, paging/swapping, and/or temporary files) it could
   make a big difference if the workload was spread across more disks on the
   DEC 3000 Model 400 than on the AlphaStation 200 4/233.


4) It was suggested that an AlphaStation is an Alpha chip mated with
   a PC chassis and PC main memory system.

   It is true that the AlphaStations provide industry-standard PCI and ISA
   slots for hardware compatibility with thousands of cards from hundreds of
   vendors. But the memory and cache subsystem is custom designed for
   Alpha, using the 21071 chip set from Digital Semiconductor. The cache
   subsystem has a 128-bit bus, compared to the 64-bit bus found on Pentium
   PCs. And the 64-bit memory bus is wider than the 32-bit memory bus found
   on many Pentium PCs.


5) It was stated that the 3000/400 main memory bus is 256 bits wide, wider
   than the Digital 2100 and the DEC 7000, and the AlphaStation is only 64
   bits.

   This is true, but there are other important considerations. First, the
   DEC 3000 Model 400 has 256 bits of width AT THE DRAM PINS. Depending on
   the configuration, the DEC 7000 can have more than 1024 pins of bus width
   at the DRAM pins. Second, the servers cycle the bus faster than the
   workstations. Third, the servers provide higher aggregate bandwidth with
   multiple CPUs by allowing multiple transactions to be in flight at the
   same time. For example, the Digital Technical Journal (Vol. 4 Number 4)
   article on the DEC 7000 has a diagram to show how 3 transactions can
   overlap on the bus, for an aggregate bandwidth of 640 Mb/sec.

   Finally, bandwidth to main memory is probably not the issue here. In the
   memory hierarchy, most workloads are most sensitive to the speed of the
   secondary cache and the size of the secondary cache. All of the above
   mentioned systems have a 128-bit interface to the secondary cache. And
   both the DEC 3000 Model 400 and the AlphaStation 200 4/233 have the same
   size secondary caches, 512KB.


6) Finally, it was suggested that "the DEC salescritter was feeding us
   marketing" when he said the new AlphaStations are faster than the 3000
   series.

   Well, we think our salescritter ^H^H^H^H^H^H^H^H^H^H^H^H esteemed
   colleague in the Sales department had it right, at least when comparing
   to older DEC 3000's such as the Model 400.

   On the other hand, it IS true that the top of the line DEC 3000 Models
   can outperform an AlphaStation 200 4/233; for example, if one looks at
   the DEC 3000 Model 700 the substantial cache advantage for the 700 (2MB
   vs 512KB) will usually mean more than the slight Mhz advantage for the
   AlphaStation 200 4/233 (233 vs. 225).

   To try to give a real-life example comparing an AlphaStation with a DEC
   3000, here's a series of 5 benchmarks done with Ansys, a Mechanical CAD
   application. You will note that an AlphaStation 400 4/233 sometimes wins
   and sometimes loses to the DEC 3000 Model 800, which is what you might
   expect given that the 4/233 has a faster clock, but the 3000/800 has a
   bigger cache and (in this case) more disk and MUCH bigger memory.

   Overall they are roughly comparable for this application.

   * simply indicates the faster entry in each line.

                              DEC 3000 AlphaStation
                              Model 800 400 4/233
                              --------- ------------
       Clock speed 200 Mhz 233 Mhz
       Bus Speed 100 MB/sec 132 MB/sec
       Secondary Cache 2MB 512KB
       Memory Size 1GB 96MB
       Disks 2 x RZ28 1 x RZ28
       Benchmark LS1 Elapsed 64 58*
                     CPU 54* 55
       Benchmark LS2 Elapsed 302* 329
                     CPU 286 281*
       Benchmark LS3 Elapsed 903* 973
                     CPU 834 823*
       Benchmark LS4 Elapsed 1928* 2146
                     CPU 1887 1872*
       Benchmark LS5 Elapsed 1852* 2079
                     CPU 1800 1762*


So where does all this leave us on the original report? As I said at the
beginning, "it's a mystery". I hope to hear back from Mr. Baxter with
details of his workload and configuration, and once we resolve it, one or
the other of us will post an update. But my current guess (without a shred
of evidence yet) is that the workload was disk bound, and was spread less
optimally across the disk spindles on the AlphaStation than on the DEC 3000
Model 400.

    John L. Henning
    CSG Performance Group
    Digital Equipment Corporation
    henning_at_i4get.enet.dec.com
Received on Thu Jan 19 1995 - 12:04:35 NZDT

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:45 NZDT