A recent message ("SUMMARY: AlphaStation 4/233 vs. 3000/400") to this
mailing list has generated significant follow-on interest. There seems to
be an opportunity to add some additional information into the discussion.
Note: I asked Dave Sill (Alpha-OSF-managers maintainer) if it would be all
right to send the information to this mailing list. After some hesitation
(to be sure this wasn't going to be flame war), Dave approved my doing so;
and asked me to mention that I had sought his approval first. To help
ensure that this is not a flame war, I have tried to address the concerns
raised in a way which is "non-personal", and have tried to keep this
response as factual as I can.
First off, let me admit that I am mystified by the reported results. If
Anthony Baxter says that he saw a "make stage 1 of gcc" take 1750 seconds on
a DEC 3000 and 1500 seconds on an AlphaStation 200 4/233, then that's what
he saw. I believe him. But these results are definitely a mystery, and
deserve further investigation. I've sent him a set of questions that
hopefully will help to resolve the mystery.
In the meantime, though, I'd like to address some of the performance
hypotheses and concerns which have already been sent to this mailing list.
1) It was suggested that perhaps the multiuser AIM benchmarks are missing
from the AlphaStation Performance Flash because of "selective reporting".
There are so many benchmarks to be run on so many speed variations of so
many platforms with constantly improving compilers, etc., that we can't
possibly run all benchmarks on all platforms. We run (and certify) the
key benchmarks that we believe are most appropriate for each platform.
Because the AlphaStation 200 4/233 is a workstation, not a server, the
AIM server suite is not anywhere near the top of our list to run here.
Digital has more complete AIM reports than any other vendor. AIM results
require auditing which was not done prior to the AlphaStation 200 4/233
annoucement. We will run AIM III based on market demmand.
Note: I am a member of the CSG Performance Group within engineering
at Digital. We are the group that authors the Performance Flashes.
2) It was suggested that the PCI bus may be slower than the Turbochannel
bus. This is both true, false, and probably irrelevant, all at the
same time! Let me explain -
The false part: the raw peak bandwidth of TurboChannel is 100 MB/sec; PCI
(32-bit) is 132 MB/sec.
The true part: Today's available PCI adapters tend not to make full use
of this bandwidth, but you can expect to see them exploit more over time.
(And future 64-bit PCI implementations will double the raw bandwidth on
the PCI.)
The irrelevant part: the peak speed of the TurboChannel bus or the PCI
bus is relevant to an application only if you are attempting to crank
through a significant fraction of peak, for example for a very complex
multi-head graphics workload.
But a typical disk-bound workload needs a much smaller fraction -- for
example, 7.6% (computed as Fast SCSI's 10 MB/sec divided by PCI's peak of
132). I would guess that "make stage 1 of gcc" is disk bound, not
graphics bound.
3) It was suggested that the disk controller on the AlphaStation 200 4/233
is slower than the disk controller on the DEC 3000 Model 400.
False, assuming we are talking about the on-board disk controllers. The
3000/400 SCSI controller ran at only 5 MB/sec; the on-board controller in
the AlphaStation 200 4/233 is the NCR 810 SCSI interface, which runs at
10 MB/sec.
On the other hand, the disks themselves MIGHT be the issue here. If the
workload "make stage 1 of gcc" is a disk-intensive workload (for compiler
input, compiler output, paging/swapping, and/or temporary files) it could
make a big difference if the workload was spread across more disks on the
DEC 3000 Model 400 than on the AlphaStation 200 4/233.
4) It was suggested that an AlphaStation is an Alpha chip mated with
a PC chassis and PC main memory system.
It is true that the AlphaStations provide industry-standard PCI and ISA
slots for hardware compatibility with thousands of cards from hundreds of
vendors. But the memory and cache subsystem is custom designed for
Alpha, using the 21071 chip set from Digital Semiconductor. The cache
subsystem has a 128-bit bus, compared to the 64-bit bus found on Pentium
PCs. And the 64-bit memory bus is wider than the 32-bit memory bus found
on many Pentium PCs.
5) It was stated that the 3000/400 main memory bus is 256 bits wide, wider
than the Digital 2100 and the DEC 7000, and the AlphaStation is only 64
bits.
This is true, but leaves out some important considerations. First, the
DEC 3000 Model 400 has 256 bits of width AT THE DRAM PINS. Depending on
the configuration, the DEC 7000 can have more than 1024 pins of bus width
at the DRAM pins. Second, the servers cycle the bus faster than the
workstations. Third, the servers provide higher aggregate bandwidth with
multiple CPUs by allowing multiple transactions to be in flight at the
same time. For example, the Digital Technical Journal (Vol. 4 Number 4)
article on the DEC 7000 has a diagram to show how 3 transactions can
overlap on the bus, for an aggregate bandwidth of 640 Mb/sec.
Finally, memory bandwidth is probably not the issue here. In the memory
hierarchy, most workloads are most sensitive to the speed of the
secondary cache and the size of the secondary cache. All of the above
mentioned systems have a 128-bit interface to the secondary cache. And
both the DEC 3000 Model 400 and the AlphaStation 200 4/233 have the same
size secondary caches, 512KB.
6) Finally, it was suggested that "the DEC salescritter was feeding us
marketing" when he said the new AlphaStations are faster than the 3000
series.
Well, we think our salescritter ^H^H^H^H^H^H^H^H^H^H^H^H esteemed
colleague in the Sales department had it right, at least when comparing
to older DEC 3000's such as the Model 400.
It is true that the top of the line DEC 3000 Models may outperform an
AlphaStation 200 4/233; for example, if one looks at the DEC 3000 Model
700 the substantial cache advantage for the 700 (2MB vs 512KB) will
usually mean more than the slight Mhz advantage for the AlphaStation 200
4/233 (233 vs. 225).
To try to give a real-life example, here's a series of 5 benchmarks done
with Ansys, a Mechanical CAD application. You will note that an
AlphaStation 400 4/233 sometimes wins and sometimes loses to the DEC 3000
Model 800, which is what you might expect given that the 4/233 has a
faster clock, but the 3000/800 has a bigger cache and (in this case) more
disk and MUCH bigger memory. Overall they are roughly comparable for
this application.
* simply indicates the faster entry in each line.
DEC 3000 AlphaStation
Model 800 400 4/233
--------- ------------
Clock speed 200 Mhz 233 Mhz
Bus Speed 100 MB/sec 132 MB/sec
Secondary Cache 2MB 512KB
Memory Size 1GB 96MB
Disks 2 x RZ28 1 x RZ28
Benchmark LS1 Elapsed 64 58*
CPU 54* 55
Benchmark LS2 Elapsed 302* 329
CPU 286 281*
Benchmark LS3 Elapsed 903* 973
CPU 834 823*
Benchmark LS4 Elapsed 1928* 2146
CPU 1887 1872*
Benchmark LS5 Elapsed 1852* 2079
CPU 1800 1762*
So where does all this leave us on the original report? As I said at the
beginning, "it's a mystery". I hope to hear back from Mr. Baxter soon with
details of his workload and configuration, and once we resolve it, one or
the other of us will post an update. But my current guess (without a shred
of evidence yet) is that the workload was disk bound, and was spread less
optimally across the disk spindles on the AlphaStation than on the DEC 3000
Model 400.
John L. Henning
CSG Performance Group
Digital Equipment Corporation
henning_at_i4get.enet.dec.com
Received on Thu Jan 19 1995 - 09:03:01 NZDT