Hi,
I have a colleague who wants to spend about $50k on a compute machine. He
has developed a FORTRAN code that is numerically intensive and a few of us
have benchmarked a scaled down version of his code on our favorite machines.
I've run the benchmark on a 4/233 Alpha which we own. The Sun and IBM camps
have been able to get access to an Ultra and a 590, respectively, to run the
benchmark. I've tried our local DEC sales rep to arrange time on a 5/500
machine to run the benchmark, but so far zip! I've also tried emailing the
high-performance center at Maynard to get help in tuning the code. Zip, too!
Currently, the top performer is the IBM 590. It runs the code in 54 sec.
On a IBM 595, the code comes in at 25sec, 2sec slower than a Cray C90!
The Sun Ultra turns in 108sec, while our 4/233 is at 185 sec. The
production code is expected to run for weeks; hence the need for a
compute machine. I've been working on tuning the code myself and was able
to shave 30 sec off, ie, the 4/233's best time is 156sec.
My time frame has pretty much run out. We need to decide next week. I'm
totally disgusted with the lack of help from DEC. I guess that's mainly
why I'm writing this; commiserating with fellow DEC victims and at the same
time, hoping for a miracle. If anyone has advice on how to shake these
people up, I'd gladly listen.
Actually, there is a second question: The code uses the sqrt and divide
intensively in a loop. The IBM 590's killer performance comes primarily
from a vector reciprocal sqrt function. Initially, its time was comparable
to the 4/233 at 186sec, but the vrsqrt() function shaved the time down to 73
sec. Recoding the loop shaved it finally down to 54sec. In addition, the
590 has 2 floating point units. I was wondering what the EV5 has; how many
floating point units, instruction issues, etc.? What kind of performance
improvement can I expect from an EV5 compared to an EV4? I've read in the
Alpha architecture book that that division cannot be pipelined. What does
it mean when division cannot be pipelined? Does it mean no other floating
point operations can take place till division finishes, or that no division
operation may take place till the previous division finishes? Is this still
true for the EV5? Are there architectural improvements to the EV5 that
might make it competitive with the RS6000? My colleauge is interested only
in numerical performance for his code even if it means rolling our own
vector reciprocal sqrt function in assembler. Of course, this is only worth
doing if the architecture supports it. Incidentally, I've tried the vsqrt
and vrecip functions from the DXML library. There is no improvement!
Received on Wed Nov 13 1996 - 23:05:38 NZDT