how is your fp performance?

From: <K.McManus_at_greenwich.ac.uk>
Date: Thu, 29 May 1997 18:19:56 +0100 (BST)

I have a small f77 code that runs a direct, indirect ordered
and an indirect disordered SAXPY that provides a good picture of
a systems floating point performance and cache usage.

The maximal MFLOP results look a bit like this...

Sparc20 100MHz HyperSparc 25.7
Ultra1 140MHz UltraSparc 94.7
Ultra1 170MHz UltraSparc 110.7
Indy2 150MHz MIPS R4400/R4010 breaks!
4100 466MHz ev56 206.5/231.4
RS6000 ? where is DTIME/ETIME??

The reason for the two results on ev56 is because the compiler
is less than sensible and changing the code outside the loops
will push down the performance by 10%. The main problem here is
that the results are less than good. This is supposedly a quad
issue chip that can perform two fp's per clock - i.e. a max
performance of 932MFLOP. So OK I would be amazed if I came close
to maximum but these results fall a disappointingly long way
from what I would like to see, which is basically something
about three times faster than this.

Can anybody tell me either...

Why are these results so poor
How can I get the promised performance?

If anybody wants to try the code I've left it at...
http://www.gre.ac.uk/~k.mcmanus/saxpy.f
All suggestions gratefully received

k.mcmanus_at_gre.ac.uk - http://www.gre.ac.uk/~k.mcmanus
-------------------------------------------------------------
Dr Kevin McManus ||
School of Computing & Math Science ||
The University of Greenwich ||
Wellington St. Woolwich ||Tel +44 (0)181 331 8719
London SE18 6PF UK ||Fax +44 (0)181 331 8665
Received on Thu May 29 1997 - 19:33:02 NZST

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:36 NZDT