I have a small f77 code that runs a direct, indirect ordered
and an indirect disordered SAXPY that provides a good picture of
a systems floating point performance and cache usage.
The maximal MFLOP results look a bit like this...
Sparc20 100MHz HyperSparc 25.7
Ultra1 140MHz UltraSparc 94.7
Ultra1 170MHz UltraSparc 110.7
Indy2 150MHz MIPS R4400/R4010 breaks!
4100 466MHz ev56 206.5/231.4
RS6000 ? where is DTIME/ETIME??
The reason for the two results on ev56 is because the compiler
is less than sensible and changing the code outside the loops
will push down the performance by 10%. The main problem here is
that the results are less than good. This is supposedly a quad
issue chip that can perform two fp's per clock - i.e. a max
performance of 932MFLOP. So OK I would be amazed if I came close
to maximum but these results fall a disappointingly long way
from what I would like to see, which is basically something
about three times faster than this.
Can anybody tell me either...
Why are these results so poor
How can I get the promised performance?
If anybody wants to try the code I've left it at...
http://www.gre.ac.uk/~k.mcmanus/saxpy.f
All suggestions gratefully received
k.mcmanus_at_gre.ac.uk -
http://www.gre.ac.uk/~k.mcmanus
-------------------------------------------------------------
Dr Kevin McManus ||
School of Computing & Math Science ||
The University of Greenwich ||
Wellington St. Woolwich ||Tel +44 (0)181 331 8719
London SE18 6PF UK ||Fax +44 (0)181 331 8665
Received on Thu May 29 1997 - 19:33:02 NZST