I have a small f77 code that runs a direct, indirect ordered
and an indirect disordered SAXPY that provides a good picture of
a systems floating point performance and cache usage.
The maximal MFLOP results look a bit like this...
Sparc20    100MHz HyperSparc         25.7
Ultra1     140MHz UltraSparc         94.7  
Ultra1     170MHz UltraSparc         110.7
Indy2      150MHz MIPS R4400/R4010   breaks!
4100       466MHz ev56               206.5/231.4
RS6000     ?                         where is DTIME/ETIME??
The reason for the two results on ev56 is because the compiler
is less than sensible and changing the code outside the loops
will push down the performance by 10%. The main problem here is
that the results are less than good. This is supposedly a quad
issue chip that can perform two fp's per clock - i.e. a max
performance of 932MFLOP. So OK I would be amazed if I came close
to maximum but these results fall a disappointingly long way
from what I would like to see, which is basically something
about three times faster than this.
Can anybody tell me either...
Why are these results so poor
How can I get the promised performance?
If anybody wants to try the code I've left it at...
http://www.gre.ac.uk/~k.mcmanus/saxpy.f
All suggestions gratefully received
k.mcmanus_at_gre.ac.uk  -  
http://www.gre.ac.uk/~k.mcmanus
-------------------------------------------------------------
Dr Kevin McManus                     ||
School of Computing & Math Science   ||
The University of Greenwich          ||
Wellington St.  Woolwich             ||Tel +44 (0)181 331 8719 
London SE18 6PF  UK                  ||Fax +44 (0)181 331 8665
Received on Thu May 29 1997 - 19:33:02 NZST