Hello and thanks to all responses.
The answer lies in loop unrolling a la KAP. Nick Hill at
Rutherford Labs has supplied me with a KAP version of the
SAXPY that produces a fabulous 960MFLOPS. I have left this
code for you all to try at...
http://www.gre.ac.uk/~k.mcmanus/saxpy.kap.f
and the original
http://www.gre.ac.uk/~k.mcmanus/saxpy.f
is still there for comparison.
A point of interest is that the KAP version doubles the FLOP
rate on Sun machines.
This raises some intriguing questions.....
1 How come the FLOP rate is more than twice the clock rate
of 466MHz??
2 Why can the compiler not manage this elementary transform??
3 Is this a conspiracy to raise royalty for Kuck??
4 Has anybody compared this rather poor compiler performance
against the impressive SGI v6 compiler??
5 Why did DEC not tell me before buying Half a million
bucks of kit that without KAP it would run like a three
legged dog??
Answers on an email please to
k.mcmanus_at_gre.ac.uk -
http://www.gre.ac.uk/~k.mcmanus
-------------------------------------------------------------
Dr Kevin McManus ||
School of Computing & Math Science ||
The University of Greenwich ||
Wellington St. Woolwich ||Tel +44 (0)181 331 8719
London SE18 6PF UK ||Fax +44 (0)181 331 8665
Received on Fri May 30 1997 - 18:09:24 NZST