[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
ATLAS on PIII
Hi,
Antoine forwarded some mail he'd exchanged with regarding the poor
performance you were getting on a PIII using ATLAS to the ATLAS
mailing list, atlas@cs.utk.edu. Essentially, it looked like you
were getting 68% of peak, rather than the expected 72% (off-chip L2)
or .76% (on-chip L2). I think I may have an idea what is going wrong.
The version of ATLAS on netlib does know about PIII's, and this is
bad news for the PIII's which have an on-chip cache 1/2 the size
of the PII's (which is what the present release of ATLAS thinks a PIII is).
So what you want to do is tell ATLAS to reexamine it's Level 2 cache
blocking, which is controlled CacheEdge.
To do this, go to your ATLAS/tune/blas/gemm/<arch>, and issue:
make xdfindCE
./xdfindCE
This program should spit out a bunch of output, ending in something like:
>Best CE=160KB, mflop=396.04
It's saying the best CacheEdge setting for my machine is 160KB; my guess
is yours will say 160 or something close to that. Edit your
ATLAS/include/<arch>/atlas_cacheedge.h, and you'll probably see something
like:
#ifndef ATLAS_CACHEEDGE_H
#define ATLAS_CACHEEDGE_H
#define CacheEdge 262144
#endif
Change this to:
#ifndef ATLAS_CACHEEDGE_H
#define ATLAS_CACHEEDGE_H
#define CacheEdge 163840
#endif
Notice this is with my setting of 160*1024 for CacheEdge. Now recompile
all needed files by going to ATLAS/bin/<arch>, and issuing:
make xdmmtst xsmmtst xcmmtst xzmmtst
Then, let's see if you have the predicted 600Mflop now:
./xdmmtst -F 500
Send this output, or any questions, to atlas@cs.utk.edu.
Cheers,
Clint
> From Matthias Pester <m.pester@mathematik.tu-chemnitz.de>
> We are testing a 528 node Linux-Cluster with Pentium III-800 MHz,
> 512 MB RAM each, and 100-Mbit-FastEthernet with high-performance
> switches.
> So I took the opportunity to run xdmmtst, where you guessed
> 600 Mflops. I saw
> 543,5 Mflops for N= 500,
> 536,2 Mflops for N=1000,
> 529,8 Mflops for N=1500,
> 519,0 Mflops for N=2000,
> (always a speedup 11 against the simple BLAS version)