[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: ATLAS FAQ
Camm,
>Hi Clint! Sorry I've been out of touch recently. Did you get those
>files?
Yep, I got the files; I finally had the sysadmins set your directory
so I could read it (since you pointed me at files in there, I was pretty
sure that was OK with you).
As might be expected, there were some peculiarities. First, the best
performance was given by ATL_dgemm_SSE_1x1xkb.c, not ATL_dgemm_SSE_1x4.c.
Also, ATL_dgemm_SSE_1x1xkb.c didn't work for cleanup (got wrong answer
for non-multiple of NB). Using an NB of 80 (and keeping N a multiple of
80), I was able to build a complete dgemm getting a little over 2.1Gflop
on torc19. However, that large of an NB used up too much memory, causing
swapping very early, so I dropped back to NB=56, but didn't build the
complete gemm there, since cleanup wasn't rolling.
The interesting thing is that if the 2.1Gflop holds up (as I think it
will), the P4 will overtake the Athlon on the double precision flops/$ . . .
Cheers,
Clint