[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: developer release 3.1.2
Greetings!
I've finished a complex level2 SSE for atlas. It can be found at
http://master.debian.org/~camm/atlas_complex_level2.tgz
All seems to work well, with the exception of the inlining issue I
mentioned earlier. I have a work around now that has been successfully
tested with gcc. The info pages for gcc describe several conditions
under which the compiler will not inline a function, one of which is
the presence of a nested function declaration. I've defined such a
dummy declaration via a cpp macro called NO_INLINE, and used it in the
functions I don't want inlined. If the compiler isn't gcc, NO_INLINE
is empty.
This whole procedure may not be necessary if I rebuild the whole tree
with the same compilation settings, but I haven't tested that yet. I
also have no way of testing other compilers. Any other suggestions
most appreciated.
Rough timings on a PIII 450Mhz:
SSE Standard ATLAS
cgemvT: 400 MFLOPS 160 MFLOPS
cgemvN: 380 MFLOPS 190 MFLOPS
cger: 200 MFLOPS 100 MFLOPS
Take care,
R Clint Whaley <rwhaley@cs.utk.edu> writes:
> Guys,
>
> The new developer release is out, and available from the usual site. This
> one includes Camm's SSE-enabled SGER and SGEMV, plus various upgrades
> in config and mvsearch to support it. Also, fixes the reported errors
> in Level 1 C blas, and linking problems with Level 2 packed BLAS.
>
> Camm, let me know if what I've done with your stuff is OK or not.
>
> Thanks,
> Clint
>
>
--
Camm Maguire camm@enhanced.com
==========================================================================
"The earth is but one country, and mankind its citizens." -- Baha'u'llah