[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: updated P4 timings
Greetings! Looks like the new binutils work. Just a straightforward
(dumb) port of the SSE1 sgemm to an SSE2 dgemm gives the following on
torc19 as a function of nb:
4 483.36
8 937.42
12 1214.86
16 1573.87
20 1752.28
24 1857.47
28 1933.42
32 1895.4
36 1752.19
40 2061.65
44 1903.98
48 2061.43
52 2019.29
56 2158.68
60 2062.08
64 2160.4
68 1891.99
72 2158.44
76 2109.31
80 2159.82
84 1959.44
Take care,
R Clint Whaley <rwhaley@cs.utk.edu> writes:
> Camm,
>
> >Greetings! Just an update on the P4 SSE2. Downloaded the intel specs
> >today, and it seems as though all the instructions are the same with
> >the trailing 's' replaced by a 'd', i.e. addps -> addpd, etc. Anyway,
> >it seems as though the assembler has not yet caugt up with this:
> >Guess we have to wait for a new assembler update.
>
> First, thanks for scoping this out. Second, I went to the binutils directory
> on www.gnu.org, and found some comments indicating the newest stuff has
> support for SSE2. However, I couldn't figure out much more than that. So,
> I grabbed last night's snapshot, and installed it on torc19. If you put
> /home/rwhaley/local/P4/bin as the first entry in your path, I think gcc
> will use the one I installed. Can you see if that guy will compile your
> routine? If not, maybe you can post a very simple SSE2 file, so we can
> iterate until we get an assembler and/or gcc that can handle the SSE2
> stuff, without having to have you test each one . . .
>
> Thanks,
> Clint
>
>
--
Camm Maguire camm@enhanced.com
==========================================================================
"The earth is but one country, and mankind its citizens." -- Baha'u'llah