[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
efficient summing of vector.
Hi Camm, master in the ways of intel-assembly.
You once wrote that you shaved an instruction of the way I sum a sse
register. I use a sequence like this to sum the register in #reg using
xmm7 as scratch, and it seems like a clumsy way to do it. How can it be
done in 4 instructions?
__asm__ __volatile__ ("movhlps " #reg ", %%xmm7\n"\
"addps " #reg ", %%xmm7\n"\
"movaps %%xmm7, " #reg "\n"\
"shufps $1, " #reg ", %%xmm7\n"\
"addss %%xmm7, " #reg "\n"\
Hope you can help me,
Cheers,
Peter