[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
error in M cleanup
Camm,
The good news is that using your new SSE2 stuff I'm now getting a complete
DGEMM (not just mmcase) of roughly 2Gflop. The bad news is that it still
doesn't always get the right answer. In particular there appears to be
an error in the M cleanup. For any i such that M = 2 + 4i, it produces
the wrong answer. Here's some examples of making the tester fail:
>> make mmutstcase mmrout=../CASES/ATL_gemm_SSE.c mb=0 nb=56 M=2 N=56 K=56
>> make mmutstcase mmrout=../CASES/ATL_gemm_SSE.c mb=0 nb=56 M=10 N=56 K=56
Seems like an error in cleanup of a 4 unrolled loop, but I obviously don't
know. Can you confirm it's an error, and not just something I'm doing wrong?
To give some good news with all this, I include timings below comparing the
new SSE2 DGEMM versus the x86 FPU implementation.
Thanks,
Clint
100 200 300 400 500 600 700 800 900 1000
====== ====== ====== ====== ====== ====== ====== ====== ====== ======
P4 x86 1025.6 1194.0 1181.2 1238.7 1209.7 1234.3 1247.3 1264.2 1276.8 1242.2
P4 SSE2 1351.4 1837.0 1944.0 1828.6 1851.9 1878.3 1960.0 1932.1 1944.0 2000.0
1200 1400 1600 1800 2000 2200 2400 2600 2800 3000
====== ====== ====== ====== ====== ====== ====== ====== ====== ======
P4 x86 1256.7 1250.1 1254.5 1262.3 1261.8 1258.6 1261.3 1261.7 1262.0 1260.5
P4 SSE2 1986.2 1974.1 1974.0 1970.3 1990.0 1999.6 1991.9 1991.6 2002.0 1974.4