[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
single precision results: holy GFLOP, batman!
Here are some results for single precision. Frankly, the numbers are
awe-inspiring. The idea of getting a 3Gflop LU on a PC kind of makes
my head spin. It makes you rethink how big a problem needs to be
before you parallelize it: this baby could solve a 10K sLU in something like
3 minutes . . .
I compare the 1.5Ghz P4 using SSE with our 1Ghz Athlon using 3DNow!. This
is not a fair comparison, in that SSE is IEEE compliant, and 3DNow! is not
(i.e., even if the Athlon won for performance, I'd still recommend the P4
since it gets the right answer as well). Also, remember that the Athlon is
not using the best memory.
All that said, the amazing thing is that the P4 is *more* than 1.5 times faster
than the Athlon (1.5 is how much faster its clock is, obviously).
Wow,
Clint
ATH : 1Ghz Athlon, SDRAM $1269
P4 : 1.5Ghz Pentium 4, Rambus $2109
100 200 300 400 500 600 700 800 900 1000
===== ===== ===== ===== ===== ===== ===== ===== ===== =====
ATH sMM 1860.5 2162.2 2160.0 2133.3 2205.9 2160.0 2286.7 2275.6 2278.1 2298.9
P4 sMM 2500.0 3674.1 3240.0 3584.0 3571.4 3600.0 3811.1 3657.1 3645.0 3703.7
ATH sLU 556.0 824.1 983.3 1151.0 1223.7 1348.3 1384.4 1420.9 1471.5 1480.4
P4 sLU 606.5 1153.8 1529.5 1703.5 1808.9 2054.6 2284.2 2200.1 2428.0 2467.3
1200 1400 1600 1800 2000 2200 2400 2600 2800 3000
===== ===== ===== ===== ===== ===== ===== ===== ===== =====
ATH sMM 2304.0 2325.4 2320.7 2328.1 2332.4 2330.0 2315.6 2349.7 2351.6 2364.3
P4 sMM 3676.6 3683.2 3673.5 3679.5 3661.3 3665.4 3671.7 3657.9 3670.9 3658.5
ATH sLU 1599.0 1618.0 1727.5 1719.6 1819.6 1787.5 1887.9 1865.3 1961.2 1904.3
P4 sLU 2616.5 2688.8 2785.1 2922.1 2913.3 2994.2 3040.6 2074.5 3093.2 3118.8
GEMM SYMM SYRK SYR2K TRMM TRSM
===== ===== ===== ===== ===== =====
ATH s500 2173.9 2272.7 1789.3 2381.0 2000.0 1351.4
P4 s500 3571.4 3125.0 2636.8 3333.3 2941.2 2500.0