[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Timing roundup (P4, IA64, Athlon)
Guys,
I include below some interesting timings, where I compare the following three
systems, all running linux, and using a default ATLAS 3.2.0 install:
ATH : 1Ghz Athlon, SDRAM $1269
P4 : 1.5Ghz Pentium 4, Rambus $2109
IA64: 666Mhz Itanium, no idea on mem ?????
So, the first thing to note is that the Athlon is using the old memory
type (SDRAM, not the newer SDDRAM, or whatever the hell it is), and the
P4 is using rambus. I have no idea what the Itanium has. The price above
is not what we payed for the machines (I have no idea), it's what Gateway
tells me those machines with 256Mb of memory cost.
All the numbers here are using the P4's normal FPU. This machine will
need SSE2 to really shine (that will pump it's theor peak to 2*mhz). However,
the normal FPU is what you get just using gcc on linux, so it's what linux
people will be getting for a while, as well as MSVC++ people (Intel has a
compiler for Windows that apparently generates SSE2 code automatically,
which is what MKL is apparently already using to get dmatmul > Mhz).
So, the good news is that the P4 looks a lot like a PIII at the greater
clock speed, even when using the normal FPU (I had heard rumors that the
P4 fpu was crippled), since you get roughly 72% of peak with dgemm (the
exact number the PII gets; PIII's typically get more like 76%). Here's
some peak numbers (extracted from detailed timings below):
Theo dMatmul dLU dMM % dLU %
Mhz peak (MFLOP) (MFLOP) of Mhz of Mhz
==== ==== ======= ====== ====== ======
ATH : 1000 2000 1192.6 1003.1 119.3 100.3
P4 : 1500 1500 1073.9 986.1 71.6 65.7
IA64: 666 2664 1866.3 1336.0 280.2 200.6
Theoretical dMatmul dLU dLU %
peak (Mflop) % peak % peak of dMM
============ ======= ====== ======
ATH : 2000 59.6 50.2 84.1
P4 : 1500 71.6 65.7 91.8
IA64: 2664 70.1 50.2 71.6
OK, so peak performance-wise (where N=3000 is largest timings I took: both
Athlon and IA64 LU numbers were still getting better, as you would expect by
looking at their LU % of MM numbers), without SSE2, it looks like the P4 will
need to be about 1.66 times faster than an Athlon to maintain the same GEMM
peak, and about 1.53 times faster to maintain the same LU peak. Since the LU
peak should be perked up quite a bit by faster memory, it may look more like
the MM numbers soon. So, under these conditions, Athlon is the fp king of
the two. Athlon is far and away the flops/$ champion, and as far as I know,
this is true of any machine on the market.
Anyway, the full timings are given below. You'll see that the P4 does well
early (probably due to superior memory), with the IA64 doing really poorly
for small probs (memory again).
Cheers,
Clint
100 200 300 400 500 600 700 800 900 1000
===== ===== ===== ===== ===== ===== ===== ===== ===== =====
ATH dMM 909.1 1010.5 1080.0 1163.6 1087.0 1136.8 1143.3 1190.7 1205.0 1156.1
P4 dMM 952.4 1010.5 1080.0 984.6 1041.7 1080.0 1055.4 1077.9 1088.1 1075.3
IA64 dMM 866.3 1247.9 1472.4 1566.6 1570.6 1708.0 1645.1 1730.3 1710.2 1741.5
ATH dLU 477.4 611.8 695.0 709.8 780.1 777.4 815.8 793.1 823.0 865.2
P4 dLU 435.8 611.8 718.2 788.6 805.2 821.8 878.5 874.4 882.9 888.2
IA64 dLU 241.2 419.4 554.3 652.8 754.2 800.4 832.4 873.0 926.0 937.0
1200 1400 1600 1800 2000 2200 2400 2600 2800 3000
===== ===== ===== ===== ===== ===== ===== ===== ===== =====
ATH dMM 1183.6 1172.6 1175.3 1192.6 1179.9 1175.3 1189.7 1191.2 1190.1 1187.3
P4 dMM 1066.7 1067.7 1066.7 1065.3 1071.0 1073.9 1073.3 1072.7 1073.4 852.3
IA64 dMM 1789.2 1809.9 1820.4 1858.1 1840.1 1823.3 1832.5 1810.5 1862.2 1866.3
ATH dLU 878.8 923.4 925.2 943.2 950.3 965.5 974.9 983.5 994.6 1003.1
P4 dLU 906.5 932.8 937.9 950.2 955.4 965.5 969.8 977.8 975.4 986.1
IA64 dLU 990.7 1047.1 1077.9 1149.2 1179.4 1208.7 1240.7 1272.7 1305.7 1336.0
GEMM SYMM SYRK SYR2K TRMM TRSM
===== ===== ===== ===== ===== =====
ATH-1 500 1136.4 1000.0 835.0 1087.0 961.5 961.5
P4-1.5 500 1041.7 961.5 835.0 1000.0 892.9 1041.7
IA64-666 500 1610.1 1201.9 1462.9 1462.9 1082.5 816.6