[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
MKL5.0 v. ATLAS3.2 v ATLAS 3.3 on the P4
Guys,
Here are some timings comparing MKL5.0, ATLAS 3.2.1, and the ATLAS developer
release 3.3.0. All timings are on our 1.5Ghz P4 (256K L2). Note that the
developer release requires an experimental "as" to assemble the new SSE2
instructions; I was not able in the 5 minutes I spent on it to get this
rolling under cygwin/Windows 2000. Again, I'm timing under Win2K 'cause
the Linux version of MKL is under NDA. So, MKL5.0 and ATLAS 3.2.1 timings
were obtained under Win2K, while the ATLAS 3.3.0 timings were taken under
Linux **on the same machine**.
As with the PIII, MKL5.0 seg faults for 500x500 HERK HER2K, so that's why there
are no timings for that case.
The quickest summation I could give would be: just use ATLAS.
The main difference between ATLAS 3.2 and 3.3 is, of course, support for
SSE2 using Camm and Peter's excellent kernels. I have not done full timings;
I settled for what I had time for, so perhaps MKL may be better on others,
but I think its P4 support is just too preliminary for that to be likely . . .
Cheers,
Clint
*******************************************************************************
* 1.5Ghz P4, 256K L2 *
*******************************************************************************
M50 : MKL5.0, Win2K
A32 : ATLAS 3.2.1, Win2K
A33 : ATLAS 3.3.0, Linux
1200 1400 1600 1800 2000 2200 2400 2600 2800 3000
===== ===== ===== ===== ===== ===== ===== ===== ===== =====
M50 dLU 676.4 681.2 677.9 685.7 690.5 690.0 691.7 694.6 695.3 698.1
A32 dLU 1045.7 1073.6 1077.1 1108.8 1109.1 1124.8 1130.2 1138.9 1137.9 1161.6
A33 dLU 1514.8 1562.7 1568.6 1619.3 1645.5 1677.6 1690.4 1720.1 1715.2 1722.1
M50 sLU 1741.7 1790.7 1840.5 1883.8 1861.5 1915.3 1999.8 1995.9 1977.1 1994.4
A32 sLU 2449.5 2571.5 2699.7 2812.1 2878.7 2977.9 3036.6 3094.0 3142.3 3191.8
A33 sLU 2449.5 2504.6 2624.4 2756.3 2851.0 2944.5 3060.8 3090.8 3126.2 3173.8
100 200 300 400 500 600 700 800 900 1000
===== ===== ===== ===== ===== ===== ===== ===== ===== =====
M50 sLU 556.9 1121.7 1346.6 1245.2 1468.4 1513.9 1522.8 1696.6 1728.1 1748.5
A32 sLU 527.6 917.8 1134.0 1520.9 1560.2 1917.6 1903.5 1994.2 2207.3 2220.6
A33 sLU 514.1 917.8 1134.0 1521.0 1664.2 1917.6 1903.5 2131.3 2207.3 2220.6
M50 dLU 384.8 531.3 567.0 606.6 622.5 639.2 650.8 642.2 664.3 672.2
A32 dLU 425.7 673.0 766.8 815.8 734.2 924.9 993.1 974.3 988.9 1023.3
A33 dLU 435.8 696.2 936.8 1120.7 1134.7 1150.6 1269.0 1311.6 1387.4 1448.2
M50 cLU 769.8 998.1 1024.4 1033.4 1037.6 1086.1 1100.1 1098.8 1114.9 1109.3
A32 cLU 554.9 912.6 1239.8 1543.0 1665.4 1856.9 2077.6 2162.7 2310.6 2377.9
A33 cLU 631.0 1064.7 1438.2 1623.9 1850.5 2055.9 2229.7 2313.0 2369.7 2445.6
M50 zLU 696.2 848.3 859.5 919.1 897.8 898.0 895.4 896.6 928.4 917.9
A32 zLU 504.8 709.8 826.6 897.4 945.0 992.5 1026.0 1040.2 1071.8 1077.5
A33 zLU 438.9 734.3 980.6 1136.7 1189.6 1338.7 1451.1 1436.5 1518.1 1532.0
M50 sMM 2500.0 2380.0 2454.5 3200.0 3125.0 2880.0 2982.6 3200.0 3095.5 2980.6
A32 sMM 2142.9 2880.0 3200.0 3605.6 4166.7 3570.2 3591.6 3923.4 3940.5 3913.9
A33 sMM 2631.6 2917.6 3240.0 4266.7 3846.2 3600.0 4035.3 4096.0 3940.5 3921.6
M50 dMM 952.4 1361.7 1350.0 1600.0 1562.5 1728.0 1591.6 1762.5 1672.0 1752.8
A32 dMM 952.4 1066.7 1148.9 1163.6 1184.8 1196.7 1222.8 1217.6 1245.1 1240.7
A33 dMM 1515.2 1600.0 1675.9 1920.0 2000.0 1878.3 1854.1 1969.2 1997.3 1941.7
M50 cMM 19.4 1113.0 1136.8 1216.2 1189.1 1190.1 1191.5 1202.9 1198.3 1199.4
A32 cMM 112.0 2115.7 2700.0 3938.5 4000.0 3918.4 3859.4 4129.0 4072.6 3994.0
A33 cMM 666.7 3200.0 3085.7 3938.5 4000.0 3840.0 3920.0 4137.4 4050.0 4060.9
M50 zMM 1000.0 1010.5 981.8 1064.4 1039.5 1071.3 1045.7 1067.8 1051.2 1052.5
A32 zMM 1047.1 1010.5 1200.0 1187.9 1161.4 1223.8 1212.0 1217.2 1208.5 1199.4
A33 zMM 1538.5 1920.0 1963.6 1897.3 1923.1 2057.1 2017.6 2133.3 2046.3 2088.8
HEMM HERK HER2K
GEMM SYMM SYRK SYR2K TRMM TRSM
====== ====== ====== ====== ====== ======
M50 s500 3125.0 1785.7 1926.9 1243.8 2500.0 1785.7
A32 s500 3571.4 3571.4 2783.3 3571.4 3125.0 2272.7
A33 s500 3846.2 3571.4 2890.4 3571.4 3125.0 2381.0
M50 d500 1087.0 803.9 1138.6 889.7 1126.1 690.6
A32 d500 1000.0 1136.4 963.5 1136.4 1041.7 1041.7
A33 d500 1923.1 1923.1 1565.6 1785.7 1666.7 1470.6
M50 c500 1062.7 1040.6 1000.0 891.3 1019.3 1085.7
A32 c500 2849.0 3436.4 2947.1 3846.2 3128.1 1467.7
A33 c500 4000.0 3703.7 2783.3 3783.3 3336.7 2085.4
M50 z500 1019.4 960.6 **SEG FAULT** 925.1 892.2
A32 z500 1203.4 1189.1 961.6 1189.1 1040.5 1062.6
A33 z500 1851.9 1785.7 1565.6 1923.1 1725.9 1352.7