[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
UltraSparc kernel results
Guys,
I have finally gotten the new kernel/cleanup stuff working such that
if you hold its hand supportively enough, it'll complete a build for
you. I include timings below on a Ultra-2, 200Mhz, comparing
Sunperf, last release of ATLAS (Atl) and ATLAS + the kernel submitted
by Viet Nguyen & Peter Strazdins. The good news is we get about 90%
of performance for double complex, and we modestly beat the vendor for
double. We still run only around 80-85% of vendor speed for single
precision (the submitted code doesn't help single).
That's the good news. The bad news is I got access to an Ultra-5/10,
sun's PCI-based low-end ultrasparc, and the submitted kernels don't
seem to do very well on those machines; ATLAS's generated code is
as good as the kernel there, and both get *completely* waxed by
sunperf. My guess is the motherboard can have such an effect
because the UltraSparc II has an off-chip cache, and the PCI-based
one makes the code really different . . . Anyway, I'll have to
investigate this further, maybe I just messed up the build . . .
Cheers
Clint
LIB N DGEMM DSYMM SYR2K DSYRK DTRMM DTRSM
======= ==== ===== ===== ===== ===== ===== =====
Sunperf 100 271.3 157.7 163.3 163.3 101.4 190.2
Atl 100
Atl+USK 100 255.5 239.7 158.8 158.8 152.8 210.8
Sunperf 500 277.8 238.1 287.4 266.5 164.5 235.8
Atl 500 245.1 235.8 238.1 235.8 227.3 240.4
Atl+USK 500 297.6 287.4 297.6 245.6 271.7 260.4
Sunperf 1000 288.2 222.2 269.9 253.8 159.2 248.8
Atl 1000 248.4 246.3 245.7 235.8 238.1 245.7
Atl+USK 1000 294.1 284.9 280.9 252.8 266.7 232.6
LIB N ZGEMM ZSYMM SYR2K ZSYRK ZTRMM ZTRSM ZHERK HER2K
======= ==== ===== ===== ===== ===== ===== ===== ===== =====
Sunperf 500 300.3 292.4 288.2 219.7 281.2 263.4 273.8 286.5
Atl 500 251.9 245.7 247.5 213.7 233.6 205.8 214.6 233.6
Atl+USK 500 289.0 297.6 284.9 216.9 247.8 224.4 218.8 287.4
******************************************************************************
gemm timings with:
0: sunperf
1: generated atlas
2: atlas with UltraSparc kernel
DGEMM:
TEST TA TB M N K alpha beta Time Mflop
==== == == === === === ===== ===== ====== =====
0 N N 100 100 100 1.0 1.0 0.01 263.2
1 N N 100 100 100 1.0 1.0 0.01 198.9
2 N N 100 100 100 1.0 1.0 0.01 239.7
0 N N 200 200 200 1.0 1.0 0.06 250.7
1 N N 200 200 200 1.0 1.0 0.07 231.7
2 N N 200 200 200 1.0 1.0 0.06 262.5
0 N N 300 300 300 1.0 1.0 0.19 276.9
1 N N 300 300 300 1.0 1.0 0.24 226.6
2 N N 300 300 300 1.0 1.0 0.20 276.9
0 N N 400 400 400 1.0 1.0 0.49 261.2
1 N N 400 400 400 1.0 1.0 0.55 230.6
2 N N 400 400 400 1.0 1.0 0.44 294.3
0 N N 500 500 500 1.0 1.0 0.98 255.1
1 N N 500 500 500 1.0 1.0 1.07 233.6
2 N N 500 500 500 1.0 1.0 0.86 290.7
0 N N 600 600 600 1.0 1.0 1.63 265.0
1 N N 600 600 600 1.0 1.0 1.86 232.3
2 N N 600 600 600 1.0 1.0 1.51 286.1
0 N N 700 700 700 1.0 1.0 2.59 264.9
1 N N 700 700 700 1.0 1.0 2.88 238.2
2 N N 700 700 700 1.0 1.0 2.55 269.0
0 N N 800 800 800 1.0 1.0 3.81 268.8
1 N N 800 800 800 1.0 1.0 4.24 241.5
2 N N 800 800 800 1.0 1.0 3.56 287.6
0 N N 900 900 900 1.0 1.0 5.53 263.7
1 N N 900 900 900 1.0 1.0 6.49 224.7
2 N N 900 900 900 1.0 1.0 5.06 288.1
0 N N 1000 1000 1000 1.0 1.0 7.59 263.5
1 N N 1000 1000 1000 1.0 1.0 8.72 229.4
2 N N 1000 1000 1000 1.0 1.0 7.24 276.2
ZGEMM:
TEST TA TB M N K alpha beta Time Mflop
==== == == === === === ===== ===== ===== ===== ====== =====
0 N N 100 100 100 1.0 0.0 1.0 0.0 0.03 266.7
1 N N 100 100 100 1.0 0.0 1.0 0.0 0.04 227.8
2 N N 100 100 100 1.0 0.0 1.0 0.0 0.03 266.7
0 N N 200 200 200 1.0 0.0 1.0 0.0 0.22 290.9
1 N N 200 200 200 1.0 0.0 1.0 0.0 0.27 240.6
2 N N 200 200 200 1.0 0.0 1.0 0.0 0.23 278.3
0 N N 300 300 300 1.0 0.0 1.0 0.0 0.70 308.6
1 N N 300 300 300 1.0 0.0 1.0 0.0 0.91 237.4
2 N N 300 300 300 1.0 0.0 1.0 0.0 0.85 254.1
0 N N 400 400 400 1.0 0.0 1.0 0.0 1.69 303.0
1 N N 400 400 400 1.0 0.0 1.0 0.0 2.15 238.1
2 N N 400 400 400 1.0 0.0 1.0 0.0 1.85 276.8
0 N N 500 500 500 1.0 0.0 1.0 0.0 3.18 314.5
1 N N 500 500 500 1.0 0.0 1.0 0.0 4.25 235.3
2 N N 500 500 500 1.0 0.0 1.0 0.0 3.69 271.0
0 N N 600 600 600 1.0 0.0 1.0 0.0 5.49 314.8
1 N N 600 600 600 1.0 0.0 1.0 0.0 6.85 252.3
2 N N 600 600 600 1.0 0.0 1.0 0.0 6.29 274.7
0 N N 700 700 700 1.0 0.0 1.0 0.0 8.69 315.8
1 N N 700 700 700 1.0 0.0 1.0 0.0 11.95 229.6
2 N N 700 700 700 1.0 0.0 1.0 0.0 9.97 275.2
0 N N 800 800 800 1.0 0.0 1.0 0.0 13.11 312.4
1 N N 800 800 800 1.0 0.0 1.0 0.0 17.06 240.1
2 N N 800 800 800 1.0 0.0 1.0 0.0 14.88 275.3
0 N N 900 900 900 1.0 0.0 1.0 0.0 19.89 293.2
1 N N 900 900 900 1.0 0.0 1.0 0.0 24.13 241.7
2 N N 900 900 900 1.0 0.0 1.0 0.0 20.82 280.1
0 N N 1000 1000 1000 1.0 0.0 1.0 0.0 26.64 300.3
1 N N 1000 1000 1000 1.0 0.0 1.0 0.0 32.23 248.2
2 N N 1000 1000 1000 1.0 0.0 1.0 0.0 28.69 278.8