[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
SSE numbers
Guys,
I've just finished incorporating Camm's newest submissions, and the time
seemed ripe to finally compare against Doug's submitted full emmerald
SGEMM. All numbers are for my 500Mhz coppermine PIII laptop, using
the ATLAS timers (which do cache flushing, if my numbers seem low to you).
The short version is that Doug's code is a little better for some smaller
problem sizes due to kernel cleanup, but that the kernel code is the clear
winner for moderate or large problems. SGEMM peaks at 920Mflop
(46% of SSE-peak, 184% of x87 peak), and CGEMM is about the same. Emmerald
SGEMM appears to peak around 880. Single precision LU peaks (for the prob
sizes I ran) around 665 for kernel approach, and 611 for emmerald (yow! LU
timings exceeding x87 peak).
All in all, I think the kernel option seems to be quite adequate for
performance here. A kind of cool feature of this is that we get mix and
match on the build. The present code uses Camm's kernel, Camm's N-cleanup
code, and Peter's M-cleanup code. It further uses Camm's K-cleanup for
K = 4 or 8, Peter's for K = 16, 20, 24, 28, 32, 36, 40, 44, 48, 52, 56, 60, 64,
and the generated for the rest.
Peter's still cranking on the cleanup, so it may actually wind up improved
before release, which would help with the small prob sizes . . .
Cheers,
Clint
GHBlas : Greg Henry's blas (some old version, may be better now)
Peter : Timings I sent a while back, using Peter's kernels only
emmeral: SGEMM built from Doug's emmerald full-gemm
New : New mixed kernels described above
NOTE: nb=64 for SGEMM, 56 for CGEMM
100 200 300 400 500 600 700 800 900 1000
===== ===== ===== ===== ===== ===== ===== ===== ===== =====
GHBlas SGEMM 400.0 417.4 405.0 412.9 416.7 419.4 426.1 428.5 426.3 428.3
Peter SGEMM 500.0 662.1 736.4 800.0 757.6 815.1 826.5 853.3 857.6 840.3
emmeral SGEMM 714.3 784.3 741.2 783.7 862.1 800.0 797.7 867.8 796.7 873.4
New SGEMM 597.0 769.2 804.3 872.7 833.3 900.0 902.6 906.2 917.0 909.1
emmeral STRMM 303.0 519.5 540.0 609.5 625.0 635.3 591.4 664.9 662.7 671.1
New STRMM 330.6 625.0 663.2 738.5 721.2 800.0 762.2 800.0 810.0 819.7
emmeral SLU 223.6 331.0 413.0 462.9 526.6 532.7 525.1 563.7 599.5 611.2
New SLU 207.4 354.2 413.0 489.5 520.0 567.7 571.1 614.4 630.6 640.5
emmeral SLLt 135.7 241.1 308.4 380.2 426.4 492.1 487.6 488.5 523.5 542.8
New SLLt 134.3 235.4 318.0 383.8 439.9 481.2 498.2 540.0 566.1 580.6
New CGEMM 56.9 817.0 864.0 867.8 862.1 886.2 890.9 898.2 891.7 894.9
New CLU 246.1 381.7 468.9 544.2 559.8 605.9 643.8 662.4 691.5 696.0
128 256 384 512 640 768 896 1024 1152 1280
===== ===== ===== ===== ===== ===== ===== ===== ===== =====
emmeral SGEMM 751.8 820.2 617.7 745.7 680.9 748.7 672.3 732.9 693.3 733.3
New SGEMM 830.1 838.9 894.0 958.7 919.8 906.0 910.5 910.5 926.6 919.8
emmeral SLU 255.0 368.3 440.9 438.0 523.7 482.7 584.3 470.6 595.7 547.9
New SLU 268.7 368.3 476.4 490.9 569.2 580.2 622.3 533.8 661.4 665.4
112 224 336 448 560 672 784 896 1008 1120
===== ===== ===== ===== ===== ===== ===== ===== ===== =====
New CGEMM 969.7 899.2 892.5 899.2 912.3 909.3 909.2 912.0 916.5 916.0