[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: ATLAS FAQ
R Clint Whaley <rwhaley@cs.utk.edu> writes:
> Camm,
>
> >Hi Clint! Sorry I've been out of touch recently. Did you get those
> >files?
>
> Yep, I got the files; I finally had the sysadmins set your directory
> so I could read it (since you pointed me at files in there, I was pretty
> sure that was OK with you).
>
Of course! Glad to see it got done.
> As might be expected, there were some peculiarities. First, the best
> performance was given by ATL_dgemm_SSE_1x1xkb.c, not ATL_dgemm_SSE_1x4.c.
OK, This won't persist. There are still a few different variants of
the instruction ordering/pipelining that I'm testing. The two files
differ in this respect.
> Also, ATL_dgemm_SSE_1x1xkb.c didn't work for cleanup (got wrong answer
> for non-multiple of NB). Using an NB of 80 (and keeping N a multiple of
Can't reproduce this one. Can you give me the command that fails?
I've included my runs below.
> 80), I was able to build a complete dgemm getting a little over 2.1Gflop
> on torc19. However, that large of an NB used up too much memory, causing
> swapping very early, so I dropped back to NB=56, but didn't build the
> complete gemm there, since cleanup wasn't rolling.
>
> The interesting thing is that if the 2.1Gflop holds up (as I think it
> will), the P4 will overtake the Athlon on the double precision flops/$ . . .
>
Great! Yes, we may not really be nearly as optimal yet as we appear
to be with the single precision.
> Cheers,
> Clint
>
>
>
camm@torc19 Linux]$ make mmutstcase pre=d kb=56 m=57 n=58 mb=0 nb=0 mmrout=../CASES/ATL_dgemm_SSE_1x1xkb.c
rm -f dmm.[o,c]
./xemit_mm -p d -b 1 -M 0 -N 0 -K 56 -R -3 \
-lda 56 -ldb 56 -ldc 0 > dmm.c
pre=d, CU=0, ma=0, ff=0, if=-1, nf=-1, lo=1, ta=112, tb=111, lat=4, mu=4, nu=4, ku=1, m=0, n=0, k=56, lda=56, ldb=56, ldc=0, csA=1, csB=1, csC=1, alpha=1, beta=1
cat ../CASES/ATL_dgemm_SSE_1x1xkb.c >> dmm.c
/usr/bin/gcc -DL2SIZE=4194304 -I/scratch/camm/ATLAS/include -I/scratch/camm/ATLAS/include/Linux -I/scratch/camm/ATLAS/include/contrib -DAdd__ -DStringSunStyle -DATL_OS_Linux -DATL_SSE1 -fomit-frame-pointer -O -c dmm.c
In file included from dmm.c:50:
/scratch/camm/ATLAS/include/contrib/camm_util.h:144:31: warning: nothing can be pasted after this token
make mmtstcase0 pre=d ta=t tb=n muladd=1 lat=4 loopO=JIK M=0 N=0 K=56 mb=0 nb=0 kb=56 mu=4 nu=4 ku=1 lda=56 ldb=56 ldc=0 lda2=56 ldb2=56 ldc2=0 csA=1 csB=1 csC=1 alpha=1 beta=1 moves="-DMoveA -DMoveB" cleanup=0 MCC=/usr/bin/gcc MMFLAGS="-fomit-frame-pointer -O" mmobjs=dmm.o \
mmlib="/scratch/camm/ATLAS/lib/Linux/libatlas.a -lm"
make[1]: Entering directory `/scratch/camm/ATLAS/tune/blas/gemm/Linux'
rm -f dmmtst.o
/usr/bin/gcc -DL2SIZE=4194304 -I/scratch/camm/ATLAS/include -I/scratch/camm/ATLAS/include/Linux -I/scratch/camm/ATLAS/include/contrib -DAdd__ -DStringSunStyle -DATL_OS_Linux -DATL_SSE1 -fomit-frame-pointer -O3 -funroll-all-loops -DdREAL -DtranAt -DtranBn \
-DMULADD=1 -DLAT=4 -DJIK \
-DMB0=0 -DNB0=0 -DKB0=56 \
-DMB=0 -DNB=0 -DKB=56 \
-DKU=1 -DNU=4 -DMU=4 \
-DLDA=56 -DLDB=56 -DLDC=0 \
-DcsA=1 -DcsB=1 -DcsC=1 \
-DALPHA=1 -DBETA=1 -DMoveA -DMoveB \
-DCLEANUP=0 \
-o dmmtst.o -c ../mmtst.c
/usr/bin/gcc -DL2SIZE=4194304 -I/scratch/camm/ATLAS/include -I/scratch/camm/ATLAS/include/Linux -I/scratch/camm/ATLAS/include/contrib -DAdd__ -DStringSunStyle -DATL_OS_Linux -DATL_SSE1 -fomit-frame-pointer -O3 -funroll-all-loops -o xdmmtst dmmtst.o dmm.o \
/scratch/camm/ATLAS/lib/Linux/libatlas.a -lm
/scratch/camm/ATLAS/bin/Linux/ATLrun.sh /scratch/camm/ATLAS/tune/blas/gemm/Linux xdmmtst
PASSED TEST
make[1]: Leaving directory `/scratch/camm/ATLAS/tune/blas/gemm/Linux'
[camm@torc19 Linux]$ make mmutstcase pre=d kb=56 k=56 m=57 n=58 mb=0 nb=0 mmrout=../CASES/ATL_dgemm_SSE_1x1xkb.c
rm -f dmm.[o,c]
./xemit_mm -p d -b 1 -M 0 -N 0 -K 56 -R -3 \
-lda 56 -ldb 56 -ldc 0 > dmm.c
pre=d, CU=0, ma=0, ff=0, if=-1, nf=-1, lo=1, ta=112, tb=111, lat=4, mu=4, nu=4, ku=1, m=0, n=0, k=56, lda=56, ldb=56, ldc=0, csA=1, csB=1, csC=1, alpha=1, beta=1
cat ../CASES/ATL_dgemm_SSE_1x1xkb.c >> dmm.c
/usr/bin/gcc -DL2SIZE=4194304 -I/scratch/camm/ATLAS/include -I/scratch/camm/ATLAS/include/Linux -I/scratch/camm/ATLAS/include/contrib -DAdd__ -DStringSunStyle -DATL_OS_Linux -DATL_SSE1 -fomit-frame-pointer -O -c dmm.c
In file included from dmm.c:50:
/scratch/camm/ATLAS/include/contrib/camm_util.h:144:31: warning: nothing can be pasted after this token
make mmtstcase0 pre=d ta=t tb=n muladd=1 lat=4 loopO=JIK M=0 N=0 K=56 mb=0 nb=0 kb=56 mu=4 nu=4 ku=1 lda=56 ldb=56 ldc=0 lda2=56 ldb2=56 ldc2=0 csA=1 csB=1 csC=1 alpha=1 beta=1 moves="-DMoveA -DMoveB" cleanup=0 MCC=/usr/bin/gcc MMFLAGS="-fomit-frame-pointer -O" mmobjs=dmm.o \
mmlib="/scratch/camm/ATLAS/lib/Linux/libatlas.a -lm"
make[1]: Entering directory `/scratch/camm/ATLAS/tune/blas/gemm/Linux'
rm -f dmmtst.o
/usr/bin/gcc -DL2SIZE=4194304 -I/scratch/camm/ATLAS/include -I/scratch/camm/ATLAS/include/Linux -I/scratch/camm/ATLAS/include/contrib -DAdd__ -DStringSunStyle -DATL_OS_Linux -DATL_SSE1 -fomit-frame-pointer -O3 -funroll-all-loops -DdREAL -DtranAt -DtranBn \
-DMULADD=1 -DLAT=4 -DJIK \
-DMB0=0 -DNB0=0 -DKB0=56 \
-DMB=0 -DNB=0 -DKB=56 \
-DKU=1 -DNU=4 -DMU=4 \
-DLDA=56 -DLDB=56 -DLDC=0 \
-DcsA=1 -DcsB=1 -DcsC=1 \
-DALPHA=1 -DBETA=1 -DMoveA -DMoveB \
-DCLEANUP=0 \
-o dmmtst.o -c ../mmtst.c
/usr/bin/gcc -DL2SIZE=4194304 -I/scratch/camm/ATLAS/include -I/scratch/camm/ATLAS/include/Linux -I/scratch/camm/ATLAS/include/contrib -DAdd__ -DStringSunStyle -DATL_OS_Linux -DATL_SSE1 -fomit-frame-pointer -O3 -funroll-all-loops -o xdmmtst dmmtst.o dmm.o \
/scratch/camm/ATLAS/lib/Linux/libatlas.a -lm
/scratch/camm/ATLAS/bin/Linux/ATLrun.sh /scratch/camm/ATLAS/tune/blas/gemm/Linux xdmmtst
PASSED TEST
make[1]: Leaving directory `/scratch/camm/ATLAS/tune/blas/gemm/Linux'
[camm@torc19 Linux]$ make mmutstcase pre=z kb=56 k=56 m=57 n=58 mb=0 nb=0 mmrout=../CASES/ATL_dgemm_SSE_1x1xkb.c
rm -f zmm.[o,c]
./xemit_mm -p z -b 1 -M 0 -N 0 -K 56 -R -3 \
-lda 56 -ldb 56 -ldc 0 > zmm.c
pre=z, CU=0, ma=0, ff=0, if=-1, nf=-1, lo=1, ta=112, tb=111, lat=4, mu=4, nu=4, ku=1, m=0, n=0, k=56, lda=56, ldb=56, ldc=0, csA=1, csB=1, csC=1, alpha=1, beta=1
cat ../CASES/ATL_dgemm_SSE_1x1xkb.c >> zmm.c
/usr/bin/gcc -DL2SIZE=4194304 -I/scratch/camm/ATLAS/include -I/scratch/camm/ATLAS/include/Linux -I/scratch/camm/ATLAS/include/contrib -DAdd__ -DStringSunStyle -DATL_OS_Linux -DATL_SSE1 -fomit-frame-pointer -O -c zmm.c
In file included from zmm.c:50:
/scratch/camm/ATLAS/include/contrib/camm_util.h:144:31: warning: nothing can be pasted after this token
make mmtstcase0 pre=z ta=t tb=n muladd=1 lat=4 loopO=JIK M=0 N=0 K=56 mb=0 nb=0 kb=56 mu=4 nu=4 ku=1 lda=56 ldb=56 ldc=0 lda2=56 ldb2=56 ldc2=0 csA=1 csB=1 csC=1 alpha=1 beta=1 moves="-DMoveA -DMoveB" cleanup=0 MCC=/usr/bin/gcc MMFLAGS="-fomit-frame-pointer -O" mmobjs=zmm.o \
mmlib="/scratch/camm/ATLAS/lib/Linux/libatlas.a -lm"
make[1]: Entering directory `/scratch/camm/ATLAS/tune/blas/gemm/Linux'
rm -f zmmtst.o
/usr/bin/gcc -DL2SIZE=4194304 -I/scratch/camm/ATLAS/include -I/scratch/camm/ATLAS/include/Linux -I/scratch/camm/ATLAS/include/contrib -DAdd__ -DStringSunStyle -DATL_OS_Linux -DATL_SSE1 -fomit-frame-pointer -O3 -funroll-all-loops -DzREAL -DtranAt -DtranBn \
-DMULADD=1 -DLAT=4 -DJIK \
-DMB0=0 -DNB0=0 -DKB0=56 \
-DMB=0 -DNB=0 -DKB=56 \
-DKU=1 -DNU=4 -DMU=4 \
-DLDA=56 -DLDB=56 -DLDC=0 \
-DcsA=1 -DcsB=1 -DcsC=1 \
-DALPHA=1 -DBETA=1 -DMoveA -DMoveB \
-DCLEANUP=0 \
-o zmmtst.o -c ../mmtst.c
/usr/bin/gcc -DL2SIZE=4194304 -I/scratch/camm/ATLAS/include -I/scratch/camm/ATLAS/include/Linux -I/scratch/camm/ATLAS/include/contrib -DAdd__ -DStringSunStyle -DATL_OS_Linux -DATL_SSE1 -fomit-frame-pointer -O3 -funroll-all-loops -o xzmmtst zmmtst.o zmm.o \
/scratch/camm/ATLAS/lib/Linux/libatlas.a -lm
zmmtst.o: In function `mmtst':
zmmtst.o(.text+0xa6e): undefined reference to `ATL_zJIK0x0x56TN56x56x0_a1_bX'
zmmtst.o(.text+0xadf): undefined reference to `ATL_zJIK0x0x56TN56x56x0_a1_bX'
collect2: ld returned 1 exit status
make[1]: *** [mmtstcase0] Error 1
make[1]: Leaving directory `/scratch/camm/ATLAS/tune/blas/gemm/Linux'
make: *** [mmutstcase] Error 2
[camm@torc19 Linux]$ make cmmutstcase pre=z kb=56 k=56 m=57 n=58 mb=0 nb=0 mmrout=../CASES/ATL_dgemm_SSE_1x1xkb.c
rm -f zmm.c
make BuildCobjs pre=z mb=0 nb=0 kb=56 \
lda=56 ldb=56 ldc=0 mmrout=../CASES/ATL_dgemm_SSE_1x1xkb.c
make[1]: Entering directory `/scratch/camm/ATLAS/tune/blas/gemm/Linux'
rm -f zmm_b[1,0,X].[o,c]
./xemit_mm -p z -b 0 -M 0 -N 0 -K 56 -R -3 \
-lda 56 -ldb 56 -ldc 0 > zmm_b0.c
pre=z, CU=0, ma=0, ff=0, if=-1, nf=-1, lo=1, ta=112, tb=111, lat=4, mu=4, nu=4, ku=1, m=0, n=0, k=56, lda=56, ldb=56, ldc=0, csA=1, csB=1, csC=1, alpha=1, beta=0
cat ../CASES/ATL_dgemm_SSE_1x1xkb.c >> zmm_b0.c
./xemit_mm -p z -b 1 -M 0 -N 0 -K 56 -R -3 \
-lda 56 -ldb 56 -ldc 0 > zmm_b1.c
pre=z, CU=0, ma=0, ff=0, if=-1, nf=-1, lo=1, ta=112, tb=111, lat=4, mu=4, nu=4, ku=1, m=0, n=0, k=56, lda=56, ldb=56, ldc=0, csA=1, csB=1, csC=1, alpha=1, beta=1
cat ../CASES/ATL_dgemm_SSE_1x1xkb.c >> zmm_b1.c
./xemit_mm -p z -b 8 -M 0 -N 0 -K 56 -R -3 \
-lda 56 -ldb 56 -ldc 0 > zmm_bX.c
pre=z, CU=0, ma=0, ff=0, if=-1, nf=-1, lo=1, ta=112, tb=111, lat=4, mu=4, nu=4, ku=1, m=0, n=0, k=56, lda=56, ldb=56, ldc=0, csA=1, csB=1, csC=1, alpha=1, beta=8
cat ../CASES/ATL_dgemm_SSE_1x1xkb.c >> zmm_bX.c
/usr/bin/gcc -DL2SIZE=4194304 -I/scratch/camm/ATLAS/include -I/scratch/camm/ATLAS/include/Linux -I/scratch/camm/ATLAS/include/contrib -DAdd__ -DStringSunStyle -DATL_OS_Linux -DATL_SSE1 -fomit-frame-pointer -O -c zmm_b1.c
In file included from zmm_b1.c:50:
/scratch/camm/ATLAS/include/contrib/camm_util.h:144:31: warning: nothing can be pasted after this token
/usr/bin/gcc -DL2SIZE=4194304 -I/scratch/camm/ATLAS/include -I/scratch/camm/ATLAS/include/Linux -I/scratch/camm/ATLAS/include/contrib -DAdd__ -DStringSunStyle -DATL_OS_Linux -DATL_SSE1 -fomit-frame-pointer -O -c zmm_b0.c
In file included from zmm_b0.c:50:
/scratch/camm/ATLAS/include/contrib/camm_util.h:144:31: warning: nothing can be pasted after this token
/usr/bin/gcc -DL2SIZE=4194304 -I/scratch/camm/ATLAS/include -I/scratch/camm/ATLAS/include/Linux -I/scratch/camm/ATLAS/include/contrib -DAdd__ -DStringSunStyle -DATL_OS_Linux -DATL_SSE1 -fomit-frame-pointer -O -c zmm_bX.c
In file included from zmm_bX.c:50:
/scratch/camm/ATLAS/include/contrib/camm_util.h:144:31: warning: nothing can be pasted after this token
make[1]: Leaving directory `/scratch/camm/ATLAS/tune/blas/gemm/Linux'
make mmtstcase0 pre=z ta=t tb=n muladd=1 lat=4 loopO=JIK M=0 N=0 K=56 mb=0 nb=0 kb=56 mu=4 nu=4 ku=1 lda=56 ldb=56 ldc=0 lda2=56 ldb2=56 ldc2=0 csA=1 csB=1 csC=1 alpha=1 beta=1 moves="-DMoveA -DMoveB" cleanup=0 MCC=/usr/bin/gcc MMFLAGS="-fomit-frame-pointer -O" csC=2 \
mmobjs="zmm_b0.o zmm_b1.o zmm_bX.o" \
mmlib="/scratch/camm/ATLAS/lib/Linux/libatlas.a -lm"
make[1]: Entering directory `/scratch/camm/ATLAS/tune/blas/gemm/Linux'
rm -f zmmtst.o
/usr/bin/gcc -DL2SIZE=4194304 -I/scratch/camm/ATLAS/include -I/scratch/camm/ATLAS/include/Linux -I/scratch/camm/ATLAS/include/contrib -DAdd__ -DStringSunStyle -DATL_OS_Linux -DATL_SSE1 -fomit-frame-pointer -O3 -funroll-all-loops -DzREAL -DtranAt -DtranBn \
-DMULADD=1 -DLAT=4 -DJIK \
-DMB0=0 -DNB0=0 -DKB0=56 \
-DMB=0 -DNB=0 -DKB=56 \
-DKU=1 -DNU=4 -DMU=4 \
-DLDA=56 -DLDB=56 -DLDC=0 \
-DcsA=1 -DcsB=1 -DcsC=2 \
-DALPHA=1 -DBETA=1 -DMoveA -DMoveB \
-DCLEANUP=0 \
-o zmmtst.o -c ../mmtst.c
/usr/bin/gcc -DL2SIZE=4194304 -I/scratch/camm/ATLAS/include -I/scratch/camm/ATLAS/include/Linux -I/scratch/camm/ATLAS/include/contrib -DAdd__ -DStringSunStyle -DATL_OS_Linux -DATL_SSE1 -fomit-frame-pointer -O3 -funroll-all-loops -o xzmmtst zmmtst.o zmm_b0.o zmm_b1.o zmm_bX.o \
/scratch/camm/ATLAS/lib/Linux/libatlas.a -lm
/scratch/camm/ATLAS/bin/Linux/ATLrun.sh /scratch/camm/ATLAS/tune/blas/gemm/Linux xzmmtst
PASSED TEST
make[1]: Leaving directory `/scratch/camm/ATLAS/tune/blas/gemm/Linux'
--
Camm Maguire camm@enhanced.com
==========================================================================
"The earth is but one country, and mankind its citizens." -- Baha'u'llah