[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: ATLAS FAQ



R Clint Whaley <rwhaley@cs.utk.edu> writes:

> Camm,
> 
> >Hi Clint!  Sorry I've been out of touch recently.  Did you get those
> >files?   
> 
> Yep, I got the files; I finally had the sysadmins set your directory
> so I could read it (since you pointed me at files in there, I was pretty
> sure that was OK with you).
> 

Of course!  Glad to see it got done.

> As might be expected, there were some peculiarities.  First, the best 
> performance was given by ATL_dgemm_SSE_1x1xkb.c, not ATL_dgemm_SSE_1x4.c.

OK, This won't persist.  There are still a few different variants of
the instruction ordering/pipelining that I'm testing.  The two files
differ in this respect.

> Also, ATL_dgemm_SSE_1x1xkb.c didn't work for cleanup (got wrong answer
> for non-multiple of NB).  Using an NB of 80 (and keeping N a multiple of

Can't reproduce this one.  Can you give me the command that fails?
I've included my runs below.

> 80), I was able to build a complete dgemm getting a little over 2.1Gflop
> on torc19.  However, that large of an NB used up too much memory, causing
> swapping very early, so I dropped back to NB=56, but didn't build the
> complete gemm there, since cleanup wasn't rolling.
> 
> The interesting thing is that if the 2.1Gflop holds up (as I think it
> will), the P4 will overtake the Athlon on the double precision flops/$ . . .
> 

Great!  Yes, we may not really be nearly as optimal yet as we appear
to be with the single precision.

> Cheers,
> Clint
> 
> 
> 
camm@torc19 Linux]$ make mmutstcase pre=d kb=56 m=57 n=58 mb=0 nb=0 mmrout=../CASES/ATL_dgemm_SSE_1x1xkb.c
rm -f dmm.[o,c]
./xemit_mm -p d -b 1 -M 0 -N 0 -K 56 -R -3 \
                   -lda 56 -ldb 56 -ldc 0 > dmm.c
pre=d, CU=0, ma=0, ff=0, if=-1, nf=-1, lo=1, ta=112, tb=111, lat=4, mu=4, nu=4, ku=1, m=0, n=0, k=56, lda=56, ldb=56, ldc=0, csA=1, csB=1, csC=1, alpha=1, beta=1

cat ../CASES/ATL_dgemm_SSE_1x1xkb.c >> dmm.c
/usr/bin/gcc  -DL2SIZE=4194304 -I/scratch/camm/ATLAS/include -I/scratch/camm/ATLAS/include/Linux -I/scratch/camm/ATLAS/include/contrib  -DAdd__ -DStringSunStyle -DATL_OS_Linux -DATL_SSE1 -fomit-frame-pointer -O -c dmm.c
In file included from dmm.c:50:
/scratch/camm/ATLAS/include/contrib/camm_util.h:144:31: warning: nothing can be pasted after this token
make mmtstcase0 pre=d ta=t tb=n muladd=1 lat=4 loopO=JIK M=0 N=0 K=56 mb=0 nb=0 kb=56 mu=4 nu=4 ku=1 lda=56 ldb=56 ldc=0 lda2=56 ldb2=56 ldc2=0 csA=1 csB=1 csC=1 alpha=1 beta=1 moves="-DMoveA -DMoveB" cleanup=0 MCC=/usr/bin/gcc  MMFLAGS="-fomit-frame-pointer -O" mmobjs=dmm.o \
                mmlib="/scratch/camm/ATLAS/lib/Linux/libatlas.a -lm"
make[1]: Entering directory `/scratch/camm/ATLAS/tune/blas/gemm/Linux'
rm -f dmmtst.o
/usr/bin/gcc  -DL2SIZE=4194304 -I/scratch/camm/ATLAS/include -I/scratch/camm/ATLAS/include/Linux -I/scratch/camm/ATLAS/include/contrib  -DAdd__ -DStringSunStyle -DATL_OS_Linux -DATL_SSE1 -fomit-frame-pointer -O3 -funroll-all-loops -DdREAL -DtranAt -DtranBn \
              -DMULADD=1 -DLAT=4 -DJIK \
              -DMB0=0 -DNB0=0 -DKB0=56 \
              -DMB=0 -DNB=0 -DKB=56 \
              -DKU=1 -DNU=4 -DMU=4 \
              -DLDA=56 -DLDB=56 -DLDC=0 \
              -DcsA=1 -DcsB=1 -DcsC=1 \
              -DALPHA=1 -DBETA=1 -DMoveA -DMoveB \
              -DCLEANUP=0 \
              -o dmmtst.o -c ../mmtst.c
/usr/bin/gcc  -DL2SIZE=4194304 -I/scratch/camm/ATLAS/include -I/scratch/camm/ATLAS/include/Linux -I/scratch/camm/ATLAS/include/contrib  -DAdd__ -DStringSunStyle -DATL_OS_Linux -DATL_SSE1 -fomit-frame-pointer -O3 -funroll-all-loops -o xdmmtst dmmtst.o dmm.o \
                   /scratch/camm/ATLAS/lib/Linux/libatlas.a -lm
/scratch/camm/ATLAS/bin/Linux/ATLrun.sh /scratch/camm/ATLAS/tune/blas/gemm/Linux xdmmtst
PASSED TEST
make[1]: Leaving directory `/scratch/camm/ATLAS/tune/blas/gemm/Linux'
[camm@torc19 Linux]$ make mmutstcase pre=d kb=56 k=56 m=57 n=58 mb=0 nb=0 mmrout=../CASES/ATL_dgemm_SSE_1x1xkb.c
rm -f dmm.[o,c]
./xemit_mm -p d -b 1 -M 0 -N 0 -K 56 -R -3 \
                   -lda 56 -ldb 56 -ldc 0 > dmm.c
pre=d, CU=0, ma=0, ff=0, if=-1, nf=-1, lo=1, ta=112, tb=111, lat=4, mu=4, nu=4, ku=1, m=0, n=0, k=56, lda=56, ldb=56, ldc=0, csA=1, csB=1, csC=1, alpha=1, beta=1

cat ../CASES/ATL_dgemm_SSE_1x1xkb.c >> dmm.c
/usr/bin/gcc  -DL2SIZE=4194304 -I/scratch/camm/ATLAS/include -I/scratch/camm/ATLAS/include/Linux -I/scratch/camm/ATLAS/include/contrib  -DAdd__ -DStringSunStyle -DATL_OS_Linux -DATL_SSE1 -fomit-frame-pointer -O -c dmm.c
In file included from dmm.c:50:
/scratch/camm/ATLAS/include/contrib/camm_util.h:144:31: warning: nothing can be pasted after this token
make mmtstcase0 pre=d ta=t tb=n muladd=1 lat=4 loopO=JIK M=0 N=0 K=56 mb=0 nb=0 kb=56 mu=4 nu=4 ku=1 lda=56 ldb=56 ldc=0 lda2=56 ldb2=56 ldc2=0 csA=1 csB=1 csC=1 alpha=1 beta=1 moves="-DMoveA -DMoveB" cleanup=0 MCC=/usr/bin/gcc  MMFLAGS="-fomit-frame-pointer -O" mmobjs=dmm.o \
                mmlib="/scratch/camm/ATLAS/lib/Linux/libatlas.a -lm"
make[1]: Entering directory `/scratch/camm/ATLAS/tune/blas/gemm/Linux'
rm -f dmmtst.o
/usr/bin/gcc  -DL2SIZE=4194304 -I/scratch/camm/ATLAS/include -I/scratch/camm/ATLAS/include/Linux -I/scratch/camm/ATLAS/include/contrib  -DAdd__ -DStringSunStyle -DATL_OS_Linux -DATL_SSE1 -fomit-frame-pointer -O3 -funroll-all-loops -DdREAL -DtranAt -DtranBn \
              -DMULADD=1 -DLAT=4 -DJIK \
              -DMB0=0 -DNB0=0 -DKB0=56 \
              -DMB=0 -DNB=0 -DKB=56 \
              -DKU=1 -DNU=4 -DMU=4 \
              -DLDA=56 -DLDB=56 -DLDC=0 \
              -DcsA=1 -DcsB=1 -DcsC=1 \
              -DALPHA=1 -DBETA=1 -DMoveA -DMoveB \
              -DCLEANUP=0 \
              -o dmmtst.o -c ../mmtst.c
/usr/bin/gcc  -DL2SIZE=4194304 -I/scratch/camm/ATLAS/include -I/scratch/camm/ATLAS/include/Linux -I/scratch/camm/ATLAS/include/contrib  -DAdd__ -DStringSunStyle -DATL_OS_Linux -DATL_SSE1 -fomit-frame-pointer -O3 -funroll-all-loops -o xdmmtst dmmtst.o dmm.o \
                   /scratch/camm/ATLAS/lib/Linux/libatlas.a -lm
/scratch/camm/ATLAS/bin/Linux/ATLrun.sh /scratch/camm/ATLAS/tune/blas/gemm/Linux xdmmtst
PASSED TEST
make[1]: Leaving directory `/scratch/camm/ATLAS/tune/blas/gemm/Linux'
[camm@torc19 Linux]$ make mmutstcase pre=z kb=56 k=56 m=57 n=58 mb=0 nb=0 mmrout=../CASES/ATL_dgemm_SSE_1x1xkb.c
rm -f zmm.[o,c]
./xemit_mm -p z -b 1 -M 0 -N 0 -K 56 -R -3 \
                   -lda 56 -ldb 56 -ldc 0 > zmm.c
pre=z, CU=0, ma=0, ff=0, if=-1, nf=-1, lo=1, ta=112, tb=111, lat=4, mu=4, nu=4, ku=1, m=0, n=0, k=56, lda=56, ldb=56, ldc=0, csA=1, csB=1, csC=1, alpha=1, beta=1

cat ../CASES/ATL_dgemm_SSE_1x1xkb.c >> zmm.c
/usr/bin/gcc  -DL2SIZE=4194304 -I/scratch/camm/ATLAS/include -I/scratch/camm/ATLAS/include/Linux -I/scratch/camm/ATLAS/include/contrib  -DAdd__ -DStringSunStyle -DATL_OS_Linux -DATL_SSE1 -fomit-frame-pointer -O -c zmm.c
In file included from zmm.c:50:
/scratch/camm/ATLAS/include/contrib/camm_util.h:144:31: warning: nothing can be pasted after this token
make mmtstcase0 pre=z ta=t tb=n muladd=1 lat=4 loopO=JIK M=0 N=0 K=56 mb=0 nb=0 kb=56 mu=4 nu=4 ku=1 lda=56 ldb=56 ldc=0 lda2=56 ldb2=56 ldc2=0 csA=1 csB=1 csC=1 alpha=1 beta=1 moves="-DMoveA -DMoveB" cleanup=0 MCC=/usr/bin/gcc  MMFLAGS="-fomit-frame-pointer -O" mmobjs=zmm.o \
                mmlib="/scratch/camm/ATLAS/lib/Linux/libatlas.a -lm"
make[1]: Entering directory `/scratch/camm/ATLAS/tune/blas/gemm/Linux'
rm -f zmmtst.o
/usr/bin/gcc  -DL2SIZE=4194304 -I/scratch/camm/ATLAS/include -I/scratch/camm/ATLAS/include/Linux -I/scratch/camm/ATLAS/include/contrib  -DAdd__ -DStringSunStyle -DATL_OS_Linux -DATL_SSE1 -fomit-frame-pointer -O3 -funroll-all-loops -DzREAL -DtranAt -DtranBn \
              -DMULADD=1 -DLAT=4 -DJIK \
              -DMB0=0 -DNB0=0 -DKB0=56 \
              -DMB=0 -DNB=0 -DKB=56 \
              -DKU=1 -DNU=4 -DMU=4 \
              -DLDA=56 -DLDB=56 -DLDC=0 \
              -DcsA=1 -DcsB=1 -DcsC=1 \
              -DALPHA=1 -DBETA=1 -DMoveA -DMoveB \
              -DCLEANUP=0 \
              -o zmmtst.o -c ../mmtst.c
/usr/bin/gcc  -DL2SIZE=4194304 -I/scratch/camm/ATLAS/include -I/scratch/camm/ATLAS/include/Linux -I/scratch/camm/ATLAS/include/contrib  -DAdd__ -DStringSunStyle -DATL_OS_Linux -DATL_SSE1 -fomit-frame-pointer -O3 -funroll-all-loops -o xzmmtst zmmtst.o zmm.o \
                   /scratch/camm/ATLAS/lib/Linux/libatlas.a -lm
zmmtst.o: In function `mmtst':
zmmtst.o(.text+0xa6e): undefined reference to `ATL_zJIK0x0x56TN56x56x0_a1_bX'
zmmtst.o(.text+0xadf): undefined reference to `ATL_zJIK0x0x56TN56x56x0_a1_bX'
collect2: ld returned 1 exit status
make[1]: *** [mmtstcase0] Error 1
make[1]: Leaving directory `/scratch/camm/ATLAS/tune/blas/gemm/Linux'
make: *** [mmutstcase] Error 2
[camm@torc19 Linux]$ make cmmutstcase pre=z kb=56 k=56 m=57 n=58 mb=0 nb=0 mmrout=../CASES/ATL_dgemm_SSE_1x1xkb.c
rm -f zmm.c
make BuildCobjs pre=z mb=0 nb=0 kb=56 \
                lda=56 ldb=56 ldc=0 mmrout=../CASES/ATL_dgemm_SSE_1x1xkb.c
make[1]: Entering directory `/scratch/camm/ATLAS/tune/blas/gemm/Linux'
rm -f zmm_b[1,0,X].[o,c]
./xemit_mm -p z -b 0 -M 0 -N 0 -K 56 -R -3 \
                   -lda 56 -ldb 56 -ldc 0 > zmm_b0.c
pre=z, CU=0, ma=0, ff=0, if=-1, nf=-1, lo=1, ta=112, tb=111, lat=4, mu=4, nu=4, ku=1, m=0, n=0, k=56, lda=56, ldb=56, ldc=0, csA=1, csB=1, csC=1, alpha=1, beta=0

cat ../CASES/ATL_dgemm_SSE_1x1xkb.c >> zmm_b0.c
./xemit_mm -p z -b 1 -M 0 -N 0 -K 56 -R -3 \
                   -lda 56 -ldb 56 -ldc 0 > zmm_b1.c
pre=z, CU=0, ma=0, ff=0, if=-1, nf=-1, lo=1, ta=112, tb=111, lat=4, mu=4, nu=4, ku=1, m=0, n=0, k=56, lda=56, ldb=56, ldc=0, csA=1, csB=1, csC=1, alpha=1, beta=1

cat ../CASES/ATL_dgemm_SSE_1x1xkb.c >> zmm_b1.c
./xemit_mm -p z -b 8 -M 0 -N 0 -K 56 -R -3 \
                   -lda 56 -ldb 56 -ldc 0 > zmm_bX.c
pre=z, CU=0, ma=0, ff=0, if=-1, nf=-1, lo=1, ta=112, tb=111, lat=4, mu=4, nu=4, ku=1, m=0, n=0, k=56, lda=56, ldb=56, ldc=0, csA=1, csB=1, csC=1, alpha=1, beta=8

cat ../CASES/ATL_dgemm_SSE_1x1xkb.c >> zmm_bX.c
/usr/bin/gcc  -DL2SIZE=4194304 -I/scratch/camm/ATLAS/include -I/scratch/camm/ATLAS/include/Linux -I/scratch/camm/ATLAS/include/contrib  -DAdd__ -DStringSunStyle -DATL_OS_Linux -DATL_SSE1 -fomit-frame-pointer -O -c zmm_b1.c 
In file included from zmm_b1.c:50:
/scratch/camm/ATLAS/include/contrib/camm_util.h:144:31: warning: nothing can be pasted after this token
/usr/bin/gcc  -DL2SIZE=4194304 -I/scratch/camm/ATLAS/include -I/scratch/camm/ATLAS/include/Linux -I/scratch/camm/ATLAS/include/contrib  -DAdd__ -DStringSunStyle -DATL_OS_Linux -DATL_SSE1 -fomit-frame-pointer -O -c zmm_b0.c 
In file included from zmm_b0.c:50:
/scratch/camm/ATLAS/include/contrib/camm_util.h:144:31: warning: nothing can be pasted after this token
/usr/bin/gcc  -DL2SIZE=4194304 -I/scratch/camm/ATLAS/include -I/scratch/camm/ATLAS/include/Linux -I/scratch/camm/ATLAS/include/contrib  -DAdd__ -DStringSunStyle -DATL_OS_Linux -DATL_SSE1 -fomit-frame-pointer -O -c zmm_bX.c 
In file included from zmm_bX.c:50:
/scratch/camm/ATLAS/include/contrib/camm_util.h:144:31: warning: nothing can be pasted after this token
make[1]: Leaving directory `/scratch/camm/ATLAS/tune/blas/gemm/Linux'
make mmtstcase0 pre=z ta=t tb=n muladd=1 lat=4 loopO=JIK M=0 N=0 K=56 mb=0 nb=0 kb=56 mu=4 nu=4 ku=1 lda=56 ldb=56 ldc=0 lda2=56 ldb2=56 ldc2=0 csA=1 csB=1 csC=1 alpha=1 beta=1 moves="-DMoveA -DMoveB" cleanup=0 MCC=/usr/bin/gcc  MMFLAGS="-fomit-frame-pointer -O" csC=2 \
                mmobjs="zmm_b0.o zmm_b1.o zmm_bX.o" \
                mmlib="/scratch/camm/ATLAS/lib/Linux/libatlas.a -lm"
make[1]: Entering directory `/scratch/camm/ATLAS/tune/blas/gemm/Linux'
rm -f zmmtst.o
/usr/bin/gcc  -DL2SIZE=4194304 -I/scratch/camm/ATLAS/include -I/scratch/camm/ATLAS/include/Linux -I/scratch/camm/ATLAS/include/contrib  -DAdd__ -DStringSunStyle -DATL_OS_Linux -DATL_SSE1 -fomit-frame-pointer -O3 -funroll-all-loops -DzREAL -DtranAt -DtranBn \
              -DMULADD=1 -DLAT=4 -DJIK \
              -DMB0=0 -DNB0=0 -DKB0=56 \
              -DMB=0 -DNB=0 -DKB=56 \
              -DKU=1 -DNU=4 -DMU=4 \
              -DLDA=56 -DLDB=56 -DLDC=0 \
              -DcsA=1 -DcsB=1 -DcsC=2 \
              -DALPHA=1 -DBETA=1 -DMoveA -DMoveB \
              -DCLEANUP=0 \
              -o zmmtst.o -c ../mmtst.c
/usr/bin/gcc  -DL2SIZE=4194304 -I/scratch/camm/ATLAS/include -I/scratch/camm/ATLAS/include/Linux -I/scratch/camm/ATLAS/include/contrib  -DAdd__ -DStringSunStyle -DATL_OS_Linux -DATL_SSE1 -fomit-frame-pointer -O3 -funroll-all-loops -o xzmmtst zmmtst.o zmm_b0.o zmm_b1.o zmm_bX.o \
                   /scratch/camm/ATLAS/lib/Linux/libatlas.a -lm
/scratch/camm/ATLAS/bin/Linux/ATLrun.sh /scratch/camm/ATLAS/tune/blas/gemm/Linux xzmmtst
PASSED TEST
make[1]: Leaving directory `/scratch/camm/ATLAS/tune/blas/gemm/Linux'


-- 
Camm Maguire			     			camm@enhanced.com
==========================================================================
"The earth is but one country, and mankind its citizens."  --  Baha'u'llah