================================================================== === === === GENESIS Distributed Memory Benchmarks === === === === QCD1 === === === === Monte-Carlo Simulation of the (3+1)-Dimensional Pure === === SU(3) Lattice Gauge Theory === === === === Author: Eckardt Kehl === === PALLAS GmbH === === Hermulheimer Str. 10 === === 5040 Bruhl, GERMANY === === tel.:+49-2232-18960 e-mail:karls@pallas-gmbh.de === === === === Copyright: PALLAS GmbH === === === === Last update: June 1993; Release: 2.2 === === === ================================================================== 1. Description -------------- This benchmark is based on a 'pure gluon' SU(3) lattice gauge theory simulation, using the Monte-Carlo Metropolis technique. It differs from the QCD2 benchmark in that it uses the 'quenched' approximation which neglects dynamical fermions. The simulation is defined on a four-dimensional lattice which is a discrete approximations to continuum space-time. The basic variables are 3 by 3 complex matrices. Four such matrices are associated with every lattice site. The lattice update is performed using a multi-hit Metropolis algorithm. In the parallel version of the program, the lattice can be distributed in any one or more of the four lattice directions. 2. Operating Instructions ------------------------- File I/O : The distributed version reads an input file, "qcd1.dat" to determine the required lattice size and number of processors. Further information on this is given below. A permanent record of the benchmark run is saved in a file called "result". This contains information on the lattice size and the number of processes over which the problem is distributed in each lattice direction, and some information on the physical solution for each iteration. The information for each iteration is also output to standard output on channel 6 to give some idea of how the run is progressing. Changing problem size and numbers of processes: ----------------------------------------------- The problem is based on a 4-dimensional space-time lattice of size: N = NS**3 * NT. For the purposes of the benchmark, NS & NT are specified as integer powers of 2, so that: NS = 2**LOGNS , NT = 2**LOGNT In the parallel version of the program the number of processors (NP) over which the lattice is distributed is determined by the input parameter LOGP, which is the log to base 2 of the required number of processors, ie. NP = 2**LOGP. The specified number of processors is configured as a 4D grid internally within the program. NP = NPX * NPY * NPZ * NPT Where NPX, NPY, NPZ & NPT are all powers of two, NPT >= NPZ >= NPY >= NPX. The local lattice size on each processor is then: n = (NS/NPX) * (NS/NPY) * (NS/NPZ) * (NT/NPT) In the sequential version of the program the lattice size is set by changing the values of LOGNS & LOGNT in PARAMETER statements in the include file qcd1.inc In the parallel version of the program the parameters LOGNS, LOGNT & LOGP are read from the input data file qcd1.dat. The maximum number of processes in each dimension are specified by PARAMETER statements in the include file `qcd1h.inc', if any of these values (normally 4) are exceeded the program prints an error message and the program terminates. Similarly the maximum local lattice dimensions are specified by PARAMETER statements in the include file `qcd1n.inc', an error is again notified if any of these maximum dimensions is exceeded. These maximum values can be changed by altering the PARAMETER statements, but care must be taken not to exceed the available node memory as a consequence. The node memory requirement is given very approximately by the expression: Node Memory (Mbyte) = (NXD+2) * (NYD+2) * (NZD+2) * (NTD+2) / 1000. Where NXD, NYD, NZD, NTD are the maximum local lattice dimensions To give a rough feel for the approximate node memory requirement - If NTD = 8 & NXD = NYD = NZD = 4, the approximate node memory required for arrays is 2.2 Mbyte, If NXD = NYD = NZD = NTD = 8, the approximate node memory required for arrays is 10 Mbyte. Suggested Problem Sizes: ------------------------ It is recommended that the benchmark is run for four standard problem sizes with the input parameters given in the following table: Problem Size LOGNS LOGNT 4**3 * 16 2 4 8**3 * 16 3 4 16**3 * 16 4 4 32**3 * 16 5 4 Compiling and Running The Benchmark: ------------------------------------ 1) Choose problem size and number of processes. In the sequential version this is done by editing PARAMETER statements in the file qcd1.inc. In the distributed version the problem size and number of processes in each dimension is set in the input data file qcd1.dat. Upper limits for the numbers of processes are set in the include file qcd1h.inc. Similarly the upper limits for the local lattice size are set in the file qcd1n.inc. These upper limits may be changed but care should be taken not to exceed the available node memory (see above). 2) To compile and link the benchmark type: `make' for the distributed version or `make slave' for the single-node version. 3) If any of the parameters in the include files are changed, the code has to be recompiled. The make-file will automatically send to the compiler only affected files, Type make 4) On some systems it may be necessary to allocate the appropriate resources before running the benchmark, eg. on the iPSC/860 to reserve a cube of 8 processors, type: getcube -t8 5) To run either sequential or distributed version of the benchmark, type: qcd1 The progress of the benchmark execution can be monitored via the standard output, whilst a permanent copy of the benchmark output is written to a file called 'result'. 6) If the run is successful and a permanent record is required, the file 'result' should be copied to another file before the next run overwrites it. Vectorization: ------------- The program has been written completely in a vectorizable form. The vector length equals half the lattice volume. The most important subroutines for vectorization are: PRO, STAPLE, MERTRO, ADD, GATHER, SCATTER and ACCEPT. $Id: ReadMe,v 1.2 1994/04/20 17:19:30 igl Rel igl $
Submitted by Mark Papiani,
last updated on 10 Jan 1995.