------------------------------------------------------------------------------- Name of Program : APPLU APPSP APPBT ------------------------------------------------------------------------------- Submitter's Name : David H. Bailey Submitter's Organization: NASA Ames Research Center Submitter's Address : Mail Stop T27A-1 Moffett Field, CA 94035-1000 Submitter's Telephone # : 415-604-4410 Submitter's Fax # : 415-604-3957 Submitter's Email : dbailey@nas.nasa.gov ------------------------------------------------------------------------------- Major Application Field : Computational Fluid Dyamics Application Subfield(s) : None ------------------------------------------------------------------------------- Application "pedigree" (origin, history, authors, major mods) : These three codes constitute the "CFD application benchmarks" of the NAS Parallel Benchmark suite. They are regarded by those of us who have developed this suite as the most important and relevant to NASA's applications of the NPB. These three have been part of the original suite since its establishment as a "paper and pencil" benchmark 1991. Numerous vendors have submitted and updated performance reports for these benchmarks. ------------------------------------------------------------------------------- May this code be freely distributed (if not specify restrictions) : This code may be freely distributed internationally. ------------------------------------------------------------------------------- Give length in bytes of integers and floating-point numbers that should be used in this application: All floating point data and operations are 64-bit operations. There is no restriction on integer sizes -- both 32-bit and 64-bit may be used. ------------------------------------------------------------------------------- Documentation describing the implementation of the application (at module level, or lower) : D. Bailey, E. Barszcz, J. Barton, D. Browning, R. Carter, L. Dagum, R. Fatoohi, S. Fineberg, P. Frederickson, T. Lasinski, R. Schreiber, H. Simon, V. Venkatakrishnan and S. Weeratunga, The NAS Parallel Benchmarks, RNR Technical Report RNR-94-007, March 1994. ------------------------------------------------------------------------------- Research papers describing sequential code and/or algorithms : See previous. ------------------------------------------------------------------------------- Research papers describing parallel code and/or algorithms : V. K. Naik, ``Performance Issues in Implementing NAS Parallel Benchmark Applications on IBM SP-1'', Research Report, T.J. Watson Research Center, IBM, (in preparation) 1993. E. Barszcz, R. Fatoohi, V. Venkatakrishnan, and S. Weeratunga, ``Solution of Regular Sparse Triangular Linear Systems on Vector and Distributed Memory Multiprocessors'', Tech Report RNR-93-07, NASA Ames Research Center, Moffett Field, CA 94035, April 1993. Rod Fatoohi and Sisira Weeratunga, "Performance Evaluation of Three Distributed Computing Environments for Scientific Applications", to appear, Proceedings of Supercomputing '94. ------------------------------------------------------------------------------- Other relevant research papers: ------------------------------------------------------------------------------- Application available in the following languages (give message passing system used, if applicable, and machines application runs on) : Three version of the code are available from us: 1) Sequential (reduced-size problems that will run on workstation). 2) Parallel (for Intel iPSC/860 or Paragon). 3) Parallel (for CM-2 or CM-5). At least one other scientist (Sundarem) has implemented the benchmarks in PVM. ------------------------------------------------------------------------------- Total number of lines in source code: APPBT: 4434 APPLU: 3262 APPSP: 3493 Number of lines excluding comments : APPBT: 4258 APPLU: 3104 APPSP: 3305 Size in bytes of source code : APPBT: 148186 APPLU: 96370 APPSP: 92783 ------------------------------------------------------------------------------- List input files (filename, number of lines, size in bytes, and if formatted) : APPBT: appbt.A.inp - Class A, 21 lines 812 bytes appbt.B.inp - Class B, 21 lines 812 bytes APPLU: applu.A.inp - Class A, 26 lines, 918 bytes applu.B.inp - Class B, 26 lines, 919 bytes APPSP: appsp.A.inp - Class A, 26 lines, 814 bytes appsp.B.inp - Class B, 26 lines, 817 bytes ------------------------------------------------------------------------------- List output files (filename, number of lines, size in bytes, and if formatted) : standard output: formatted text ------------------------------------------------------------------------------- Brief, high-level description of what application does: These programs perform fluid dynamics flow simulation. They are stripped of complexities associated with real CFD application programs, thereby enabling a simpler description of the algorithms. However, they do reproduce the essential computation and data motion characteristics of large scale, state of the art CFD codes. ------------------------------------------------------------------------------- Main algorithms used: Each employs a different high-level solution scheme. See below. ------------------------------------------------------------------------------- Skeleton sketch of application: The three problems are: LU: A regular-sparse, block (5 x 5) lower and upper triangular system solution. This problem represents the computations associated with the implicit operator of a newer class of implicit CFD algorithms, typified at NASA Ames by the code ``INS3D-LU''. This problem exhibits a somewhat limited amount of parallelism compared to the next two. SP: Solution of multiple, independent systems of non diagonally dominant, scalar, pentadiagonal equations. SP and the following problem BT are representative of computations associated with the implicit operators of CFD codes such as ``ARC3D'' at NASA Ames. SP and BT are similar in many respects, but there is a fundamental difference with respect to the communication to computation ratio. BT: Solution of multiple, independent systems of non diagonally dominant, block tridiagonal equations with a (5 x 5) block size. ------------------------------------------------------------------------------- Brief description of I/O behaviour: Only a brief output at final completion. ------------------------------------------------------------------------------- Describe the data distribution (if appropriate) : This may be done in any of several ways. On most systems, the main data arrays are decomposed in all three dimensions. ------------------------------------------------------------------------------- Give parameters of the data distribution (if appropriate) : See above. ------------------------------------------------------------------------------- Brief description of load balance behavior : Since the arrays involved are static, load balancing only depends how well the individual arrays can be divided between processors. ------------------------------------------------------------------------------- Give parameters that determine the problem size : See table below. ------------------------------------------------------------------------------- Give memory as function of problem size : We do not have a formula for this. See table below. ------------------------------------------------------------------------------- Give number of floating-point operations as function of problem size : We do not have a formula for this. See table below. ------------------------------------------------------------------------------- Give communication overhead as function of problem size and data distribution : We do not have a formula for this. It depends on implementation. ------------------------------------------------------------------------------- Give three problem sizes, small, medium, and large for which the benchmark should be run (give parameters for problem size, sizes of I/O files, memory required, and number of floating point operations) : Class A size problem, with Y-MP/1 statistics: Benchmark code Problem Memory Time Rate size (Mw) (sec) (Mflop/s) LU (LU) 64^3 30 344 189 Pentadiagonal (SP) 64^3 6 806 175 Block tridiagonal (BT) 64^3 24 923 192 Class B size problem, with C-90/1 statistics: Benchmark code Problem Memory Time Rate size (Mw) (sec) (Mflop/s) LU (LU) 102^3 122 1973 162 Pentadiagonal (SP) 102^3 22 2160 207 Block tridiagonal (BT) 102^3 96 3554 203 ------------------------------------------------------------------------------- How did you determine the number of floating-point operations (hardware monitor, count by hand, etc.) : HPM on Crays ------------------------------------------------------------------------------- Other relevant information: -------------------------------------------------------------------------------
PARKBENCH compact applications page