%A R. Schreiber %T Engineering and Scientific Subroutine Library, Module Design Specification %V 1 %K saxpy %J SAXPY Computer Corporation, 255 San Geronimo Way, Sunnyvale, CA 94086 %D 1986 %L (Schreiber, 1986) %A J.J. Dongarra %A T. Hewitt %L (Dongarra and Hewitt, 1986) %T Implementing Dense Linear Algebra Algorithms Using Multitasking on the CRAY X-MP-4 %J SIAM J. Sci Stat. Comp. %V 7, 1 %D January, 1986 %P 347-350 %A D. Lawrie %A A. Sameh %T The Computation and Communication Complexity of Parallel Banded System Solvers %J ACM TOMS %D 1985 %A J.J. Dongarra %A A. Sameh %T On Some Parallel Banded System Solvers %J Parallel Computing %V 1, 3 %D Dec. 1984 %P 223-235 %A U. Meier %T A Parallel Partition Method for Solving Banded Systems of Linear Equations %J Parallel Computing %V 2 %D 1985 %P 33-45 %A C. Ashcroft %T Parallel Reduction Methods for the Solution of Banded Systems of Equations %J General Moters Research Laboratories, GMR-5094 %D June 1985 %A L. Shampine %A R. Allen %T Numerical Computing: An Introduction %I W. B. Saunders Company %D 1973 %C Philadelpha %A J.J. Dongarra %A J. Bunch %A C. Moler %A G. Stewart %L (Dongarra \f2et al.\f1, 1976) %T LINPACK Users' Guide %I SIAM Pub. %D 1976 %C Philadelphia %A C. Lawson %A R. Hanson %A D. Kincaid %A F. Krogh %L (Lawson \f2et al.\f1, 1979a) %T Basic Linear Algebra Subprograms for Fortran Usage %K paper %J ACM Transactions on Mathematical Software %D 1979 %V 5 %P 308-323 %A C. Lawson %A R. Hanson %A D. Kincaid %A F. Krogh %L (Lawson \f2et al.\f1, 1979b) %K algorithm %T Algorithm 539: Basic Linear Algebra Subprograms for Fortran Usage %J ACM Transactions on Mathematical Software %D 1979 %V 5 %P 324-325 %A J.J. Dongarra %A F. Gustavson %A A. Karp %L (Dongarra \f2et al.\f1, 1984) %T Implementing Linear Algebra Algorithms for Dense Matrices on a Vector Pipeline Machine %J SIAM Review %V 26, 1 %P 91-112 %D Jan. 1984 %A J.J. Dongarra %A D.C. Sorensen %L (Dongarra and Sorensen, 1986) %T An Aid to Fortran Programming and Debugging %R ANL MCS - TM in preparation %D 1986 %A J.J. Dongarra %A D.C. Sorensen %L (Dongarra and Sorensen, 1986) %T A Portable Environment for Developing Parallel Fortran Programs %J Parallel Computing %K loen %V 5, 1 %P 175-186 %D July 1987 %A J.J. Dongarra %A D.C. Sorensen %L (Dongarra and Sorensen, 1986) %T On Environment for Implementing Explicite Parallel Processing in Fortran %R ANL MCS - TM 79 %D 1986 %A J.J. Dongarra %A S. C. Eisenstat %L (Dongarra and Eisenstat, 1986) %T Squeezing the Most out of an Algorithm in Cray Fortran %J ACM Trans. Math. Software %V 10, 3 %D 1984 %P 221-230 %T Handbook for Automatic Computation: Volume II - Linear Algebra %I Springer-Verlag, New York %D 1971 %A J. Wilkinson %A C. Reinsch %L (Wilkinson and Reinch, 1971) %A J. Wilkinson %T private communication %D 1976 %L (Wilkinson, 1976) %A D. Evans %A M. Hatzopoulos %L (Evans and Hatzopoulos, 1979) %T The Solution of Certain Banded Systems of Linear Equations using the Folding Algorithm %J Computer Journal %V 19 %D 1976 %P 184-187 %A Bhatt S.N %A Ipsen I.C.F %T How to Embed Trees in Hypercubes %R Yale University, Dept. of Computer Science Report, YALEU/CSD/RR-443 %D December 1985 %A Crowther W %A Goodhue J. %A Starr E %A Thomas R %A Milliken W %A Blackadar T %T Performance Measurements on a 128-node Butterfly Parallel Processor %J Proceedings of the 1985 International Conference on Parallel Processing, IEEE Computer Society %D 1985 %P 531-540 %A Desphande S.R %A Jenevin R.M %T Scalability of a Binary Tree on a Hypercube %R University of Texas at Austin Report, (TR-86-01 %D January 1986 %A George A %A Liu J. W %T Computer Solution of Large Sparse Positive Definite Systems %I Prentice-Hall, Englewood Cliffs, N.J %D 1981 %A George A %T Nested Dissection of a Regular Finite Element Mesh %J SIAM J. on Numer. Anal %V 10 %P 345-363 %D 1973 %A Gottlieb A. %A Grishman R. %A Kruskal C.P. %A McAuliffe K.P. %A Rudolph L. %A Snir M %T The NYU Ultracomputer - Designing an MIMD Shared Memory Parallel Computer %J IEEE Trans. Computers %V C-32 %D 1983 %P 175-189 %A Ho C.-T. %A Johnsson S.L %T Tree Embeddings and Optimal Routing in Hypercubes %R Yale University, Dept. of Computer Science %D Report in preparation %A Jalby W. %A Meier U %T Optimizing Matrix Operations on a Parallel Multiprocessor with a Memory Hierarchy %R Univ. of Illinois, Center for Supercomputer Research and Development %D 1986 %A Johnsson S.L %T Solving Tridiagonal Systems on Ensemble Architectures %J SIAM J. Sci. Stat. Comp Also available as Yale University Report YALEU/CSD/RR-436, November 1985 %D 1986 %K 436 %A Johnsson S.L %T Solving Narrow Banded Systems on Ensemble Architectures %J ACM TOMS %V 11 %D November 1985 %A Johnsson S.L %T Band Matrix Systems Solvers on Ensemble Architectures %R Yale University Report YALEU/CSD/RR-388 %D 1985 %K 388 %A Johnsson S.L %T Data Permutations and Basic Linear Algebra Computations on Ensemble Architectures %R Yale University, Dept. of Computer Science Report YALEU/CSD/RR-367 %D February, 1985 %K 367 %A Johnsson S.L %T Communication Efficient Basic Linear Algebra Computations on Hypercube Architectures %R Dept. of Computer Science, Yale University Report, YALEU/CSD/RR-361 %D January, 1985 %K 361 %A Johnsson S.L %T Fast Banded Systems Solvers for Ensemble Architectures %R Department of Computer Science, Yale University Report, YALEU/CSD/RR-379 %D March, 1985 %K 379 %A Johnsson S.L %T Dense Matrix Operations on a Torus and a Boolean Cube %J The National Computer Conference, AFIPS %D July, 1985 %K afips %A Johnsson S.L %T Odd-Even Cyclic Reduction on Ensemble Architectures and the Solution Tridiagonal Systems of Equations %R Dept. of Computer Science, Yale University Report, YALE/CSD/RR-339 %D October, 1984 %K 339 %A McBryan O.A. %A Van de Velde E.F %T Hypercube Algorithms and Implementations %R Courant Institute of Mathematical Sciences, New York University %D November, 1985 %A Reingold E.M. %A Nievergelt J., Deo N %T Combinatorial Algorithms %I Prentice-Hall %D 1977 %A Read R %A Rose D.J %T A Graph-Theoretic Study of the Numerical Solution of Sparse Positive Definite Systems of Linear Equations %T Graph Theory and Computations %I Academic Press %P 183-217 %D 1973 %A Pfister G.F. %A Brantley W.C. %A George D.A. %A Harvey S.L %A Kleinfelder W.J. %A McAuliffe K.P. %A Melton E.A. %A Norton V.A. %A Weiss J %T The IBM Research Parallel Processor Prototype (RP3): Introduction and Architecture %L (Pfister \f2et al.\f1, 1985) %J Proceedings of the 1985 International Conference on Parallel Processing, IEEE Computer Society %P 764-771 %D 1985 %A Saad Y. %A Schultz M.H %T Data Communication in Hypercubes %R Dept. of Computer Science, Yale University Report, RR YALEU/DCS/RR-428 %D October, 1985 %A Schwartz J.T %T Ultracomputers %J ACM Trans. on Programming Languages and Systems %V 2 %D 1980 %P 484-521 %A Smith B.J %T Architecture and Applications of the HEP Multiprocessor Computer System %J Real-Time Signal Processing IV, Proc. of SPIE %P 241-248 %D 1981 %T Matrix Eigensystem Routines - EISPACK Guide, Second Edition, %P Springer-Verlag, Lecture Notes in Computer Science %V 6 %D 1976 %A B.T. Smith %A J.M. Boyle %A J.J. Dongarra %A B.S. Garbow %A Y. Ikebe %A V. Klema %A C. Moler %T Matrix Eigensystem Routines - EISPACK Guide Extension %P Springer-Verlag, Lecture Notes in Computer Science %V 51 %D 1977 %A B.S. Garbow %A J.M. Boyle %A J.J. Dongarra %A C.B. Moler %T Squeezing the Most Out of High Performance Computers for Finding the Eigenvalues %A J.J. Dongarra %A L. Kaufman %A S. Hammarling %J Linear Algebra and Its Applications %V 77 %D 1986 %P 113-136 %A J.J. Dongarra %A D.C. Sorensen %K referbug germany %L (Dongarra and Sorensen, 1986) %T Linear Algebra on High-Performance Computers %B Proceedings Parallel Computing 85 %E U. Schendel %I North Holland %P 3-32 %D 1986 %T Performance of Various Computers Using Standard Linear Equations Software in a Fortran Environment %A J.J. Dongarra %K linpack benchmark %L (Dongarra, 1986) %R Argonne National Laboratory MCS-TM-23 %D April, 1987 %A Sorensen D.C %T Buffering for Vector Performance on a Pipelined MIMD Machine %J Parallel Computing %V 1 %D 1984 %P 143-164 %A Johnsson S.L %T "Gaussian Elimination on Sparse Matrices and Concurrency" %R Caltech Computer Science Department, 1980, 4087, TR:80 %K 4087 %A M.W. Gentleman %T Private Communications %D 1985 %A M.W. Gentleman %T Implementing Nested Dissection %R Dept. of Computer Science, Univ. of Waterloo, Research report CS-82-03 %D 1982 %A Wing O. %A Huang J.W %T A Computational Model of Parallel Solution of Linear Equations %J IEEE Trans. Computers %V C-29 %D 1980 %P 632-638 %A Liu J.W.H %T Computational Models and Task Scheduling for Parallel Sparse Cholesky Factorization %R Dept. of Computer Science, York University, Downsview, Ontario, Technical report CS-85-01 %D 1985 %A George A. %A Heath M. %A Liu J. %A Ng E %T Sparse Cholesky Factorization on a Local-Memory Multiprocessor %R Department of Computer Science, York University, Downsview, Ontario, Technical report CS-86-01 %D 1986 %A Worley Patrick H. %A Schreiber Robert %T Nested dissection on a Mesh-Connected Processor Array %R Stanford University, Cent. Large Scale Sci. Computation, CLaSSiC-85-08 %D 1985 %A Flynn M.J %T Very High-Speed Computing Systems %J Proc. of the IEEE %P 1901-1909 %V 12 %D 1966 %A Seitz C.L %T The Cosmic Cube %J Communications of the ACM %V 28, 1 %D 1985 %P 22-33 %A Hillis W.D %B The Connection Machine %I MIT Press %D 1985 %A R.G. Babb %T Parallel Processing with Large Grain Data Flow Techniques %J IEEE Computer %V 17, 7 %P 55-61 %D July 1984 %A J.C. Browne %T Framework for Formulation and Analysis of Parallel Computation Structures %J Parallel Computing %V 3 %P 1-9 %D 1986 %A %T CRAY 2 Multitasking Users Guide %I Cray Research Inc %C Minn, MN %D 1986 %A R. Chin %A G. Hedstrom %A F. Howes %A J. McGraw %T Parallel Computation of Multiple-Scale Problems %B New Computing Environments: Parallel, Vector, and Systolic %E Ed. A. Wouk %I Siam Pub. %C Philadelphia %D 1986 %P 134-151 %A J.C. Diaz %T Calculating the Block Preconditioner on Parallel Multivector Processors %J Proceedings of the Workshop on Applied Computing in The Energy Field %C Stillwater, Oklahoma, %D October 10, 1986 %A J.J. Dongarra %A I.S. Duff %L (Dongarra and Duff, 1987) %T Advanced Architecture Computers %R Argonne National Laboratory Report, ANL-MCS-TM-57 (Revision 1) %D January, 1987 %A J.J. Dongarra %A D.C. Sorensen %T A Fully Parallel Algorithm for the Symmetric Eigenvalue Problem %J SIAM SISSC %V 8, 2 %D March, 1987 %A H. Jordan %T HEP Architecture, Programming and Performance %B Parallel MIMD Compuation: HEP Supercomputer and Its Applications %E Ed. J. Kowalik %I MIT Press %D 1985 %A E. Lusk %A R. Overbeek %T Implementation of Monitors with Macros: A Programming Aid for the HEP and Other Parallel Processors %R Argonne National Laboratory Report, ANL-83-97 %D 1983 %A J.R. McGraw, et al., %T SISAL: Streams and Iteration in a Single Assignment Language %R Language Reference Manual, Ver. 1.2 %C Lawerence Livermore National Laboratory M-146 %A John VanRosendale %A Piyush Mehrotra %T The BLAZE Language: A Parallel Language for Scientific Programming %R ICASE Report # 85-29 %D 1985 %A Alliant Computer Systems Corp %T Alliant FX/Fortran Programmer's Handbook %C Acton Mass. %D 1985 %A J.J.M. Cuppen %T A Divide and Conquer Method for the Symmetric Tridiagonal Eigenproblem %J Numerische Mathematik %V 36 %P 177-195 %D 1981 %A S. Comer %T Private Communication %D 1986 %A Mark Guzzi %T Guy at U of I %A S. Lo %A B. Philippe %A A. Sameh %T A Parallel Algorithm for the Real Symmetric Tridiagonal Eigenvalue Problem %R Center for Supercomputing Research and Development Report, University of Illinois, Urbana Illinois %D Nov. 1985 %A J. Francis %T The QR Transformation, Parts I and II %J Computer Journal %V 4 %P 265-271, 332-345 %A J.J. Dongarra %A J. DuCroz %A S. Hammarling %A R. Hanson %L (Dongarra \f2et al.\f1, 1986a) %T An Extended Set of Fortran Basic Linear Algebra Subprograms %K paper %R Argonne National Laboratory Report, ANL-MCS-TM-41 (Revision 3) %D November 1986 %A J.J. Dongarra %A J. DuCroz %A S. Hammarling %A R. Hanson %L (Dongarra \f2et al.\f1, 1986b) %T An Extended Set of Basic Linear Algebra Subprograms: Model Implementation and Test Programs %K algorithm %R Argonne National Laboratory Report, ANL-MCS-TM-81 %D November, 1986 %A J.J. Dongarra %A D. Sorensen %T SCHEDULE: Tools for Developing and Analyzing Parallel Fortran Programs %R Argonne National Laboratory Report, ANL-MCS-TM-86 %D November 1986 %A J.J. Dongarra %A L. Johnsson %T Solving Banded Systems on a Parallel Processor %J Parallel Computing %V 5, 1 %P 219-246 %D July 1987 %A C. Bischof %A C. Van Loan %L (Bischof and Van Loan, 1987) %T The WY Representation for Products of Householder Matrices %J SIAM SISSC %V 8, 2 %D March, 1987 %A J. DuCroz %A S. Nugent %A J. Reid %A D. Taylor %L (DuCroz \f2et al.\f1, 1981) %T Solving Large Full Sets of Linear Equations in a Paged Virtual Store %J TOMS %V 7,4 %D 1981 %P 527-536 %A D. Dodson %A J. Lewis %L (Dodson and Lewis, 1986) %T Issues relating to extension of the Basic Linear Algebra Subprograms %J ACM SIGNUM Newsletter %V 20,1 %D 1985 %P 2-18 %A D. Dodson %A J. Lewis %L (Dodson and Lewis, 1986) %T A Proposal for Sparse BLAS %J ACM SIGNUM Newsletter %V 20,1 %D 1985 %P ?? %A J.J. Dongarra %A E. Grosse %T Distribution of Mathematical Software via Electronic Mail %R Argonne National Laboratory Report, ANL-MCS-TM-48, (to appear in CACM) %D March 1985 %A IBM %L (IBM, 1986) %T Engineering and Scientific Subroutine Library %V Program Number: 5668-863 %J IBM %D 1986 %A M. Berry %A K. Gallivan %A W. Harrod %A W. Jalby %A S. Lo %A U. Meier %A B. Philippe %A A. Sameh %L (Berry \f2et al.\f1, 1986) %T Parallel Algorithms on the CEDAR System %J CSRD Report No. 581 %D 1986 %A I. Bucher %A T. Jordan %L (Bucher and Jordan, 1984) %T Linear Algebra Programs for use on a Vector Computer with a Secondary Solid State Storage Device %B Advances in Computer Methods for Partical Differential Equations %E R. Vichnevetsky and R Stepleman %I IMACS %P 546-550 %D 1984 %A K. Fong %A T. L. Jordan %T Some Linear Algebra Algorithms and Their Performance on CRAY-1 %R Los Alamos Scientific Laboratory, UC-32 %D June 1977 %A B. Chartres %L (Chartres, 1960) %T Adaption of the Jacobi and Givens Methods for a Computer with Magnetic Tape Backup Store %J University of Sydney Technical Report No. 8 %D 1960 %A A.C. McKellar %A E.G. Coffman Jr. %L (McKellar and Coffman, 1969) %T Organizing Matrices and Matrix Operations for Paged Memory Systems %J CACM %V 12,3 %D 1969 %P 153-165 %A C. Moler %T Matrix Computations with Fortran and Paging %J CACM %V 15,4 %D 1972 %P 268-270 %A D.W. Barron %A H.P.F. Swinnerton-Dyer %L (Barron and Swinnerton-Dyer, 1960) %T Solution of Simultaneous Linear Equations Using a Magnetic-Tape Store %J Computer J. %V 3 %D 1960 %P 28-33 %A J.J. Dongarra %A A. Hinds %T Unrolling Loops in Fortran %J Software-Practice and Experience %V 9 %P 219-226 %D 1979 %A D. Pager %T Some Notes on Speeding Up Certain Loops by Software, Firmware, and Hardware Means %J IEEE Trans. on Comp. %P 97-100 %D January 1972 %A D. Knuth %T An Empirical Study of Fortran Programs %J Software-Practice and Experience %V 1 %P 105-133 %D 1971 %A D.A. Calahan %L (Calahan, 1986) %T Block-Oriented Local-Memory-Based Linear Equation Solution on the CRAY-2: Uniprocessor Algorithms %J Proceedings International Conference on Parallel Processing %D August 1986 %I IEEE Computer Society Press %P 375-378 %A A.K. Dave %A I.S. Duff %L (Dave and Duff, 1986) %T Sparse Matrix Calculations on the CRAY-2 %R AERE Harwell Report CSS 197 (to appear Parallel Computing) %D 1986 %A I.S. Duff %T Full Matrix Techniques in Sparse Gaussian Elimination %L (Duff, 1981) %J Numerical Analysis Proceedings, Dundee 1981, Lecture Notes in Mathematics 912 %E G.A. Watson %I Springer-Verlag %P 71-84 %C Berlin %D 1981 %A A. George %A H. Rashwan %L (George and Rashwan, 1985) %T Auxiliary Storage Methods for Solving Finite Element Systems %J SIAM SISSC %V 6 %P 882-910 %D 1985 %A K. Gallivan %A W. Jalby %A U. Meier %A A. Sameh %T The Impact of Hierarchical Memory Systems on Linear Algebra Algorithm Design %R CSRD Report No. 625 %D 1987 %A Y. Robert %A P. Sguazzero %L (Robert and Sguazzero, 1987) %T The LU Decomposition Algorithm and Its Efficient Fortran Implementation on the IBM 3090 Vector Multiprocessor %R IBM ECSEC Report ICE-0006 %D March 1987 %A Walid Abu-Sufah %A Allen D. Malony %T Vector Processing on the Alliant FX/8 Multiprocessor %J Proc. of the 1986 Int'l. Conf. on Parallel Processing, St. Charles, IL %C St. Charles, IL %P 559-566 %D Aug. 19-22, 1986 %E K. Howard, S. Jacobs, E. Swartzlander %L (Abu-Sufah and Malony, 1986) %A Frank H. McMahon %T The Livermore Fortran Kernels: A Computer Test of Numerical Performance Range %R Lawrence Livermore National Laboratory UCRL-53745 %D October 1986 %L (McMahon, 1986) %A Jack Worlton %T Understanding Supercomputer Benchmarks %J Datamation %D September 1,1984 %P 121-130 %L (Worlton, 1984) %A Phillip Ein-Dor %T Grosch's Law Re-Revisited: CPU Power and the Cost of Computation %L (Ein-Dor, 1985) %J CACM %D February 1985 %P 142-151 %A H.A. Grosch %T High Speed Arithmetic: The Digital Computer as a Research Tool %J J. Opt. Soc. Am %V 43, 4 %D April 1975 %P 24 %L (Grosch, 1987) %T An Agenda for Improved Evaluation of Supercomputer Performance %R National Research Council %D 1986 %T Performance Measurment %P IEEE Subcommittee on Supercomputers %D November 1986 %L (IEEE Subcomm., 1986) %A J.L. Martin %A A.L. Dana %A T. Warnock %T Tools for Measuring Software Performance on Vector Architectures %D November 1983 %C San Fancisco, CA %P Symp. on Applications and Assessment of Automated tools for Software Development %L (Martin \f2et al.\f1, 1983) %A Dieter Mueller-Wichards %T Performance Estimates for Applications: An Algebric Framework %R IBM Research RC# 12391 %D December 15, 1986 %L (Mueller-Wichards, 1986) %A R. Brice %T Benchmarking Your Benchmark: A Users Perspective %V 4,2 %D June 1983 %P 73-79 %J Measurment Technology %L (Brice, 1983) %A G. Amdahl %J AFIPS Conf. Proc. %T The Validity of the Single Processor Approach to Achieving Large Scale Computing Capabilities %D 1967 %V 30 %P 483-485 %L (Amdahl, 1983) %A J. Worlton %J Private Communication %D 1986 %K private %L (Worlton, 1986) %A J.L. Martin and D.Mueller-Wichards %T Supercomputer Performance Evaluation: Status and Direction %J J of Supercomputing %V 1 %D May 1987 %L (Martin and D.Mueller-Wichards, 1987) %A K. Jordan %T Performance Comparison of Large-Scale Scientific Computers: Scalar Mainframes, Mainframes with Integrated Vector Facilities, and Supercomputers %D March 1987 %P 10-23 %J Computer %L (Jordan, 1987) %A W.F. Ballhaus, Jr. %T Supercomputers in Aerodynamics %J Frontiers of Supercomputing %C Berkeley, CA %P 195-216 %D 1986 %E N. Matroplis, D.H. Sharp, W.J. Worlton, K.R. Ames %L (Ballhaus, 1986) %A R.C. Maydew %T Sandia National Laboratories internal memorandum %D October, 1986 %L (Maydew, 1986) %A R.W. Hockney %A C.R. Jesshope %T Parallel Computers %P Adam Hilger Ltd, Bristol %D 1981 %A J.J. Dongarra %A J. DuCroz %A I. Duff %A S. Hammarling %L (Dongarra \f2et al.\f1, 1987) %T A Proposal for a Set of Level 3 Basic Linear Algebra Subprograms %R Argonne National Laboratory Report, ANL-MCS-TM-88 %D April 1987 %A I. Duff %T Private Communications %D 1987 %A R. Maestro %T Private Communications %D 1987 %A Brian T. Smith %T Private Communications %D 1987 %A David Snelling %T Private Communications %D 1987 %A Manolis A Vavalis %K purdue %T Private Communications %D 1987 %A Adam Beguelin %T Private Communications %D 1987