# To unbundle, sh this file echo 1 1>&2 cat >1 <<'End of 1' Date: Mon, 17 Nov 86 11:33:50 est From: David Krowitz To: dongarra@anl-mcs Subject: here come the benchmarks I ported to the FX/1 Status: RO In the next several messages you should receive the following files containing the sources for the benchmarks from the netlib library which I've ported onto the FX/1 along with command files to compile and link the programs. As I mentioned earlier, you may want to replace the current version of the livermore loops in the benchmark directory of netlib my my versions as the current versions won't compile on either Apollos or Alliants due to some sloppy code. You should be getting the following files: sys type blocks current type uid used length attr rights name file uasc 1 95 P pgndwrx backup_history file uasc 1 461 P pgndwrx cwhetstoned.bld file uasc 5 4373 P pgndwrx cwhetstoned.c file uasc 1 461 P pgndwrx cwhetstones.bld file uasc 5 4370 P pgndwrx cwhetstones.c file uasc 1 397 P pgndwrx dhrystone.bld file uasc 17 16679 P pgndwrx dhrystone.c file uasc 1 242 P pgndwrx linpackd.bld file uasc 21 20733 P pgndwrx linpackd.f file uasc 1 242 P pgndwrx linpacks.bld file uasc 20 20359 P pgndwrx linpacks.f file uasc 1 258 P pgndwrx livermored.bld file uasc 87 87568 P pgndwrx livermored.f file uasc 1 258 P pgndwrx livermores.bld file uasc 87 87564 P pgndwrx livermores.f file uasc 1 247 P pgndwrx whetstoned.bld file uasc 7 7031 P pgndwrx whetstoned.f file uasc 1 248 P pgndwrx whetstones.bld file uasc 7 7030 P pgndwrx whetstones.f 19 entries, 266 blocks used. (the listing above was done on the Apollo's ... the file lengths include a 32-byte file header which is peculiar to the Apollos) Let me know if I need to resend any of the files. -- David Krowitz End of 1 echo 10 1>&2 cat >10 <<'End of 10' Date: Mon, 17 Nov 86 11:46:49 est From: David Krowitz To: dongarra@anl-mcs Subject: Here is file livermores.bld for FX/1 library Status: RO ### ### Command file to compile and bind the single precision version of ### the Livermore loops benchmark for the DSP9010 (Alliant FX/1). ### rm livermores fortran -o livermores -Ogv -AS livermores.f rm livermores.o End of 10 echo 11 1>&2 cat >11 <<'End of 11' Date: Mon, 17 Nov 86 11:45:38 est From: David Krowitz To: dongarra@anl-mcs Subject: Here is file livermored.bld for FX/1 library Status: RO ### ### Command file to compile and bind the double precision version of ### the Livermore loops benchmark for the DSP9010 (Alliant FX/1). ### rm livermored fortran -o livermored -Ogv -AS livermored.f rm livermored.o End of 11 echo 12 1>&2 cat >12 <<'End of 12' Date: Mon, 17 Nov 86 11:40:42 est From: David Krowitz To: dongarra@anl-mcs Subject: Here is file dhrystone.c for FX/1 library Status: RO /* * "DHRYSTONE" Benchmark Program * * Version: C/1 * Date: 12/01/84, RESULTS updated 10/22/85 * Author: Reinhold P. Weicker, CACM Vol 27, No 10, 10/84 pg. 1013 * Translated from ADA by Rick Richardson * Every method to preserve ADA-likeness has been used, * at the expense of C-ness. * Compile: cc -O dry.c -o drynr : No registers * cc -O -DREG=register dry.c -o dryr : Registers * Defines: Defines are provided for old C compiler's * which don't have enums, and can't assign structures. * The time(2) function is library dependant; One is * provided for CI-C86. Your compiler may be different. * The LOOPS define is initially set for 50000 loops. * If you have a machine with large integers and is * very fast, please change this number to 500000 to * get better accuracy. Please select the way to * measure the execution time using the TIME define. * For single user machines, time(2) is adequate. For * multi-user machines where you cannot get single-user * access, use the times(2) function. If you have * neither, use a stopwatch in the dead of night. * Use a "printf" at the point marked "start timer" * to begin your timings. DO NOT use the UNIX "time(1)" * command, as this will measure the total time to * run this program, which will (erroneously) include * the time to malloc(3) storage and to compute the * time it takes to do nothing. * Run: drynr; dryr * * Results: If you get any new machine/OS results, please send to: * {ihnp4,vax135,..}!houxm!vaximile!rer * and thanks to all that do. Space prevents listing * the names of those who have provided some of these * results. * Note: I order the list in increasing performance of the * "with registers" benchmark. If the compiler doesn't * provide register variables, then the benchmark * is the same for both REG and NOREG. I'm not going * to list a compiler in a better place because if it * had register variables it might do better. No * register variables is a big loss in my book. * PLEASE: Send complete information about the machine type, * clock speed, OS and C manufacturer/version. If * the machine is modified, tell me what was done. * Otherwise, I won't include it in this list. My * favorite flame on this subject was a machine that * was listed as an IBM PC/XT 8086-9.54Mhz. That must * have been some kind of co-processor board that ran * the benchmark, not the XT. Tell me what it was! * * * MACHINE MICROPROCESSOR OPERATING COMPILER DHRYSTONES/SEC. * TYPE SYSTEM NO REG REGS * -------------------------- ------------ ----------- --------------- * IBM PC/XT 8088-4.77Mhz PC/IX cc 257 287 * Cosmos 68000-8Mhz UniSoft cc 305 322 * IBM PC/XT 8088-4.77Mhz VENIX/86 2.0 cc 297 324 * IBM PC 8088-4.77Mhz MSDOS 2.0 b16cc 2.0 310 340 * IBM PC 8088-4.77Mhz MSDOS 2.0 CI-C86 2.20M 390 390 * IBM PC/XT 8088-4.77Mhz PCDOS 2.1 Lattice 2.15 403 - @ * PDP-11/34 - UNIX V7M cc 387 438 * Onyx C8002 Z8000-4Mhz IS/1 1.1 (V7) cc 476 511 * ATT PC6300 8086-8Mhz MSDOS 2.11 b16cc 2.0 632 684 * IBM PC/AT 80286-6Mhz PCDOS 3.0 CI-C86 2.1 666 684 * Macintosh 68000-7.8Mhz 2M Mac Rom Mac C 32 bit int 694 704 * Macintosh 68000-7.7Mhz - MegaMax C 2.0 661 709 * NEC PC9801F 8086-8Mhz PCDOS 2.11 Lattice 2.15 768 - @ * ATT PC6300 8086-8Mhz MSDOS 2.11 CI-C86 2.20M 769 769 * ATT 3B2/300 WE32000-?Mhz UNIX 5.0.2 cc 735 806 * IBM PC/AT 80286-6Mhz PCDOS 3.0 MS 3.0(large) 833 847 LM * VAX 11/750 - Unix 4.2bsd cc 862 877 * Fast Mac 68000-7.7Mhz - MegaMax C 2.0 839 904 + * Macintosh 68000-7.8Mhz 2M Mac Rom Mac C 16 bit int 877 909 S * IRIS-1400 68010-10Mhz Unix System V cc 909 1000 * IBM PC/AT 80286-6Mhz VENIX/86 2.1 cc 961 1000 * IBM PC/AT 80286-6Mhz PCDOS 3.0 b16cc 2.0 943 1063 * IBM PC/AT 80286-6Mhz PCDOS 3.0 MS 3.0(small) 1063 1086 * VAX 11/750 - VMS VAX-11 C 2.0 958 1091 * ATT PC7300 68010-10Mhz UNIX 5.2 cc 1041 1111 * ATT PC6300+ 80286-6Mhz MSDOS 3.1 b16cc 2.0 1111 1219 * Sun2/120 68010-10Mhz Sun 4.2BSD cc 1136 1219 * IBM PC/AT 80286-6Mhz PCDOS 3.0 CI-C86 2.20M 1219 1219 * MASSCOMP 500 68010-10MHz RTU V3.0 cc (V3.2) 1156 1238 * Cyb DataMate 68010-12.5Mhz Uniplus 5.0 Unisoft cc 1162 1250 * PDP 11/70 - UNIX 5.2 cc 1162 1250 * IBM PC/AT 80286-6Mhz PCDOS 3.1 Lattice 2.15 1250 - @ * IBM PC/AT 80286-7.5Mhz VENIX/86 2.1 cc 1190 1315 * * Sun2/120 68010-10Mhz Standalone cc 1219 1315 * ATT 3B2/400 WE32100-?Mhz UNIX 5.2 cc 1315 1315 * HP-110 8086-5.33Mhz MSDOS 2.11 Aztec-C 1282 1351 ? * IBM PC/AT 80286-6Mhz ? ? 1250 1388 ? * ATT PC6300+ 80286-6Mhz MSDOS 3.1 CI-C86 2.20M 1428 1428 * Cyb DataMate 68010-12.5Mhz Uniplus 5.0 Unisoft cc 1470 1562 S * VAX 11/780 - UNIX 5.2 cc 1515 1562 * MicroVAX-II - - - 1562 1612 * ATT 3B20 - UNIX 5.2 cc 1515 1724 * HP9000-500 B series CPU HP-UX 4.02 cc 1724 - * IBM PC/STD 80286-8Mhz ? ? 1724 1785 * Gould PN6005 - UTX 1.1(4.2BSD) cc 1675 1964 * VAX 11/785 - UNIX 5.2 cc 2083 2083 * VAX 11/785 - VMS VAX-11 C 2.0 2083 2083 * Pyramid 90x - OSx 2.3 cc 2272 2272 * Pyramid 90x - OSx 2.5 cc 3125 3125 * SUN 3/75 68020-16.67Mhz SUN 4.2 V3 cc 3333 3571 * Sun 3/180 68020-16.67Mhz Sun 4.2 cc 3333 3846 * MC 5400 68020-16.67MHz RTU V3.0 cc (V4.0) 3952 4054 * SUN-3/160C 68020-16.67Mhz Sun3.0ALPHA1 Un*x 3333 4166 * Gould PN9080 - UTX-32 1.1c cc - 4629 * MC 5600/5700 68020-16.67MHz RTU V3.0 cc (V4.0) 4504 4746 % * VAX 8600 - VMS VAX-11 C 2.0 7142 7142 * Amdahl 470 V/8 ? ? - 15015 * Amdahl 580 - UTS 5.0 Rel 1.2 cc Ver. 1.5 23076 23076 * Amdahl 5860 ? ? - 28355 * * * 15Mhz crystal substituted for original 12Mhz; * + This Macintosh was upgraded from 128K to 512K in such a way that * the new 384K of memory is not slowed down by video generator accesses. * % Single processor; MC == MASSCOMP * & Seattle Telecom STD-286 board * @ vanilla Lattice compiler used with MicroPro standard library * S Shorts used instead of ints * LM Large Memory Model. (Otherwise, all 80x8x results are small model) * ? I don't trust results marked with '?'. These were sent to me with * either incomplete information, or with times that just don't make sense. * If anybody can confirm these figures, please respond. * ************************************************************************** * * The following program contains statements of a high-level programming * language (C) in a distribution considered representative: * * assignments 53% * control statements 32% * procedure, function calls 15% * * 100 statements are dynamically executed. The program is balanced with * respect to the three aspects: * - statement type * - operand type (for simple data types) * - operand access * operand global, local, parameter, or constant. * * The combination of these three aspects is balanced only approximately. * * The program does not compute anything meaningfull, but it is * syntactically and semantically correct. * */ /* Accuracy of timings and human fatigue controlled by next two lines */ #define LOOPS 50000 /* Use this for slow or 16 bit machines */ /*#define LOOPS 500000 /* Use this for faster machines */ /* Compiler dependent options */ #undef NOENUM /* Define if compiler has no enum's */ #undef NOSTRUCTASSIGN /* Define if compiler can't assign structures */ /* define only one of the next three defines */ /*#define NOTIME /* Define if no time() function in library */ #define TIMES /* Use times(2) time function */ /*#define TIME /* Use time(2) time function */ /* define the granularity of your times(2) function (when used) */ #define HZ 60 /* times(2) returns 1/60 second (most) */ /*#define HZ 100 /* times(2) returns 1/100 second (WECo) */ #ifdef NOSTRUCTASSIGN #define structassign(d, s) memcpy(&(d), &(s), sizeof(d)) #else #define structassign(d, s) d = s #endif #ifdef NOENUM #define Ident1 1 #define Ident2 2 #define Ident3 3 #define Ident4 4 #define Ident5 5 typedef int Enumeration; #else typedef enum {Ident1, Ident2, Ident3, Ident4, Ident5} Enumeration; #endif typedef int OneToThirty; typedef int OneToFifty; typedef char CapitalLetter; typedef char String30[31]; typedef int Array1Dim[51]; typedef int Array2Dim[51][51]; struct Record { struct Record *PtrComp; Enumeration Discr; Enumeration EnumComp; OneToFifty IntComp; String30 StringComp; }; typedef struct Record RecordType; typedef RecordType * RecordPtr; typedef int boolean; #define NULL 0 #define TRUE 1 #define FALSE 0 #ifndef REG #define REG #endif extern Enumeration Func1(); extern boolean Func2(); #ifdef TIMES #include #include #endif main() { Proc0(); } /* * Package 1 */ int IntGlob; boolean BoolGlob; char Char1Glob; char Char2Glob; Array1Dim Array1Glob; Array2Dim Array2Glob; RecordPtr PtrGlb; RecordPtr PtrGlbNext; Proc0() { OneToFifty IntLoc1; REG OneToFifty IntLoc2; OneToFifty IntLoc3; REG char CharLoc; REG char CharIndex; Enumeration EnumLoc; String30 String1Loc; String30 String2Loc; #ifdef TIME long time(); long starttime; long benchtime; long nulltime; register unsigned int i; starttime = time(0); for (i = 0; i < LOOPS; ++i); nulltime = time(0) - starttime; /* Computes overhead of looping */ #endif #ifdef TIMES time_t starttime; time_t benchtime; time_t nulltime; struct tms tms; register unsigned int i; times(&tms); starttime = tms.tms_utime; for (i = 0; i < LOOPS; ++i); times(&tms); nulltime = tms.tms_utime - starttime; /* Computes overhead of looping */ #endif PtrGlbNext = (RecordPtr) malloc(sizeof(RecordType)); PtrGlb = (RecordPtr) malloc(sizeof(RecordType)); PtrGlb->PtrComp = PtrGlbNext; PtrGlb->Discr = Ident1; PtrGlb->EnumComp = Ident3; PtrGlb->IntComp = 40; strcpy(PtrGlb->StringComp, "DHRYSTONE PROGRAM, SOME STRING"); /***************** -- Start Timer -- *****************/ #ifdef TIME starttime = time(0); #endif #ifdef TIMES times(&tms); starttime = tms.tms_utime; #endif for (i = 0; i < LOOPS; ++i) { Proc5(); Proc4(); IntLoc1 = 2; IntLoc2 = 3; strcpy(String2Loc, "DHRYSTONE PROGRAM, 2'ND STRING"); EnumLoc = Ident2; BoolGlob = ! Func2(String1Loc, String2Loc); while (IntLoc1 < IntLoc2) { IntLoc3 = 5 * IntLoc1 - IntLoc2; Proc7(IntLoc1, IntLoc2, &IntLoc3); ++IntLoc1; } Proc8(Array1Glob, Array2Glob, IntLoc1, IntLoc3); Proc1(PtrGlb); for (CharIndex = 'A'; CharIndex <= Char2Glob; ++CharIndex) if (EnumLoc == Func1(CharIndex, 'C')) Proc6(Ident1, &EnumLoc); IntLoc3 = IntLoc2 * IntLoc1; IntLoc2 = IntLoc3 / IntLoc1; IntLoc2 = 7 * (IntLoc3 - IntLoc2) - IntLoc1; Proc2(&IntLoc1); /***************** -- Stop Timer -- *****************/ } #ifdef TIME benchtime = time(0) - starttime - nulltime; printf("Dhrystone time for %ld passes = %ld\n", (long) LOOPS, benchtime); printf("This machine benchmarks at %ld dhrystones/second\n", ((long) LOOPS) / benchtime); #endif #ifdef TIMES times(&tms); benchtime = tms.tms_utime - starttime - nulltime; printf("Dhrystone time for %ld passes = %ld\n", (long) LOOPS, benchtime/HZ); printf("This machine benchmarks at %ld dhrystones/second\n", ((long) LOOPS) * HZ / benchtime); #endif } Proc1(PtrParIn) REG RecordPtr PtrParIn; { #define NextRecord (*(PtrParIn->PtrComp)) structassign(NextRecord, *PtrGlb); PtrParIn->IntComp = 5; NextRecord.IntComp = PtrParIn->IntComp; NextRecord.PtrComp = PtrParIn->PtrComp; Proc3(NextRecord.PtrComp); if (NextRecord.Discr == Ident1) { NextRecord.IntComp = 6; Proc6(PtrParIn->EnumComp, &NextRecord.EnumComp); NextRecord.PtrComp = PtrGlb->PtrComp; Proc7(NextRecord.IntComp, 10, &NextRecord.IntComp); } else structassign(*PtrParIn, NextRecord); #undef NextRecord } Proc2(IntParIO) OneToFifty *IntParIO; { REG OneToFifty IntLoc; REG Enumeration EnumLoc; IntLoc = *IntParIO + 10; for(;;) { if (Char1Glob == 'A') { --IntLoc; *IntParIO = IntLoc - IntGlob; EnumLoc = Ident1; } if (EnumLoc == Ident1) break; } } Proc3(PtrParOut) RecordPtr *PtrParOut; { if (PtrGlb != NULL) *PtrParOut = PtrGlb->PtrComp; else IntGlob = 100; Proc7(10, IntGlob, &PtrGlb->IntComp); } Proc4() { REG boolean BoolLoc; BoolLoc = Char1Glob == 'A'; BoolLoc |= BoolGlob; Char2Glob = 'B'; } Proc5() { Char1Glob = 'A'; BoolGlob = FALSE; } extern boolean Func3(); Proc6(EnumParIn, EnumParOut) REG Enumeration EnumParIn; REG Enumeration *EnumParOut; { *EnumParOut = EnumParIn; if (! Func3(EnumParIn) ) *EnumParOut = Ident4; switch (EnumParIn) { case Ident1: *EnumParOut = Ident1; break; case Ident2: if (IntGlob > 100) *EnumParOut = Ident1; else *EnumParOut = Ident4; break; case Ident3: *EnumParOut = Ident2; break; case Ident4: break; case Ident5: *EnumParOut = Ident3; } } Proc7(IntParI1, IntParI2, IntParOut) OneToFifty IntParI1; OneToFifty IntParI2; OneToFifty *IntParOut; { REG OneToFifty IntLoc; IntLoc = IntParI1 + 2; *IntParOut = IntParI2 + IntLoc; } Proc8(Array1Par, Array2Par, IntParI1, IntParI2) Array1Dim Array1Par; Array2Dim Array2Par; OneToFifty IntParI1; OneToFifty IntParI2; { REG OneToFifty IntLoc; REG OneToFifty IntIndex; IntLoc = IntParI1 + 5; Array1Par[IntLoc] = IntParI2; Array1Par[IntLoc+1] = Array1Par[IntLoc]; Array1Par[IntLoc+30] = IntLoc; for (IntIndex = IntLoc; IntIndex <= (IntLoc+1); ++IntIndex) Array2Par[IntLoc][IntIndex] = IntLoc; ++Array2Par[IntLoc][IntLoc-1]; Array2Par[IntLoc+20][IntLoc] = Array1Par[IntLoc]; IntGlob = 5; } Enumeration Func1(CharPar1, CharPar2) CapitalLetter CharPar1; CapitalLetter CharPar2; { REG CapitalLetter CharLoc1; REG CapitalLetter CharLoc2; CharLoc1 = CharPar1; CharLoc2 = CharLoc1; if (CharLoc2 != CharPar2) return (Ident1); else return (Ident2); } boolean Func2(StrParI1, StrParI2) String30 StrParI1; String30 StrParI2; { REG OneToThirty IntLoc; REG CapitalLetter CharLoc; IntLoc = 1; while (IntLoc <= 1) if (Func1(StrParI1[IntLoc], StrParI2[IntLoc+1]) == Ident1) { CharLoc = 'A'; ++IntLoc; } if (CharLoc >= 'W' && CharLoc <= 'Z') IntLoc = 7; if (CharLoc == 'X') return(TRUE); else { if (strcmp(StrParI1, StrParI2) > 0) { IntLoc += 7; return (TRUE); } else return (FALSE); } } boolean Func3(EnumParIn) REG Enumeration EnumParIn; { REG Enumeration EnumLoc; EnumLoc = EnumParIn; if (EnumLoc == Ident3) return (TRUE); return (FALSE); } #ifdef NOSTRUCTASSIGN memcpy(d, s, l) register char *d; register char *s; int l; { while (l--) *d++ = *s++; } #endif /* * Library function for compilers with no time(2) function in the * library. */ #ifdef NOTIME long time(p) long *p; { /* CI-C86 time function - don't use around midnight */ long t; struct regval {unsigned int ax,bx,cx,dx,si,di,ds,es; } regs; regs.ax = 0x2c00; sysint21(®s, ®s); t = ((regs.cx>>8)*60L + (regs.cx & 0xff))*60L + (regs.dx>>8); if (p) *p = t; return t; } #endif End of 12 echo 13 1>&2 cat >13 <<'End of 13' Date: Mon, 17 Nov 86 11:46:32 est From: David Krowitz To: dongarra@anl-mcs Subject: Here is file livermored.f for FX/1 library Status: RO C PROGRAM MFLOP(TAPE5=OUTMF) C LATEST FILE MODIFICATION DATE: 31 Oct 84. C**************************************************************************** C MEASURES CPU PERFORMANCE OF THE COMPUTER/COMPILER/COMPUTATIONAL COMPLEX C**************************************************************************** C * C L. L. N. L. F O R T R A N K E R N E L S: M F L O P S * C * C These kernels measure Fortran numerical computation rates for a * C spectrum of cpu-limited computational structures. Mathematical * C through-put is measured in units of millions of floating-point * C operations executed per second, called Megaflops/sec. * C * C This program measures a realistic cpu performance range for the * C Fortran programming system on a given day. The cpu performance * C rates depend strongly on the maturity of the Fortran compiler's * C ability to translate Fortran code into efficient machine code. * C * C [ The cpu hardware capability apart from compiler maturity (or * C availability), could be measured (or simulated) by programming the * C kernels in assembly or machine code directly. These measurements * C can also serve as a framework for tracking the maturation of the * C Fortran compiler during system development.] * C * C While this test evaluates the performance of a broad sampling of * C Fortran computations, it is not an application program and hence * C it is not a benchmark per se. The performance of benchmarks and * C even workloads, if cpu limited, could be roughly estimated by * C choosing appropriate weights and loop limits for each kernel (see * C Block Data). * C * C Use of this program is granted with the request that a copy of the * C results be sent to the author at the address shown below, to be * C added to our studies of computer performance. The timing results * C will be held as proprietary data if so marked. In return, you * C will receive a copy of our latest report. * C * C * C F.H. McMahon L-35 * C Lawrence Livermore National Laboratory * C P.0. Box 808 * C Livermore, CA * C 94550 * C * C * C (C) Copyright 1983 the Regents of the * C University of California. All Rights Reserved. * C * C This work was produced under the sponsorship of * C the U.S. Department of Energy. The Government * C retains certain rights therein. * C * C**************************************************************************** C C C C C C C C C 1. We require one run of the Fortran kernels as is, that is, with C no reprogramming. In addition, the vendor may, if so desired C reprogram the Fortran kernels to demonstrate high performance C hardware features. In either case, a compiler listing of the code C actually used should be returned along with the timing results. C C 2. On computers where default single precision is REAL*4 we require C an additional run with all mantissas.ge.48 i.e. declare all REAL*8: C IMPLICIT REAL*8(A-H,O-Z) C C 3. On computers with Cache memories we require two timed runs C with the repitition counter Loop = 1 and 100 (set in subr. SIZES), C to show un-initialized as well as encached execution rates. C Increase the size of array CACHE(in subr. TEST) from 8192 to cache size. C C 4. On computers with Virtual memory systems assure a working set space C larger than the entire program so that page faults are negligible, C because we wish to measure the cpu-limited computation rates. C C 5. Conversion includes verifying or changing the following: C C First : the i/O output device number= IOU assignment C Second: the definition of function SECOND for timing and C the value of TIC:= minimum clock timing (sec.) C Third : the definition of function MOD2N in KERNEL C Fourth: the system names Komput and Kompil in REPORT C **************************************************************** C DIMENSION FLOPS(137), TR(137), RATES(137) DIMENSION LSPAN(137), WG(137), OSUM (137) REAL SECOND c How many data sets to run? parameter (ndataset = 1) C C CALL LINK('UNIT5=(output,create,text)//') c open (unit=6,file='lloop.lst') t= SECOND(0.0) IOU= 6 TIC= 1.0e-9 kern= 24 C DO 1 lset= 1,ndataset l= lset tock= TICK ( IOU, l, TIC) C CALL KERNEL( FLOPS,TR,RATES,LSPAN,WG,OSUM, tock) C c CALL REPORT( IOU, kern,FLOPS,TR,RATES,LSPAN,WG,OSUM) 1 CONTINUE C CALL REPORT( IOU,ndataset*kern,FLOPS,TR,RATES,LSPAN,WG,OSUM) C t= SECOND(0.0) - t WRITE( IOU,9) nint(t) 9 FORMAT( 1H1,//,26H CHECK CLOCK CALIBRATION: ,/, . 18H Total cpu Time = ,i6, 5H Sec. ) STOP END C C*********************************************** BLOCK DATA C*********************************************** C IMPLICIT REAL*8(A-H,O-Z) C C l1 := param-dimension governs the size of most 1-d arrays C l2 := param-dimension governs the size of most 2-d arrays C C ISPAN := Array of limits for each DO loop in the kernels C ISP2 := Array of limits for each DO loop in the kernels C ISP3 := Array of limits for each DO loop in the kernels C IPAS1 := Array of limits for multiple pass execution of each kernel C IPAS2 := Array of limits for multiple pass execution of each kernel C IPAS3 := Array of limits for multiple pass execution of each kernel C NROPS := Array of floating-point operation counts for one pass thru kernel C WT := Array of weights to average kernel execution rates. C SKALE:= Array of scale factors for SIGNAL data generator. C BIAS:= Array of scale factors for SIGNAL data generator. C C C/ PARAMETER( l1= 1001, l2= 101 ) C/ PARAMETER( l13= 64, l13h= l13/2, l213= l13+l13h, l813= 8*l13 ) C/ PARAMETER( l14= 512, l16= 75, l416= 4*l16 , l21= 25 ) C C/ PARAMETER( l1= 27, l2= 15 ) C/ PARAMETER( l13= 8, l13h= 8/2, l213= 8+4, l813= 8*8 ) C/ PARAMETER( l14= 16, l16= 15, l416= 4*15 , l21= 15) C C C/ PARAMETER( l1= 1001, l2= 101 ) C/ PARAMETER( l13= 64, l13h= 64/2, l213= 64+32, l813= 8*64 ) C/ PARAMETER( l14= 512, l16= 75, l416= 4*75 , l21= 25) C C/ PARAMETER( m1= 1001-1, m2= 101-1, m7= 1001-6 ) C REAL LOOPS, NROPS COMMON /SPACE0/ TIME(47), CSUM(47), WW(47), WT(47), TICKS, 1 SKALE(47), BIAS(47), WS(95), LOOPS(47), NROPS(47) C COMMON /SPACEI/ ISPAN(47), ISP2(47), ISP3(47), 1 IPAS1(47), IPAS2(47), IPAS3(47) C C **************************************************************** C C DATA ISPAN( 1),ISPAN( 2),ISPAN( 3),ISPAN( 4),ISPAN( 5),ISPAN( 6), . ISPAN( 7),ISPAN( 8),ISPAN( 9),ISPAN(10),ISPAN(11),ISPAN(12), . ISPAN(13),ISPAN(14),ISPAN(15),ISPAN(16),ISPAN(17),ISPAN(18), . ISPAN(19),ISPAN(20),ISPAN(21),ISPAN(22),ISPAN(23),ISPAN(24), . ISPAN(25) ./1001, 101, 1001, 1001, 1001, 64, 995, 100, . 101, 101, 1001, 1000, 64, 1001, 101, 75, . 101, 100, 101, 1000, 25, 101, 100, 1001, 0/ C C* ./l1, l2, l1, l1, l1, l13, m7, m2, C* . l2, l2, l1, m1, l13, l1, l2, l16, C* . l2, m2, l2, m1, l21, l2, m2, l1, 0/ C DATA ISP2( 1),ISP2( 2),ISP2( 3),ISP2( 4),ISP2( 5),ISP2( 6), . ISP2( 7),ISP2( 8),ISP2( 9),ISP2( 10),ISP2( 11),ISP2( 12), . ISP2( 13),ISP2( 14),ISP2( 15),ISP2( 16),ISP2( 17),ISP2( 18), . ISP2( 19),ISP2( 20),ISP2( 21),ISP2( 22),ISP2( 23),ISP2( 24), . ISP2( 25) ./101, 101, 101, 101, 101, 32, 101, 100, . 101, 101, 101, 100, 32, 101, 101, 40, . 101, 100, 101, 100, 20, 101, 100, 101, 0/ C C DATA ISP3( 1),ISP3( 2),ISP3( 3),ISP3( 4),ISP3( 5),ISP3( 6), . ISP3( 7),ISP3( 8),ISP3( 9),ISP3( 10),ISP3( 11),ISP3( 12), . ISP3( 13),ISP3( 14),ISP3( 15),ISP3( 16),ISP3( 17),ISP3( 18), . ISP3( 19),ISP3( 20),ISP3( 21),ISP3( 22),ISP3( 23),ISP3( 24), . ISP3( 25) ./27, 15, 27, 27, 27, 8, 21, 14, . 15, 15, 27, 26, 8, 27, 15, 15, . 15, 14, 15, 26, 15, 15, 14, 27, 0/ C DATA IPAS1( 1),IPAS1( 2),IPAS1( 3),IPAS1( 4),IPAS1( 5),IPAS1( 6), . IPAS1( 7),IPAS1( 8),IPAS1( 9),IPAS1(10),IPAS1(11),IPAS1(12), . IPAS1(13),IPAS1(14),IPAS1(15),IPAS1(16),IPAS1(17),IPAS1(18), . IPAS1(19),IPAS1(20),IPAS1(21),IPAS1(22),IPAS1(23),IPAS1(24), . IPAS1(25) ./ 7, 67, 9, 14, 10, 3, 4, 10, 36, 34, 11, 12, . 36, 2, 1, 25, 35, 2, 39, 1, 1, 11, 8, 5, 0/ C DATA IPAS2( 1),IPAS2( 2),IPAS2( 3),IPAS2( 4),IPAS2( 5),IPAS2( 6), . IPAS2( 7),IPAS2( 8),IPAS2( 9),IPAS2(10),IPAS2(11),IPAS2(12), . IPAS2(13),IPAS2(14),IPAS2(15),IPAS2(16),IPAS2(17),IPAS2(18), . IPAS2(19),IPAS2(20),IPAS2(21),IPAS2(22),IPAS2(23),IPAS2(24), . IPAS2(25) ./ 40, 40, 53, 70, 55, 7, 22, 6, 21, 19, 64, 68, . 41, 10, 1, 27, 20, 1, 23, 8, 1, 7, 5, 31, 0/ C DATA IPAS3( 1),IPAS3( 2),IPAS3( 3),IPAS3( 4),IPAS3( 5),IPAS3( 6), . IPAS3( 7),IPAS3( 8),IPAS3( 9),IPAS3(10),IPAS3(11),IPAS3(12), . IPAS3(13),IPAS3(14),IPAS3(15),IPAS3(16),IPAS3(17),IPAS3(18), . IPAS3(19),IPAS3(20),IPAS3(21),IPAS3(22),IPAS3(23),IPAS3(24), . IPAS3(25) ./ 28, 46, 37, 38, 40, 21, 20, 9, 26, 25, 46, 48, . 31, 8, 1, 14, 26, 2, 28, 7, 1, 8, 7, 23, 0/ C c c The following flop-counts (NROPS) are required for scalar or serial c execution. The scalar version defines the NECESSARY computation c generally, in the absence of proof to the contrary. The vector c or parallel executions are only credited with executing the same c necessary computation. If the parallel methods do more computation c than is necessary then the extra flops are not counted as through-put. c DATA NROPS( 1),NROPS( 2),NROPS( 3),NROPS( 4),NROPS( 5),NROPS( 6), . NROPS( 7),NROPS( 8),NROPS( 9),NROPS(10),NROPS(11),NROPS(12), . NROPS(13),NROPS(14),NROPS(15),NROPS(16),NROPS(17),NROPS(18), . NROPS(19),NROPS(20),NROPS(21),NROPS(22),NROPS(23),NROPS(24) . /5., 4., 2., 2., 2., 2., 16., 36., 17., 9., 1., 1., . 7., 11., 33., 7., 9., 44., 6., 26., 2., 17., 11., 1./ C DATA WT( 1), WT( 2), WT( 3), WT( 4), WT( 5), WT( 6), . WT( 7), WT( 8), WT( 9), WT(10), WT(11), WT(12), . WT(13), WT(14), WT(15), WT(16), WT(17), WT(18), . WT(19), WT(20), WT(21), WT(22), WT(23), WT(24), . WT(25) ./1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, . 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, . 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0/ C DATA SKALE( 1),SKALE( 2),SKALE( 3),SKALE( 4),SKALE( 5),SKALE( 6), . SKALE( 7),SKALE( 8),SKALE( 9),SKALE(10),SKALE(11),SKALE(12), . SKALE(13),SKALE(14),SKALE(15),SKALE(16),SKALE(17),SKALE(18), . SKALE(19),SKALE(20),SKALE(21),SKALE(22),SKALE(23),SKALE(24), . SKALE(25) ./0.1 ,0.1 ,0.1 ,0.1 ,0.1 ,0.1 ,0.1 ,0.1 , . 0.1 ,0.1 ,0.1 ,0.1 ,0.1 ,0.1 ,0.1 ,0.1 , . 0.1 ,0.1 ,0.1 ,0.1 ,0.1 ,0.1 ,0.1 ,0.1 ,0.1/ C C/ ./1.0 ,1.0 ,1.0 ,1.0 ,1.0 ,1.0 ,1.0 ,1.0 , C/ . 1.0 ,1.0 ,1.0 ,1.0 ,1.0 ,1.0 ,1.0 ,1.0 , C/ . 1.0 ,1.0 ,1.0 ,1.0 ,1.0 ,1.0 ,1.0 ,1.0 ,1.0/ C DATA BIAS( 1), BIAS( 2), BIAS( 3), BIAS( 4), BIAS( 5), BIAS( 6), . BIAS( 7), BIAS( 8), BIAS( 9), BIAS(10), BIAS(11), BIAS(12), . BIAS(13), BIAS(14), BIAS(15), BIAS(16), BIAS(17), BIAS(18), . BIAS(19), BIAS(20), BIAS(21), BIAS(22), BIAS(23), BIAS(24), . BIAS(25) ./0.0 ,0.0 ,0.0 ,0.0 ,0.0 ,0.0 ,0.0 ,0.0 , . 0.0 ,0.0 ,0.0 ,0.0 ,0.0 ,0.0 ,0.0 ,0.0 , . 0.0 ,0.0 ,0.0 ,0.0 ,0.0 ,0.0 ,0.0 ,0.0 ,0.0/ END C C*********************************************** SUBROUTINE INDEX C*********************************************** C C C C ------ ----------------------------------------------- C MODULE PURPOSE C ------ ----------------------------------------------- C C KERNEL executes 24 samples of Fortran computation C C REPORT prints timing results C C RESULT computes timing results into pushdown store C C SECOND cumulative CPU time for task in seconds (MKS units) C C SIGNAL generates a set of floating-point numbers near 1.0 C C SIZES test and set the loop controls before each kernel test C C SORDID simple sort C C SPACE sets memory pointers for array variables. optional. C C STATS calculates unweighted statistics C C STATW calculates weighted statistics C C SUMO check-sum with ordinal dependency C C TEST times, tests, and initializes each kernel test C C TICK measures time overhead of subroutine test C C VALID compresses valid timing results C C VALUES initializes all values C C VECTOR initializes common blocks containing type real arrays. C ------ ----------------------------------------------- C C RETURN C END C C*********************************************** SUBROUTINE KERNEL( FLOPS,TR,RATES,LSPAN,WG,OSUM, TOCK) C*********************************************************************** C * C KERNEL - Executes Sample Fortran Kernels * C * C FLOPS - Array: Number of Flops executed by each kernel * C TR - Array: Time of execution of each kernel(microsecs) * C RATES - Array: Rate of execution of each kernel(megaflops/sec)* C LSPAN - Array: Span of inner DO loop in each kernel * C WG - Array: Weight assigned to each kernel for statistics * C OSUM - Array: Checksums of the results of each kernel * C TOCK - Timing overhead per kernel(sec). Clock resolution. * C * C*********************************************************************** C * C L. L. N. L. F O R T R A N K E R N E L S: M F L O P S * C * C These kernels measure Fortran numerical computation * C rates for a spectrum of cpu-limited computational * C structures or benchmarks. Mathematical through-put * C is measured in units of millions of floating-point * C operations executed per second, called Megaflops/sec. * C * C * C Fonzi's Law: There is not now and there never will be a language * C in which it is the least bit difficult to write * C bad programs. * C * C F.H.MCMAHON 1972 * C*********************************************************************** C C l1 := param-dimension governs the size of most 1-d arrays C l2 := param-dimension governs the size of most 2-d arrays C C Loop := multiple pass control to execute kernel long enough to time. C n := DO loop control for each kernel. Controls are set in subr. SIZES C C ****************************************************************** C IMPLICIT REAL*8(A-H,O-Z) C C/ PARAMETER( l1= 1001, l2= 101 ) C/ PARAMETER( l13= 64, l13h= l13/2, l213= l13+l13h, l813= 8*l13 ) C/ PARAMETER( l14= 512, l16= 75, l416= 4*l16 , l21= 25 ) C DIMENSION FLOPS(137), TR(137), RATES(137) DIMENSION LSPAN(137), WG(137), OSUM (137) C COMMON /SPACES/ ION,j5,k2,k3,Loop,m,KR,N13H, 1 n,N1,N2,N13,N213,N813,N16,N416,N21,NT1,NT2 C COMMON /SPACER/ A11,A12,A13,A21,A22,A23,A31,A32,A33, 2 AR,BR,C0,CR,DI,DK, 3 DM22,DM23,DM24,DM25,DM26,DM27,DM28,DN,E3,E6,EXPMAX,FLX, 4 Q,QA,R,RI,S,SCALE,SIG,STB5,T,XNC,XNEI,XNM C REAL LOOPS, NROPS INTEGER E,F,ZONE C COMMON /SPACE0/ TIME(47), CSUM(47), WW(47), WT(47), TICKS, 1 SKALE(47), BIAS(47), WS(95), LOOPS(47), NROPS(47) C COMMON /SPACEI/ ISPAN(47), ISP2(47), ISP3(47), 1 IPAS1(47), IPAS2(47), IPAS3(47) C C/ COMMON /ISPACE/ E(l213), F(l213), C/ 1 IX(l1), IR(l1), ZONE(l416) C/C C/ COMMON /SPACE1/ U(l1), V(l1), W(l1), C/ 1 X(l1), Y(l1), Z(l1), G(l1), C/ 2 DU1(l2), DU2(l2), DU3(l2), GRD(l1), DEX(l1), C/ 3 XI(l1), EX(l1), EX1(l1), DEX1(l1), C/ 4 VX(l1), XX(l1), RX(l1), RH(l1), C/ 5 VSP(l2), VSTP(l2), VXNE(l2), VXND(l2), C/ 6 VE3(l2), VLR(l2), VLIN(l2), B5(l2), C/ 7 PLAN(l416), D(l416), SA(l2), SB(l2) C/C C/ COMMON /SPACE2/ P(4,l813), PX(l21,l2), CX(l21,l2), C/ 1 VY(l2,l21), VH(l2,7), VF(l2,7), VG(l2,7), VS(l2,7), C/ 2 ZA(l2,7) , ZP(l2,7), ZQ(l2,7), ZR(l2,7), ZM(l2,7), C/ 3 ZB(l2,7) , ZU(l2,7), ZV(l2,7), ZZ(l2,7), C/ 4 B(l13,l13), C(l13,l13), H(l13,l13), C/ 5 U1(5,l2,2), U2(5,l2,2), U3(5,l2,2) C C ****************************************************************** C C C/ PARAMETER( l1= 1001, l2= 101 ) C/ PARAMETER( l13= 64, l13h= 64/2, l213= 64+32, l813= 8*64 ) C/ PARAMETER( l14= 512, l16= 75, l416= 4*75 , l21= 25) C C COMMON /ISPACE/ E(96), F(96), 1 IX(1001), IR(1001), ZONE(300) C COMMON /SPACE1/ U(1001), V(1001), W(1001), 1 X(1001), Y(1001), Z(1001), G(1001), 2 DU1(101), DU2(101), DU3(101), GRD(1001), DEX(1001), 3 XI(1001), EX(1001), EX1(1001), DEX1(1001), 4 VX(1001), XX(1001), RX(1001), RH(1001), 5 VSP(101), VSTP(101), VXNE(101), VXND(101), 6 VE3(101), VLR(101), VLIN(101), B5(101), 7 PLAN(300), D(300), SA(101), SB(101) C COMMON /SPACE2/ P(4,512), PX(25,101), CX(25,101), 1 VY(101,25), VH(101,7), VF(101,7), VG(101,7), VS(101,7), 2 ZA(101,7) , ZP(101,7), ZQ(101,7), ZR(101,7), ZM(101,7), 3 ZB(101,7) , ZU(101,7), ZV(101,7), ZZ(101,7), 4 B(64,64), C(64,64), H(64,64), 5 U1(5,101,2), U2(5,101,2), U3(5,101,2) C C ****************************************************************** C C// DIMENSION E(96), F(96), U(1001), V(1001), W(1001), C// 1 X(1001), Y(1001), Z(1001), G(1001), C// 2 DU1(101), DU2(101), DU3(101), GRD(1001), DEX(1001), C// 3 IX(1001), XI(1001), EX(1001), EX1(1001), DEX1(1001), C// 4 VX(1001), XX(1001), IR(1001), RX(1001), RH(1001), C// 5 VSP(101), VSTP(101), VXNE(101), VXND(101), C// 6 VE3(101), VLR(101), VLIN(101), B5(101), C// 7 PLAN(300), ZONE(300), D(300), SA(101), SB(101) C//C C// DIMENSION P(4,512), PX(25,101), CX(25,101), C// 1 VY(101,25), VH(101,7), VF(101,7), VG(101,7), VS(101,7), C// 2 ZA(101,7) , ZP(101,7), ZQ(101,7), ZR(101,7), ZM(101,7), C// 3 ZB(101,7) , ZU(101,7), ZV(101,7), ZZ(101,7), C// 4 B(64,64), C(64,64), H(64,64), C// 5 U1(5,101,2), U2(5,101,2), U3(5,101,2) C//C C//C ****************************************************************** C//C C// COMMON /POINT/ ME,MF,MU,MV,MW,MX,MY,MZ,MG,MDU1,MDU2,MDU3,MGRD, C// 1 MDEX,MIX,MXI,MEX,MEX1,MDEX1,MVX,MXX,MIR,MRX,MRH,MVSP,MVSTP, C// 2 MVXNE,MVXND,MVE3,MVLR,MVLIN,MB5,MPLAN,MZONE,MD,MSA,MSB, C// 3 MP,MPX,MCX,MVY,MVH,MVF,MVG,MVS,MZA,MZP,MZQ,MZR,MZM,MZB,MZU, C// 4 MZV,MZZ,MB,MC,MH,MU1,MU2,MU3 C//C C// POINTER (ME,E), (MF,F), (MU,U), (MV,V), (MW,W), C// 1 (MX,X), (MY,Y), (MZ,Z), (MG,G), C// 2 (MDU1,DU1),(MDU2,DU2),(MDU3,DU3),(MGRD,GRD),(MDEX,DEX), C// 3 (MIX,IX), (MXI,XI), (MEX,EX), (MEX1,EX1), (MDEX1,DEX1), C// 4 (MVX,VX), (MXX,XX), (MIR,IR), (MRX,RX), (MRH,RH), C// 5 (MVSP,VSP), (MVSTP,VSTP), (MVXNE,VXNE), (MVXND,VXND), C// 6 (MVE3,VE3), (MVLR,VLR), (MVLIN,VLIN), (MB5,B5), C// 7 (MPLAN,PLAN), (MZONE,ZONE), (MD,D), (MSA,SA), (MSB,SB) C//C C// POINTER (MP,P), (MPX,PX), (MCX,CX), C// 1 (MVY,VY), (MVH,VH), (MVF,VF), (MVG,VG), (MVS,VS), C// 2 (MZA,ZA), (MZP,ZP), (MZQ,ZQ), (MZR,ZR), (MZM,ZM), C// 3 (MZB,ZB), (MZU,ZU), (MZV,ZV), (MZZ,ZZ), C// 4 (MB,B), (MC,C), (MH,H), C// 5 (MU1,U1), (MU2,U2), (MU3,U3) C C ****************************************************************** C C STANDARD PRODUCT COMPILER DIRECTIVES MAY BE USED FOR OPTIMIZATION C CDIR$ VECTOR CLLL. OPTIMIZE LEVEL i CLLL. OPTION INTEGER (7) CLLL. OPTION NODYNEQV CLLL. COMMON DUMMY(2000) CLLL. LOC(X) =.LOC.X CLLL. IQ8QDSP = 64*LOC(DUMMY) C C ****************************************************************** C BINARY MACHINES MAY USE THE AND(P,Q) FUNCTION IF AVAILABLE C IN PLACE OF THE FOLLOWING CONGRUENCE FUNCTION (SEE KERNEL 13, 14) C CLLL. INTEGER AND, OR, XOR CLLL. AND(j,k) = j.INT.k C AND(j,k) = j.AND.k AND(j,k) = IAND(j,k) c MOD2N(i,j)= MOD(i,j) +j/2 -ISIGN(j/2,i) MOD2N(i,j)= AND(i,j-1) MOD2P(i,j)= AND(i-1,j-1) + 1 c MOD2P(i,j)= MOD2N(i-1,j) + 1 C i is Congruent to MOD2N(i,j) mod(j) cFor s-1 proj compiler: c* mod2n(i,j)= i .int. (j - 1) c* mod2p(i,j)= ((i - 1) .int. (j - 1)) + 1 C ****************************************************************** C C AMAX1(x,y)= MAX(x,y) C AMIN1(x,y)= MIN(x,y) C C C C CALL SPACE C CALL TEST(0) C C****************************************************************************** C*** KERNEL 1 HYDRO FRAGMENT C****************************************************************************** C DO 1 L = 1,Loop DO 1 k = 1,n 1 X(k)= Q + Y(k)*(R*Z(k+10) + T*Z(k+11)) C C................... CALL TEST(1) C C****************************************************************************** C*** KERNEL 2 ICCG EXCERPT (INCOMPLETE CHOLESKY - CONJUGATE GRADIENT) C****************************************************************************** C DO 200 L= 1,Loop IL= n IPNTP= 0 222 IPNT= IPNTP IPNTP= IPNTP+IL IL= IL/2 i= IPNTP CDIR$ IVDEP C DO 2 k= IPNT+2,IPNTP,2 i= i+1 2 X(i)= X(k) - V(k)*X(k-1) - V(k+1)*X(k+1) IF( IL.GT.1) GO TO 222 200 CONTINUE C C................... CALL TEST(2) C C****************************************************************************** C*** KERNEL 3 INNER PRODUCT C****************************************************************************** C DO 3 L= 1,Loop Q= 0.0 DO 3 k= 1,n 3 Q= Q + Z(k)*X(k) C C................... CALL TEST(3) C C C C C C****************************************************************************** C*** KERNEL 4 BANDED LINEAR EQUATIONS C****************************************************************************** C m= (1001-7)/2 DO 444 L= 1,Loop DO 444 k= 7,1001,m lw= k-6 CDIR$ IVDEP DO 4 j= 5,n,5 X(k-1)= X(k-1) - X(lw)*Y(j) 4 lw= lw+1 X(k-1)= Y(5)*X(k-1) 444 CONTINUE C C................... CALL TEST(4) C C****************************************************************************** C*** KERNEL 5 TRI-DIAGONAL ELIMINATION, BELOW DIAGONAL C****************************************************************************** C DO 5 L = 1,Loop DO 5 i = 2,n 5 X(i)= Z(i)*(Y(i) - X(i-1)) C C................... CALL TEST(5) C C****************************************************************************** C*** KERNEL 6 GENERAL LINEAR RECURRENCE EQUATIONS C****************************************************************************** C DO 6 L= 1,Loop DO 6 i= 2,n C W(i)= 0.01 use only if overflow occurs DO 6 k= 1,i-1 W(i)= W(i) + B(i,k) * W(i-k) 6 CONTINUE C C................... CALL TEST(6) C C****************************************************************************** C*** KERNEL 7 EQUATION OF STATE FRAGMENT C****************************************************************************** C DO 7 L= 1,Loop DO 7 k= 1,n X(k)= U(k ) + R*( Z(k ) + R*Y(k )) + . T*( U(k+3) + R*( U(k+2) + R*U(k+1)) + . T*( U(k+6) + R*( U(k+5) + R*U(k+4)))) 7 CONTINUE C C................... CALL TEST(7) C C C C C C****************************************************************************** C*** KERNEL 8 A.D.I. INTEGRATION C****************************************************************************** C DO 8 L = 1,Loop nl1 = 1 nl2 = 2 DO 8 kx = 2,3 CDIR$ IVDEP DO 8 ky = 2,n DU1(ky)=U1(kx,ky+1,nl1) - U1(kx,ky-1,nl1) DU2(ky)=U2(kx,ky+1,nl1) - U2(kx,ky-1,nl1) DU3(ky)=U3(kx,ky+1,nl1) - U3(kx,ky-1,nl1) U1(kx,ky,nl2)=U1(kx,ky,nl1) +A11*DU1(ky) +A12*DU2(ky) +A13*DU3(ky) . + SIG*(U1(kx+1,ky,nl1) -2.*U1(kx,ky,nl1) +U1(kx-1,ky,nl1)) U2(kx,ky,nl2)=U2(kx,ky,nl1) +A21*DU1(ky) +A22*DU2(ky) +A23*DU3(ky) . + SIG*(U2(kx+1,ky,nl1) -2.*U2(kx,ky,nl1) +U2(kx-1,ky,nl1)) U3(kx,ky,nl2)=U3(kx,ky,nl1) +A31*DU1(ky) +A32*DU2(ky) +A33*DU3(ky) . + SIG*(U3(kx+1,ky,nl1) -2.*U3(kx,ky,nl1) +U3(kx-1,ky,nl1)) 8 CONTINUE C C................... CALL TEST(8) C C****************************************************************************** C*** KERNEL 9 INTEGRATE PREDICTORS C****************************************************************************** C DO 9 L = 1,Loop DO 9 i = 1,n PX( 1,i)= DM28*PX(13,i) + DM27*PX(12,i) + DM26*PX(11,i) + . DM25*PX(10,i) + DM24*PX( 9,i) + DM23*PX( 8,i) + . DM22*PX( 7,i) + C0*(PX( 5,i) + PX( 6,i))+ PX( 3,i) 9 CONTINUE C C................... CALL TEST(9) C C****************************************************************************** C*** KERNEL 10 DIFFERENCE PREDICTORS C****************************************************************************** C DO 10 L= 1,Loop DO 10 i= 1,n AR = CX(5,i) BR = AR - PX(5,i) PX(5,i) = AR CR = BR - PX(6,i) PX(6,i) = BR AR = CR - PX(7,i) PX(7,i) = CR BR = AR - PX(8,i) PX(8,i) = AR CR = BR - PX(9,i) PX(9,i) = BR AR = CR - PX(10,i) PX(10,i)= CR BR = AR - PX(11,i) PX(11,i)= AR CR = BR - PX(12,i) PX(12,i)= BR PX(14,i)= CR - PX(13,i) PX(13,i)= CR 10 CONTINUE C C................... CALL TEST(10) C C****************************************************************************** C*** KERNEL 11 FIRST SUM. C****************************************************************************** C DO 11 L = 1,Loop X(1)= Y(1) DO 11 k = 2,n 11 X(k)= X(k-1) + Y(k) C C................... CALL TEST(11) C C****************************************************************************** C*** KERNEL 12 FIRST DIFF. C****************************************************************************** C DO 12 L = 1,Loop DO 12 k = 1,n 12 X(k)= Y(k+1) - Y(k) C C................... CALL TEST(12) C C****************************************************************************** C*** KERNEL 13 2-D PIC Particle In Cell C****************************************************************************** C DO 13 L= 1,Loop DO 13 ip= 1,n i1= P(1,ip) j1= P(2,ip) i1= 1 + MOD2N(i1,64) j1= 1 + MOD2N(j1,64) P(3,ip)= P(3,ip) + B(i1,j1) P(4,ip)= P(4,ip) + C(i1,j1) P(1,ip)= P(1,ip) + P(3,ip) P(2,ip)= P(2,ip) + P(4,ip) i2= P(1,ip) j2= P(2,ip) i2= MOD2N(i2,64) j2= MOD2N(j2,64) P(1,ip)= P(1,ip) + Y(i2+32) P(2,ip)= P(2,ip) + Z(j2+32) i2= i2 + E(i2+32) j2= j2 + F(j2+32) H(i2,j2)= H(i2,j2) + 1.0 13 CONTINUE C C................... CALL TEST(13) C C****************************************************************************** C*** KERNEL 14 1-D PIC Particle In Cell C****************************************************************************** C C DO 14 L= 1,Loop DO 141 k= 1,n IX(k)= INT( GRD(k)) XI(k)= FLOAT( IX(k)) EX1(k)= EX ( IX(k)) DEX1(k)= DEX ( IX(k)) 141 CONTINUE C DO 142 k= 1,n VX(k)= VX(k) + EX1(k) + (XX(k) - XI(k))*DEX1(k) XX(k)= XX(k) + VX(k) + FLX IR(k)= XX(k) RX(k)= XX(k) - IR(k) IR(k)= MOD2N( IR(k),512) + 1 XX(k)= RX(k) + IR(k) 142 CONTINUE C DO 14 k= 1,n RH(IR(k) )= RH(IR(k) ) + 1.0 - RX(k) RH(IR(k)+1)= RH(IR(k)+1) + RX(k) 14 CONTINUE C C................... CALL TEST(14) C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C****************************************************************************** C*** KERNEL 15 CASUAL FORTRAN. DEVELOPMENT VERSION. C****************************************************************************** C C C CASUAL ORDERING OF SCALAR OPERATIONS IS TYPICAL PRACTICE. C THIS EXAMPLE DEMONSTRATES THE NON-TRIVIAL TRANSFORMATION C REQUIRED TO MAP INTO AN EFFICIENT MACHINE IMPLEMENTATION. C DO 45 L = 1,Loop NR= 7 NZ= n AR= 0.053 BR= 0.073 15 DO 45 j = 2,NR DO 45 k = 2,NZ IF( j-NR) 31,30,30 30 VY(k,j)= 0.0 GO TO 45 31 IF( VH(k,j+1) -VH(k,j)) 33,33,32 32 T= AR GO TO 34 33 T= BR 34 IF( VF(k,j) -VF(k-1,j)) 35,36,36 35 R= DMAX1( VH(k-1,j), VH(k-1,j+1)) S= VF(k-1,j) GO TO 37 36 R= DMAX1( VH(k,j), VH(k,j+1)) S= VF(k,j) 37 VY(k,j)= SQRT( VG(k,j)**2 +R*R)*T/S 38 IF( k-NZ) 40,39,39 39 VS(k,j)= 0. GO TO 45 40 IF( VF(k,j) -VF(k,j-1)) 41,42,42 41 R= DMAX1( VG(k,j-1), VG(k+1,j-1)) S= VF(k,j-1) T= BR GO TO 43 42 R= DMAX1( VG(k,j), VG(k+1,j)) S= VF(k,j) T= AR 43 VS(k,j)= SQRT( VH(k,j)**2 +R*R)*T/S 45 CONTINUE C C................... CALL TEST(15) C C C C C C C C C C C C C C C****************************************************************************** C*** KERNEL 16 MONTE CARLO SEARCH LOOP C****************************************************************************** C II= n/3 LB= II+II k2= 0 k3= 0 C DO 485 L= 1,Loop m= 1 405 i1= m 410 j2= (n+n)*(m-1)+1 DO 470 k= 1,n k2= k2+1 j4= j2+k+k j5= ZONE(j4) IF( j5-n ) 420,475,450 415 IF( j5-n+II ) 430,425,425 420 IF( j5-n+LB ) 435,415,415 425 IF( PLAN(j5)-R) 445,480,440 430 IF( PLAN(j5)-S) 445,480,440 435 IF( PLAN(j5)-T) 445,480,440 440 IF( ZONE(j4-1)) 455,485,470 445 IF( ZONE(j4-1)) 470,485,455 450 k3= k3+1 IF( D(j5)-(D(j5-1)*(T-D(j5-2))**2+(S-D(j5-3))**2 . +(R-D(j5-4))**2)) 445,480,440 455 m= m+1 IF( m-ZONE(1) ) 465,465,460 460 m= 1 465 IF( i1-m) 410,480,410 470 CONTINUE 475 CONTINUE 480 CONTINUE 485 CONTINUE C C................... CALL TEST(16) C C****************************************************************************** C*** KERNEL 17 IMPLICIT, CONDITIONAL COMPUTATION C****************************************************************************** C C C RECURSIVE-DOUBLING VECTOR TECHNIQUES CAN NOT BE USED C BECAUSE CONDITIONAL OPERATIONS APPLY TO EACH ELEMENT. C DO 62 L= 1,Loop i= n j= 1 INK= -1 SCALE= 5./3. XNM= 1./3. E6= 1.03/3.07 GO TO 61 C STEP MODEL 60 E6= XNM*VSP(i)+VSTP(i) VXNE(i)= E6 XNM= E6 VE3(i)= E6 i= i+INK IF( i.EQ.j) GO TO 62 61 E3= XNM*VLR(i) +VLIN(i) XNEI= VXNE(i) VXND(i)= E6 XNC= SCALE*E3 C SELECT MODEL IF( XNM .GT.XNC) GO TO 60 IF( XNEI.GT.XNC) GO TO 60 C LINEAR MODEL VE3(i)= E3 E6= E3+E3-XNM VXNE(i)= E3+E3-XNEI XNM= E6 i= i+INK IF( i.NE.j) GO TO 61 62 CONTINUE C C................... CALL TEST(17) C C****************************************************************************** C*** KERNEL 18 2-D EXPLICIT HYDRODYNAMICS FRAGMENT C****************************************************************************** C DO 75 L= 1,Loop T= 0.0037 S= 0.0041 KN= 6 JN= n DO 70 k= 2,KN DO 70 j= 2,JN ZA(j,k)= (ZP(j-1,k+1)+ZQ(j-1,k+1)-ZP(j-1,k)-ZQ(j-1,k)) . *(ZR(j,k)+ZR(j-1,k))/(ZM(j-1,k)+ZM(j-1,k+1)) ZB(j,k)= (ZP(j-1,k)+ZQ(j-1,k)-ZP(j,k)-ZQ(j,k)) . *(ZR(j,k)+ZR(j,k-1))/(ZM(j,k)+ZM(j-1,k)) 70 CONTINUE C DO 72 k= 2,KN DO 72 j= 2,JN ZU(j,k)= ZU(j,k)+S*(ZA(j,k)*(ZZ(j,k)-ZZ(j+1,k)) . -ZA(j-1,k) *(ZZ(j,k)-ZZ(j-1,k)) . -ZB(j,k) *(ZZ(j,k)-ZZ(j,k-1)) . +ZB(j,k+1) *(ZZ(j,k)-ZZ(j,k+1))) ZV(j,k)= ZV(j,k)+S*(ZA(j,k)*(ZR(j,k)-ZR(j+1,k)) . -ZA(j-1,k) *(ZR(j,k)-ZR(j-1,k)) . -ZB(j,k) *(ZR(j,k)-ZR(j,k-1)) . +ZB(j,k+1) *(ZR(j,k)-ZR(j,k+1))) 72 CONTINUE C DO 75 k= 2,KN DO 75 j= 2,JN ZR(j,k)= ZR(j,k)+T*ZU(j,k) ZZ(j,k)= ZZ(j,k)+T*ZV(j,k) 75 CONTINUE C C................... CALL TEST(18) C C****************************************************************************** C*** KERNEL 19 GENERAL LINEAR RECURRENCE EQUATIONS C****************************************************************************** C C IF( JR.GT.1 ) GO TO 192 C KB5I= 0 DO 194 L= 1,Loop DO 191 k= 1,n B5(k+KB5I)= SA(k) +STB5*SB(k) STB5= B5(k+KB5I) -STB5 191 CONTINUE C GO TO 194 C 192 DO 193 i= 1,n k= n-i+1 B5(k+KB5I)= SA(k) +STB5*SB(k) STB5= B5(k+KB5I) -STB5 193 CONTINUE 194 CONTINUE C C................... CALL TEST(19) C C****************************************************************************** C*** KERNEL 20 DISCRETE ORDINATES TRANSPORT: CONDITIONAL RECURRENCE ON XX. C****************************************************************************** C DO 20 L= 1,Loop DO 20 k= 1,n DI= Y(k)-G(k)/( XX(k)+DK) DN= 0.2 IF( DI.NE.0.0) DN= DMAX1( 0.1D0,DMIN1( Z(k)/DI, 0.2D0)) X(k)= ((W(k)+V(k)*DN)* XX(k)+U(k))/(VX(k)+V(k)*DN) XX(k+1)= (X(k)- XX(k))*DN+ XX(k) 20 CONTINUE C C................... CALL TEST(20) C C****************************************************************************** C*** KERNEL 21 MATRIX*MATRIX PRODUCT C****************************************************************************** C DO 21 L= 1,Loop DO 21 i= 1,n DO 21 k= 1,n DO 21 j= 1,n PX(i,j)= PX(i,j) +VY(i,k) * CX(k,j) 21 CONTINUE C C................... CALL TEST(21) C C C C C C C C****************************************************************************** C*** KERNEL 22 PLANCKIAN DISTRIBUTION C****************************************************************************** C C C EXPMAX= 234.5 EXPMAX= 20.0 U(n)= 0.99*EXPMAX*V(n) DO 22 L= 1,Loop DO 22 k= 1,n CARE IF( U(k) .LT. EXPMAX*V(k)) THEN Y(k)= U(k)/V(k) CARE ELSE CARE Y(k)= EXPMAX CARE ENDIF W(k)= X(k)/( EXP( Y(k)) -1.0) 22 CONTINUE C................... CALL TEST(22) C C****************************************************************************** C*** KERNEL 23 2-D IMPLICIT HYDRODYNAMICS FRAGMENT C****************************************************************************** C DO 23 L= 1,Loop DO 23 j= 2,6 DO 23 k= 2,n QA= ZA(k,j+1)*ZR(k,j) +ZA(k,j-1)*ZB(k,j) + . ZA(k+1,j)*ZU(k,j) +ZA(k-1,j)*ZV(k,j) +ZZ(k,j) 23 ZA(k,j)= ZA(k,j) +.175*(QA -ZA(k,j)) C C................... CALL TEST(23) C C****************************************************************************** C*** KERNEL 24 FIND LOCATION OF FIRST MINIMUM IN ARRAY C****************************************************************************** C C X( n/2)= -1.0E+50 X( n/2)= -1.0E+10 DO 24 L= 1,Loop m= 1 DO 24 k= 2,n IF( X(k).LT.X(m)) m= k 24 CONTINUE C C m= imin1( n,x,1) 35 nanosec./element STACKLIBE/CRAY C................... CALL TEST(24) C C****************************************************************************** C CALL RESULT( FLOPS,TR,RATES,LSPAN,WG,OSUM, TOCK) C RETURN END C C*********************************************** FUNCTION LIMITS( ilbi, i, iubi, tag, iou, j, k ) C*********************************************** C C LIMITS tests i between ilbi, iubi C C*********************************************** IMPLICIT REAL*8(A-H,O-Z) C LIMITS= 1 IF( (ilbi .LE. i) .AND. (i .LE. iubi)) RETURN C mini= MIN0( mini,i) maxi= MAX0( maxi,i) WRITE( iou,101) tag,j,k, ilbi,i,iubi 101 FORMAT(1X,A8,2I6,6X,I9,4H.LE.,I9,4H.LE.,I9) LIMITS= 0 RETURN END C*********************************************** C %Z%%M% %I% %E% SMI SUBROUTINE REPORT( LOGIO, NK, FLOPS,TR,RATES,LSPAN,WG,OSUM ) C IMPLICIT REAL*8(A-H,O-Z) REAL LOOPS, NROPS c For s-1 proj compiler use F77 strings: CHARACTER Komput*32, Kompil*32, TAG(5)*8 C COMMON /SPACE0/ TIME(47), CSUM(47), WW(47), WT(47), TICKS, 1 SKALE(47), BIAS(47), WS(95), LOOPS(47), NROPS(47) C COMMON /SPACEI/ ISPAN(47), ISP2(47), ISP3(47), 1 IPAS1(47), IPAS2(47), IPAS3(47) C DIMENSION FLOPS(137), TR(137), RATES(137) DIMENSION LSPAN(137), WG(137), OSUM (137) c DIMENSION TAG(5) DIMENSION HM(12), FR(12), N1(10), N2(10), IQ(10), SUMW(10) DIMENSION LQ(5), STAT1(12), STAT2(12) DIMENSION IN(137), CSUM1(137) DIMENSION MAP1(137), MAP2(137), IN2(137), VL1(137) DIMENSION MAP(137), VL(137), WL(137), TV(137), TV1(137), TV2(137) DIMENSION FLOPS1(137), RT1(137), ISPAN1(137), WT1(137) DIMENSION FLOPS2(137), RT2(137), ISPAN2(137), WT2(137) C DATA kern /24/ C DATA FR( 1), FR( 2), FR( 3), FR( 4), FR( 5), FR( 6), . FR( 7), FR( 8), FR( 9) ./ 0.0, 0.2, 0.4, 0.6, 0.7, 0.8, 0.9, 0.95, 1.0/ C DATA SUMW( 1), SUMW( 2), SUMW( 3), SUMW( 4), SUMW( 5), SUMW( 6), . SUMW( 7) ./1.0, 0.95, 0.9, 0.8, 0.7, 0.6, 0.5/ C DATA IQ( 1), IQ( 2), IQ( 3), IQ( 4), IQ( 5), IQ( 6), . IQ( 7) ./1, 2, 1, 2, 1, 2, 1/ C c DATA TAG( 1), TAG( 2), TAG( 3), TAG( 4) c ./8H1st QT: , 8H2nd QT: , 8H3rd QT: , 8H4th QT: / c For s-1 proj compiler use F77 string: data tag(1) /'1st QT:'/ data tag(2) /'2nd QT:'/ data tag(3) /'3rd QT:'/ data tag(4) /'4th QT:'/ C c DATA Komput/ 8HCRAY-XMP/ c DATA Komput/ 8HCRAY-1 / c DATA Komput/ 8HVAX750 / c For s-1 proj compiler use F77 string: c data komput /'Sun 2/50 REAL*4'/ c data komput /'Apollo DN560 REAL*8'/ data komput /'Alliant FX/1 REAL*8'/ C c DATA Kompil/ 8HCFT-1.12/ c DATA Kompil/ 8HCIVIC30i/ c DATA Kompil/ 8H4.2BSD / c For s-1 proj compiler use F77 string: c data kompil / 'Sun f77 Rel 2.0' / c data kompil / 'Apollo FTN 8.40 (AEGIS SR9.2.3)' / data kompil / 'Alliant Concentrix Fortran rev. 1.0' / C MODI(i,M)= (MOD(i,M) + M*( 1 - MIN0(1,MOD(i,M)))) C IF( logio.LT.0) RETURN C fuzz= 1.0e-9 DO 1000 k= 1,NK VL(k)= LSPAN(k) 1000 CONTINUE C CALL VALID( TV,MAP,neff, 1.0e-5,RATES, 1.0e+4,NK) C C Compress valid data sets mapping on MAP. C DO 1 k= 1,neff MAP1(k)= MODI( MAP(k),kern) FLOPS1(k)= FLOPS( MAP(k)) RT1(k)= TR( MAP(k)) VL1(k)= VL( MAP(k)) ISPAN1(k)= LSPAN( MAP(k)) WT1(k)= WG( MAP(k)) TV1(k)= RATES( MAP(k)) CSUM1(k)= OSUM( MAP(k)) 1 continue C CALL STATW( STAT1,TV,IN, VL1,WT1,neff) LV= STAT1(1) C CALL STATW( STAT1,TV,IN, TV1,WT1,neff) twt= STAT1(6) C if (0 .eq. 1) then WRITE ( logio,7001) WRITE ( logio,7001) WRITE ( logio,7001) WRITE ( logio,7001) WRITE ( logio,7001) WRITE ( logio,7001) WRITE ( logio,7000) WRITE ( logio,7002) endif WRITE ( logio,7003) WRITE ( logio,7002) WRITE ( logio,7007) Komput WRITE ( logio,7008) Kompil WRITE ( logio,7009) LV if (0 .eq. 1) then WRITE ( logio,7061) WRITE ( logio,7062) WRITE ( logio,7063) WRITE ( logio,7064) WRITE ( logio,7065) WRITE ( logio,7066) WRITE ( logio,7067) WRITE ( logio,7001) WRITE ( logio,7004) WRITE ( logio,7005) WRITE ( logio,7011) (MAP1(k), FLOPS1(k), RT1(k), 1000*TV1(k), . ISPAN1(k), WT1(k), CSUM1(k), k=1,neff) WRITE ( logio,7005) endif C WRITE ( logio,7022) 1000*STAT1(3), 1000*STAT1(4) WRITE ( logio,7055) 1000*STAT1(5) WRITE ( logio,7030) 1000*STAT1(7) WRITE ( logio,7031) 1000*STAT1(9) WRITE ( logio,7033) 1000*STAT1(1) WRITE ( logio,7044) 1000*STAT1(2) C 7000 FORMAT(1H1) 7001 FORMAT(/) 7002 FORMAT( 45H ******************************************** ) 7003 FORMAT( 45H THE LIVERMORE F O R T R A N K E R N E L S ) 7004 FORMAT(/,53H KERNEL FLOPS MICROSEC KFLOP/SEC SPAN WEIGHT SUM) 7005 FORMAT( 53H ------ ----- -------- --------- ---- ------ ---) 7007 FORMAT(/,9X,16H Computer : ,A32,/) 7008 FORMAT( 9X,16H Compiler : ,A32,/) 7009 FORMAT( 9X,16HMean Vector L = ,I5,/) 7011 FORMAT(1X,i2,E11.4,E11.4,F12.1,1X,I4,1X,F6.2,E20.12) 7012 FORMAT(1X,i2,E11.4,E11.4,F12.1,1X,I4,1X,F6.2) 7022 FORMAT(/,9X,16HKFLOPS RANGE = ,F12.1,4H TO,F12.1, . 16H Kilo-Flops/Sec. ) 7055 FORMAT(/,9X,16HHARMONIC MEAN = ,F12.1,16H Kilo-Flops/Sec. ) 7030 FORMAT(/,9X,16HMEDIAN RATE = ,F12.1,16H Kilo-Flops/Sec. ) 7031 FORMAT(/,9X,16HMEDIAN DEV. = ,F12.1,16H Kilo-Flops/Sec. ) 7033 FORMAT(/,9X,16HAVERAGE RATE = ,F12.1,16H Kilo-Flops/Sec. ) 7044 FORMAT(/,9X,16HSTANDARD DEV. = ,F12.1,16H Kilo-Flops/Sec. ) 7053 FORMAT(/,9X,16HFRAC. WEIGHTS = ,F12.4) C 7061 FORMAT(9X,52HWhen the computer performance range is very large ) 7062 FORMAT(9X,52Hthe net Mflops rate of many Fortran programs and ) 7063 FORMAT(9X,52Hworkloads will be in the sub-range between the equi-) 7064 FORMAT(9X,52Hweighted harmonic and arithmetic means depending ) 7065 FORMAT(9X,52Hon the degree of code parallelism and optimization. ) 7066 FORMAT(9X,52HMore accurate estimates of cpu workload rates depend) 7067 FORMAT(9X,52Hon assigning appropriate weights for each kernel. ) C WRITE ( logio,7000) WRITE ( logio,7070) 7070 FORMAT(//,50H TOP QUARTILE: BEST ARCHITECTURE/APPLICATION MATCH ) C C Compute compression index-list MAP1: Non-zero weights. C CALL VALID( TV,MAP1,meff, 1.0e-6, WT1, 1.0e+6, neff) C C Re-order data sets mapping on IN (descending order of MFlops). C DO 2 k= 1,meff i= IN( MAP1(k)) FLOPS2(k)= FLOPS1(i) RT2(k)= RT1(i) ISPAN2(k)= ISPAN1(i) WT2(k)= WT1(i) TV2(k)= TV1(i) MAP2(k)= MODI( MAP(i),kern) 2 continue C nq= meff/4 lr= meff -4*nq LQ(1)= nq LQ(2)= nq + nq + lr LQ(3)= nq i2= 0 C DO 5 j= 1,3 i1= i2 + 1 i2= i2 + LQ(j) ll= i2 - i1 + 1 CALL STATW( STAT2,TV,IN2, TV2(i1),WT2(i1),ll) frac= STAT2(6)/( twt +fuzz) WL(j)= STAT2(5) C WRITE ( logio,7001) WRITE ( logio,7004) WRITE ( logio,7005) WRITE ( logio,7012) ( MAP2(k),FLOPS2(k),RT2(k),1000*TV2(k), . ISPAN2(k), WT2(k), k=i1,i2 ) WRITE ( logio,7005) C WRITE ( logio,7053) frac WRITE ( logio,7055) 1000*STAT2(5) WRITE ( logio,7033) 1000*STAT2(1) WRITE ( logio,7044) 1000*STAT2(2) 5 continue smf= WL(1) C return WRITE ( logio,7000) WRITE ( logio,7001) C 7301 FORMAT(9X,31H SENSITIVITY ANALYSIS ) 7302 FORMAT(9X,52HThe sensitivity of the harmonic mean rate (Mflops) ) 7303 FORMAT(9X,52Hto various weightings is shown in the table below. ) 7304 FORMAT(9X,52HSeven work distributions are generated by assigning ) 7305 FORMAT(9X,52Htwo distinct weights to ranked kernels by quartiles.) 7306 FORMAT(9X,52HForty nine possible cpu workloads are then evaluated) 7307 FORMAT(9X,52Husing seven sets of values for the total weights: ) 7341 FORMAT(3X,A7,6X,43HO O O O O X X) 7342 FORMAT(3X,A7,6X,43HO O O X X X O) 7343 FORMAT(3X,A7,6X,43HO X X X O O O) 7344 FORMAT(3X,A7,6X,43HX X O O O O O) 7346 FORMAT(13X, 48H------ ------ ------ ------ ------ ------ ------) 7348 FORMAT(3X,5HTotal,/,3X,7HWeights,20X,11HNet Mflops:,/,4X,6HX O) 7349 FORMAT(2X,9H---- ---- ) 7220 FORMAT(/,1X,2F5.2,1X,7F7.2) C WRITE ( logio,7001) WRITE ( logio,7001) WRITE ( logio,7301) WRITE ( logio,7001) WRITE ( logio,7302) WRITE ( logio,7303) WRITE ( logio,7304) WRITE ( logio,7305) WRITE ( logio,7306) WRITE ( logio,7307) WRITE ( logio,7001) WRITE ( logio,7346) WRITE ( logio,7341) TAG(1) WRITE ( logio,7342) TAG(2) WRITE ( logio,7343) TAG(3) WRITE ( logio,7344) TAG(4) WRITE ( logio,7346) WRITE ( logio,7348) WRITE ( logio,7349) C r= meff mq= (meff+3)/4 q= mq j= 1 DO 21 i= 8,2,-2 N1(i )= j N1(i+1)= j N2(i )= j + mq + mq - 1 N2(i+1)= j + mq - 1 j= j + mq 21 continue C DO 29 j= 1,7 sumo= 1.0 - SUMW(j) DO 27 i= 1,7 p= IQ(i)*Q xt= SUMW(j)/(p + fuzz) ot= sumo /(r - p + fuzz) DO 23 k= 1,meff WL(k)= ot 23 continue k1= N1(i+2) k2= N2(i+2) DO 25 k= k1,k2 WL(k)= xt 25 continue CALL STATW( STAT2,TV,IN2, TV2,WL,meff) RT1(i)= STAT2(5) 27 continue WRITE ( logio,7220) SUMW(j), sumo, ( RT1(k), k=1,7) 29 continue C WRITE ( logio,7349) WRITE ( logio,7346) C 7101 FORMAT(4x,52HNET MFLOP RATES OF PARTIALLY OPTIMIZED FORTRAN CODE:) 7102 FORMAT(/,1X,5F7.2,4F8.2) 7103 FORMAT(3x,52H Fraction Of Operations Run At Optimal Machine Rates) C CALL STATW( STAT2,TV,IN2, TV2, WT2, meff) med= meff + 1 - INT(STAT2(8)) lh= meff - med C CALL STATW( STAT2,TV,IN2, TV2(med),WT2(med),lh) C HM(1)= STAT2(5) g= 1.0 - HM(1)/( smf + fuzz) C DO 7 k= 2,9 HM(k)= HM(1)/( 1.0 - FR(k)*g + fuzz) 7 continue C WRITE ( logio,7001) WRITE ( logio,7001) WRITE ( logio,7001) WRITE ( logio,7101) WRITE ( logio,7102) ( HM(k), k= 1,9) WRITE ( logio,7102) ( FR(k), k= 1,9) WRITE ( logio,7103) WRITE ( logio,7001) RETURN END C********************************************** SUBROUTINE RESULT( FLOPS,TR,RATES,LSPAN,WG,OSUM, TOCK) C*********************************************************************** C * C RESULT - Computes timing Results into pushdown store. * C * C FLOPS - Array: Number of Flops executed by each kernel * C TR - Array: Time of execution of each kernel(microsecs) * C RATES - Array: Rate of execution of each kernel(megaflops/sec)* C LSPAN - Array: Span of inner DO loop in each kernel * C WG - Array: Weight assigned to each kernel for statistics * C OSUM - Array: Checksums of the results of each kernel * C TOCK - Timing overhead per kernel(sec). Clock resolution. * C * C*********************************************************************** C C IMPLICIT REAL*8(A-H,O-Z) DIMENSION FLOPS(137), TR(137), RATES(137) DIMENSION LSPAN(137), WG(137), OSUM (137) DIMENSION WTL(5) C REAL LOOPS, NROPS C COMMON /SPACE0/ TIME(47), CSUM(47), WW(47), WT(47), TICKS, 1 SKALE(47), BIAS(47), WS(95), LOOPS(47), NROPS(47) C COMMON /SPACEI/ ISPAN(47), ISP2(47), ISP3(47), 1 IPAS1(47), IPAS2(47), IPAS3(47) C DATA kern /24/ DATA kl/0/, limits/137/ DATA WTL(1),WTL(2),WTL(3) /1.0,2.0,1.0/ C C Push Result Arrays Down before entering new resul limit= limits - kern j = limits DO 1001 k = limit,1,-1 FLOPS(j)= FLOPS(k) TR(j)= TR(k) RATES(j)= RATES(k) LSPAN(j)= LSPAN(k) WG(j)= WG(k) OSUM(j)= OSUM(k) j = j - 1 1001 CONTINUE C C CALCULATE MFLOPS FOR EACH KERNEL tmin= 5.0*TOCK kl= kl+1 DO 1010 k = 1,kern FLOPS(k)= NROPS(k)*LOOPS(k) TR(k)= (TIME(k) - TOCK)* 1.0e+6 RATES(k)= 0.0 IF( TR(k).NE. 0.0) RATES(k)= FLOPS(k)/TR(k) IF( WT(k).LE. 0.0) RATES(k)= 0.0 IF( TIME(k).LE.tmin) RATES(k)= 0.0 LSPAN(k)= ISPAN(k) WG(k)= WT(k)*WTL(kl) OSUM(k)= CSUM(k) 1010 CONTINUE C RETURN END C C c For s-1 proj compiler we supply an externally compiled function: c Not for Sun C********************************************** function SECOND( OLDSEC) C*********************************************** C REAL SECOND C C SECOND= Cumulative CPU time for job in seconds. MKS unit is seconds. C Clock resolution should be less than 2% of Kernel 12 run-time. C C C If your system provides a timing routine that satisfies C the definition above; then simply delete this function. C C Else this function must be programmed using some C timing routine available in your system. C Timing routines with CPU clock resolution are always sufficient. C Timing routines with microsec. resolution are usually sufficient. C C Timing routines with much less resolution may require the use C of multiple-pass loops around each kernel to make the run time C at least 50 times the tick-period of the timing routine. C In these cases, you must redefine the DATA statements for IPAS1 C to increase the value of Loop set in subroutine SIZES. C C If no CPU timer is available, then you can time each kernel by C the wall clock using the PAUSE statement at the end of subr. VALUES. C C Function TICK measures the overhead time for a call to SECOND. C An independent calibration of the running time of at least C one kernel may be wise. See Subr. TEST, label 100+. C C C The following statement is deliberately incomplete: C c SECOND= C C C The following statement was used on the DEC VAX/780 VMS 3.0 C Also set: Loop= 10*Loop in Subroutine SIZES to run kernels long enough. C C REAL SECNDS C* SECOND= SECNDS( OLDSEC) C C OR: C C$ DATA INITIA /237/ C$ IF( INITIA.NE.237 ) GO TO 1 C$ NSTAT= LIB$INIT_TIMER C$ 1 INITIA= INITIA + 1 C$ C$ NSTAT = LIB$STAT_TIMER(2,ISEC) C$ SECOND= FLOAT(ISEC)*0.01 - OLDSEC C C C The following statements were used on the DEC PDP-11/23 RT-11 system. C Also set: Loop=100*Loop in Subroutine SIZES to run kernels long enough. C C* DIMENSION JT(2) C* CALL GTIM(JT) C* TIME1 = JT(1) C* TIME2 = JT(2) C* TIME = TIME1 * 65768. + TIME2 C* SECOND=TIME/60. - OLDSEC C C C The following statements were used on the Hewlett-Packard HP 9000 C Also set: Loop= 10*Loop in Subroutine SIZES to run kernels long enough. C C* INTEGER*4 ITIME(4) C* CALL TIMES( ITIME(4)) C* TIMEX= ITIME(1) + ITIME(2) + ITIME(3) + ITIME(4) C* SECOND= TIMEX/60. - OLDSEC C C C The following statements were used on the SUN workstation C Also set: Loop= 10*Loop in Subroutine SIZES to run kernels long enough. C C DIMENSION XTIME(2) C REAL*4 XTIME C XT= ETIME( XTIME) C SECOND= (XTIME(1) + XTIME(2)) - OLDSEC C C The following statements were used on the Apollo workstation C Also set: Loop= 10*Loop in Subroutine SIZES to run kernels long enough. C C INTEGER*2 CLOCK(3) C DOUBLE PRECISION DSECS C C CALL PROC1_$GET_CPUT (CLOCK) C CALL CAL_$FLOAT_CLOCK (CLOCK,DSECS) C SECOND = DSECS-OLDSEC C C The following statements were used on the Alliant FX/1 C Also set: Loop= 10*Loop in Subroutine SIZES to run kernels long enough. C REAL STIME,UTIME CALL ETIME(UTIME,STIME) SECOND=UTIME-OLDSEC RETURN END C C*********************************************** SUBROUTINE SIGNAL( V, SCALE,BIAS, n) C*********************************************** C C SIGNAL GENERATES VERY FRIENDLY FLOATING-POINT NUMBERS NEAR 1.0 C WHEN SCALE= 1.0 AND BIAS= 0. C C*********************************************** IMPLICIT REAL*8(A-H,O-Z) C DIMENSION V(n) C SCALED= SCALE BIASED= BIAS C FUZZ= 1.2345E-9 FUZZ= 1.2345E-3 BUZZ= 1.0 +FUZZ FIZZ= 1.1 *FUZZ C DO 1 k= 1,n BUZZ= (1.0 - FUZZ)*BUZZ +FUZZ FUZZ= -FUZZ V(k)=((BUZZ- FIZZ) -BIASED)*SCALED 1 CONTINUE C RETURN END C C C*********************************************** SUBROUTINE SIZES(i) C*********************************************** IMPLICIT REAL*8(A-H,O-Z) C C C/ PARAMETER( l1= 1001, l2= 101 ) C/ PARAMETER( l13= 64, l13h= l13/2, l213= l13+l13h, l813= 8*l13 ) C/ PARAMETER( l14= 512, l16= 75, l416= 4*l16 , l21= 25 ) C C/ PARAMETER( l1= 27, l2= 15 ) C/ PARAMETER( l13= 8, l13h= 8/2, l213= 8+4, l813= 8*8 ) C/ PARAMETER( l14= 16, l16= 15, l416= 4*15 , l21= 15) C C/ PARAMETER( l1= 1001, l2= 101 ) C/ PARAMETER( l13= 64, l13h= 64/2, l213= 64+32, l813= 8*64 ) C/ PARAMETER( l14= 512, l16= 75, l416= 4*75 , l21= 25) C C C/ PARAMETER( NNI= 2*l1 +2*l213 +l416 ) C/ PARAMETER( NN1= 17*l1 +13*l2 +2*l416 ) C/ PARAMETER( NN2= 4*l813 + 3*l21*l2 +121*l2 +3*l13*l13 ) C/ PARAMETER( Nl1= 19*l1, Nl2= 131*l2 +3*l21*l2 ) C/ PARAMETER( Nl13= 3*l13*l13 +34*l13 +32) C C COMMON /SPACES/ ION,j5,k2,k3,Loop,m,KR,N13H, 1 n,N1,N2,N13,N213,N813,N16,N416,N21,NT1,NT2 C COMMON /SPACER/ A11,A12,A13,A21,A22,A23,A31,A32,A33, 2 AR,BR,C0,CR,DI,DK, 3 DM22,DM23,DM24,DM25,DM26,DM27,DM28,DN,E3,E6,EXPMAX,FLX, 4 Q,QA,R,RI,S,SCALE,SIG,STB5,T,XNC,XNEI,XNM C REAL LOOPS, NROPS COMMON /SPACE0/ TIME(47), CSUM(47), WW(47), WT(47), TICKS, 1 SKALE(47), BIAS(47), WS(95), LOOPS(47), NROPS(47) C COMMON /SPACEI/ ISPAN(47), ISP2(47), ISP3(47), 1 IPAS1(47), IPAS2(47), IPAS3(47) C C C ****************************************************************** C C i := kernel number C Loop := multiple pass control to execute kernel long enough to time. C n := DO loop control for each kernel. Controls are set in subr. SIZES C C ****************************************************************** C C Domain tests follow to detect overstoring of controls for array opns. C C IT= 1 IF( i.LT.0 .OR. i.GT. 24) GO TO 911 IF( n.LT.0 .OR. n.GT. 1001) GO TO 911 IF(Loop.LT.0 .OR. Loop.GT.60000) GO TO 911 C IT= 2 IF( KR.EQ.2 ) ISPAN(i+1)= ISP2(i+1) IF( KR.EQ.3 ) ISPAN(i+1)= ISP3(i+1) n = ISPAN(i+1) C Loop= IPAS1(i+1) IF( KR.EQ.2 ) Loop= 2*IPAS2(i+1) IF( KR.EQ.3 ) Loop= 8*IPAS3(i+1) C Loop= 100*Loop Loop= 10*Loop C IF( n.LT.0 .OR. n.GT. 1001) GO TO 911 IF(Loop.LT.0 .OR. Loop.GT.60000) GO TO 911 c WRITE( ION,915) Loop c 915 FORMAT(1H1,///,37H **** Number loops per iteration = ,I8) 915 FORMAT(37H **** Number loops per iteration = ,I8) N1 = 1001 N2 = 101 N13 = 64 N13H= 32 N213= 96 N813= 512 N16 = 75 N416= 300 N21 = 25 C NT1= 17*1001 +13*101 +2*300 NT2= 4*512 + 3*25*101 +121*101 +3*64*64 C RETURN C C 911 WRITE( ION,913) i, n, Loop 913 FORMAT(1H1,///,37H FATAL OVERSTORE/ DATA LOSS. TEST= ,4I8) STOP C END C*********************************************** SUBROUTINE SORDID( i,W, V,n,KIND) C*********************************************** C QUICK AND DIRTY PORTABLE SORT. C C i - RESULT INDEX-LIST. MAPS V TO SORTED W. C W - RESULT ARRAY, SORTED V. C C V - INPUT ARRAY SORTED IN PLACE. C n - INPUT NUMBER OF ELEMENTS IN V C KIND - SORT ORDER: = 1 ASCENDING MAGNITUDE C = 2 DESCENDING MAGNITUDE C C*********************************************** IMPLICIT REAL*8(A-H,O-Z) C DIMENSION i(n), W(n), V(n) C DO 1 k= 1,n W(k)= V(k) 1 i(k)= k C IF( KIND.EQ.2) GO TO 4 C DO 3 j= 1,n-1 m= j DO 2 k= j+1,n IF( W(k).LT.W(m)) m= k 2 CONTINUE X= W(j) k= i(j) W(j)= W(m) i(j)= i(m) W(m)= X i(m)= k 3 CONTINUE RETURN C C 4 DO 6 j= 1,n-1 m= j DO 5 k= j+1,n IF( W(k).GT.W(m)) m= k 5 CONTINUE X= W(j) k= i(j) W(j)= W(m) i(j)= i(m) W(m)= X i(m)= k 6 CONTINUE RETURN END C C*********************************************** SUBROUTINE SPACE C*********************************************** C C Subroutine Space dynamically allocates physical memory space C for the array variables in KERNEL by setting pointer values. C The POINTER declaration has been defined in the IBM PL1 language C and defined as a Fortran extension in Livermore and CRAY compilers. C C In general, large FORTRAN simulation programs use a memory C manager to dynamically allocate arrays to conserve high speed C physical memory and thus avoid slow disk references (page faults). C C It is sufficient for our purposes to trivially set the values C of pointers to the location of static arrays used in common. C The efficiency of pointered (indirect) computation should be measured C if available. C C*********************************************** IMPLICIT REAL*8(A-H,O-Z) C C C/ PARAMETER( l1= 1001, l2= 101 ) C/ PARAMETER( l13= 64, l13h= 64/2, l213= 64+32, l813= 8*64 ) C/ PARAMETER( l14= 512, l16= 75, l416= 4*75 , l21= 25) C INTEGER E,F,ZONE C COMMON /ISPACE/ E(96), F(96), 1 IX(1001), IR(1001), ZONE(300) C COMMON /SPACE1/ U(1001), V(1001), W(1001), 1 X(1001), Y(1001), Z(1001), G(1001), 2 DU1(101), DU2(101), DU3(101), GRD(1001), DEX(1001), 3 XI(1001), EX(1001), EX1(1001), DEX1(1001), 4 VX(1001), XX(1001), RX(1001), RH(1001), 5 VSP(101), VSTP(101), VXNE(101), VXND(101), 6 VE3(101), VLR(101), VLIN(101), B5(101), 7 PLAN(300), D(300), SA(101), SB(101) C COMMON /SPACE2/ P(4,512), PX(25,101), CX(25,101), 1 VY(101,25), VH(101,7), VF(101,7), VG(101,7), VS(101,7), 2 ZA(101,7) , ZP(101,7), ZQ(101,7), ZR(101,7), ZM(101,7), 3 ZB(101,7) , ZU(101,7), ZV(101,7), ZZ(101,7), 4 B(64,64), C(64,64), H(64,64), 5 U1(5,101,2), U2(5,101,2), U3(5,101,2) C C ****************************************************************** C C// COMMON /POINT/ ME,MF,MU,MV,MW,MX,MY,MZ,MG,MDU1,MDU2,MDU3,MGRD, C// 1 MDEX,MIX,MXI,MEX,MEX1,MDEX1,MVX,MXX,MIR,MRX,MRH,MVSP,MVSTP, C// 2 MVXNE,MVXND,MVE3,MVLR,MVLIN,MB5,MPLAN,MZONE,MD,MSA,MSB, C// 3 MP,MPX,MCX,MVY,MVH,MVF,MVG,MVS,MZA,MZP,MZQ,MZR,MZM,MZB,MZU, C// 4 MZV,MZZ,MB,MC,MH,MU1,MU2,MU3 C//C C//CLLL. LOC(X) =.LOC.X C//C C// ME = LOC( E ) C// MF = LOC( F ) C// MU = LOC( U ) C// MV = LOC( V ) C// MW = LOC( W ) C// MX = LOC( X ) C// MY = LOC( Y ) C// MZ = LOC( Z ) C// MG = LOC( G ) C// MDU1 = LOC( DU1 ) C// MDU2 = LOC( DU2 ) C// MDU3 = LOC( DU3 ) C// MGRD = LOC( GRD ) C// MDEX = LOC( DEX ) C// MIX = LOC( IX ) C// MXI = LOC( XI ) C// MEX = LOC( EX ) C// MEX1 = LOC( EX1 ) C// MDEX1 = LOC( DEX1 ) C// MVX = LOC( VX ) C// MXX = LOC( XX ) C// MIR = LOC( IR ) C// MRX = LOC( RX ) C// MRH = LOC( RH ) C// MVSP = LOC( VSP ) C// MVSTP = LOC( VSTP ) C// MVXNE = LOC( VXNE ) C// MVXND = LOC( VXND ) C// MVE3 = LOC( VE3 ) C// MVLR = LOC( VLR ) C// MVLIN = LOC( VLIN ) C// MB5 = LOC( B5 ) C// MPLAN = LOC( PLAN ) C// MZONE = LOC( ZONE ) C// MD = LOC( D ) C// MSA = LOC( SA ) C// MSB = LOC( SB ) C// MP = LOC( P ) C// MPX = LOC( PX ) C// MCX = LOC( CX ) C// MVY = LOC( VY ) C// MVH = LOC( VH ) C// MVF = LOC( VF ) C// MVG = LOC( VG ) C// MVS = LOC( VS ) C// MZA = LOC( ZA ) C// MZP = LOC( ZP ) C// MZQ = LOC( ZQ ) C// MZR = LOC( ZR ) C// MZM = LOC( ZM ) C// MZB = LOC( ZB ) C// MZU = LOC( ZU ) C// MZV = LOC( ZV ) C// MZZ = LOC( ZZ ) C// MB = LOC( B ) C// MC = LOC( C ) C// MH = LOC( H ) C// MU1 = LOC( U1 ) C// MU2 = LOC( U2 ) C// MU3 = LOC( U3 ) C RETURN END C*********************************************** SUBROUTINE STATS( RESULT, X,n) C*********************************************** C C UNWEIGHTED STATISTICS: MEAN, STADEV, MIN, MAX, HARMONIC MEAN. C C RESULT(1)= THE MEAN OF X. C RESULT(2)= THE STANDARD DEVIATION OF THE MEAN OF X. C RESULT(3)= THE MINIMUM OF X. C RESULT(4)= THE MAXIMUM OF X. C RESULT(5)= THE HARMONIC MEAN C X IS THE ARRAY OF INPUT VALUES. C n IS THE NUMBER OF INPUT VALUES IN X. C C*********************************************** IMPLICIT REAL*8(A-H,O-Z) C DIMENSION X(n), RESULT(9) cLLL. OPTIMIZE LEVEL G C C DO 10 k= 1,9 10 RESULT(k)= 0.0 C IF(n.LE.0) RETURN C CALCULATE MEAN OF X. S= 0.0 DO 1 k= 1,n 1 S= S + X(k) A= S/n RESULT(1)= A C CALCULATE STANDARD DEVIATION OF X. D= 0.0 DO 2 k= 1,n 2 D= D + (X(k)-A)**2 D= D/n RESULT(2)= SQRT(D) C CALCULATE MINIMUM OF X. U= X(1) DO 3 k= 2,n 3 U= DMIN1(U,X(k)) RESULT(3)= U C CALCULATE MAXIMUM OF X. V= X(1) DO 4 k= 2,n 4 V= DMAX1(V,X(k)) RESULT(4)= V C CALCULATE HARMONIC MEAN OF X. H= 0.0 DO 5 k= 1,n IF( X(k).NE.0.0) H= H + 1.0/X(k) 5 CONTINUE IF( H.NE.0.0) H= FLOAT(n)/H RESULT(5)= H C RETURN END C*********************************************** SUBROUTINE STATW( RESULT,OX,IX, X,W,n) C*********************************************** C C WEIGHTED STATISTICS: MEAN, STADEV, MIN, MAX, HARMONIC MEAN, MEDIAN. C C RESULT(1)= THE MEAN OF X. C RESULT(2)= THE STANDARD DEVIATION OF THE MEAN OF X. C RESULT(3)= THE MINIMUM OF X. C RESULT(4)= THE MAXIMUM OF X. C RESULT(5)= THE HARMONIC MEAN C RESULT(6)= THE TOTAL WEIGHT. C RESULT(7)= THE MEDIAN. C RESULT(8)= THE MEDIAN INDEX, ASENDING. C RESULT(9)= THE ROBUST MEDIAN ABSOLUTE DEVIATION. C OX IS THE ARRAY OF RESULT ORDERED (DECENDING) Xs. C IX IS THE ARRAY OF RESULT INDEX LIST MAPS X TO OX. C C X IS THE ARRAY OF INPUT VALUES. C W IS THE ARRAY OF INPUT WEIGHTS. C n IS THE NUMBER OF INPUT VALUES IN X. C C*********************************************** IMPLICIT REAL*8(A-H,O-Z) C DIMENSION RESULT(9), OX(n), IX(n), X(n), W(n) cLLL. OPTIMIZE LEVEL G C C DO 10 k= 1,9 10 RESULT(k)= 0.0 C IF(n.LE.0) RETURN C CALCULATE MEAN OF X. A= 0.0 S= 0.0 T= 0.0 DO 1 k= 1,n S= S + W(k)*X(k) 1 T= T + W(k) IF( T.NE.0.0) A= S/T RESULT(1)= A C CALCULATE STANDARD DEVIATION OF X. D= 0.0 U= 0.0 DO 2 k= 1,n 2 D= D + W(k)*(X(k)-A)**2 IF( T.NE.0.0) D= D/T IF( D.GE.0.0) U= SQRT(D) RESULT(2)= U C CALCULATE MINIMUM OF X. U= X(1) DO 3 k= 2,n 3 U= DMIN1(U,X(k)) RESULT(3)= U C CALCULATE MAXIMUM OF X. V= X(1) DO 4 k= 2,n 4 V= DMAX1(V,X(k)) RESULT(4)= V C CALCULATE HARMONIC MEAN OF X. H= 0.0 DO 5 k= 1,n IF( X(k).NE.0.0) H= H + W(k)/X(k) 5 CONTINUE IF( H.NE.0.0) H= T/H RESULT(5)= H RESULT(6)= T C CALCULATE WEIGHTED MEDIAN CALL SORDID( IX, OX, X, n, 1) C R= 0.0 DO 70 k= 1,n R= R + W( IX(k)) IF( R.GT. 0.5*T ) GO TO 7 70 CONTINUE k= n 7 RESULT(7)= OX(k) RESULT(8)= FLOAT(k) C CALCULATE ROBUST MEDIAN ABSOLUTE DEVIATION (MAD) DO 90 k= 1,n 90 OX(k)= ABS( X(k) - RESULT(7)) C CALL SORDID( IX, OX, OX, n, 1) C R= 0.0 DO 91 k= 1,n R= R + W( IX(k)) IF( R.GT. 0.7*T ) GO TO 9 91 CONTINUE k= n 9 RESULT(9)= OX(k) C CALCULATE DESCENDING ORDERED X. CALL SORDID( IX, OX, X, n, 2) C RETURN END C*********************************************** FUNCTION SUMO( V,n) C*********************************************** C C CHECK-SUM WITH ORDINAL DEPENDENCY. C C*********************************************** IMPLICIT REAL*8(A-H,O-Z) DIMENSION V(1) S= 0.0 C DO 1 k= 1,n 1 S= S + FLOAT(k)*V(k) SUMO= S RETURN END C*********************************************** SUBROUTINE TEST(i) C*********************************************** C C TIME, TEST, AND INITIALIZE THE EXECUTION OF KERNEL i. C C****************************************************************************** C IMPLICIT REAL*8(A-H,O-Z) C C SIZES test and set the loop controls before each kernel test C COMMON /SPACES/ ION,j5,k2,k3,Loop,m,KR,N13H, 1 n,N1,N2,N13,N213,N813,N16,N416,N21,NT1,NT2 C COMMON /SPACER/ A11,A12,A13,A21,A22,A23,A31,A32,A33, 2 AR,BR,C0,CR,DI,DK, 3 DM22,DM23,DM24,DM25,DM26,DM27,DM28,DN,E3,E6,EXPMAX,FLX, 4 Q,QA,R,RI,S,SCALE,SIG,STB5,T,XNC,XNEI,XNM C REAL LOOPS, NROPS, NN, NP INTEGER E,F,ZONE C COMMON /SPACE0/ TIME(47), CSUM(47), WW(47), WT(47), TICKS, 1 SKALE(47), BIAS(47), WS(95), LOOPS(47), NROPS(47) C COMMON /SPACEI/ ISPAN(47), ISP2(47), ISP3(47), 1 IPAS1(47), IPAS2(47), IPAS3(47) C COMMON /ISPACE/ E(96), F(96), 1 IX(1001), IR(1001), ZONE(300) C COMMON /SPACE1/ U(1001), V(1001), W(1001), 1 X(1001), Y(1001), Z(1001), G(1001), 2 DU1(101), DU2(101), DU3(101), GRD(1001), DEX(1001), 3 XI(1001), EX(1001), EX1(1001), DEX1(1001), 4 VX(1001), XX(1001), RX(1001), RH(1001), 5 VSP(101), VSTP(101), VXNE(101), VXND(101), 6 VE3(101), VLR(101), VLIN(101), B5(101), 7 PLAN(300), D(300), SA(101), SB(101) C C COMMON /SPACE2/ P(4,512), PX(25,101), CX(25,101), 1 VY(101,25), VH(101,7), VF(101,7), VG(101,7), VS(101,7), 2 ZA(101,7) , ZP(101,7), ZQ(101,7), ZR(101,7), ZM(101,7), 3 ZB(101,7) , ZU(101,7), ZV(101,7), ZZ(101,7), 4 B(64,64), C(64,64), H(64,64), 5 U1(5,101,2), U2(5,101,2), U3(5,101,2) C COMMON /SPACE3/ CACHE(8192) C REAL SECOND C DATA START /0.0/, NPF/0/ C$ DATA IPF/0/, KPF/0/ C C****************************************************************************** C t= second(0) := cumulative cpu time for task in seconds. C TEMPUS= SECOND(0.0) - START C$C 5 get number of page faults (optional) C$ KSTAT= LIB$STAT_TIMER(5,KPF) C$ NPF = KPF - IPF NN= n NP= Loop IF( i.EQ.0 ) GO TO 100 CALL SIZES(i-1) TIME(i)= TEMPUS C GO TO( 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, . 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, . 21, 22, 23, 24, 25 ), i C C C C****************************************************************************** C 1 CSUM (1) = SUMO ( X, n) LOOPS(1) = NP*NN GO TO 100 C C****************************************************************************** C 2 CSUM (2) = SUMO ( X, n) LOOPS(2) = NP*(NN-4) GO TO 100 C C****************************************************************************** C 3 CSUM (3) = Q LOOPS(3) = NP*NN GO TO 100 C C****************************************************************************** C 4 MM= (1001-7)/2 DO 400 k = 7,1001,MM 400 V(k)= X(k) CSUM (4) = SUMO ( V, 3) LOOPS(4) = NP*(((NN-5)/5)+1)*3 GO TO 100 C C****************************************************************************** C 5 CSUM (5) = SUMO ( X(2), n-1) LOOPS(5) = NP*(NN-1) GO TO 100 C C****************************************************************************** C 6 CSUM (6) = SUMO ( W, n) LOOPS(6) = NP*NN*((NN-1)/2) GO TO 100 C C****************************************************************************** C 7 CSUM (7) = SUMO ( X, n) LOOPS(7) = NP*NN GO TO 100 C C****************************************************************************** C 8 CSUM (8) = SUMO ( U1,5*n*2) + SUMO ( U2,5*n*2) + SUMO ( U3,5*n*2) LOOPS(8) = NP*(NN-1)*2 GO TO 100 C C****************************************************************************** C 9 CSUM (9) = SUMO ( PX, 15*n) LOOPS(9) = NP*NN GO TO 100 C C****************************************************************************** C 10 CSUM (10) = SUMO ( PX, 15*n) LOOPS(10) = NP*NN GO TO 100 C C****************************************************************************** C 11 CSUM (11) = SUMO ( X(2), n-1) LOOPS(11) = NP*(NN-1) GO TO 100 C C****************************************************************************** C 12 CSUM (12) = SUMO ( X, n-1) LOOPS(12) = NP*(NN-1) GO TO 100 C C****************************************************************************** C 13 CSUM (13) = SUMO ( P, 8*n) + SUMO ( H, 8*n) LOOPS(13) = NP*NN GO TO 100 C C****************************************************************************** C 14 CSUM (14) = SUMO ( VX,n) + SUMO ( XX,n) + SUMO ( RH,67) LOOPS(14) = NP*NN GO TO 100 C C****************************************************************************** C 15 CSUM (15) = SUMO ( VY, n*7) + SUMO ( VS, n*7) LOOPS(15) = NP*(NN-1)*5 GO TO 100 C C****************************************************************************** C 16 CSUM (16) = FLOAT( k3+k2+j5+m) NROPS(16) = k2+k2+10*k3 LOOPS(16) = 1 GO TO 100 C C****************************************************************************** C 17 CSUM (17) = SUMO ( VXNE, n) + SUMO ( VXND, n) + XNM LOOPS(17) = NP*NN GO TO 100 C C****************************************************************************** C 18 CSUM (18) = SUMO ( ZR, n*7) + SUMO ( ZZ, n*7) LOOPS(18) = NP*(NN-1)*5 GO TO 100 C C****************************************************************************** C 19 CSUM (19) = SUMO ( B5, n) + STB5 LOOPS(19) = NP*NN GO TO 100 C C****************************************************************************** C 20 CSUM (20) = SUMO ( XX(2), n) LOOPS(20) = NP*NN GO TO 100 C C****************************************************************************** C 21 CSUM (21) = SUMO ( PX, n*n) LOOPS(21) = NP*NN*NN*NN GO TO 100 C C****************************************************************************** C 22 CSUM (22) = SUMO ( W, n) LOOPS(22) = NP*NN GO TO 100 C C****************************************************************************** C 23 CSUM (23) = SUMO ( ZA, n*7) LOOPS(23) = NP*(NN-1)*5 GO TO 100 C C****************************************************************************** C 24 CSUM (24) = FLOAT(m) LOOPS(24) = NP*(NN-1) GO TO 100 C C****************************************************************************** C 25 CONTINUE GO TO 100 C C****************************************************************************** C 100 CONTINUE C CALL SIZES (i) C C The following print can be used for stop-watch timing of each kernel. C You may have to increase the iteration count Loop in Subr. SIZES. C You must increase Loop if this test of clock resolution fails: C C j5 = 23456 IF( j5.EQ.23456) GO TO 120 IF( i.EQ.0 ) GO TO 114 TERR= 100.0 IF( TEMPUS.NE. 0.0) TERR= TERR*(TICKS/TEMPUS) IF( TERR .LT. 15.0) GO TO 114 WRITE( ION,113) I 113 FORMAT(/,1X,I2,45H INACCURATE TIMING OR ERROR. NEED LONGER RUN ) 114 continue if (0 .eq. 1) then WRITE ( ION,115) i, TEMPUS, TERR, NPF 115 FORMAT( 2X,i2,9H Done T= ,E11.4,8H T err= ,F8.2,1H% , 1 I8,13H Page-Faults ) else if ( i .eq. 12) then end if C 120 CALL VALUES(i) CALL SIZES (i) CALL SIGNAL( CACHE, 1.0, 0.0, 8192) IF( j5.EQ.23456) GO TO 150 C/ pause 150 CONTINUE C$C 5 get number of page faults (optional) C$ NSTAT= LIB$STAT_TIMER(5,IPF) START= SECOND(0.0) RETURN END C C*********************************************** FUNCTION TICK( LOGIO, ITER, TIC) C*********************************************** C C MEASURE TIME OVERHEAD OF SUBROUTINe TEST. C C*********************************************** C IMPLICIT REAL*8(A-H,O-Z) C C C/ PARAMETER( l1= 1001, l2= 101 ) C/ PARAMETER( l13= 64, l13h= 64/2, l213= 64+32, l813= 8*64 ) C/ PARAMETER( l14= 512, l16= 75, l416= 4*75 , l21= 25) C C COMMON /SPACES/ ION,j5,k2,k3,Loop,m,KR,N13H, 1 n,N1,N2,N13,N213,N813,N16,N416,N21,NT1,NT2 C COMMON /SPACER/ A11,A12,A13,A21,A22,A23,A31,A32,A33, 2 AR,BR,C0,CR,DI,DK, 3 DM22,DM23,DM24,DM25,DM26,DM27,DM28,DN,E3,E6,EXPMAX,FLX, 4 Q,QA,R,RI,S,SCALE,SIG,STB5,T,XNC,XNEI,XNM C REAL LOOPS, NROPS INTEGER E,F,ZONE C COMMON /SPACE0/ TIME(47), CSUM(47), WW(47), WT(47), TICKS, 1 SKALE(47), BIAS(47), WS(95), LOOPS(47), NROPS(47) C COMMON /SPACEI/ ISPAN(47), ISP2(47), ISP3(47), 1 IPAS1(47), IPAS2(47), IPAS3(47) C COMMON /ISPACE/ E(96), F(96), 1 IX(1001), IR(1001), ZONE(300) C COMMON /SPACE1/ U(1001), V(1001), W(1001), 1 X(1001), Y(1001), Z(1001), G(1001), 2 DU1(101), DU2(101), DU3(101), GRD(1001), DEX(1001), 3 XI(1001), EX(1001), EX1(1001), DEX1(1001), 4 VX(1001), XX(1001), RX(1001), RH(1001), 5 VSP(101), VSTP(101), VXNE(101), VXND(101), 6 VE3(101), VLR(101), VLIN(101), B5(101), 7 PLAN(300), D(300), SA(101), SB(101) C COMMON /SPACE2/ P(4,512), PX(25,101), CX(25,101), 1 VY(101,25), VH(101,7), VF(101,7), VG(101,7), VS(101,7), 2 ZA(101,7) , ZP(101,7), ZQ(101,7), ZR(101,7), ZM(101,7), 3 ZB(101,7) , ZU(101,7), ZV(101,7), ZZ(101,7), 4 B(64,64), C(64,64), H(64,64), 5 U1(5,101,2), U2(5,101,2), U3(5,101,2) C C C DIMENSION TSEC(16), STAT(12), MAP(47) C ION= LOGIO KR = ITER n = 0 Loop = 0 k2 = 0 k3 = 0 j5 = 23456 m = 0 C C CALL TEST(0) CALL TEST(1) CALL TEST(2) CALL TEST(3) CALL TEST(4) CALL TEST(5) CALL TEST(6) CALL TEST(7) CALL TEST(8) CALL TEST(9) CALL TEST(10) CALL TEST(11) CALL TEST(12) CALL TEST(13) CALL TEST(14) CALL TEST(15) CALL TEST(16) j5 = 0 C DO 10 k= 1,15 10 TSEC(k)= TIME(k+1) C CALL VALID( TIME,MAP,neff, 1.0e-6, TSEC, 1.0e+4, 15) CALL STATS( STAT, TIME, neff) TICK= STAT(1) TICKS= DMAX1( TIC, STAT(1)) C DO 20 k= 1,47 TIME(k)= 0.0 CSUM(k)= 0.0 20 CONTINUE C IF( LOGIO.LT.0) GO TO 73 if(0.eq.0) goto 73 WRITE( LOGIO, 99) WRITE( LOGIO,100) WRITE( LOGIO,102) ( STAT(k), k= 1,4) CALL STATS( STAT,U,NT1) WRITE( LOGIO,104) ( STAT(k), k= 1,4) CALL STATS( STAT,P,NT2) WRITE( LOGIO,104) ( STAT(k), k= 1,4) C 73 RETURN C 99 FORMAT(//,17H CLOCK OVERHEAD: ) 100 FORMAT(/,14X,7HAVERAGE,8X,7HSTANDEV,8X,7HMINIMUM,8X,7HMAXIMUM ) 102 FORMAT(/,1X,5H TICK,4E15.6) 104 FORMAT(/,1X,5H DATA,4E15.6) END C C*********************************************** SUBROUTINE VALID( VX,MAP,L, BL,X,BU,n ) C*********************************************** C C Compress valid data sets; form compression list. C C C VX - ARRAY OF RESULT COMPRESSED Xs. C MAP - ARRAY OF RESULT COMPRESSION INDICES C L - RESULT COMPRESSED LENGTH OF VX, MAP C - C BL - INPUT LOWER BOUND FOR VX C X - ARRAY OF INPUT VALUES. C BU - INPUT UPPER BOUND FOR VX C n - NUMBER OF INPUT VALUES IN X. C C*********************************************** IMPLICIT REAL*8(A-H,O-Z) C DIMENSION VX(n), MAP(n), X(n) CLLL. OPTIMIZE LEVEL G C m= 0 DO 1 k= 1,n IF( X(k).LE. BL .OR. X(k).GE. BU ) GO TO 1 m= m + 1 MAP(m)= k VX(m)= X(k) 1 CONTINUE C L= m RETURN END C C*********************************************** SUBROUTINE VALUES(i) C*********************************************** C IMPLICIT REAL*8(A-H,O-Z) C C/ PARAMETER( l1= 1001, l2= 101 ) C/ PARAMETER( l13= 64, l13h= 64/2, l213= 64+32, l813= 8*64 ) C/ PARAMETER( l14= 512, l16= 75, l416= 4*75 , l21= 25) C C COMMON /SPACES/ ION,j5,k2,k3,Loop,m,KR,N13H, 1 n,N1,N2,N13,N213,N813,N16,N416,N21,NT1,NT2 C COMMON /SPACER/ A11,A12,A13,A21,A22,A23,A31,A32,A33, 2 AR,BR,C0,CR,DI,DK, 3 DM22,DM23,DM24,DM25,DM26,DM27,DM28,DN,E3,E6,EXPMAX,FLX, 4 Q,QA,R,RI,S,SCALE,SIG,STB5,T,XNC,XNEI,XNM C REAL LOOPS, NROPS COMMON /SPACE0/ TIME(47), CSUM(47), WW(47), WT(47), TICKS, 1 SKALE(47), BIAS(47), WS(95), LOOPS(47), NROPS(47) C COMMON /SPACEI/ ISPAN(47), ISP2(47), ISP3(47), 1 IPAS1(47), IPAS2(47), IPAS3(47) C INTEGER E,F,ZONE C COMMON /ISPACE/ E(96), F(96), 1 IX(1001), IR(1001), ZONE(300) C COMMON /SPACE1/ U(1001), V(1001), W(1001), 1 X(1001), Y(1001), Z(1001), G(1001), 2 DU1(101), DU2(101), DU3(101), GRD(1001), DEX(1001), 3 XI(1001), EX(1001), EX1(1001), DEX1(1001), 4 VX(1001), XX(1001), RX(1001), RH(1001), 5 VSP(101), VSTP(101), VXNE(101), VXND(101), 6 VE3(101), VLR(101), VLIN(101), B5(101), 7 PLAN(300), D(300), SA(101), SB(101) C COMMON /SPACE2/ P(4,512), PX(25,101), CX(25,101), 1 VY(101,25), VH(101,7), VF(101,7), VG(101,7), VS(101,7), 2 ZA(101,7) , ZP(101,7), ZQ(101,7), ZR(101,7), ZM(101,7), 3 ZB(101,7) , ZU(101,7), ZV(101,7), ZZ(101,7), 4 B(64,64), C(64,64), H(64,64), 5 U1(5,101,2), U2(5,101,2), U3(5,101,2) C C C ****************************************************************** C IP1= i+1 C CALL VECTOR( i) C 13 IF( IP1.NE.13 ) GO TO 14 S= 1.0 DO 205 j= 1,4 DO 205 k= 1,512 P(j,k) = S S= S + 0.5 205 CONTINUE C DO 210 j= 1,96 E(j) = 1 F(j) = 1 210 CONTINUE C 14 IF( IP1.NE.14) GO TO 16 ONE9TH= 1./9. DO 215 j= 1,1001 JOKE= j*ONE9TH IF( JOKE.LT.1 ) JOKE= 1 EX(j) = JOKE RH(JOKE) = JOKE DEX(JOKE) = JOKE 215 CONTINUE C DO 220 j= 1,1001 VX(j) = 0.001 XX(j) = 2.001 GRD(j) = 3 + ONE9TH 220 CONTINUE C 16 IF( IP1.NE.16 ) GO TO 73 CONDITIONS: MC= 2 MR= n II= MR/3 LB= II+II D(1)= 1.01980486428764 DO 400 k= 2,300 400 D(k)= D(k-1) +1.0E-4/D(k-1) R= D(MR) DO 403 L= 1,MC m= (MR+MR)*(L-1) DO 401 j= 1,2 DO 401 k= 1,MR m= m+1 S= FLOAT(k) PLAN(m)= R*((S+1.0)/S) 401 ZONE(m)= k+k 403 CONTINUE k= MR+MR+1 ZONE(k)= MR S= D(MR-1) T= D(MR-2) C 73 CONTINUE RETURN END C C*********************************************** SUBROUTINE VECTOR(i) C*********************************************** C IMPLICIT REAL*8(A-H,O-Z) C C/ PARAMETER( l1= 1001, l2= 101 ) C/ PARAMETER( l13= 64, l13h= 64/2, l213= 64+32, l813= 8*64 ) C/ PARAMETER( l14= 512, l16= 75, l416= 4*75 , l21= 25) C/C C/ parameter( NN0= 39 ) C/ parameter( NNI= 2*l1 +2*l213 +l416 ) C/ parameter( NN1= 17*l1 +13*l2 +2*l416 ) C/ parameter( NN2= 4*512 + 3*25*101 +121*101 +3*64*64 ) C COMMON /SPACES/ ION,j5,k2,k3,Loop,m,KR,N13H, 1 n,N1,N2,N13,N213,N813,N16,N416,N21,NT1,NT2 C COMMON /SPACER/ A11(39) C/ COMMON /SPACER/ A11(NN0) C REAL LOOPS, NROPS COMMON /SPACE0/ TIME(47), CSUM(47), WW(47), WT(47), TICKS, 1 SKALE(47), BIAS(47), WS(95), LOOPS(47), NROPS(47) C COMMON /SPACEI/ ISPAN(47), ISP2(47), ISP3(47), 1 IPAS1(47), IPAS2(47), IPAS3(47) C C/ COMMON /SPACE1/ U(NN1) C/ COMMON /SPACE2/ P(NN2) C/C COMMON /SPACE1/ U(18930) COMMON /SPACE2/ P(34132) C IP1= i+1 NT0= 39 C CALL SIGNAL( U, SKALE(IP1), BIAS(IP1), NT1) CALL SIGNAL( P, SKALE(IP1), BIAS(IP1), NT2) CALL SIGNAL(A11, SKALE(IP1), BIAS(IP1), NT0) C RETURN C END End of 13 echo 14 1>&2 cat >14 <<'End of 14' Date: Mon, 17 Nov 86 11:43:41 est From: David Krowitz To: dongarra@anl-mcs Subject: Here is file linpacks.f for FX/1 library Status: RO program linpacks real aa(200,200),a(201,200),b(200),x(200) real time(8,6),cray,ops,total,norma,normx real resid,residn,eps,epslon integer ipvt(200) lda = 201 ldaa = 200 c n = 100 cray = .056 write(6,1) 1 format(' Please send the results of this run to:'// $ ' Jack J. Dongarra'/ $ ' Mathematics and Computer Science Division'/ $ ' Argonne National Laboratory'/ $ ' Argonne, Illinois 60439'// $ ' Telephone: 312-972-7246'// $ ' ARPAnet: DONGARRA@ANL-MCS'/) ops = (2.0e0*n**3)/3.0e0 + 2.0e0*n**2 c call matgen(a,lda,n,b,norma) t1 = second() call sgefa(a,lda,n,ipvt,info) time(1,1) = second() - t1 t1 = second() call sgesl(a,lda,n,ipvt,b,0) time(1,2) = second() - t1 total = time(1,1) + time(1,2) C C COMPUTE A RESIDUAL TO VERIFY RESULTS. C do 10 i = 1,n x(i) = b(i) 10 continue call matgen(a,lda,n,b,norma) do 20 i = 1,n b(i) = -b(i) 20 continue CALL SMXPY(n,b,n,lda,x,a) RESID = 0.0 NORMX = 0.0 DO 30 I = 1,N RESID = amax1( RESID, ABS(b(i)) ) NORMX = amax1( NORMX, ABS(X(I)) ) 30 CONTINUE eps = epslon(1.0) RESIDn = RESID/( N*NORMA*NORMX*EPS ) write(6,40) 40 format(' norm. resid resid machep', $ ' x(1) x(n)') write(6,50) residn,resid,eps,x(1),x(n) 50 format(1p5e16.8) c write(6,60) n 60 format(//' times are reported for matrices of order ',i5) write(6,70) 70 format(6x,'sgefa',6x,'sgesl',6x,'total',5x,'mflops',7x,'unit', $ 6x,'ratio') c time(1,3) = total time(1,4) = ops/(1.0e6*total) time(1,5) = 2.0e0/time(1,4) time(1,6) = total/cray write(6,80) lda 80 format(' times for array with leading dimension of',i4) write(6,110) (time(1,i),i=1,6) c call matgen(a,lda,n,b,norma) t1 = second() call sgefa(a,lda,n,ipvt,info) time(2,1) = second() - t1 t1 = second() call sgesl(a,lda,n,ipvt,b,0) time(2,2) = second() - t1 total = time(2,1) + time(2,2) time(2,3) = total time(2,4) = ops/(1.0e6*total) time(2,5) = 2.0e0/time(2,4) time(2,6) = total/cray c call matgen(a,lda,n,b,norma) t1 = second() call sgefa(a,lda,n,ipvt,info) time(3,1) = second() - t1 t1 = second() call sgesl(a,lda,n,ipvt,b,0) time(3,2) = second() - t1 total = time(3,1) + time(3,2) time(3,3) = total time(3,4) = ops/(1.0e6*total) time(3,5) = 2.0e0/time(3,4) time(3,6) = total/cray c ntimes = 10 tm2 = 0 t1 = second() do 90 i = 1,ntimes tm = second() call matgen(a,lda,n,b,norma) tm2 = tm2 + second() - tm call sgefa(a,lda,n,ipvt,info) 90 continue time(4,1) = (second() - t1 - tm2)/ntimes t1 = second() do 100 i = 1,ntimes call sgesl(a,lda,n,ipvt,b,0) 100 continue time(4,2) = (second() - t1)/ntimes total = time(4,1) + time(4,2) time(4,3) = total time(4,4) = ops/(1.0e6*total) time(4,5) = 2.0e0/time(4,4) time(4,6) = total/cray c write(6,110) (time(2,i),i=1,6) write(6,110) (time(3,i),i=1,6) write(6,110) (time(4,i),i=1,6) 110 format(6(1pe11.3)) c call matgen(aa,ldaa,n,b,norma) t1 = second() call sgefa(aa,ldaa,n,ipvt,info) time(5,1) = second() - t1 t1 = second() call sgesl(aa,ldaa,n,ipvt,b,0) time(5,2) = second() - t1 total = time(5,1) + time(5,2) time(5,3) = total time(5,4) = ops/(1.0e6*total) time(5,5) = 2.0e0/time(5,4) time(5,6) = total/cray c call matgen(aa,ldaa,n,b,norma) t1 = second() call sgefa(aa,ldaa,n,ipvt,info) time(6,1) = second() - t1 t1 = second() call sgesl(aa,ldaa,n,ipvt,b,0) time(6,2) = second() - t1 total = time(6,1) + time(6,2) time(6,3) = total time(6,4) = ops/(1.0e6*total) time(6,5) = 2.0e0/time(6,4) time(6,6) = total/cray c call matgen(aa,ldaa,n,b,norma) t1 = second() call sgefa(aa,ldaa,n,ipvt,info) time(7,1) = second() - t1 t1 = second() call sgesl(aa,ldaa,n,ipvt,b,0) time(7,2) = second() - t1 total = time(7,1) + time(7,2) time(7,3) = total time(7,4) = ops/(1.0e6*total) time(7,5) = 2.0e0/time(7,4) time(7,6) = total/cray c ntimes = 10 tm2 = 0 t1 = second() do 120 i = 1,ntimes tm = second() call matgen(aa,ldaa,n,b,norma) tm2 = tm2 + second() - tm call sgefa(aa,ldaa,n,ipvt,info) 120 continue time(8,1) = (second() - t1 - tm2)/ntimes t1 = second() do 130 i = 1,ntimes call sgesl(aa,ldaa,n,ipvt,b,0) 130 continue time(8,2) = (second() - t1)/ntimes total = time(8,1) + time(8,2) time(8,3) = total time(8,4) = ops/(1.0e6*total) time(8,5) = 2.0e0/time(8,4) time(8,6) = total/cray c write(6,140) ldaa 140 format(/' times for array with leading dimension of',i4) write(6,110) (time(5,i),i=1,6) write(6,110) (time(6,i),i=1,6) write(6,110) (time(7,i),i=1,6) write(6,110) (time(8,i),i=1,6) stop end subroutine matgen(a,lda,n,b,norma) real a(lda,1),b(1),norma c init = 1325 norma = 0.0 do 30 j = 1,n do 20 i = 1,n init = mod(3125*init,65536) a(i,j) = (init - 32768.0)/16384.0 norma = amax1(a(i,j), norma) 20 continue 30 continue do 35 i = 1,n b(i) = 0.0 35 continue do 50 j = 1,n do 40 i = 1,n b(i) = b(i) + a(i,j) 40 continue 50 continue return end subroutine sgefa(a,lda,n,ipvt,info) integer lda,n,ipvt(1),info real a(lda,1) c c sgefa factors a real matrix by gaussian elimination. c c sgefa is usually called by dgeco, but it can be called c directly with a saving in time if rcond is not needed. c (time for dgeco) = (1 + 9/n)*(time for sgefa) . c c on entry c c a real(lda, n) c the matrix to be factored. c c lda integer c the leading dimension of the array a . c c n integer c the order of the matrix a . c c on return c c a an upper triangular matrix and the multipliers c which were used to obtain it. c the factorization can be written a = l*u where c l is a product of permutation and unit lower c triangular matrices and u is upper triangular. c c ipvt integer(n) c an integer vector of pivot indices. c c info integer c = 0 normal value. c = k if u(k,k) .eq. 0.0 . this is not an error c condition for this subroutine, but it does c indicate that sgesl or dgedi will divide by zero c if called. use rcond in dgeco for a reliable c indication of singularity. c c linpack. this version dated 08/14/78 . c cleve moler, university of new mexico, argonne national lab. c c subroutines and functions c c blas saxpy,sscal,isamax c c internal variables c real t integer isamax,j,k,kp1,l,nm1 c c c gaussian elimination with partial pivoting c info = 0 nm1 = n - 1 if (nm1 .lt. 1) go to 70 do 60 k = 1, nm1 kp1 = k + 1 c c find l = pivot index c l = isamax(n-k+1,a(k,k),1) + k - 1 ipvt(k) = l c c zero pivot implies this column already triangularized c if (a(l,k) .eq. 0.0e0) go to 40 c c interchange if necessary c if (l .eq. k) go to 10 t = a(l,k) a(l,k) = a(k,k) a(k,k) = t 10 continue c c compute multipliers c t = -1.0e0/a(k,k) call sscal(n-k,t,a(k+1,k),1) c c row elimination with column indexing c do 30 j = kp1, n t = a(l,j) if (l .eq. k) go to 20 a(l,j) = a(k,j) a(k,j) = t 20 continue call saxpy(n-k,t,a(k+1,k),1,a(k+1,j),1) 30 continue go to 50 40 continue info = k 50 continue 60 continue 70 continue ipvt(n) = n if (a(n,n) .eq. 0.0e0) info = n return end subroutine sgesl(a,lda,n,ipvt,b,job) integer lda,n,ipvt(1),job real a(lda,1),b(1) c c sgesl solves the real system c a * x = b or trans(a) * x = b c using the factors computed by dgeco or sgefa. c c on entry c c a real(lda, n) c the output from dgeco or sgefa. c c lda integer c the leading dimension of the array a . c c n integer c the order of the matrix a . c c ipvt integer(n) c the pivot vector from dgeco or sgefa. c c b real(n) c the right hand side vector. c c job integer c = 0 to solve a*x = b , c = nonzero to solve trans(a)*x = b where c trans(a) is the transpose. c c on return c c b the solution vector x . c c error condition c c a division by zero will occur if the input factor contains a c zero on the diagonal. technically this indicates singularity c but it is often caused by improper arguments or improper c setting of lda . it will not occur if the subroutines are c called correctly and if dgeco has set rcond .gt. 0.0 c or sgefa has set info .eq. 0 . c c to compute inverse(a) * c where c is a matrix c with p columns c call dgeco(a,lda,n,ipvt,rcond,z) c if (rcond is too small) go to ... c do 10 j = 1, p c call sgesl(a,lda,n,ipvt,c(1,j),0) c 10 continue c c linpack. this version dated 08/14/78 . c cleve moler, university of new mexico, argonne national lab. c c subroutines and functions c c blas saxpy,sdot c c internal variables c real sdot,t integer k,kb,l,nm1 c nm1 = n - 1 if (job .ne. 0) go to 50 c c job = 0 , solve a * x = b c first solve l*y = b c if (nm1 .lt. 1) go to 30 do 20 k = 1, nm1 l = ipvt(k) t = b(l) if (l .eq. k) go to 10 b(l) = b(k) b(k) = t 10 continue call saxpy(n-k,t,a(k+1,k),1,b(k+1),1) 20 continue 30 continue c c now solve u*x = y c do 40 kb = 1, n k = n + 1 - kb b(k) = b(k)/a(k,k) t = -b(k) call saxpy(k-1,t,a(1,k),1,b(1),1) 40 continue go to 100 50 continue c c job = nonzero, solve trans(a) * x = b c first solve trans(u)*y = b c do 60 k = 1, n t = sdot(k-1,a(1,k),1,b(1),1) b(k) = (b(k) - t)/a(k,k) 60 continue c c now solve trans(l)*x = y c if (nm1 .lt. 1) go to 90 do 80 kb = 1, nm1 k = n - kb b(k) = b(k) + sdot(n-k,a(k+1,k),1,b(k+1),1) l = ipvt(k) if (l .eq. k) go to 70 t = b(l) b(l) = b(k) b(k) = t 70 continue 80 continue 90 continue 100 continue return end subroutine saxpy(n,da,dx,incx,dy,incy) c c constant times a vector plus a vector. c jack dongarra, linpack, 3/11/78. c real dx(1),dy(1),da integer i,incx,incy,ix,iy,m,mp1,n c if(n.le.0)return if (da .eq. 0.0e0) return if(incx.eq.1.and.incy.eq.1)go to 20 c c code for unequal increments or equal increments c not equal to 1 c ix = 1 iy = 1 if(incx.lt.0)ix = (-n+1)*incx + 1 if(incy.lt.0)iy = (-n+1)*incy + 1 do 10 i = 1,n dy(iy) = dy(iy) + da*dx(ix) ix = ix + incx iy = iy + incy 10 continue return c c code for both increments equal to 1 c 20 continue do 30 i = 1,n dy(i) = dy(i) + da*dx(i) 30 continue return end real function sdot(n,dx,incx,dy,incy) c c forms the dot product of two vectors. c jack dongarra, linpack, 3/11/78. c real dx(1),dy(1),dtemp integer i,incx,incy,ix,iy,m,mp1,n c sdot = 0.0e0 dtemp = 0.0e0 if(n.le.0)return if(incx.eq.1.and.incy.eq.1)go to 20 c c code for unequal increments or equal increments c not equal to 1 c ix = 1 iy = 1 if(incx.lt.0)ix = (-n+1)*incx + 1 if(incy.lt.0)iy = (-n+1)*incy + 1 do 10 i = 1,n dtemp = dtemp + dx(ix)*dy(iy) ix = ix + incx iy = iy + incy 10 continue sdot = dtemp return c c code for both increments equal to 1 c 20 continue do 30 i = 1,n dtemp = dtemp + dx(i)*dy(i) 30 continue sdot = dtemp return end subroutine sscal(n,da,dx,incx) c c scales a vector by a constant. c jack dongarra, linpack, 3/11/78. c real da,dx(1) integer i,incx,m,mp1,n,nincx c if(n.le.0)return if(incx.eq.1)go to 20 c c code for increment not equal to 1 c nincx = n*incx do 10 i = 1,nincx,incx dx(i) = da*dx(i) 10 continue return c c code for increment equal to 1 c 20 continue do 30 i = 1,n dx(i) = da*dx(i) 30 continue return end integer function isamax(n,dx,incx) c c finds the index of element having max. absolute value. c jack dongarra, linpack, 3/11/78. c real dx(1),dmax integer i,incx,ix,n c isamax = 0 if( n .lt. 1 ) return isamax = 1 if(n.eq.1)return if(incx.eq.1)go to 20 c c code for increment not equal to 1 c ix = 1 dmax = abs(dx(1)) ix = ix + incx do 10 i = 2,n if(abs(dx(ix)).le.dmax) go to 5 isamax = i dmax = abs(dx(ix)) 5 ix = ix + incx 10 continue return c c code for increment equal to 1 c 20 dmax = abs(dx(1)) do 30 i = 2,n if(abs(dx(i)).le.dmax) go to 30 isamax = i dmax = abs(dx(i)) 30 continue return end REAL FUNCTION EPSLON (X) REAL X C C ESTIMATE UNIT ROUNDOFF IN QUANTITIES OF SIZE X. C REAL A,B,C,EPS C C THIS PROGRAM SHOULD FUNCTION PROPERLY ON ALL SYSTEMS C SATISFYING THE FOLLOWING TWO ASSUMPTIONS, C 1. THE BASE USED IN REPRESENTING FLOATING POINT C NUMBERS IS NOT A POWER OF THREE. C 2. THE QUANTITY A IN STATEMENT 10 IS REPRESENTED TO C THE ACCURACY USED IN FLOATING POINT VARIABLES C THAT ARE STORED IN MEMORY. C THE STATEMENT NUMBER 10 AND THE GO TO 10 ARE INTENDED TO C FORCE OPTIMIZING COMPILERS TO GENERATE CODE SATISFYING C ASSUMPTION 2. C UNDER THESE ASSUMPTIONS, IT SHOULD BE TRUE THAT, C A IS NOT EXACTLY EQUAL TO FOUR-THIRDS, C B HAS A ZERO FOR ITS LAST BIT OR DIGIT, C C IS NOT EXACTLY EQUAL TO ONE, C EPS MEASURES THE SEPARATION OF 1.0 FROM C THE NEXT LARGER FLOATING POINT NUMBER. C THE DEVELOPERS OF EISPACK WOULD APPRECIATE BEING INFORMED C ABOUT ANY SYSTEMS WHERE THESE ASSUMPTIONS DO NOT HOLD. C C ***************************************************************** C THIS ROUTINE IS ONE OF THE AUXILIARY ROUTINES USED BY EISPACK III C TO AVOID MACHINE DEPENDENCIES. C ***************************************************************** C C THIS VERSION DATED 4/6/83. C A = 4.0E0/3.0E0 10 B = A - 1.0E0 C = B + B + B EPS = ABS(C-1.0E0) IF (EPS .EQ. 0.0E0) GO TO 10 EPSLON = EPS*ABS(X) RETURN END SUBROUTINE MM (A, LDA, N1, N3, B, LDB, N2, C, LDC) REAL A(LDA,*), B(LDB,*), C(LDC,*) C C PURPOSE: C MULTIPLY MATRIX B TIMES MATRIX C AND STORE THE RESULT IN MATRIX A. C C PARAMETERS: C C A REAL(LDA,N3), MATRIX OF N1 ROWS AND N3 COLUMNS C C LDA INTEGER, LEADING DIMENSION OF ARRAY A C C N1 INTEGER, NUMBER OF ROWS IN MATRICES A AND B C C N3 INTEGER, NUMBER OF COLUMNS IN MATRICES A AND C C C B REAL(LDB,N2), MATRIX OF N1 ROWS AND N2 COLUMNS C C LDB INTEGER, LEADING DIMENSION OF ARRAY B C C N2 INTEGER, NUMBER OF COLUMNS IN MATRIX B, AND NUMBER OF ROWS IN C MATRIX C C C C REAL(LDC,N3), MATRIX OF N2 ROWS AND N3 COLUMNS C C LDC INTEGER, LEADING DIMENSION OF ARRAY C C C ---------------------------------------------------------------------- C DO 20 J = 1, N3 DO 10 I = 1, N1 A(I,J) = 0.0 10 CONTINUE CALL SMXPY (N2,A(1,J),N1,LDB,C(1,J),B) 20 CONTINUE C RETURN END SUBROUTINE SMXPY (N1, Y, N2, LDM, X, M) REAL Y(*), X(*), M(LDM,*) C C PURPOSE: C MULTIPLY MATRIX M TIMES VECTOR X AND ADD THE RESULT TO VECTOR Y. C C PARAMETERS: C C N1 INTEGER, NUMBER OF ELEMENTS IN VECTOR Y, AND NUMBER OF ROWS IN C MATRIX M C C Y REAL(N1), VECTOR OF LENGTH N1 TO WHICH IS ADDED THE PRODUCT M*X C C N2 INTEGER, NUMBER OF ELEMENTS IN VECTOR X, AND NUMBER OF COLUMNS C IN MATRIX M C C LDM INTEGER, LEADING DIMENSION OF ARRAY M C C X REAL(N2), VECTOR OF LENGTH N2 C C M REAL(LDM,N2), MATRIX OF N1 ROWS AND N2 COLUMNS C C ---------------------------------------------------------------------- C C CLEANUP ODD VECTOR C J = MOD(N2,2) IF (J .GE. 1) THEN DO 10 I = 1, N1 Y(I) = (Y(I)) + X(J)*M(I,J) 10 CONTINUE ENDIF C C CLEANUP ODD GROUP OF TWO VECTORS C J = MOD(N2,4) IF (J .GE. 2) THEN DO 20 I = 1, N1 Y(I) = ( (Y(I)) $ + X(J-1)*M(I,J-1)) + X(J)*M(I,J) 20 CONTINUE ENDIF C C CLEANUP ODD GROUP OF FOUR VECTORS C J = MOD(N2,8) IF (J .GE. 4) THEN DO 30 I = 1, N1 Y(I) = ((( (Y(I)) $ + X(J-3)*M(I,J-3)) + X(J-2)*M(I,J-2)) $ + X(J-1)*M(I,J-1)) + X(J) *M(I,J) 30 CONTINUE ENDIF C C CLEANUP ODD GROUP OF EIGHT VECTORS C J = MOD(N2,16) IF (J .GE. 8) THEN DO 40 I = 1, N1 Y(I) = ((((((( (Y(I)) $ + X(J-7)*M(I,J-7)) + X(J-6)*M(I,J-6)) $ + X(J-5)*M(I,J-5)) + X(J-4)*M(I,J-4)) $ + X(J-3)*M(I,J-3)) + X(J-2)*M(I,J-2)) $ + X(J-1)*M(I,J-1)) + X(J) *M(I,J) 40 CONTINUE ENDIF C C MAIN LOOP - GROUPS OF SIXTEEN VECTORS C JMIN = J+16 DO 60 J = JMIN, N2, 16 DO 50 I = 1, N1 Y(I) = ((((((((((((((( (Y(I)) $ + X(J-15)*M(I,J-15)) + X(J-14)*M(I,J-14)) $ + X(J-13)*M(I,J-13)) + X(J-12)*M(I,J-12)) $ + X(J-11)*M(I,J-11)) + X(J-10)*M(I,J-10)) $ + X(J- 9)*M(I,J- 9)) + X(J- 8)*M(I,J- 8)) $ + X(J- 7)*M(I,J- 7)) + X(J- 6)*M(I,J- 6)) $ + X(J- 5)*M(I,J- 5)) + X(J- 4)*M(I,J- 4)) $ + X(J- 3)*M(I,J- 3)) + X(J- 2)*M(I,J- 2)) $ + X(J- 1)*M(I,J- 1)) + X(J) *M(I,J) 50 CONTINUE 60 CONTINUE RETURN END C C Timing routine for the Alliant FX/1 and FX/8 C Return cumulative CPU time used by process C REAL FUNCTION SECOND () REAL STIME,UTIME CALL ETIME(UTIME,STIME) SECOND=UTIME RETURN END End of 14 echo 15 1>&2 cat >15 <<'End of 15' Date: Mon, 17 Nov 86 11:50:38 est From: David Krowitz To: dongarra@anl-mcs Subject: Here is file whetstoned.bld for FX/1 library Status: RO ### ### Command file to compile and bind the double precision ### version of the Whetstone benchmark for the DSP9010 (Alliant FX/1). ### rm whetstoned fortran -o whetstoned -Og whetstoned.f rm whetstoned.o End of 15 echo 16 1>&2 cat >16 <<'End of 16' Date: Mon, 17 Nov 86 11:52:06 est From: David Krowitz To: dongarra@anl-mcs Subject: Here is file whetstones.bld for FX/1 library Status: RO ### ### Command file to compile and bind the single precision ### version of the Whetstone benchmark for the DSP9010 (Alliant FX/1). ### rm whetstones fortran -o whetstones -Og whetstones.f rm whetstones.o End of 16 echo 17 1>&2 cat >17 <<'End of 17' Date: Mon, 17 Nov 86 11:51:17 est From: David Krowitz To: dongarra@anl-mcs Subject: Here is file whetstoned.f for FX/1 library Status: RO C********************************************************************** C Benchmark #2 -- Double Precision Whetstone (A001) C C o This is a REAL*8 version of C the Whetstone benchmark program. C C o DO-loop semantics are ANSI-66 compatible. C C o Final measurements are to be made with all C WRITE statements and FORMAT sttements removed. C C********************************************************************** IMPLICIT REAL*8 (A-H,O-Z) C COMMON T,T1,T2,E1(4),J,K,L common/ptime/ptime,time0 real time0,time1,cputime,ptime C WRITE(6,1) 1 FORMAT(/' Benchmark #2 -- Double Precision Whetstone (A001)') C C Start benchmark timing at this point. C time0 = cputime() ptime = time0 C C The actual benchmark starts here. C T = .499975 T1 = 0.50025 T2 = 2.0 C C With loopcount LOOP=10, one million Whetstone instructions C will be executed in EACH MAJOR LOOP..A MAJOR LOOP IS EXECUTED C 'II' TIMES TO INCREASE WALL-CLOCK TIMING ACCURACY. C LOOP = 1000 II = 1 C DO 500 JJ=1,II C C Establish the relative loop counts of each module. C N1 = 0 N2 = 12 * LOOP N3 = 14 * LOOP N4 = 345 * LOOP N5 = 0 N6 = 210 * LOOP N7 = 32 * LOOP N8 = 899 * LOOP N9 = 616 * LOOP N10 = 0 N11 = 93 * LOOP C C Module 1: Simple identifiers C X1 = 1.0 X2 = -1.0 X3 = -1.0 X4 = -1.0 C IF (N1.EQ.0) GO TO 35 DO 30 I=1,N1 X1 = (X1 + X2 + X3 - X4)*T X2 = (X1 + X2 - X3 + X4)*T X3 = (X1 - X2 + X3 + X4)*T X4 = (-X1 + X2 + X3 + X4)*T 30 CONTINUE 35 CONTINUE C IF (JJ.EQ.II)CALL POUT(N1,N1,N1,X1,X2,X3,X4) C C Module 2: Array elements C E1(1) = 1.0 E1(2) = -1.0 E1(3) = -1.0 E1(4) = -1.0 C IF (N2.EQ.0) GO TO 45 DO 40 I=1,N2 E1(1) = (E1(1) + E1(2) + E1(3) - E1(4))*T E1(2) = (E1(1) + E1(2) - E1(3) + E1(4))*T E1(3) = (E1(1) - E1(2) + E1(3) + E1(4))*T E1(4) = (-E1(1) + E1(2) + E1(3) + E1(4))*T 40 CONTINUE 45 CONTINUE C IF (JJ.EQ.II)CALL POUT(N2,N3,N2,E1(1),E1(2),E1(3),E1(4)) C C Module 3: Array as parameter C IF (N3.EQ.0) GO TO 59 DO 50 I=1,N3 CALL PA(E1) 50 CONTINUE 59 CONTINUE C IF (JJ.EQ.II)CALL POUT(N3,N2,N2,E1(1),E1(2),E1(3),E1(4)) C C Module 4: Conditional jumps C J = 1 IF (N4.EQ.0) GO TO 65 DO 60 I=1,N4 IF (J.EQ.1) GO TO 51 J = 3 GO TO 52 51 J = 2 52 IF (J.gt.2) GO TO 53 J = 1 GO TO 54 53 J = 0 54 IF (J.LT.1) GO TO 55 J = 0 GO TO 60 55 J = 1 60 CONTINUE 65 CONTINUE C IF (JJ.EQ.II)CALL POUT(N4,J,J,X1,X2,X3,X4) C C Module 5: Omitted C Module 6: Integer arithmetic C J = 1 K = 2 L = 3 C IF (N6.EQ.0) GO TO 75 DO 70 I=1,N6 J = J * (K-J) * (L-K) K = L * K - (L-J) * K L = (L - K) * (K + J) E1(L-1) = J + K + L E1(K-1) = J * K * L 70 CONTINUE 75 CONTINUE C IF (JJ.EQ.II)CALL POUT(N6,J,K,E1(1),E1(2),E1(3),E1(4)) C C Module 7: Trigonometric functions C X = 0.5 Y = 0.5 C IF (N7.EQ.0) GO TO 85 DO 80 I=1,N7 X=T*ATAN(T2*SIN(X)*COS(X)/(COS(X+Y)+COS(X-Y)-1.0)) Y=T*ATAN(T2*SIN(Y)*COS(Y)/(COS(X+Y)+COS(X-Y)-1.0)) 80 CONTINUE 85 CONTINUE C IF (JJ.EQ.II)CALL POUT(N7,J,K,X,X,Y,Y) C C Module 8: Procedure calls C X = 1.0 Y = 1.0 Z = 1.0 C IF (N8.EQ.0) GO TO 95 DO 90 I=1,N8 CALL P3(X,Y,Z) 90 CONTINUE 95 CONTINUE C IF (JJ.EQ.II)CALL POUT(N8,J,K,X,Y,Z,Z) C C Module 9: Array references C J = 1 K = 2 L = 3 E1(1) = 1.0 E1(2) = 2.0 E1(3) = 3.0 C IF (N9.EQ.0) GO TO 105 DO 100 I=1,N9 CALL P0 100 CONTINUE 105 CONTINUE C IF (JJ.EQ.II)CALL POUT(N9,J,K,E1(1),E1(2),E1(3),E1(4)) C C Module 10: Integer arithmetic C J = 2 K = 3 C IF (N10.EQ.0) GO TO 115 DO 110 I=1,N10 J = J + K K = J + K J = K - J K = K - J - J 110 CONTINUE 115 CONTINUE C IF (JJ.EQ.II)CALL POUT(N10,J,K,X1,X2,X3,X4) C C Module 11: Standard functions C X = 0.75 C IF (N11.EQ.0) GO TO 125 DO 120 I=1,N11 X = DSQRT(DEXP(DLOG(X)/T1)) 120 CONTINUE 125 CONTINUE C IF (JJ.EQ.II)CALL POUT(N11,J,K,X,X,X,X) C C THIS IS THE END OF THE MAJOR LOOP. C 500 CONTINUE C C Stop benchmark timing at this point. C time1 = cputime() C---------------------------------------------------------------- C Performance in Whetstone KIP's per second is given by C C (100*LOOP*II)/TIME C C where TIME is in seconds. C-------------------------------------------------------------------- print *,' Double Whetstone KIPS ', &nint((100*LOOP*II)/(time1-time0)) END C SUBROUTINE PA(E) IMPLICIT REAL*8 (A-H,O-Z) DIMENSION E(4) COMMON T,T1,T2,E1(4),J,K,L J1 = 0 10 E(1) = (E(1) + E(2) + E(3) - E(4)) * T E(2) = (E(1) + E(2) - E(3) + E(4)) * T E(3) = (E(1) - E(2) + E(3) + E(4)) * T E(4) = (-E(1) + E(2) + E(3) + E(4)) / T2 J1 = J1 + 1 IF (J1 - 6) 10,20,20 C 20 RETURN END C SUBROUTINE P0 IMPLICIT REAL*8 (A-H,O-Z) COMMON T,T1,T2,E1(4),J,K,L E1(J) = E1(K) E1(K) = E1(L) E1(L) = E1(J) RETURN END C SUBROUTINE P3(X,Y,Z) IMPLICIT REAL*8 (A-H,O-Z) COMMON T,T1,T2,E1(4),J,K,L X1 = X Y1 = Y X1 = T * (X1 + Y1) Y1 = T * (X1 + Y1) Z = (X1 + Y1) / T2 RETURN END C SUBROUTINE POUT(N,J,K,X1,X2,X3,X4) IMPLICIT REAL*8 (A-H,O-Z) common/ptime/ptime,time0 real ptime,time1,time0,cputime time1 = cputime() print 10, nint(time1-time0),nint(time1-ptime),N,J,K,X1,X2,X3,X4 10 FORMAT (1X,2i3,1X,3I7,4(1PE12.4)) ptime = time1 RETURN END C REAL FUNCTION CPUTIME () REAL STIME,UTIME CALL ETIME(UTIME,STIME) CPUTIME=UTIME RETURN END End of 17 echo 18 1>&2 cat >18 <<'End of 18' Date: Mon, 17 Nov 86 11:52:58 est From: David Krowitz To: dongarra@anl-mcs Subject: Here is file whetstones.f for FX/1 library Status: RO C********************************************************************** C Benchmark #2 -- Single Precision Whetstone (A001) C C o This is a REAL*4 version of C the Whetstone benchmark program. C C o DO-loop semantics are ANSI-66 compatible. C C o Final measurements are to be made with all C WRITE statements and FORMAT sttements removed. C C********************************************************************** IMPLICIT REAL*4 (A-H,O-Z) C COMMON T,T1,T2,E1(4),J,K,L common/ptime/ptime,time0 real time0,time1,cputime,ptime C WRITE(6,1) 1 FORMAT(/' Benchmark #2 -- Single Precision Whetstone (A001)') C C Start benchmark timing at this point. C time0 = cputime() ptime = time0 C C The actual benchmark starts here. C T = .499975 T1 = 0.50025 T2 = 2.0 C C With loopcount LOOP=10, one million Whetstone instructions C will be executed in EACH MAJOR LOOP..A MAJOR LOOP IS EXECUTED C 'II' TIMES TO INCREASE WALL-CLOCK TIMING ACCURACY. C LOOP = 1000 II = 1 C DO 500 JJ=1,II C C Establish the relative loop counts of each module. C N1 = 0 N2 = 12 * LOOP N3 = 14 * LOOP N4 = 345 * LOOP N5 = 0 N6 = 210 * LOOP N7 = 32 * LOOP N8 = 899 * LOOP N9 = 616 * LOOP N10 = 0 N11 = 93 * LOOP C C Module 1: Simple identifiers C X1 = 1.0 X2 = -1.0 X3 = -1.0 X4 = -1.0 C IF (N1.EQ.0) GO TO 35 DO 30 I=1,N1 X1 = (X1 + X2 + X3 - X4)*T X2 = (X1 + X2 - X3 + X4)*T X3 = (X1 - X2 + X3 + X4)*T X4 = (-X1 + X2 + X3 + X4)*T 30 CONTINUE 35 CONTINUE C IF (JJ.EQ.II)CALL POUT(N1,N1,N1,X1,X2,X3,X4) C C Module 2: Array elements C E1(1) = 1.0 E1(2) = -1.0 E1(3) = -1.0 E1(4) = -1.0 C IF (N2.EQ.0) GO TO 45 DO 40 I=1,N2 E1(1) = (E1(1) + E1(2) + E1(3) - E1(4))*T E1(2) = (E1(1) + E1(2) - E1(3) + E1(4))*T E1(3) = (E1(1) - E1(2) + E1(3) + E1(4))*T E1(4) = (-E1(1) + E1(2) + E1(3) + E1(4))*T 40 CONTINUE 45 CONTINUE C IF (JJ.EQ.II)CALL POUT(N2,N3,N2,E1(1),E1(2),E1(3),E1(4)) C C Module 3: Array as parameter C IF (N3.EQ.0) GO TO 59 DO 50 I=1,N3 CALL PA(E1) 50 CONTINUE 59 CONTINUE C IF (JJ.EQ.II)CALL POUT(N3,N2,N2,E1(1),E1(2),E1(3),E1(4)) C C Module 4: Conditional jumps C J = 1 IF (N4.EQ.0) GO TO 65 DO 60 I=1,N4 IF (J.EQ.1) GO TO 51 J = 3 GO TO 52 51 J = 2 52 IF (J.gt.2) GO TO 53 J = 1 GO TO 54 53 J = 0 54 IF (J.LT.1) GO TO 55 J = 0 GO TO 60 55 J = 1 60 CONTINUE 65 CONTINUE C IF (JJ.EQ.II)CALL POUT(N4,J,J,X1,X2,X3,X4) C C Module 5: Omitted C Module 6: Integer arithmetic C J = 1 K = 2 L = 3 C IF (N6.EQ.0) GO TO 75 DO 70 I=1,N6 J = J * (K-J) * (L-K) K = L * K - (L-J) * K L = (L - K) * (K + J) E1(L-1) = J + K + L E1(K-1) = J * K * L 70 CONTINUE 75 CONTINUE C IF (JJ.EQ.II)CALL POUT(N6,J,K,E1(1),E1(2),E1(3),E1(4)) C C Module 7: Trigonometric functions C X = 0.5 Y = 0.5 C IF (N7.EQ.0) GO TO 85 DO 80 I=1,N7 X=T*ATAN(T2*SIN(X)*COS(X)/(COS(X+Y)+COS(X-Y)-1.0)) Y=T*ATAN(T2*SIN(Y)*COS(Y)/(COS(X+Y)+COS(X-Y)-1.0)) 80 CONTINUE 85 CONTINUE C IF (JJ.EQ.II)CALL POUT(N7,J,K,X,X,Y,Y) C C Module 8: Procedure calls C X = 1.0 Y = 1.0 Z = 1.0 C IF (N8.EQ.0) GO TO 95 DO 90 I=1,N8 CALL P3(X,Y,Z) 90 CONTINUE 95 CONTINUE C IF (JJ.EQ.II)CALL POUT(N8,J,K,X,Y,Z,Z) C C Module 9: Array references C J = 1 K = 2 L = 3 E1(1) = 1.0 E1(2) = 2.0 E1(3) = 3.0 C IF (N9.EQ.0) GO TO 105 DO 100 I=1,N9 CALL P0 100 CONTINUE 105 CONTINUE C IF (JJ.EQ.II)CALL POUT(N9,J,K,E1(1),E1(2),E1(3),E1(4)) C C Module 10: Integer arithmetic C J = 2 K = 3 C IF (N10.EQ.0) GO TO 115 DO 110 I=1,N10 J = J + K K = J + K J = K - J K = K - J - J 110 CONTINUE 115 CONTINUE C IF (JJ.EQ.II)CALL POUT(N10,J,K,X1,X2,X3,X4) C C Module 11: Standard functions C X = 0.75 C IF (N11.EQ.0) GO TO 125 DO 120 I=1,N11 X = SQRT(EXP(ALOG(X)/T1)) 120 CONTINUE 125 CONTINUE C IF (JJ.EQ.II)CALL POUT(N11,J,K,X,X,X,X) C C THIS IS THE END OF THE MAJOR LOOP. C 500 CONTINUE C C Stop benchmark timing at this point. C time1 = cputime() C---------------------------------------------------------------- C Performance in Whetstone KIP's per second is given by C C (100*LOOP*II)/TIME C C where TIME is in seconds. C-------------------------------------------------------------------- print *,' Single Whetstone KIPS ', &nint((100*LOOP*II)/(time1-time0)) END C SUBROUTINE PA(E) IMPLICIT REAL*4 (A-H,O-Z) DIMENSION E(4) COMMON T,T1,T2,E1(4),J,K,L J1 = 0 10 E(1) = (E(1) + E(2) + E(3) - E(4)) * T E(2) = (E(1) + E(2) - E(3) + E(4)) * T E(3) = (E(1) - E(2) + E(3) + E(4)) * T E(4) = (-E(1) + E(2) + E(3) + E(4)) / T2 J1 = J1 + 1 IF (J1 - 6) 10,20,20 C 20 RETURN END C SUBROUTINE P0 IMPLICIT REAL*4 (A-H,O-Z) COMMON T,T1,T2,E1(4),J,K,L E1(J) = E1(K) E1(K) = E1(L) E1(L) = E1(J) RETURN END C SUBROUTINE P3(X,Y,Z) IMPLICIT REAL*4 (A-H,O-Z) COMMON T,T1,T2,E1(4),J,K,L X1 = X Y1 = Y X1 = T * (X1 + Y1) Y1 = T * (X1 + Y1) Z = (X1 + Y1) / T2 RETURN END C SUBROUTINE POUT(N,J,K,X1,X2,X3,X4) IMPLICIT REAL*4 (A-H,O-Z) common/ptime/ptime,time0 real ptime,time1,time0,cputime time1 = cputime() print 10, nint(time1-time0),nint(time1-ptime),N,J,K,X1,X2,X3,X4 10 FORMAT (1X,2i3,1X,3I7,4(1PE12.4)) ptime = time1 RETURN END C REAL FUNCTION CPUTIME () REAL STIME,UTIME CALL ETIME(UTIME,STIME) CPUTIME=UTIME RETURN END End of 18 echo 19 1>&2 cat >19 <<'End of 19' Date: Mon, 17 Nov 86 11:47:42 est From: David Krowitz To: dongarra@anl-mcs Subject: Here is file livermores.f for FX/1 library Status: RO C PROGRAM MFLOP(TAPE5=OUTMF) C LATEST FILE MODIFICATION DATE: 31 Oct 84. C**************************************************************************** C MEASURES CPU PERFORMANCE OF THE COMPUTER/COMPILER/COMPUTATIONAL COMPLEX C**************************************************************************** C * C L. L. N. L. F O R T R A N K E R N E L S: M F L O P S * C * C These kernels measure Fortran numerical computation rates for a * C spectrum of cpu-limited computational structures. Mathematical * C through-put is measured in units of millions of floating-point * C operations executed per second, called Megaflops/sec. * C * C This program measures a realistic cpu performance range for the * C Fortran programming system on a given day. The cpu performance * C rates depend strongly on the maturity of the Fortran compiler's * C ability to translate Fortran code into efficient machine code. * C * C [ The cpu hardware capability apart from compiler maturity (or * C availability), could be measured (or simulated) by programming the * C kernels in assembly or machine code directly. These measurements * C can also serve as a framework for tracking the maturation of the * C Fortran compiler during system development.] * C * C While this test evaluates the performance of a broad sampling of * C Fortran computations, it is not an application program and hence * C it is not a benchmark per se. The performance of benchmarks and * C even workloads, if cpu limited, could be roughly estimated by * C choosing appropriate weights and loop limits for each kernel (see * C Block Data). * C * C Use of this program is granted with the request that a copy of the * C results be sent to the author at the address shown below, to be * C added to our studies of computer performance. The timing results * C will be held as proprietary data if so marked. In return, you * C will receive a copy of our latest report. * C * C * C F.H. McMahon L-35 * C Lawrence Livermore National Laboratory * C P.0. Box 808 * C Livermore, CA * C 94550 * C * C * C (C) Copyright 1983 the Regents of the * C University of California. All Rights Reserved. * C * C This work was produced under the sponsorship of * C the U.S. Department of Energy. The Government * C retains certain rights therein. * C * C**************************************************************************** C C C C C C C C C 1. We require one run of the Fortran kernels as is, that is, with C no reprogramming. In addition, the vendor may, if so desired C reprogram the Fortran kernels to demonstrate high performance C hardware features. In either case, a compiler listing of the code C actually used should be returned along with the timing results. C C 2. On computers where default single precision is REAL*4 we require C an additional run with all mantissas.ge.48 i.e. declare all REAL*8: C IMPLICIT REAL*4(A-H,O-Z) C C 3. On computers with Cache memories we require two timed runs C with the repitition counter Loop = 1 and 100 (set in subr. SIZES), C to show un-initialized as well as encached execution rates. C Increase the size of array CACHE(in subr. TEST) from 8192 to cache size. C C 4. On computers with Virtual memory systems assure a working set space C larger than the entire program so that page faults are negligible, C because we wish to measure the cpu-limited computation rates. C C 5. Conversion includes verifying or changing the following: C C First : the i/O output device number= IOU assignment C Second: the definition of function SECOND for timing and C the value of TIC:= minimum clock timing (sec.) C Third : the definition of function MOD2N in KERNEL C Fourth: the system names Komput and Kompil in REPORT C **************************************************************** C DIMENSION FLOPS(137), TR(137), RATES(137) DIMENSION LSPAN(137), WG(137), OSUM (137) REAL SECOND c How many data sets to run? parameter (ndataset = 1) C C CALL LINK('UNIT5=(output,create,text)//') c open (unit=6,file='lloop.lst') t= SECOND(0.0) IOU= 6 TIC= 1.0e-9 kern= 24 C DO 1 lset= 1,ndataset l= lset tock= TICK ( IOU, l, TIC) C CALL KERNEL( FLOPS,TR,RATES,LSPAN,WG,OSUM, tock) C c CALL REPORT( IOU, kern,FLOPS,TR,RATES,LSPAN,WG,OSUM) 1 CONTINUE C CALL REPORT( IOU,ndataset*kern,FLOPS,TR,RATES,LSPAN,WG,OSUM) C t= SECOND(0.0) - t WRITE( IOU,9) nint(t) 9 FORMAT( 1H1,//,26H CHECK CLOCK CALIBRATION: ,/, . 18H Total cpu Time = ,i6, 5H Sec. ) STOP END C C*********************************************** BLOCK DATA C*********************************************** C IMPLICIT REAL*4(A-H,O-Z) C C l1 := param-dimension governs the size of most 1-d arrays C l2 := param-dimension governs the size of most 2-d arrays C C ISPAN := Array of limits for each DO loop in the kernels C ISP2 := Array of limits for each DO loop in the kernels C ISP3 := Array of limits for each DO loop in the kernels C IPAS1 := Array of limits for multiple pass execution of each kernel C IPAS2 := Array of limits for multiple pass execution of each kernel C IPAS3 := Array of limits for multiple pass execution of each kernel C NROPS := Array of floating-point operation counts for one pass thru kernel C WT := Array of weights to average kernel execution rates. C SKALE:= Array of scale factors for SIGNAL data generator. C BIAS:= Array of scale factors for SIGNAL data generator. C C C/ PARAMETER( l1= 1001, l2= 101 ) C/ PARAMETER( l13= 64, l13h= l13/2, l213= l13+l13h, l813= 8*l13 ) C/ PARAMETER( l14= 512, l16= 75, l416= 4*l16 , l21= 25 ) C C/ PARAMETER( l1= 27, l2= 15 ) C/ PARAMETER( l13= 8, l13h= 8/2, l213= 8+4, l813= 8*8 ) C/ PARAMETER( l14= 16, l16= 15, l416= 4*15 , l21= 15) C C C/ PARAMETER( l1= 1001, l2= 101 ) C/ PARAMETER( l13= 64, l13h= 64/2, l213= 64+32, l813= 8*64 ) C/ PARAMETER( l14= 512, l16= 75, l416= 4*75 , l21= 25) C C/ PARAMETER( m1= 1001-1, m2= 101-1, m7= 1001-6 ) C REAL LOOPS, NROPS COMMON /SPACE0/ TIME(47), CSUM(47), WW(47), WT(47), TICKS, 1 SKALE(47), BIAS(47), WS(95), LOOPS(47), NROPS(47) C COMMON /SPACEI/ ISPAN(47), ISP2(47), ISP3(47), 1 IPAS1(47), IPAS2(47), IPAS3(47) C C **************************************************************** C C DATA ISPAN( 1),ISPAN( 2),ISPAN( 3),ISPAN( 4),ISPAN( 5),ISPAN( 6), . ISPAN( 7),ISPAN( 8),ISPAN( 9),ISPAN(10),ISPAN(11),ISPAN(12), . ISPAN(13),ISPAN(14),ISPAN(15),ISPAN(16),ISPAN(17),ISPAN(18), . ISPAN(19),ISPAN(20),ISPAN(21),ISPAN(22),ISPAN(23),ISPAN(24), . ISPAN(25) ./1001, 101, 1001, 1001, 1001, 64, 995, 100, . 101, 101, 1001, 1000, 64, 1001, 101, 75, . 101, 100, 101, 1000, 25, 101, 100, 1001, 0/ C C* ./l1, l2, l1, l1, l1, l13, m7, m2, C* . l2, l2, l1, m1, l13, l1, l2, l16, C* . l2, m2, l2, m1, l21, l2, m2, l1, 0/ C DATA ISP2( 1),ISP2( 2),ISP2( 3),ISP2( 4),ISP2( 5),ISP2( 6), . ISP2( 7),ISP2( 8),ISP2( 9),ISP2( 10),ISP2( 11),ISP2( 12), . ISP2( 13),ISP2( 14),ISP2( 15),ISP2( 16),ISP2( 17),ISP2( 18), . ISP2( 19),ISP2( 20),ISP2( 21),ISP2( 22),ISP2( 23),ISP2( 24), . ISP2( 25) ./101, 101, 101, 101, 101, 32, 101, 100, . 101, 101, 101, 100, 32, 101, 101, 40, . 101, 100, 101, 100, 20, 101, 100, 101, 0/ C C DATA ISP3( 1),ISP3( 2),ISP3( 3),ISP3( 4),ISP3( 5),ISP3( 6), . ISP3( 7),ISP3( 8),ISP3( 9),ISP3( 10),ISP3( 11),ISP3( 12), . ISP3( 13),ISP3( 14),ISP3( 15),ISP3( 16),ISP3( 17),ISP3( 18), . ISP3( 19),ISP3( 20),ISP3( 21),ISP3( 22),ISP3( 23),ISP3( 24), . ISP3( 25) ./27, 15, 27, 27, 27, 8, 21, 14, . 15, 15, 27, 26, 8, 27, 15, 15, . 15, 14, 15, 26, 15, 15, 14, 27, 0/ C DATA IPAS1( 1),IPAS1( 2),IPAS1( 3),IPAS1( 4),IPAS1( 5),IPAS1( 6), . IPAS1( 7),IPAS1( 8),IPAS1( 9),IPAS1(10),IPAS1(11),IPAS1(12), . IPAS1(13),IPAS1(14),IPAS1(15),IPAS1(16),IPAS1(17),IPAS1(18), . IPAS1(19),IPAS1(20),IPAS1(21),IPAS1(22),IPAS1(23),IPAS1(24), . IPAS1(25) ./ 7, 67, 9, 14, 10, 3, 4, 10, 36, 34, 11, 12, . 36, 2, 1, 25, 35, 2, 39, 1, 1, 11, 8, 5, 0/ C DATA IPAS2( 1),IPAS2( 2),IPAS2( 3),IPAS2( 4),IPAS2( 5),IPAS2( 6), . IPAS2( 7),IPAS2( 8),IPAS2( 9),IPAS2(10),IPAS2(11),IPAS2(12), . IPAS2(13),IPAS2(14),IPAS2(15),IPAS2(16),IPAS2(17),IPAS2(18), . IPAS2(19),IPAS2(20),IPAS2(21),IPAS2(22),IPAS2(23),IPAS2(24), . IPAS2(25) ./ 40, 40, 53, 70, 55, 7, 22, 6, 21, 19, 64, 68, . 41, 10, 1, 27, 20, 1, 23, 8, 1, 7, 5, 31, 0/ C DATA IPAS3( 1),IPAS3( 2),IPAS3( 3),IPAS3( 4),IPAS3( 5),IPAS3( 6), . IPAS3( 7),IPAS3( 8),IPAS3( 9),IPAS3(10),IPAS3(11),IPAS3(12), . IPAS3(13),IPAS3(14),IPAS3(15),IPAS3(16),IPAS3(17),IPAS3(18), . IPAS3(19),IPAS3(20),IPAS3(21),IPAS3(22),IPAS3(23),IPAS3(24), . IPAS3(25) ./ 28, 46, 37, 38, 40, 21, 20, 9, 26, 25, 46, 48, . 31, 8, 1, 14, 26, 2, 28, 7, 1, 8, 7, 23, 0/ C c c The following flop-counts (NROPS) are required for scalar or serial c execution. The scalar version defines the NECESSARY computation c generally, in the absence of proof to the contrary. The vector c or parallel executions are only credited with executing the same c necessary computation. If the parallel methods do more computation c than is necessary then the extra flops are not counted as through-put. c DATA NROPS( 1),NROPS( 2),NROPS( 3),NROPS( 4),NROPS( 5),NROPS( 6), . NROPS( 7),NROPS( 8),NROPS( 9),NROPS(10),NROPS(11),NROPS(12), . NROPS(13),NROPS(14),NROPS(15),NROPS(16),NROPS(17),NROPS(18), . NROPS(19),NROPS(20),NROPS(21),NROPS(22),NROPS(23),NROPS(24) . /5., 4., 2., 2., 2., 2., 16., 36., 17., 9., 1., 1., . 7., 11., 33., 7., 9., 44., 6., 26., 2., 17., 11., 1./ C DATA WT( 1), WT( 2), WT( 3), WT( 4), WT( 5), WT( 6), . WT( 7), WT( 8), WT( 9), WT(10), WT(11), WT(12), . WT(13), WT(14), WT(15), WT(16), WT(17), WT(18), . WT(19), WT(20), WT(21), WT(22), WT(23), WT(24), . WT(25) ./1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, . 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, . 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0/ C DATA SKALE( 1),SKALE( 2),SKALE( 3),SKALE( 4),SKALE( 5),SKALE( 6), . SKALE( 7),SKALE( 8),SKALE( 9),SKALE(10),SKALE(11),SKALE(12), . SKALE(13),SKALE(14),SKALE(15),SKALE(16),SKALE(17),SKALE(18), . SKALE(19),SKALE(20),SKALE(21),SKALE(22),SKALE(23),SKALE(24), . SKALE(25) ./0.1 ,0.1 ,0.1 ,0.1 ,0.1 ,0.1 ,0.1 ,0.1 , . 0.1 ,0.1 ,0.1 ,0.1 ,0.1 ,0.1 ,0.1 ,0.1 , . 0.1 ,0.1 ,0.1 ,0.1 ,0.1 ,0.1 ,0.1 ,0.1 ,0.1/ C C/ ./1.0 ,1.0 ,1.0 ,1.0 ,1.0 ,1.0 ,1.0 ,1.0 , C/ . 1.0 ,1.0 ,1.0 ,1.0 ,1.0 ,1.0 ,1.0 ,1.0 , C/ . 1.0 ,1.0 ,1.0 ,1.0 ,1.0 ,1.0 ,1.0 ,1.0 ,1.0/ C DATA BIAS( 1), BIAS( 2), BIAS( 3), BIAS( 4), BIAS( 5), BIAS( 6), . BIAS( 7), BIAS( 8), BIAS( 9), BIAS(10), BIAS(11), BIAS(12), . BIAS(13), BIAS(14), BIAS(15), BIAS(16), BIAS(17), BIAS(18), . BIAS(19), BIAS(20), BIAS(21), BIAS(22), BIAS(23), BIAS(24), . BIAS(25) ./0.0 ,0.0 ,0.0 ,0.0 ,0.0 ,0.0 ,0.0 ,0.0 , . 0.0 ,0.0 ,0.0 ,0.0 ,0.0 ,0.0 ,0.0 ,0.0 , . 0.0 ,0.0 ,0.0 ,0.0 ,0.0 ,0.0 ,0.0 ,0.0 ,0.0/ END C C*********************************************** SUBROUTINE INDEX C*********************************************** C C C C ------ ----------------------------------------------- C MODULE PURPOSE C ------ ----------------------------------------------- C C KERNEL executes 24 samples of Fortran computation C C REPORT prints timing results C C RESULT computes timing results into pushdown store C C SECOND cumulative CPU time for task in seconds (MKS units) C C SIGNAL generates a set of floating-point numbers near 1.0 C C SIZES test and set the loop controls before each kernel test C C SORDID simple sort C C SPACE sets memory pointers for array variables. optional. C C STATS calculates unweighted statistics C C STATW calculates weighted statistics C C SUMO check-sum with ordinal dependency C C TEST times, tests, and initializes each kernel test C C TICK measures time overhead of subroutine test C C VALID compresses valid timing results C C VALUES initializes all values C C VECTOR initializes common blocks containing type real arrays. C ------ ----------------------------------------------- C C RETURN C END C C*********************************************** SUBROUTINE KERNEL( FLOPS,TR,RATES,LSPAN,WG,OSUM, TOCK) C*********************************************************************** C * C KERNEL - Executes Sample Fortran Kernels * C * C FLOPS - Array: Number of Flops executed by each kernel * C TR - Array: Time of execution of each kernel(microsecs) * C RATES - Array: Rate of execution of each kernel(megaflops/sec)* C LSPAN - Array: Span of inner DO loop in each kernel * C WG - Array: Weight assigned to each kernel for statistics * C OSUM - Array: Checksums of the results of each kernel * C TOCK - Timing overhead per kernel(sec). Clock resolution. * C * C*********************************************************************** C * C L. L. N. L. F O R T R A N K E R N E L S: M F L O P S * C * C These kernels measure Fortran numerical computation * C rates for a spectrum of cpu-limited computational * C structures or benchmarks. Mathematical through-put * C is measured in units of millions of floating-point * C operations executed per second, called Megaflops/sec. * C * C * C Fonzi's Law: There is not now and there never will be a language * C in which it is the least bit difficult to write * C bad programs. * C * C F.H.MCMAHON 1972 * C*********************************************************************** C C l1 := param-dimension governs the size of most 1-d arrays C l2 := param-dimension governs the size of most 2-d arrays C C Loop := multiple pass control to execute kernel long enough to time. C n := DO loop control for each kernel. Controls are set in subr. SIZES C C ****************************************************************** C IMPLICIT REAL*4(A-H,O-Z) C C/ PARAMETER( l1= 1001, l2= 101 ) C/ PARAMETER( l13= 64, l13h= l13/2, l213= l13+l13h, l813= 8*l13 ) C/ PARAMETER( l14= 512, l16= 75, l416= 4*l16 , l21= 25 ) C DIMENSION FLOPS(137), TR(137), RATES(137) DIMENSION LSPAN(137), WG(137), OSUM (137) C COMMON /SPACES/ ION,j5,k2,k3,Loop,m,KR,N13H, 1 n,N1,N2,N13,N213,N813,N16,N416,N21,NT1,NT2 C COMMON /SPACER/ A11,A12,A13,A21,A22,A23,A31,A32,A33, 2 AR,BR,C0,CR,DI,DK, 3 DM22,DM23,DM24,DM25,DM26,DM27,DM28,DN,E3,E6,EXPMAX,FLX, 4 Q,QA,R,RI,S,SCALE,SIG,STB5,T,XNC,XNEI,XNM C REAL LOOPS, NROPS INTEGER E,F,ZONE C COMMON /SPACE0/ TIME(47), CSUM(47), WW(47), WT(47), TICKS, 1 SKALE(47), BIAS(47), WS(95), LOOPS(47), NROPS(47) C COMMON /SPACEI/ ISPAN(47), ISP2(47), ISP3(47), 1 IPAS1(47), IPAS2(47), IPAS3(47) C C/ COMMON /ISPACE/ E(l213), F(l213), C/ 1 IX(l1), IR(l1), ZONE(l416) C/C C/ COMMON /SPACE1/ U(l1), V(l1), W(l1), C/ 1 X(l1), Y(l1), Z(l1), G(l1), C/ 2 DU1(l2), DU2(l2), DU3(l2), GRD(l1), DEX(l1), C/ 3 XI(l1), EX(l1), EX1(l1), DEX1(l1), C/ 4 VX(l1), XX(l1), RX(l1), RH(l1), C/ 5 VSP(l2), VSTP(l2), VXNE(l2), VXND(l2), C/ 6 VE3(l2), VLR(l2), VLIN(l2), B5(l2), C/ 7 PLAN(l416), D(l416), SA(l2), SB(l2) C/C C/ COMMON /SPACE2/ P(4,l813), PX(l21,l2), CX(l21,l2), C/ 1 VY(l2,l21), VH(l2,7), VF(l2,7), VG(l2,7), VS(l2,7), C/ 2 ZA(l2,7) , ZP(l2,7), ZQ(l2,7), ZR(l2,7), ZM(l2,7), C/ 3 ZB(l2,7) , ZU(l2,7), ZV(l2,7), ZZ(l2,7), C/ 4 B(l13,l13), C(l13,l13), H(l13,l13), C/ 5 U1(5,l2,2), U2(5,l2,2), U3(5,l2,2) C C ****************************************************************** C C C/ PARAMETER( l1= 1001, l2= 101 ) C/ PARAMETER( l13= 64, l13h= 64/2, l213= 64+32, l813= 8*64 ) C/ PARAMETER( l14= 512, l16= 75, l416= 4*75 , l21= 25) C C COMMON /ISPACE/ E(96), F(96), 1 IX(1001), IR(1001), ZONE(300) C COMMON /SPACE1/ U(1001), V(1001), W(1001), 1 X(1001), Y(1001), Z(1001), G(1001), 2 DU1(101), DU2(101), DU3(101), GRD(1001), DEX(1001), 3 XI(1001), EX(1001), EX1(1001), DEX1(1001), 4 VX(1001), XX(1001), RX(1001), RH(1001), 5 VSP(101), VSTP(101), VXNE(101), VXND(101), 6 VE3(101), VLR(101), VLIN(101), B5(101), 7 PLAN(300), D(300), SA(101), SB(101) C COMMON /SPACE2/ P(4,512), PX(25,101), CX(25,101), 1 VY(101,25), VH(101,7), VF(101,7), VG(101,7), VS(101,7), 2 ZA(101,7) , ZP(101,7), ZQ(101,7), ZR(101,7), ZM(101,7), 3 ZB(101,7) , ZU(101,7), ZV(101,7), ZZ(101,7), 4 B(64,64), C(64,64), H(64,64), 5 U1(5,101,2), U2(5,101,2), U3(5,101,2) C C ****************************************************************** C C// DIMENSION E(96), F(96), U(1001), V(1001), W(1001), C// 1 X(1001), Y(1001), Z(1001), G(1001), C// 2 DU1(101), DU2(101), DU3(101), GRD(1001), DEX(1001), C// 3 IX(1001), XI(1001), EX(1001), EX1(1001), DEX1(1001), C// 4 VX(1001), XX(1001), IR(1001), RX(1001), RH(1001), C// 5 VSP(101), VSTP(101), VXNE(101), VXND(101), C// 6 VE3(101), VLR(101), VLIN(101), B5(101), C// 7 PLAN(300), ZONE(300), D(300), SA(101), SB(101) C//C C// DIMENSION P(4,512), PX(25,101), CX(25,101), C// 1 VY(101,25), VH(101,7), VF(101,7), VG(101,7), VS(101,7), C// 2 ZA(101,7) , ZP(101,7), ZQ(101,7), ZR(101,7), ZM(101,7), C// 3 ZB(101,7) , ZU(101,7), ZV(101,7), ZZ(101,7), C// 4 B(64,64), C(64,64), H(64,64), C// 5 U1(5,101,2), U2(5,101,2), U3(5,101,2) C//C C//C ****************************************************************** C//C C// COMMON /POINT/ ME,MF,MU,MV,MW,MX,MY,MZ,MG,MDU1,MDU2,MDU3,MGRD, C// 1 MDEX,MIX,MXI,MEX,MEX1,MDEX1,MVX,MXX,MIR,MRX,MRH,MVSP,MVSTP, C// 2 MVXNE,MVXND,MVE3,MVLR,MVLIN,MB5,MPLAN,MZONE,MD,MSA,MSB, C// 3 MP,MPX,MCX,MVY,MVH,MVF,MVG,MVS,MZA,MZP,MZQ,MZR,MZM,MZB,MZU, C// 4 MZV,MZZ,MB,MC,MH,MU1,MU2,MU3 C//C C// POINTER (ME,E), (MF,F), (MU,U), (MV,V), (MW,W), C// 1 (MX,X), (MY,Y), (MZ,Z), (MG,G), C// 2 (MDU1,DU1),(MDU2,DU2),(MDU3,DU3),(MGRD,GRD),(MDEX,DEX), C// 3 (MIX,IX), (MXI,XI), (MEX,EX), (MEX1,EX1), (MDEX1,DEX1), C// 4 (MVX,VX), (MXX,XX), (MIR,IR), (MRX,RX), (MRH,RH), C// 5 (MVSP,VSP), (MVSTP,VSTP), (MVXNE,VXNE), (MVXND,VXND), C// 6 (MVE3,VE3), (MVLR,VLR), (MVLIN,VLIN), (MB5,B5), C// 7 (MPLAN,PLAN), (MZONE,ZONE), (MD,D), (MSA,SA), (MSB,SB) C//C C// POINTER (MP,P), (MPX,PX), (MCX,CX), C// 1 (MVY,VY), (MVH,VH), (MVF,VF), (MVG,VG), (MVS,VS), C// 2 (MZA,ZA), (MZP,ZP), (MZQ,ZQ), (MZR,ZR), (MZM,ZM), C// 3 (MZB,ZB), (MZU,ZU), (MZV,ZV), (MZZ,ZZ), C// 4 (MB,B), (MC,C), (MH,H), C// 5 (MU1,U1), (MU2,U2), (MU3,U3) C C ****************************************************************** C C STANDARD PRODUCT COMPILER DIRECTIVES MAY BE USED FOR OPTIMIZATION C CDIR$ VECTOR CLLL. OPTIMIZE LEVEL i CLLL. OPTION INTEGER (7) CLLL. OPTION NODYNEQV CLLL. COMMON DUMMY(2000) CLLL. LOC(X) =.LOC.X CLLL. IQ8QDSP = 64*LOC(DUMMY) C C ****************************************************************** C BINARY MACHINES MAY USE THE AND(P,Q) FUNCTION IF AVAILABLE C IN PLACE OF THE FOLLOWING CONGRUENCE FUNCTION (SEE KERNEL 13, 14) C CLLL. INTEGER AND, OR, XOR CLLL. AND(j,k) = j.INT.k C AND(j,k) = j.AND.k AND(j,k) = IAND(j,k) c MOD2N(i,j)= MOD(i,j) +j/2 -ISIGN(j/2,i) MOD2N(i,j)= AND(i,j-1) MOD2P(i,j)= AND(i-1,j-1) + 1 c MOD2P(i,j)= MOD2N(i-1,j) + 1 C i is Congruent to MOD2N(i,j) mod(j) cFor s-1 proj compiler: c* mod2n(i,j)= i .int. (j - 1) c* mod2p(i,j)= ((i - 1) .int. (j - 1)) + 1 C ****************************************************************** C C AMAX1(x,y)= MAX(x,y) C AMIN1(x,y)= MIN(x,y) C C C C CALL SPACE C CALL TEST(0) C C****************************************************************************** C*** KERNEL 1 HYDRO FRAGMENT C****************************************************************************** C DO 1 L = 1,Loop DO 1 k = 1,n 1 X(k)= Q + Y(k)*(R*Z(k+10) + T*Z(k+11)) C C................... CALL TEST(1) C C****************************************************************************** C*** KERNEL 2 ICCG EXCERPT (INCOMPLETE CHOLESKY - CONJUGATE GRADIENT) C****************************************************************************** C DO 200 L= 1,Loop IL= n IPNTP= 0 222 IPNT= IPNTP IPNTP= IPNTP+IL IL= IL/2 i= IPNTP CDIR$ IVDEP C DO 2 k= IPNT+2,IPNTP,2 i= i+1 2 X(i)= X(k) - V(k)*X(k-1) - V(k+1)*X(k+1) IF( IL.GT.1) GO TO 222 200 CONTINUE C C................... CALL TEST(2) C C****************************************************************************** C*** KERNEL 3 INNER PRODUCT C****************************************************************************** C DO 3 L= 1,Loop Q= 0.0 DO 3 k= 1,n 3 Q= Q + Z(k)*X(k) C C................... CALL TEST(3) C C C C C C****************************************************************************** C*** KERNEL 4 BANDED LINEAR EQUATIONS C****************************************************************************** C m= (1001-7)/2 DO 444 L= 1,Loop DO 444 k= 7,1001,m lw= k-6 CDIR$ IVDEP DO 4 j= 5,n,5 X(k-1)= X(k-1) - X(lw)*Y(j) 4 lw= lw+1 X(k-1)= Y(5)*X(k-1) 444 CONTINUE C C................... CALL TEST(4) C C****************************************************************************** C*** KERNEL 5 TRI-DIAGONAL ELIMINATION, BELOW DIAGONAL C****************************************************************************** C DO 5 L = 1,Loop DO 5 i = 2,n 5 X(i)= Z(i)*(Y(i) - X(i-1)) C C................... CALL TEST(5) C C****************************************************************************** C*** KERNEL 6 GENERAL LINEAR RECURRENCE EQUATIONS C****************************************************************************** C DO 6 L= 1,Loop DO 6 i= 2,n C W(i)= 0.01 use only if overflow occurs DO 6 k= 1,i-1 W(i)= W(i) + B(i,k) * W(i-k) 6 CONTINUE C C................... CALL TEST(6) C C****************************************************************************** C*** KERNEL 7 EQUATION OF STATE FRAGMENT C****************************************************************************** C DO 7 L= 1,Loop DO 7 k= 1,n X(k)= U(k ) + R*( Z(k ) + R*Y(k )) + . T*( U(k+3) + R*( U(k+2) + R*U(k+1)) + . T*( U(k+6) + R*( U(k+5) + R*U(k+4)))) 7 CONTINUE C C................... CALL TEST(7) C C C C C C****************************************************************************** C*** KERNEL 8 A.D.I. INTEGRATION C****************************************************************************** C DO 8 L = 1,Loop nl1 = 1 nl2 = 2 DO 8 kx = 2,3 CDIR$ IVDEP DO 8 ky = 2,n DU1(ky)=U1(kx,ky+1,nl1) - U1(kx,ky-1,nl1) DU2(ky)=U2(kx,ky+1,nl1) - U2(kx,ky-1,nl1) DU3(ky)=U3(kx,ky+1,nl1) - U3(kx,ky-1,nl1) U1(kx,ky,nl2)=U1(kx,ky,nl1) +A11*DU1(ky) +A12*DU2(ky) +A13*DU3(ky) . + SIG*(U1(kx+1,ky,nl1) -2.*U1(kx,ky,nl1) +U1(kx-1,ky,nl1)) U2(kx,ky,nl2)=U2(kx,ky,nl1) +A21*DU1(ky) +A22*DU2(ky) +A23*DU3(ky) . + SIG*(U2(kx+1,ky,nl1) -2.*U2(kx,ky,nl1) +U2(kx-1,ky,nl1)) U3(kx,ky,nl2)=U3(kx,ky,nl1) +A31*DU1(ky) +A32*DU2(ky) +A33*DU3(ky) . + SIG*(U3(kx+1,ky,nl1) -2.*U3(kx,ky,nl1) +U3(kx-1,ky,nl1)) 8 CONTINUE C C................... CALL TEST(8) C C****************************************************************************** C*** KERNEL 9 INTEGRATE PREDICTORS C****************************************************************************** C DO 9 L = 1,Loop DO 9 i = 1,n PX( 1,i)= DM28*PX(13,i) + DM27*PX(12,i) + DM26*PX(11,i) + . DM25*PX(10,i) + DM24*PX( 9,i) + DM23*PX( 8,i) + . DM22*PX( 7,i) + C0*(PX( 5,i) + PX( 6,i))+ PX( 3,i) 9 CONTINUE C C................... CALL TEST(9) C C****************************************************************************** C*** KERNEL 10 DIFFERENCE PREDICTORS C****************************************************************************** C DO 10 L= 1,Loop DO 10 i= 1,n AR = CX(5,i) BR = AR - PX(5,i) PX(5,i) = AR CR = BR - PX(6,i) PX(6,i) = BR AR = CR - PX(7,i) PX(7,i) = CR BR = AR - PX(8,i) PX(8,i) = AR CR = BR - PX(9,i) PX(9,i) = BR AR = CR - PX(10,i) PX(10,i)= CR BR = AR - PX(11,i) PX(11,i)= AR CR = BR - PX(12,i) PX(12,i)= BR PX(14,i)= CR - PX(13,i) PX(13,i)= CR 10 CONTINUE C C................... CALL TEST(10) C C****************************************************************************** C*** KERNEL 11 FIRST SUM. C****************************************************************************** C DO 11 L = 1,Loop X(1)= Y(1) DO 11 k = 2,n 11 X(k)= X(k-1) + Y(k) C C................... CALL TEST(11) C C****************************************************************************** C*** KERNEL 12 FIRST DIFF. C****************************************************************************** C DO 12 L = 1,Loop DO 12 k = 1,n 12 X(k)= Y(k+1) - Y(k) C C................... CALL TEST(12) C C****************************************************************************** C*** KERNEL 13 2-D PIC Particle In Cell C****************************************************************************** C DO 13 L= 1,Loop DO 13 ip= 1,n i1= P(1,ip) j1= P(2,ip) i1= 1 + MOD2N(i1,64) j1= 1 + MOD2N(j1,64) P(3,ip)= P(3,ip) + B(i1,j1) P(4,ip)= P(4,ip) + C(i1,j1) P(1,ip)= P(1,ip) + P(3,ip) P(2,ip)= P(2,ip) + P(4,ip) i2= P(1,ip) j2= P(2,ip) i2= MOD2N(i2,64) j2= MOD2N(j2,64) P(1,ip)= P(1,ip) + Y(i2+32) P(2,ip)= P(2,ip) + Z(j2+32) i2= i2 + E(i2+32) j2= j2 + F(j2+32) H(i2,j2)= H(i2,j2) + 1.0 13 CONTINUE C C................... CALL TEST(13) C C****************************************************************************** C*** KERNEL 14 1-D PIC Particle In Cell C****************************************************************************** C C DO 14 L= 1,Loop DO 141 k= 1,n IX(k)= INT( GRD(k)) XI(k)= FLOAT( IX(k)) EX1(k)= EX ( IX(k)) DEX1(k)= DEX ( IX(k)) 141 CONTINUE C DO 142 k= 1,n VX(k)= VX(k) + EX1(k) + (XX(k) - XI(k))*DEX1(k) XX(k)= XX(k) + VX(k) + FLX IR(k)= XX(k) RX(k)= XX(k) - IR(k) IR(k)= MOD2N( IR(k),512) + 1 XX(k)= RX(k) + IR(k) 142 CONTINUE C DO 14 k= 1,n RH(IR(k) )= RH(IR(k) ) + 1.0 - RX(k) RH(IR(k)+1)= RH(IR(k)+1) + RX(k) 14 CONTINUE C C................... CALL TEST(14) C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C****************************************************************************** C*** KERNEL 15 CASUAL FORTRAN. DEVELOPMENT VERSION. C****************************************************************************** C C C CASUAL ORDERING OF SCALAR OPERATIONS IS TYPICAL PRACTICE. C THIS EXAMPLE DEMONSTRATES THE NON-TRIVIAL TRANSFORMATION C REQUIRED TO MAP INTO AN EFFICIENT MACHINE IMPLEMENTATION. C DO 45 L = 1,Loop NR= 7 NZ= n AR= 0.053 BR= 0.073 15 DO 45 j = 2,NR DO 45 k = 2,NZ IF( j-NR) 31,30,30 30 VY(k,j)= 0.0 GO TO 45 31 IF( VH(k,j+1) -VH(k,j)) 33,33,32 32 T= AR GO TO 34 33 T= BR 34 IF( VF(k,j) -VF(k-1,j)) 35,36,36 35 R= AMAX1( VH(k-1,j), VH(k-1,j+1)) S= VF(k-1,j) GO TO 37 36 R= AMAX1( VH(k,j), VH(k,j+1)) S= VF(k,j) 37 VY(k,j)= SQRT( VG(k,j)**2 +R*R)*T/S 38 IF( k-NZ) 40,39,39 39 VS(k,j)= 0. GO TO 45 40 IF( VF(k,j) -VF(k,j-1)) 41,42,42 41 R= AMAX1( VG(k,j-1), VG(k+1,j-1)) S= VF(k,j-1) T= BR GO TO 43 42 R= AMAX1( VG(k,j), VG(k+1,j)) S= VF(k,j) T= AR 43 VS(k,j)= SQRT( VH(k,j)**2 +R*R)*T/S 45 CONTINUE C C................... CALL TEST(15) C C C C C C C C C C C C C C C****************************************************************************** C*** KERNEL 16 MONTE CARLO SEARCH LOOP C****************************************************************************** C II= n/3 LB= II+II k2= 0 k3= 0 C DO 485 L= 1,Loop m= 1 405 i1= m 410 j2= (n+n)*(m-1)+1 DO 470 k= 1,n k2= k2+1 j4= j2+k+k j5= ZONE(j4) IF( j5-n ) 420,475,450 415 IF( j5-n+II ) 430,425,425 420 IF( j5-n+LB ) 435,415,415 425 IF( PLAN(j5)-R) 445,480,440 430 IF( PLAN(j5)-S) 445,480,440 435 IF( PLAN(j5)-T) 445,480,440 440 IF( ZONE(j4-1)) 455,485,470 445 IF( ZONE(j4-1)) 470,485,455 450 k3= k3+1 IF( D(j5)-(D(j5-1)*(T-D(j5-2))**2+(S-D(j5-3))**2 . +(R-D(j5-4))**2)) 445,480,440 455 m= m+1 IF( m-ZONE(1) ) 465,465,460 460 m= 1 465 IF( i1-m) 410,480,410 470 CONTINUE 475 CONTINUE 480 CONTINUE 485 CONTINUE C C................... CALL TEST(16) C C****************************************************************************** C*** KERNEL 17 IMPLICIT, CONDITIONAL COMPUTATION C****************************************************************************** C C C RECURSIVE-DOUBLING VECTOR TECHNIQUES CAN NOT BE USED C BECAUSE CONDITIONAL OPERATIONS APPLY TO EACH ELEMENT. C DO 62 L= 1,Loop i= n j= 1 INK= -1 SCALE= 5./3. XNM= 1./3. E6= 1.03/3.07 GO TO 61 C STEP MODEL 60 E6= XNM*VSP(i)+VSTP(i) VXNE(i)= E6 XNM= E6 VE3(i)= E6 i= i+INK IF( i.EQ.j) GO TO 62 61 E3= XNM*VLR(i) +VLIN(i) XNEI= VXNE(i) VXND(i)= E6 XNC= SCALE*E3 C SELECT MODEL IF( XNM .GT.XNC) GO TO 60 IF( XNEI.GT.XNC) GO TO 60 C LINEAR MODEL VE3(i)= E3 E6= E3+E3-XNM VXNE(i)= E3+E3-XNEI XNM= E6 i= i+INK IF( i.NE.j) GO TO 61 62 CONTINUE C C................... CALL TEST(17) C C****************************************************************************** C*** KERNEL 18 2-D EXPLICIT HYDRODYNAMICS FRAGMENT C****************************************************************************** C DO 75 L= 1,Loop T= 0.0037 S= 0.0041 KN= 6 JN= n DO 70 k= 2,KN DO 70 j= 2,JN ZA(j,k)= (ZP(j-1,k+1)+ZQ(j-1,k+1)-ZP(j-1,k)-ZQ(j-1,k)) . *(ZR(j,k)+ZR(j-1,k))/(ZM(j-1,k)+ZM(j-1,k+1)) ZB(j,k)= (ZP(j-1,k)+ZQ(j-1,k)-ZP(j,k)-ZQ(j,k)) . *(ZR(j,k)+ZR(j,k-1))/(ZM(j,k)+ZM(j-1,k)) 70 CONTINUE C DO 72 k= 2,KN DO 72 j= 2,JN ZU(j,k)= ZU(j,k)+S*(ZA(j,k)*(ZZ(j,k)-ZZ(j+1,k)) . -ZA(j-1,k) *(ZZ(j,k)-ZZ(j-1,k)) . -ZB(j,k) *(ZZ(j,k)-ZZ(j,k-1)) . +ZB(j,k+1) *(ZZ(j,k)-ZZ(j,k+1))) ZV(j,k)= ZV(j,k)+S*(ZA(j,k)*(ZR(j,k)-ZR(j+1,k)) . -ZA(j-1,k) *(ZR(j,k)-ZR(j-1,k)) . -ZB(j,k) *(ZR(j,k)-ZR(j,k-1)) . +ZB(j,k+1) *(ZR(j,k)-ZR(j,k+1))) 72 CONTINUE C DO 75 k= 2,KN DO 75 j= 2,JN ZR(j,k)= ZR(j,k)+T*ZU(j,k) ZZ(j,k)= ZZ(j,k)+T*ZV(j,k) 75 CONTINUE C C................... CALL TEST(18) C C****************************************************************************** C*** KERNEL 19 GENERAL LINEAR RECURRENCE EQUATIONS C****************************************************************************** C C IF( JR.GT.1 ) GO TO 192 C KB5I= 0 DO 194 L= 1,Loop DO 191 k= 1,n B5(k+KB5I)= SA(k) +STB5*SB(k) STB5= B5(k+KB5I) -STB5 191 CONTINUE C GO TO 194 C 192 DO 193 i= 1,n k= n-i+1 B5(k+KB5I)= SA(k) +STB5*SB(k) STB5= B5(k+KB5I) -STB5 193 CONTINUE 194 CONTINUE C C................... CALL TEST(19) C C****************************************************************************** C*** KERNEL 20 DISCRETE ORDINATES TRANSPORT: CONDITIONAL RECURRENCE ON XX. C****************************************************************************** C DO 20 L= 1,Loop DO 20 k= 1,n DI= Y(k)-G(k)/( XX(k)+DK) DN= 0.2 IF( DI.NE.0.0) DN= AMAX1( 0.1,AMIN1( Z(k)/DI, 0.2)) X(k)= ((W(k)+V(k)*DN)* XX(k)+U(k))/(VX(k)+V(k)*DN) XX(k+1)= (X(k)- XX(k))*DN+ XX(k) 20 CONTINUE C C................... CALL TEST(20) C C****************************************************************************** C*** KERNEL 21 MATRIX*MATRIX PRODUCT C****************************************************************************** C DO 21 L= 1,Loop DO 21 i= 1,n DO 21 k= 1,n DO 21 j= 1,n PX(i,j)= PX(i,j) +VY(i,k) * CX(k,j) 21 CONTINUE C C................... CALL TEST(21) C C C C C C C C****************************************************************************** C*** KERNEL 22 PLANCKIAN DISTRIBUTION C****************************************************************************** C C C EXPMAX= 234.5 EXPMAX= 20.0 U(n)= 0.99*EXPMAX*V(n) DO 22 L= 1,Loop DO 22 k= 1,n CARE IF( U(k) .LT. EXPMAX*V(k)) THEN Y(k)= U(k)/V(k) CARE ELSE CARE Y(k)= EXPMAX CARE ENDIF W(k)= X(k)/( EXP( Y(k)) -1.0) 22 CONTINUE C................... CALL TEST(22) C C****************************************************************************** C*** KERNEL 23 2-D IMPLICIT HYDRODYNAMICS FRAGMENT C****************************************************************************** C DO 23 L= 1,Loop DO 23 j= 2,6 DO 23 k= 2,n QA= ZA(k,j+1)*ZR(k,j) +ZA(k,j-1)*ZB(k,j) + . ZA(k+1,j)*ZU(k,j) +ZA(k-1,j)*ZV(k,j) +ZZ(k,j) 23 ZA(k,j)= ZA(k,j) +.175*(QA -ZA(k,j)) C C................... CALL TEST(23) C C****************************************************************************** C*** KERNEL 24 FIND LOCATION OF FIRST MINIMUM IN ARRAY C****************************************************************************** C C X( n/2)= -1.0E+50 X( n/2)= -1.0E+10 DO 24 L= 1,Loop m= 1 DO 24 k= 2,n IF( X(k).LT.X(m)) m= k 24 CONTINUE C C m= imin1( n,x,1) 35 nanosec./element STACKLIBE/CRAY C................... CALL TEST(24) C C****************************************************************************** C CALL RESULT( FLOPS,TR,RATES,LSPAN,WG,OSUM, TOCK) C RETURN END C C*********************************************** FUNCTION LIMITS( ilbi, i, iubi, tag, iou, j, k ) C*********************************************** C C LIMITS tests i between ilbi, iubi C C*********************************************** IMPLICIT REAL*4(A-H,O-Z) C LIMITS= 1 IF( (ilbi .LE. i) .AND. (i .LE. iubi)) RETURN C mini= MIN0( mini,i) maxi= MAX0( maxi,i) WRITE( iou,101) tag,j,k, ilbi,i,iubi 101 FORMAT(1X,A8,2I6,6X,I9,4H.LE.,I9,4H.LE.,I9) LIMITS= 0 RETURN END C*********************************************** C %Z%%M% %I% %E% SMI SUBROUTINE REPORT( LOGIO, NK, FLOPS,TR,RATES,LSPAN,WG,OSUM ) C IMPLICIT REAL*4(A-H,O-Z) REAL LOOPS, NROPS c For s-1 proj compiler use F77 strings: CHARACTER Komput*32, Kompil*32, TAG(5)*8 C COMMON /SPACE0/ TIME(47), CSUM(47), WW(47), WT(47), TICKS, 1 SKALE(47), BIAS(47), WS(95), LOOPS(47), NROPS(47) C COMMON /SPACEI/ ISPAN(47), ISP2(47), ISP3(47), 1 IPAS1(47), IPAS2(47), IPAS3(47) C DIMENSION FLOPS(137), TR(137), RATES(137) DIMENSION LSPAN(137), WG(137), OSUM (137) c DIMENSION TAG(5) DIMENSION HM(12), FR(12), N1(10), N2(10), IQ(10), SUMW(10) DIMENSION LQ(5), STAT1(12), STAT2(12) DIMENSION IN(137), CSUM1(137) DIMENSION MAP1(137), MAP2(137), IN2(137), VL1(137) DIMENSION MAP(137), VL(137), WL(137), TV(137), TV1(137), TV2(137) DIMENSION FLOPS1(137), RT1(137), ISPAN1(137), WT1(137) DIMENSION FLOPS2(137), RT2(137), ISPAN2(137), WT2(137) C DATA kern /24/ C DATA FR( 1), FR( 2), FR( 3), FR( 4), FR( 5), FR( 6), . FR( 7), FR( 8), FR( 9) ./ 0.0, 0.2, 0.4, 0.6, 0.7, 0.8, 0.9, 0.95, 1.0/ C DATA SUMW( 1), SUMW( 2), SUMW( 3), SUMW( 4), SUMW( 5), SUMW( 6), . SUMW( 7) ./1.0, 0.95, 0.9, 0.8, 0.7, 0.6, 0.5/ C DATA IQ( 1), IQ( 2), IQ( 3), IQ( 4), IQ( 5), IQ( 6), . IQ( 7) ./1, 2, 1, 2, 1, 2, 1/ C c DATA TAG( 1), TAG( 2), TAG( 3), TAG( 4) c ./8H1st QT: , 8H2nd QT: , 8H3rd QT: , 8H4th QT: / c For s-1 proj compiler use F77 string: data tag(1) /'1st QT:'/ data tag(2) /'2nd QT:'/ data tag(3) /'3rd QT:'/ data tag(4) /'4th QT:'/ C c DATA Komput/ 8HCRAY-XMP/ c DATA Komput/ 8HCRAY-1 / c DATA Komput/ 8HVAX750 / c For s-1 proj compiler use F77 string: c data komput /'Sun 2/50 REAL*4'/ c data komput /'Apollo DN560 REAL*4'/ data komput /'Alliant FX/1 REAL*4'/ C c DATA Kompil/ 8HCFT-1.12/ c DATA Kompil/ 8HCIVIC30i/ c DATA Kompil/ 8H4.2BSD / c For s-1 proj compiler use F77 string: c data kompil / 'Sun f77 Rel 2.0' / c data kompil / 'Apollo FTN 8.40 (AEGIS SR9.2.3)' / data kompil / 'Alliant Concentrix Fortran rev. 1.0' / C MODI(i,M)= (MOD(i,M) + M*( 1 - MIN0(1,MOD(i,M)))) C IF( logio.LT.0) RETURN C fuzz= 1.0e-9 DO 1000 k= 1,NK VL(k)= LSPAN(k) 1000 CONTINUE C CALL VALID( TV,MAP,neff, 1.0e-5,RATES, 1.0e+4,NK) C C Compress valid data sets mapping on MAP. C DO 1 k= 1,neff MAP1(k)= MODI( MAP(k),kern) FLOPS1(k)= FLOPS( MAP(k)) RT1(k)= TR( MAP(k)) VL1(k)= VL( MAP(k)) ISPAN1(k)= LSPAN( MAP(k)) WT1(k)= WG( MAP(k)) TV1(k)= RATES( MAP(k)) CSUM1(k)= OSUM( MAP(k)) 1 continue C CALL STATW( STAT1,TV,IN, VL1,WT1,neff) LV= STAT1(1) C CALL STATW( STAT1,TV,IN, TV1,WT1,neff) twt= STAT1(6) C if (0 .eq. 1) then WRITE ( logio,7001) WRITE ( logio,7001) WRITE ( logio,7001) WRITE ( logio,7001) WRITE ( logio,7001) WRITE ( logio,7001) WRITE ( logio,7000) WRITE ( logio,7002) endif WRITE ( logio,7003) WRITE ( logio,7002) WRITE ( logio,7007) Komput WRITE ( logio,7008) Kompil WRITE ( logio,7009) LV if (0 .eq. 1) then WRITE ( logio,7061) WRITE ( logio,7062) WRITE ( logio,7063) WRITE ( logio,7064) WRITE ( logio,7065) WRITE ( logio,7066) WRITE ( logio,7067) WRITE ( logio,7001) WRITE ( logio,7004) WRITE ( logio,7005) WRITE ( logio,7011) (MAP1(k), FLOPS1(k), RT1(k), 1000*TV1(k), . ISPAN1(k), WT1(k), CSUM1(k), k=1,neff) WRITE ( logio,7005) endif C WRITE ( logio,7022) 1000*STAT1(3), 1000*STAT1(4) WRITE ( logio,7055) 1000*STAT1(5) WRITE ( logio,7030) 1000*STAT1(7) WRITE ( logio,7031) 1000*STAT1(9) WRITE ( logio,7033) 1000*STAT1(1) WRITE ( logio,7044) 1000*STAT1(2) C 7000 FORMAT(1H1) 7001 FORMAT(/) 7002 FORMAT( 45H ******************************************** ) 7003 FORMAT( 45H THE LIVERMORE F O R T R A N K E R N E L S ) 7004 FORMAT(/,53H KERNEL FLOPS MICROSEC KFLOP/SEC SPAN WEIGHT SUM) 7005 FORMAT( 53H ------ ----- -------- --------- ---- ------ ---) 7007 FORMAT(/,9X,16H Computer : ,A32,/) 7008 FORMAT( 9X,16H Compiler : ,A32,/) 7009 FORMAT( 9X,16HMean Vector L = ,I5,/) 7011 FORMAT(1X,i2,E11.4,E11.4,F12.1,1X,I4,1X,F6.2,E20.12) 7012 FORMAT(1X,i2,E11.4,E11.4,F12.1,1X,I4,1X,F6.2) 7022 FORMAT(/,9X,16HKFLOPS RANGE = ,F12.1,4H TO,F12.1, . 16H Kilo-Flops/Sec. ) 7055 FORMAT(/,9X,16HHARMONIC MEAN = ,F12.1,16H Kilo-Flops/Sec. ) 7030 FORMAT(/,9X,16HMEDIAN RATE = ,F12.1,16H Kilo-Flops/Sec. ) 7031 FORMAT(/,9X,16HMEDIAN DEV. = ,F12.1,16H Kilo-Flops/Sec. ) 7033 FORMAT(/,9X,16HAVERAGE RATE = ,F12.1,16H Kilo-Flops/Sec. ) 7044 FORMAT(/,9X,16HSTANDARD DEV. = ,F12.1,16H Kilo-Flops/Sec. ) 7053 FORMAT(/,9X,16HFRAC. WEIGHTS = ,F12.4) C 7061 FORMAT(9X,52HWhen the computer performance range is very large ) 7062 FORMAT(9X,52Hthe net Mflops rate of many Fortran programs and ) 7063 FORMAT(9X,52Hworkloads will be in the sub-range between the equi-) 7064 FORMAT(9X,52Hweighted harmonic and arithmetic means depending ) 7065 FORMAT(9X,52Hon the degree of code parallelism and optimization. ) 7066 FORMAT(9X,52HMore accurate estimates of cpu workload rates depend) 7067 FORMAT(9X,52Hon assigning appropriate weights for each kernel. ) C WRITE ( logio,7000) WRITE ( logio,7070) 7070 FORMAT(//,50H TOP QUARTILE: BEST ARCHITECTURE/APPLICATION MATCH ) C C Compute compression index-list MAP1: Non-zero weights. C CALL VALID( TV,MAP1,meff, 1.0e-6, WT1, 1.0e+6, neff) C C Re-order data sets mapping on IN (descending order of MFlops). C DO 2 k= 1,meff i= IN( MAP1(k)) FLOPS2(k)= FLOPS1(i) RT2(k)= RT1(i) ISPAN2(k)= ISPAN1(i) WT2(k)= WT1(i) TV2(k)= TV1(i) MAP2(k)= MODI( MAP(i),kern) 2 continue C nq= meff/4 lr= meff -4*nq LQ(1)= nq LQ(2)= nq + nq + lr LQ(3)= nq i2= 0 C DO 5 j= 1,3 i1= i2 + 1 i2= i2 + LQ(j) ll= i2 - i1 + 1 CALL STATW( STAT2,TV,IN2, TV2(i1),WT2(i1),ll) frac= STAT2(6)/( twt +fuzz) WL(j)= STAT2(5) C WRITE ( logio,7001) WRITE ( logio,7004) WRITE ( logio,7005) WRITE ( logio,7012) ( MAP2(k),FLOPS2(k),RT2(k),1000*TV2(k), . ISPAN2(k), WT2(k), k=i1,i2 ) WRITE ( logio,7005) C WRITE ( logio,7053) frac WRITE ( logio,7055) 1000*STAT2(5) WRITE ( logio,7033) 1000*STAT2(1) WRITE ( logio,7044) 1000*STAT2(2) 5 continue smf= WL(1) C return WRITE ( logio,7000) WRITE ( logio,7001) C 7301 FORMAT(9X,31H SENSITIVITY ANALYSIS ) 7302 FORMAT(9X,52HThe sensitivity of the harmonic mean rate (Mflops) ) 7303 FORMAT(9X,52Hto various weightings is shown in the table below. ) 7304 FORMAT(9X,52HSeven work distributions are generated by assigning ) 7305 FORMAT(9X,52Htwo distinct weights to ranked kernels by quartiles.) 7306 FORMAT(9X,52HForty nine possible cpu workloads are then evaluated) 7307 FORMAT(9X,52Husing seven sets of values for the total weights: ) 7341 FORMAT(3X,A7,6X,43HO O O O O X X) 7342 FORMAT(3X,A7,6X,43HO O O X X X O) 7343 FORMAT(3X,A7,6X,43HO X X X O O O) 7344 FORMAT(3X,A7,6X,43HX X O O O O O) 7346 FORMAT(13X, 48H------ ------ ------ ------ ------ ------ ------) 7348 FORMAT(3X,5HTotal,/,3X,7HWeights,20X,11HNet Mflops:,/,4X,6HX O) 7349 FORMAT(2X,9H---- ---- ) 7220 FORMAT(/,1X,2F5.2,1X,7F7.2) C WRITE ( logio,7001) WRITE ( logio,7001) WRITE ( logio,7301) WRITE ( logio,7001) WRITE ( logio,7302) WRITE ( logio,7303) WRITE ( logio,7304) WRITE ( logio,7305) WRITE ( logio,7306) WRITE ( logio,7307) WRITE ( logio,7001) WRITE ( logio,7346) WRITE ( logio,7341) TAG(1) WRITE ( logio,7342) TAG(2) WRITE ( logio,7343) TAG(3) WRITE ( logio,7344) TAG(4) WRITE ( logio,7346) WRITE ( logio,7348) WRITE ( logio,7349) C r= meff mq= (meff+3)/4 q= mq j= 1 DO 21 i= 8,2,-2 N1(i )= j N1(i+1)= j N2(i )= j + mq + mq - 1 N2(i+1)= j + mq - 1 j= j + mq 21 continue C DO 29 j= 1,7 sumo= 1.0 - SUMW(j) DO 27 i= 1,7 p= IQ(i)*Q xt= SUMW(j)/(p + fuzz) ot= sumo /(r - p + fuzz) DO 23 k= 1,meff WL(k)= ot 23 continue k1= N1(i+2) k2= N2(i+2) DO 25 k= k1,k2 WL(k)= xt 25 continue CALL STATW( STAT2,TV,IN2, TV2,WL,meff) RT1(i)= STAT2(5) 27 continue WRITE ( logio,7220) SUMW(j), sumo, ( RT1(k), k=1,7) 29 continue C WRITE ( logio,7349) WRITE ( logio,7346) C 7101 FORMAT(4x,52HNET MFLOP RATES OF PARTIALLY OPTIMIZED FORTRAN CODE:) 7102 FORMAT(/,1X,5F7.2,4F8.2) 7103 FORMAT(3x,52H Fraction Of Operations Run At Optimal Machine Rates) C CALL STATW( STAT2,TV,IN2, TV2, WT2, meff) med= meff + 1 - INT(STAT2(8)) lh= meff - med C CALL STATW( STAT2,TV,IN2, TV2(med),WT2(med),lh) C HM(1)= STAT2(5) g= 1.0 - HM(1)/( smf + fuzz) C DO 7 k= 2,9 HM(k)= HM(1)/( 1.0 - FR(k)*g + fuzz) 7 continue C WRITE ( logio,7001) WRITE ( logio,7001) WRITE ( logio,7001) WRITE ( logio,7101) WRITE ( logio,7102) ( HM(k), k= 1,9) WRITE ( logio,7102) ( FR(k), k= 1,9) WRITE ( logio,7103) WRITE ( logio,7001) RETURN END C********************************************** SUBROUTINE RESULT( FLOPS,TR,RATES,LSPAN,WG,OSUM, TOCK) C*********************************************************************** C * C RESULT - Computes timing Results into pushdown store. * C * C FLOPS - Array: Number of Flops executed by each kernel * C TR - Array: Time of execution of each kernel(microsecs) * C RATES - Array: Rate of execution of each kernel(megaflops/sec)* C LSPAN - Array: Span of inner DO loop in each kernel * C WG - Array: Weight assigned to each kernel for statistics * C OSUM - Array: Checksums of the results of each kernel * C TOCK - Timing overhead per kernel(sec). Clock resolution. * C * C*********************************************************************** C C IMPLICIT REAL*4(A-H,O-Z) DIMENSION FLOPS(137), TR(137), RATES(137) DIMENSION LSPAN(137), WG(137), OSUM (137) DIMENSION WTL(5) C REAL LOOPS, NROPS C COMMON /SPACE0/ TIME(47), CSUM(47), WW(47), WT(47), TICKS, 1 SKALE(47), BIAS(47), WS(95), LOOPS(47), NROPS(47) C COMMON /SPACEI/ ISPAN(47), ISP2(47), ISP3(47), 1 IPAS1(47), IPAS2(47), IPAS3(47) C DATA kern /24/ DATA kl/0/, limits/137/ DATA WTL(1),WTL(2),WTL(3) /1.0,2.0,1.0/ C C Push Result Arrays Down before entering new resul limit= limits - kern j = limits DO 1001 k = limit,1,-1 FLOPS(j)= FLOPS(k) TR(j)= TR(k) RATES(j)= RATES(k) LSPAN(j)= LSPAN(k) WG(j)= WG(k) OSUM(j)= OSUM(k) j = j - 1 1001 CONTINUE C C CALCULATE MFLOPS FOR EACH KERNEL tmin= 5.0*TOCK kl= kl+1 DO 1010 k = 1,kern FLOPS(k)= NROPS(k)*LOOPS(k) TR(k)= (TIME(k) - TOCK)* 1.0e+6 RATES(k)= 0.0 IF( TR(k).NE. 0.0) RATES(k)= FLOPS(k)/TR(k) IF( WT(k).LE. 0.0) RATES(k)= 0.0 IF( TIME(k).LE.tmin) RATES(k)= 0.0 LSPAN(k)= ISPAN(k) WG(k)= WT(k)*WTL(kl) OSUM(k)= CSUM(k) 1010 CONTINUE C RETURN END C C c For s-1 proj compiler we supply an externally compiled function: c Not for Sun C********************************************** function SECOND( OLDSEC) C*********************************************** C REAL SECOND C C SECOND= Cumulative CPU time for job in seconds. MKS unit is seconds. C Clock resolution should be less than 2% of Kernel 12 run-time. C C C If your system provides a timing routine that satisfies C the definition above; then simply delete this function. C C Else this function must be programmed using some C timing routine available in your system. C Timing routines with CPU clock resolution are always sufficient. C Timing routines with microsec. resolution are usually sufficient. C C Timing routines with much less resolution may require the use C of multiple-pass loops around each kernel to make the run time C at least 50 times the tick-period of the timing routine. C In these cases, you must redefine the DATA statements for IPAS1 C to increase the value of Loop set in subroutine SIZES. C C If no CPU timer is available, then you can time each kernel by C the wall clock using the PAUSE statement at the end of subr. VALUES. C C Function TICK measures the overhead time for a call to SECOND. C An independent calibration of the running time of at least C one kernel may be wise. See Subr. TEST, label 100+. C C C The following statement is deliberately incomplete: C c SECOND= C C C The following statement was used on the DEC VAX/780 VMS 3.0 C Also set: Loop= 10*Loop in Subroutine SIZES to run kernels long enough. C C REAL SECNDS C* SECOND= SECNDS( OLDSEC) C C OR: C C$ DATA INITIA /237/ C$ IF( INITIA.NE.237 ) GO TO 1 C$ NSTAT= LIB$INIT_TIMER C$ 1 INITIA= INITIA + 1 C$ C$ NSTAT = LIB$STAT_TIMER(2,ISEC) C$ SECOND= FLOAT(ISEC)*0.01 - OLDSEC C C C The following statements were used on the DEC PDP-11/23 RT-11 system. C Also set: Loop=100*Loop in Subroutine SIZES to run kernels long enough. C C* DIMENSION JT(2) C* CALL GTIM(JT) C* TIME1 = JT(1) C* TIME2 = JT(2) C* TIME = TIME1 * 65768. + TIME2 C* SECOND=TIME/60. - OLDSEC C C C The following statements were used on the Hewlett-Packard HP 9000 C Also set: Loop= 10*Loop in Subroutine SIZES to run kernels long enough. C C* INTEGER*4 ITIME(4) C* CALL TIMES( ITIME(4)) C* TIMEX= ITIME(1) + ITIME(2) + ITIME(3) + ITIME(4) C* SECOND= TIMEX/60. - OLDSEC C C C The following statements were used on the SUN workstation C Also set: Loop= 10*Loop in Subroutine SIZES to run kernels long enough. C C DIMENSION XTIME(2) C REAL*4 XTIME C XT= ETIME( XTIME) C SECOND= (XTIME(1) + XTIME(2)) - OLDSEC C C The following statements were used on the Apollo workstation C Also set: Loop= 10*Loop in Subroutine SIZES to run kernels long enough. C C INTEGER*2 CLOCK(3) C DOUBLE PRECISION DSECS C C CALL PROC1_$GET_CPUT (CLOCK) C CALL CAL_$FLOAT_CLOCK (CLOCK,DSECS) C SECOND = DSECS-OLDSEC C C The following statements were used on the Alliant FX/1 C Also set: Loop= 10*Loop in Subroutine SIZES to run kernels long enough. C REAL STIME,UTIME CALL ETIME(UTIME,STIME) SECOND=UTIME-OLDSEC RETURN END C C*********************************************** SUBROUTINE SIGNAL( V, SCALE,BIAS, n) C*********************************************** C C SIGNAL GENERATES VERY FRIENDLY FLOATING-POINT NUMBERS NEAR 1.0 C WHEN SCALE= 1.0 AND BIAS= 0. C C*********************************************** IMPLICIT REAL*4(A-H,O-Z) C DIMENSION V(n) C SCALED= SCALE BIASED= BIAS C FUZZ= 1.2345E-9 FUZZ= 1.2345E-3 BUZZ= 1.0 +FUZZ FIZZ= 1.1 *FUZZ C DO 1 k= 1,n BUZZ= (1.0 - FUZZ)*BUZZ +FUZZ FUZZ= -FUZZ V(k)=((BUZZ- FIZZ) -BIASED)*SCALED 1 CONTINUE C RETURN END C C C*********************************************** SUBROUTINE SIZES(i) C*********************************************** IMPLICIT REAL*4(A-H,O-Z) C C C/ PARAMETER( l1= 1001, l2= 101 ) C/ PARAMETER( l13= 64, l13h= l13/2, l213= l13+l13h, l813= 8*l13 ) C/ PARAMETER( l14= 512, l16= 75, l416= 4*l16 , l21= 25 ) C C/ PARAMETER( l1= 27, l2= 15 ) C/ PARAMETER( l13= 8, l13h= 8/2, l213= 8+4, l813= 8*8 ) C/ PARAMETER( l14= 16, l16= 15, l416= 4*15 , l21= 15) C C/ PARAMETER( l1= 1001, l2= 101 ) C/ PARAMETER( l13= 64, l13h= 64/2, l213= 64+32, l813= 8*64 ) C/ PARAMETER( l14= 512, l16= 75, l416= 4*75 , l21= 25) C C C/ PARAMETER( NNI= 2*l1 +2*l213 +l416 ) C/ PARAMETER( NN1= 17*l1 +13*l2 +2*l416 ) C/ PARAMETER( NN2= 4*l813 + 3*l21*l2 +121*l2 +3*l13*l13 ) C/ PARAMETER( Nl1= 19*l1, Nl2= 131*l2 +3*l21*l2 ) C/ PARAMETER( Nl13= 3*l13*l13 +34*l13 +32) C C COMMON /SPACES/ ION,j5,k2,k3,Loop,m,KR,N13H, 1 n,N1,N2,N13,N213,N813,N16,N416,N21,NT1,NT2 C COMMON /SPACER/ A11,A12,A13,A21,A22,A23,A31,A32,A33, 2 AR,BR,C0,CR,DI,DK, 3 DM22,DM23,DM24,DM25,DM26,DM27,DM28,DN,E3,E6,EXPMAX,FLX, 4 Q,QA,R,RI,S,SCALE,SIG,STB5,T,XNC,XNEI,XNM C REAL LOOPS, NROPS COMMON /SPACE0/ TIME(47), CSUM(47), WW(47), WT(47), TICKS, 1 SKALE(47), BIAS(47), WS(95), LOOPS(47), NROPS(47) C COMMON /SPACEI/ ISPAN(47), ISP2(47), ISP3(47), 1 IPAS1(47), IPAS2(47), IPAS3(47) C C C ****************************************************************** C C i := kernel number C Loop := multiple pass control to execute kernel long enough to time. C n := DO loop control for each kernel. Controls are set in subr. SIZES C C ****************************************************************** C C Domain tests follow to detect overstoring of controls for array opns. C C IT= 1 IF( i.LT.0 .OR. i.GT. 24) GO TO 911 IF( n.LT.0 .OR. n.GT. 1001) GO TO 911 IF(Loop.LT.0 .OR. Loop.GT.60000) GO TO 911 C IT= 2 IF( KR.EQ.2 ) ISPAN(i+1)= ISP2(i+1) IF( KR.EQ.3 ) ISPAN(i+1)= ISP3(i+1) n = ISPAN(i+1) C Loop= IPAS1(i+1) IF( KR.EQ.2 ) Loop= 2*IPAS2(i+1) IF( KR.EQ.3 ) Loop= 8*IPAS3(i+1) C Loop= 100*Loop Loop= 10*Loop C IF( n.LT.0 .OR. n.GT. 1001) GO TO 911 IF(Loop.LT.0 .OR. Loop.GT.60000) GO TO 911 c WRITE( ION,915) Loop c 915 FORMAT(1H1,///,37H **** Number loops per iteration = ,I8) 915 FORMAT(37H **** Number loops per iteration = ,I8) N1 = 1001 N2 = 101 N13 = 64 N13H= 32 N213= 96 N813= 512 N16 = 75 N416= 300 N21 = 25 C NT1= 17*1001 +13*101 +2*300 NT2= 4*512 + 3*25*101 +121*101 +3*64*64 C RETURN C C 911 WRITE( ION,913) i, n, Loop 913 FORMAT(1H1,///,37H FATAL OVERSTORE/ DATA LOSS. TEST= ,4I8) STOP C END C*********************************************** SUBROUTINE SORDID( i,W, V,n,KIND) C*********************************************** C QUICK AND DIRTY PORTABLE SORT. C C i - RESULT INDEX-LIST. MAPS V TO SORTED W. C W - RESULT ARRAY, SORTED V. C C V - INPUT ARRAY SORTED IN PLACE. C n - INPUT NUMBER OF ELEMENTS IN V C KIND - SORT ORDER: = 1 ASCENDING MAGNITUDE C = 2 DESCENDING MAGNITUDE C C*********************************************** IMPLICIT REAL*4(A-H,O-Z) C DIMENSION i(n), W(n), V(n) C DO 1 k= 1,n W(k)= V(k) 1 i(k)= k C IF( KIND.EQ.2) GO TO 4 C DO 3 j= 1,n-1 m= j DO 2 k= j+1,n IF( W(k).LT.W(m)) m= k 2 CONTINUE X= W(j) k= i(j) W(j)= W(m) i(j)= i(m) W(m)= X i(m)= k 3 CONTINUE RETURN C C 4 DO 6 j= 1,n-1 m= j DO 5 k= j+1,n IF( W(k).GT.W(m)) m= k 5 CONTINUE X= W(j) k= i(j) W(j)= W(m) i(j)= i(m) W(m)= X i(m)= k 6 CONTINUE RETURN END C C*********************************************** SUBROUTINE SPACE C*********************************************** C C Subroutine Space dynamically allocates physical memory space C for the array variables in KERNEL by setting pointer values. C The POINTER declaration has been defined in the IBM PL1 language C and defined as a Fortran extension in Livermore and CRAY compilers. C C In general, large FORTRAN simulation programs use a memory C manager to dynamically allocate arrays to conserve high speed C physical memory and thus avoid slow disk references (page faults). C C It is sufficient for our purposes to trivially set the values C of pointers to the location of static arrays used in common. C The efficiency of pointered (indirect) computation should be measured C if available. C C*********************************************** IMPLICIT REAL*4(A-H,O-Z) C C C/ PARAMETER( l1= 1001, l2= 101 ) C/ PARAMETER( l13= 64, l13h= 64/2, l213= 64+32, l813= 8*64 ) C/ PARAMETER( l14= 512, l16= 75, l416= 4*75 , l21= 25) C INTEGER E,F,ZONE C COMMON /ISPACE/ E(96), F(96), 1 IX(1001), IR(1001), ZONE(300) C COMMON /SPACE1/ U(1001), V(1001), W(1001), 1 X(1001), Y(1001), Z(1001), G(1001), 2 DU1(101), DU2(101), DU3(101), GRD(1001), DEX(1001), 3 XI(1001), EX(1001), EX1(1001), DEX1(1001), 4 VX(1001), XX(1001), RX(1001), RH(1001), 5 VSP(101), VSTP(101), VXNE(101), VXND(101), 6 VE3(101), VLR(101), VLIN(101), B5(101), 7 PLAN(300), D(300), SA(101), SB(101) C COMMON /SPACE2/ P(4,512), PX(25,101), CX(25,101), 1 VY(101,25), VH(101,7), VF(101,7), VG(101,7), VS(101,7), 2 ZA(101,7) , ZP(101,7), ZQ(101,7), ZR(101,7), ZM(101,7), 3 ZB(101,7) , ZU(101,7), ZV(101,7), ZZ(101,7), 4 B(64,64), C(64,64), H(64,64), 5 U1(5,101,2), U2(5,101,2), U3(5,101,2) C C ****************************************************************** C C// COMMON /POINT/ ME,MF,MU,MV,MW,MX,MY,MZ,MG,MDU1,MDU2,MDU3,MGRD, C// 1 MDEX,MIX,MXI,MEX,MEX1,MDEX1,MVX,MXX,MIR,MRX,MRH,MVSP,MVSTP, C// 2 MVXNE,MVXND,MVE3,MVLR,MVLIN,MB5,MPLAN,MZONE,MD,MSA,MSB, C// 3 MP,MPX,MCX,MVY,MVH,MVF,MVG,MVS,MZA,MZP,MZQ,MZR,MZM,MZB,MZU, C// 4 MZV,MZZ,MB,MC,MH,MU1,MU2,MU3 C//C C//CLLL. LOC(X) =.LOC.X C//C C// ME = LOC( E ) C// MF = LOC( F ) C// MU = LOC( U ) C// MV = LOC( V ) C// MW = LOC( W ) C// MX = LOC( X ) C// MY = LOC( Y ) C// MZ = LOC( Z ) C// MG = LOC( G ) C// MDU1 = LOC( DU1 ) C// MDU2 = LOC( DU2 ) C// MDU3 = LOC( DU3 ) C// MGRD = LOC( GRD ) C// MDEX = LOC( DEX ) C// MIX = LOC( IX ) C// MXI = LOC( XI ) C// MEX = LOC( EX ) C// MEX1 = LOC( EX1 ) C// MDEX1 = LOC( DEX1 ) C// MVX = LOC( VX ) C// MXX = LOC( XX ) C// MIR = LOC( IR ) C// MRX = LOC( RX ) C// MRH = LOC( RH ) C// MVSP = LOC( VSP ) C// MVSTP = LOC( VSTP ) C// MVXNE = LOC( VXNE ) C// MVXND = LOC( VXND ) C// MVE3 = LOC( VE3 ) C// MVLR = LOC( VLR ) C// MVLIN = LOC( VLIN ) C// MB5 = LOC( B5 ) C// MPLAN = LOC( PLAN ) C// MZONE = LOC( ZONE ) C// MD = LOC( D ) C// MSA = LOC( SA ) C// MSB = LOC( SB ) C// MP = LOC( P ) C// MPX = LOC( PX ) C// MCX = LOC( CX ) C// MVY = LOC( VY ) C// MVH = LOC( VH ) C// MVF = LOC( VF ) C// MVG = LOC( VG ) C// MVS = LOC( VS ) C// MZA = LOC( ZA ) C// MZP = LOC( ZP ) C// MZQ = LOC( ZQ ) C// MZR = LOC( ZR ) C// MZM = LOC( ZM ) C// MZB = LOC( ZB ) C// MZU = LOC( ZU ) C// MZV = LOC( ZV ) C// MZZ = LOC( ZZ ) C// MB = LOC( B ) C// MC = LOC( C ) C// MH = LOC( H ) C// MU1 = LOC( U1 ) C// MU2 = LOC( U2 ) C// MU3 = LOC( U3 ) C RETURN END C*********************************************** SUBROUTINE STATS( RESULT, X,n) C*********************************************** C C UNWEIGHTED STATISTICS: MEAN, STADEV, MIN, MAX, HARMONIC MEAN. C C RESULT(1)= THE MEAN OF X. C RESULT(2)= THE STANDARD DEVIATION OF THE MEAN OF X. C RESULT(3)= THE MINIMUM OF X. C RESULT(4)= THE MAXIMUM OF X. C RESULT(5)= THE HARMONIC MEAN C X IS THE ARRAY OF INPUT VALUES. C n IS THE NUMBER OF INPUT VALUES IN X. C C*********************************************** IMPLICIT REAL*4(A-H,O-Z) C DIMENSION X(n), RESULT(9) cLLL. OPTIMIZE LEVEL G C C DO 10 k= 1,9 10 RESULT(k)= 0.0 C IF(n.LE.0) RETURN C CALCULATE MEAN OF X. S= 0.0 DO 1 k= 1,n 1 S= S + X(k) A= S/n RESULT(1)= A C CALCULATE STANDARD DEVIATION OF X. D= 0.0 DO 2 k= 1,n 2 D= D + (X(k)-A)**2 D= D/n RESULT(2)= SQRT(D) C CALCULATE MINIMUM OF X. U= X(1) DO 3 k= 2,n 3 U= AMIN1(U,X(k)) RESULT(3)= U C CALCULATE MAXIMUM OF X. V= X(1) DO 4 k= 2,n 4 V= AMAX1(V,X(k)) RESULT(4)= V C CALCULATE HARMONIC MEAN OF X. H= 0.0 DO 5 k= 1,n IF( X(k).NE.0.0) H= H + 1.0/X(k) 5 CONTINUE IF( H.NE.0.0) H= FLOAT(n)/H RESULT(5)= H C RETURN END C*********************************************** SUBROUTINE STATW( RESULT,OX,IX, X,W,n) C*********************************************** C C WEIGHTED STATISTICS: MEAN, STADEV, MIN, MAX, HARMONIC MEAN, MEDIAN. C C RESULT(1)= THE MEAN OF X. C RESULT(2)= THE STANDARD DEVIATION OF THE MEAN OF X. C RESULT(3)= THE MINIMUM OF X. C RESULT(4)= THE MAXIMUM OF X. C RESULT(5)= THE HARMONIC MEAN C RESULT(6)= THE TOTAL WEIGHT. C RESULT(7)= THE MEDIAN. C RESULT(8)= THE MEDIAN INDEX, ASENDING. C RESULT(9)= THE ROBUST MEDIAN ABSOLUTE DEVIATION. C OX IS THE ARRAY OF RESULT ORDERED (DECENDING) Xs. C IX IS THE ARRAY OF RESULT INDEX LIST MAPS X TO OX. C C X IS THE ARRAY OF INPUT VALUES. C W IS THE ARRAY OF INPUT WEIGHTS. C n IS THE NUMBER OF INPUT VALUES IN X. C C*********************************************** IMPLICIT REAL*4(A-H,O-Z) C DIMENSION RESULT(9), OX(n), IX(n), X(n), W(n) cLLL. OPTIMIZE LEVEL G C C DO 10 k= 1,9 10 RESULT(k)= 0.0 C IF(n.LE.0) RETURN C CALCULATE MEAN OF X. A= 0.0 S= 0.0 T= 0.0 DO 1 k= 1,n S= S + W(k)*X(k) 1 T= T + W(k) IF( T.NE.0.0) A= S/T RESULT(1)= A C CALCULATE STANDARD DEVIATION OF X. D= 0.0 U= 0.0 DO 2 k= 1,n 2 D= D + W(k)*(X(k)-A)**2 IF( T.NE.0.0) D= D/T IF( D.GE.0.0) U= SQRT(D) RESULT(2)= U C CALCULATE MINIMUM OF X. U= X(1) DO 3 k= 2,n 3 U= AMIN1(U,X(k)) RESULT(3)= U C CALCULATE MAXIMUM OF X. V= X(1) DO 4 k= 2,n 4 V= AMAX1(V,X(k)) RESULT(4)= V C CALCULATE HARMONIC MEAN OF X. H= 0.0 DO 5 k= 1,n IF( X(k).NE.0.0) H= H + W(k)/X(k) 5 CONTINUE IF( H.NE.0.0) H= T/H RESULT(5)= H RESULT(6)= T C CALCULATE WEIGHTED MEDIAN CALL SORDID( IX, OX, X, n, 1) C R= 0.0 DO 70 k= 1,n R= R + W( IX(k)) IF( R.GT. 0.5*T ) GO TO 7 70 CONTINUE k= n 7 RESULT(7)= OX(k) RESULT(8)= FLOAT(k) C CALCULATE ROBUST MEDIAN ABSOLUTE DEVIATION (MAD) DO 90 k= 1,n 90 OX(k)= ABS( X(k) - RESULT(7)) C CALL SORDID( IX, OX, OX, n, 1) C R= 0.0 DO 91 k= 1,n R= R + W( IX(k)) IF( R.GT. 0.7*T ) GO TO 9 91 CONTINUE k= n 9 RESULT(9)= OX(k) C CALCULATE DESCENDING ORDERED X. CALL SORDID( IX, OX, X, n, 2) C RETURN END C*********************************************** FUNCTION SUMO( V,n) C*********************************************** C C CHECK-SUM WITH ORDINAL DEPENDENCY. C C*********************************************** IMPLICIT REAL*4(A-H,O-Z) DIMENSION V(1) S= 0.0 C DO 1 k= 1,n 1 S= S + FLOAT(k)*V(k) SUMO= S RETURN END C*********************************************** SUBROUTINE TEST(i) C*********************************************** C C TIME, TEST, AND INITIALIZE THE EXECUTION OF KERNEL i. C C****************************************************************************** C IMPLICIT REAL*4(A-H,O-Z) C C SIZES test and set the loop controls before each kernel test C COMMON /SPACES/ ION,j5,k2,k3,Loop,m,KR,N13H, 1 n,N1,N2,N13,N213,N813,N16,N416,N21,NT1,NT2 C COMMON /SPACER/ A11,A12,A13,A21,A22,A23,A31,A32,A33, 2 AR,BR,C0,CR,DI,DK, 3 DM22,DM23,DM24,DM25,DM26,DM27,DM28,DN,E3,E6,EXPMAX,FLX, 4 Q,QA,R,RI,S,SCALE,SIG,STB5,T,XNC,XNEI,XNM C REAL LOOPS, NROPS, NN, NP INTEGER E,F,ZONE C COMMON /SPACE0/ TIME(47), CSUM(47), WW(47), WT(47), TICKS, 1 SKALE(47), BIAS(47), WS(95), LOOPS(47), NROPS(47) C COMMON /SPACEI/ ISPAN(47), ISP2(47), ISP3(47), 1 IPAS1(47), IPAS2(47), IPAS3(47) C COMMON /ISPACE/ E(96), F(96), 1 IX(1001), IR(1001), ZONE(300) C COMMON /SPACE1/ U(1001), V(1001), W(1001), 1 X(1001), Y(1001), Z(1001), G(1001), 2 DU1(101), DU2(101), DU3(101), GRD(1001), DEX(1001), 3 XI(1001), EX(1001), EX1(1001), DEX1(1001), 4 VX(1001), XX(1001), RX(1001), RH(1001), 5 VSP(101), VSTP(101), VXNE(101), VXND(101), 6 VE3(101), VLR(101), VLIN(101), B5(101), 7 PLAN(300), D(300), SA(101), SB(101) C C COMMON /SPACE2/ P(4,512), PX(25,101), CX(25,101), 1 VY(101,25), VH(101,7), VF(101,7), VG(101,7), VS(101,7), 2 ZA(101,7) , ZP(101,7), ZQ(101,7), ZR(101,7), ZM(101,7), 3 ZB(101,7) , ZU(101,7), ZV(101,7), ZZ(101,7), 4 B(64,64), C(64,64), H(64,64), 5 U1(5,101,2), U2(5,101,2), U3(5,101,2) C COMMON /SPACE3/ CACHE(8192) C REAL SECOND C DATA START /0.0/, NPF/0/ C$ DATA IPF/0/, KPF/0/ C C****************************************************************************** C t= second(0) := cumulative cpu time for task in seconds. C TEMPUS= SECOND(0.0) - START C$C 5 get number of page faults (optional) C$ KSTAT= LIB$STAT_TIMER(5,KPF) C$ NPF = KPF - IPF NN= n NP= Loop IF( i.EQ.0 ) GO TO 100 CALL SIZES(i-1) TIME(i)= TEMPUS C GO TO( 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, . 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, . 21, 22, 23, 24, 25 ), i C C C C****************************************************************************** C 1 CSUM (1) = SUMO ( X, n) LOOPS(1) = NP*NN GO TO 100 C C****************************************************************************** C 2 CSUM (2) = SUMO ( X, n) LOOPS(2) = NP*(NN-4) GO TO 100 C C****************************************************************************** C 3 CSUM (3) = Q LOOPS(3) = NP*NN GO TO 100 C C****************************************************************************** C 4 MM= (1001-7)/2 DO 400 k = 7,1001,MM 400 V(k)= X(k) CSUM (4) = SUMO ( V, 3) LOOPS(4) = NP*(((NN-5)/5)+1)*3 GO TO 100 C C****************************************************************************** C 5 CSUM (5) = SUMO ( X(2), n-1) LOOPS(5) = NP*(NN-1) GO TO 100 C C****************************************************************************** C 6 CSUM (6) = SUMO ( W, n) LOOPS(6) = NP*NN*((NN-1)/2) GO TO 100 C C****************************************************************************** C 7 CSUM (7) = SUMO ( X, n) LOOPS(7) = NP*NN GO TO 100 C C****************************************************************************** C 8 CSUM (8) = SUMO ( U1,5*n*2) + SUMO ( U2,5*n*2) + SUMO ( U3,5*n*2) LOOPS(8) = NP*(NN-1)*2 GO TO 100 C C****************************************************************************** C 9 CSUM (9) = SUMO ( PX, 15*n) LOOPS(9) = NP*NN GO TO 100 C C****************************************************************************** C 10 CSUM (10) = SUMO ( PX, 15*n) LOOPS(10) = NP*NN GO TO 100 C C****************************************************************************** C 11 CSUM (11) = SUMO ( X(2), n-1) LOOPS(11) = NP*(NN-1) GO TO 100 C C****************************************************************************** C 12 CSUM (12) = SUMO ( X, n-1) LOOPS(12) = NP*(NN-1) GO TO 100 C C****************************************************************************** C 13 CSUM (13) = SUMO ( P, 8*n) + SUMO ( H, 8*n) LOOPS(13) = NP*NN GO TO 100 C C****************************************************************************** C 14 CSUM (14) = SUMO ( VX,n) + SUMO ( XX,n) + SUMO ( RH,67) LOOPS(14) = NP*NN GO TO 100 C C****************************************************************************** C 15 CSUM (15) = SUMO ( VY, n*7) + SUMO ( VS, n*7) LOOPS(15) = NP*(NN-1)*5 GO TO 100 C C****************************************************************************** C 16 CSUM (16) = FLOAT( k3+k2+j5+m) NROPS(16) = k2+k2+10*k3 LOOPS(16) = 1 GO TO 100 C C****************************************************************************** C 17 CSUM (17) = SUMO ( VXNE, n) + SUMO ( VXND, n) + XNM LOOPS(17) = NP*NN GO TO 100 C C****************************************************************************** C 18 CSUM (18) = SUMO ( ZR, n*7) + SUMO ( ZZ, n*7) LOOPS(18) = NP*(NN-1)*5 GO TO 100 C C****************************************************************************** C 19 CSUM (19) = SUMO ( B5, n) + STB5 LOOPS(19) = NP*NN GO TO 100 C C****************************************************************************** C 20 CSUM (20) = SUMO ( XX(2), n) LOOPS(20) = NP*NN GO TO 100 C C****************************************************************************** C 21 CSUM (21) = SUMO ( PX, n*n) LOOPS(21) = NP*NN*NN*NN GO TO 100 C C****************************************************************************** C 22 CSUM (22) = SUMO ( W, n) LOOPS(22) = NP*NN GO TO 100 C C****************************************************************************** C 23 CSUM (23) = SUMO ( ZA, n*7) LOOPS(23) = NP*(NN-1)*5 GO TO 100 C C****************************************************************************** C 24 CSUM (24) = FLOAT(m) LOOPS(24) = NP*(NN-1) GO TO 100 C C****************************************************************************** C 25 CONTINUE GO TO 100 C C****************************************************************************** C 100 CONTINUE C CALL SIZES (i) C C The following print can be used for stop-watch timing of each kernel. C You may have to increase the iteration count Loop in Subr. SIZES. C You must increase Loop if this test of clock resolution fails: C C j5 = 23456 IF( j5.EQ.23456) GO TO 120 IF( i.EQ.0 ) GO TO 114 TERR= 100.0 IF( TEMPUS.NE. 0.0) TERR= TERR*(TICKS/TEMPUS) IF( TERR .LT. 15.0) GO TO 114 WRITE( ION,113) I 113 FORMAT(/,1X,I2,45H INACCURATE TIMING OR ERROR. NEED LONGER RUN ) 114 continue if (0 .eq. 1) then WRITE ( ION,115) i, TEMPUS, TERR, NPF 115 FORMAT( 2X,i2,9H Done T= ,E11.4,8H T err= ,F8.2,1H% , 1 I8,13H Page-Faults ) else if ( i .eq. 12) then end if C 120 CALL VALUES(i) CALL SIZES (i) CALL SIGNAL( CACHE, 1.0, 0.0, 8192) IF( j5.EQ.23456) GO TO 150 C/ pause 150 CONTINUE C$C 5 get number of page faults (optional) C$ NSTAT= LIB$STAT_TIMER(5,IPF) START= SECOND(0.0) RETURN END C C*********************************************** FUNCTION TICK( LOGIO, ITER, TIC) C*********************************************** C C MEASURE TIME OVERHEAD OF SUBROUTINe TEST. C C*********************************************** C IMPLICIT REAL*4(A-H,O-Z) C C C/ PARAMETER( l1= 1001, l2= 101 ) C/ PARAMETER( l13= 64, l13h= 64/2, l213= 64+32, l813= 8*64 ) C/ PARAMETER( l14= 512, l16= 75, l416= 4*75 , l21= 25) C C COMMON /SPACES/ ION,j5,k2,k3,Loop,m,KR,N13H, 1 n,N1,N2,N13,N213,N813,N16,N416,N21,NT1,NT2 C COMMON /SPACER/ A11,A12,A13,A21,A22,A23,A31,A32,A33, 2 AR,BR,C0,CR,DI,DK, 3 DM22,DM23,DM24,DM25,DM26,DM27,DM28,DN,E3,E6,EXPMAX,FLX, 4 Q,QA,R,RI,S,SCALE,SIG,STB5,T,XNC,XNEI,XNM C REAL LOOPS, NROPS INTEGER E,F,ZONE C COMMON /SPACE0/ TIME(47), CSUM(47), WW(47), WT(47), TICKS, 1 SKALE(47), BIAS(47), WS(95), LOOPS(47), NROPS(47) C COMMON /SPACEI/ ISPAN(47), ISP2(47), ISP3(47), 1 IPAS1(47), IPAS2(47), IPAS3(47) C COMMON /ISPACE/ E(96), F(96), 1 IX(1001), IR(1001), ZONE(300) C COMMON /SPACE1/ U(1001), V(1001), W(1001), 1 X(1001), Y(1001), Z(1001), G(1001), 2 DU1(101), DU2(101), DU3(101), GRD(1001), DEX(1001), 3 XI(1001), EX(1001), EX1(1001), DEX1(1001), 4 VX(1001), XX(1001), RX(1001), RH(1001), 5 VSP(101), VSTP(101), VXNE(101), VXND(101), 6 VE3(101), VLR(101), VLIN(101), B5(101), 7 PLAN(300), D(300), SA(101), SB(101) C COMMON /SPACE2/ P(4,512), PX(25,101), CX(25,101), 1 VY(101,25), VH(101,7), VF(101,7), VG(101,7), VS(101,7), 2 ZA(101,7) , ZP(101,7), ZQ(101,7), ZR(101,7), ZM(101,7), 3 ZB(101,7) , ZU(101,7), ZV(101,7), ZZ(101,7), 4 B(64,64), C(64,64), H(64,64), 5 U1(5,101,2), U2(5,101,2), U3(5,101,2) C C C DIMENSION TSEC(16), STAT(12), MAP(47) C ION= LOGIO KR = ITER n = 0 Loop = 0 k2 = 0 k3 = 0 j5 = 23456 m = 0 C C CALL TEST(0) CALL TEST(1) CALL TEST(2) CALL TEST(3) CALL TEST(4) CALL TEST(5) CALL TEST(6) CALL TEST(7) CALL TEST(8) CALL TEST(9) CALL TEST(10) CALL TEST(11) CALL TEST(12) CALL TEST(13) CALL TEST(14) CALL TEST(15) CALL TEST(16) j5 = 0 C DO 10 k= 1,15 10 TSEC(k)= TIME(k+1) C CALL VALID( TIME,MAP,neff, 1.0e-6, TSEC, 1.0e+4, 15) CALL STATS( STAT, TIME, neff) TICK= STAT(1) TICKS= AMAX1( TIC, STAT(1)) C DO 20 k= 1,47 TIME(k)= 0.0 CSUM(k)= 0.0 20 CONTINUE C IF( LOGIO.LT.0) GO TO 73 if(0.eq.0) goto 73 WRITE( LOGIO, 99) WRITE( LOGIO,100) WRITE( LOGIO,102) ( STAT(k), k= 1,4) CALL STATS( STAT,U,NT1) WRITE( LOGIO,104) ( STAT(k), k= 1,4) CALL STATS( STAT,P,NT2) WRITE( LOGIO,104) ( STAT(k), k= 1,4) C 73 RETURN C 99 FORMAT(//,17H CLOCK OVERHEAD: ) 100 FORMAT(/,14X,7HAVERAGE,8X,7HSTANDEV,8X,7HMINIMUM,8X,7HMAXIMUM ) 102 FORMAT(/,1X,5H TICK,4E15.6) 104 FORMAT(/,1X,5H DATA,4E15.6) END C C*********************************************** SUBROUTINE VALID( VX,MAP,L, BL,X,BU,n ) C*********************************************** C C Compress valid data sets; form compression list. C C C VX - ARRAY OF RESULT COMPRESSED Xs. C MAP - ARRAY OF RESULT COMPRESSION INDICES C L - RESULT COMPRESSED LENGTH OF VX, MAP C - C BL - INPUT LOWER BOUND FOR VX C X - ARRAY OF INPUT VALUES. C BU - INPUT UPPER BOUND FOR VX C n - NUMBER OF INPUT VALUES IN X. C C*********************************************** IMPLICIT REAL*4(A-H,O-Z) C DIMENSION VX(n), MAP(n), X(n) CLLL. OPTIMIZE LEVEL G C m= 0 DO 1 k= 1,n IF( X(k).LE. BL .OR. X(k).GE. BU ) GO TO 1 m= m + 1 MAP(m)= k VX(m)= X(k) 1 CONTINUE C L= m RETURN END C C*********************************************** SUBROUTINE VALUES(i) C*********************************************** C IMPLICIT REAL*4(A-H,O-Z) C C/ PARAMETER( l1= 1001, l2= 101 ) C/ PARAMETER( l13= 64, l13h= 64/2, l213= 64+32, l813= 8*64 ) C/ PARAMETER( l14= 512, l16= 75, l416= 4*75 , l21= 25) C C COMMON /SPACES/ ION,j5,k2,k3,Loop,m,KR,N13H, 1 n,N1,N2,N13,N213,N813,N16,N416,N21,NT1,NT2 C COMMON /SPACER/ A11,A12,A13,A21,A22,A23,A31,A32,A33, 2 AR,BR,C0,CR,DI,DK, 3 DM22,DM23,DM24,DM25,DM26,DM27,DM28,DN,E3,E6,EXPMAX,FLX, 4 Q,QA,R,RI,S,SCALE,SIG,STB5,T,XNC,XNEI,XNM C REAL LOOPS, NROPS COMMON /SPACE0/ TIME(47), CSUM(47), WW(47), WT(47), TICKS, 1 SKALE(47), BIAS(47), WS(95), LOOPS(47), NROPS(47) C COMMON /SPACEI/ ISPAN(47), ISP2(47), ISP3(47), 1 IPAS1(47), IPAS2(47), IPAS3(47) C INTEGER E,F,ZONE C COMMON /ISPACE/ E(96), F(96), 1 IX(1001), IR(1001), ZONE(300) C COMMON /SPACE1/ U(1001), V(1001), W(1001), 1 X(1001), Y(1001), Z(1001), G(1001), 2 DU1(101), DU2(101), DU3(101), GRD(1001), DEX(1001), 3 XI(1001), EX(1001), EX1(1001), DEX1(1001), 4 VX(1001), XX(1001), RX(1001), RH(1001), 5 VSP(101), VSTP(101), VXNE(101), VXND(101), 6 VE3(101), VLR(101), VLIN(101), B5(101), 7 PLAN(300), D(300), SA(101), SB(101) C COMMON /SPACE2/ P(4,512), PX(25,101), CX(25,101), 1 VY(101,25), VH(101,7), VF(101,7), VG(101,7), VS(101,7), 2 ZA(101,7) , ZP(101,7), ZQ(101,7), ZR(101,7), ZM(101,7), 3 ZB(101,7) , ZU(101,7), ZV(101,7), ZZ(101,7), 4 B(64,64), C(64,64), H(64,64), 5 U1(5,101,2), U2(5,101,2), U3(5,101,2) C C C ****************************************************************** C IP1= i+1 C CALL VECTOR( i) C 13 IF( IP1.NE.13 ) GO TO 14 S= 1.0 DO 205 j= 1,4 DO 205 k= 1,512 P(j,k) = S S= S + 0.5 205 CONTINUE C DO 210 j= 1,96 E(j) = 1 F(j) = 1 210 CONTINUE C 14 IF( IP1.NE.14) GO TO 16 ONE9TH= 1./9. DO 215 j= 1,1001 JOKE= j*ONE9TH IF( JOKE.LT.1 ) JOKE= 1 EX(j) = JOKE RH(JOKE) = JOKE DEX(JOKE) = JOKE 215 CONTINUE C DO 220 j= 1,1001 VX(j) = 0.001 XX(j) = 2.001 GRD(j) = 3 + ONE9TH 220 CONTINUE C 16 IF( IP1.NE.16 ) GO TO 73 CONDITIONS: MC= 2 MR= n II= MR/3 LB= II+II D(1)= 1.01980486428764 DO 400 k= 2,300 400 D(k)= D(k-1) +1.0E-4/D(k-1) R= D(MR) DO 403 L= 1,MC m= (MR+MR)*(L-1) DO 401 j= 1,2 DO 401 k= 1,MR m= m+1 S= FLOAT(k) PLAN(m)= R*((S+1.0)/S) 401 ZONE(m)= k+k 403 CONTINUE k= MR+MR+1 ZONE(k)= MR S= D(MR-1) T= D(MR-2) C 73 CONTINUE RETURN END C C*********************************************** SUBROUTINE VECTOR(i) C*********************************************** C IMPLICIT REAL*4(A-H,O-Z) C C/ PARAMETER( l1= 1001, l2= 101 ) C/ PARAMETER( l13= 64, l13h= 64/2, l213= 64+32, l813= 8*64 ) C/ PARAMETER( l14= 512, l16= 75, l416= 4*75 , l21= 25) C/C C/ parameter( NN0= 39 ) C/ parameter( NNI= 2*l1 +2*l213 +l416 ) C/ parameter( NN1= 17*l1 +13*l2 +2*l416 ) C/ parameter( NN2= 4*512 + 3*25*101 +121*101 +3*64*64 ) C COMMON /SPACES/ ION,j5,k2,k3,Loop,m,KR,N13H, 1 n,N1,N2,N13,N213,N813,N16,N416,N21,NT1,NT2 C COMMON /SPACER/ A11(39) C/ COMMON /SPACER/ A11(NN0) C REAL LOOPS, NROPS COMMON /SPACE0/ TIME(47), CSUM(47), WW(47), WT(47), TICKS, 1 SKALE(47), BIAS(47), WS(95), LOOPS(47), NROPS(47) C COMMON /SPACEI/ ISPAN(47), ISP2(47), ISP3(47), 1 IPAS1(47), IPAS2(47), IPAS3(47) C C/ COMMON /SPACE1/ U(NN1) C/ COMMON /SPACE2/ P(NN2) C/C COMMON /SPACE1/ U(18930) COMMON /SPACE2/ P(34132) C IP1= i+1 NT0= 39 C CALL SIGNAL( U, SKALE(IP1), BIAS(IP1), NT1) CALL SIGNAL( P, SKALE(IP1), BIAS(IP1), NT2) CALL SIGNAL(A11, SKALE(IP1), BIAS(IP1), NT0) C RETURN C END End of 19 echo 2 1>&2 cat >2 <<'End of 2' Date: Mon, 17 Nov 86 11:34:49 est From: David Krowitz To: dongarra@anl-mcs Subject: Here is file cwhetstoned.bld for FX/1 library Status: RO ### ### Command file to compile and bind the double precision C language ### version of the Whetstone benchmark for the DSP9010 (Alliant FX/1). ### Note the '-ce' switch used to force the program to execute on ### the CE and not on the IP. Also note the '-lm' switch used to load ### the standard C portable math library. ### rm cwhetstoned cc -o cwhetstoned -O -ce -DPOUT cwhetstoned.c -lm rm cwhetstoned.o End of 2 echo 3 1>&2 cat >3 <<'End of 3' Date: Mon, 17 Nov 86 11:37:32 est From: David Krowitz To: dongarra@anl-mcs Subject: Here is file cwhetstones.bld for FX/1 library Status: RO ### ### Command file to compile and bind the single precision C language ### version of the Whetstone benchmark for the DSP9010 (Alliant FX/1). ### Note the '-ce' switch used to force the program to execute on ### the CE and not on the IP. Also note the '-lm' switch used to load ### the standard C portable math library. ### rm cwhetstones cc -o cwhetstones -O -ce -DPOUT cwhetstones.c -lm rm cwhetstones.o End of 3 echo 4 1>&2 cat >4 <<'End of 4' Date: Mon, 17 Nov 86 11:40:14 est From: David Krowitz To: dongarra@anl-mcs Subject: Here is file dhrystone.bld for FX/1 library Status: RO ### ### Command file to compile and bind the Dhrystone benchmark ### for the DSP9010 (Alliant FX/1). Note the '-ce' switch ### used to force the program to execute on the CE and not ### on the IP. ### rm dhrystone_noregs dhrystone_regs cc -o dhrystone_noregs -O -ce dhrystone.c cc -o dhrystone_regs -O -ce -DREG=register dhrystone.c rm dhrystone.o End of 4 echo 5 1>&2 cat >5 <<'End of 5' Date: Mon, 17 Nov 86 11:38:33 est From: David Krowitz To: dongarra@anl-mcs Subject: Here is file cwhetstones.c for FX/1 library Status: RO /* * Whetstone benchmark in C. This program is a translation of the * original Algol version in "A Synthetic Benchmark" by H.J. Curnow * and B.A. Wichman in Computer Journal, Vol 19 #1, February 1976. * * Used to test compiler optimization and floating point performance. * * Compile by: cc -O -s -o whet whet.c * or: cc -O -DPOUT -s -o whet whet.c * if output is desired. */ #define ITERATIONS 1000 /* 1 Million Whetstone instructions */ #include "math.h" float x1, x2, x3, x4, x, y, z, t, t1, t2; float e1[4]; int i, j, k, l, n1, n2, n3, n4, n6, n7, n8, n9, n10, n11; float time0,time1; main() { printf ("Single Precision Whetstone Benchmark - C language version\n\n"); cputime (&time0); /* initialize constants */ t = 0.499975; t1 = 0.50025; t2 = 2.0; /* set values of module weights */ n1 = 0 * ITERATIONS; n2 = 12 * ITERATIONS; n3 = 14 * ITERATIONS; n4 = 345 * ITERATIONS; n6 = 210 * ITERATIONS; n7 = 32 * ITERATIONS; n8 = 899 * ITERATIONS; n9 = 616 * ITERATIONS; n10 = 0 * ITERATIONS; n11 = 93 * ITERATIONS; /* MODULE 1: simple identifiers */ x1 = 1.0; x2 = x3 = x4 = -1.0; for(i = 1; i <= n1; i += 1) { x1 = ( x1 + x2 + x3 - x4 ) * t; x2 = ( x1 + x2 - x3 - x4 ) * t; x3 = ( x1 - x2 + x3 + x4 ) * t; x4 = (-x1 + x2 + x3 + x4 ) * t; } #ifdef POUT pout(n1, n1, n1, x1, x2, x3, x4); #endif /* MODULE 2: array elements */ e1[0] = 1.0; e1[1] = e1[2] = e1[3] = -1.0; for (i = 1; i <= n2; i +=1) { e1[0] = ( e1[0] + e1[1] + e1[2] - e1[3] ) * t; e1[1] = ( e1[0] + e1[1] - e1[2] + e1[3] ) * t; e1[2] = ( e1[0] - e1[1] + e1[2] + e1[3] ) * t; e1[3] = (-e1[0] + e1[1] + e1[2] + e1[3] ) * t; } #ifdef POUT pout(n2, n3, n2, e1[0], e1[1], e1[2], e1[3]); #endif /* MODULE 3: array as parameter */ for (i = 1; i <= n3; i += 1) pa(e1); #ifdef POUT pout(n3, n2, n2, e1[0], e1[1], e1[2], e1[3]); #endif /* MODULE 4: conditional jumps */ j = 1; for (i = 1; i <= n4; i += 1) { if (j == 1) j = 2; else j = 3; if (j > 2) j = 0; else j = 1; if (j < 1 ) j = 1; else j = 0; } #ifdef POUT pout(n4, j, j, x1, x2, x3, x4); #endif /* MODULE 5: omitted */ /* MODULE 6: integer arithmetic */ j = 1; k = 2; l = 3; for (i = 1; i <= n6; i += 1) { j = j * (k - j) * (l -k); k = l * k - (l - j) * k; l = (l - k) * (k + j); e1[l - 2] = j + k + l; /* C arrays are zero based */ e1[k - 2] = j * k * l; } #ifdef POUT pout(n6, j, k, e1[0], e1[1], e1[2], e1[3]); #endif /* MODULE 7: trig. functions */ x = y = 0.5; for(i = 1; i <= n7; i +=1) { x = t * atan(t2*sin(x)*cos(x)/(cos(x+y)+cos(x-y)-1.0)); y = t * atan(t2*sin(y)*cos(y)/(cos(x+y)+cos(x-y)-1.0)); } #ifdef POUT pout(n7, j, k, x, x, y, y); #endif /* MODULE 8: procedure calls */ x = y = z = 1.0; for (i = 1; i <= n8; i +=1) p3(x, y, &z); #ifdef POUT pout(n8, j, k, x, y, z, z); #endif /* MODULE9: array references */ j = 1; k = 2; l = 3; e1[0] = 1.0; e1[1] = 2.0; e1[2] = 3.0; for(i = 1; i <= n9; i += 1) p0(); #ifdef POUT pout(n9, j, k, e1[0], e1[1], e1[2], e1[3]); #endif /* MODULE10: integer arithmetic */ j = 2; k = 3; for(i = 1; i <= n10; i +=1) { j = j + k; k = j + k; j = k - j; k = k - j - j; } #ifdef POUT pout(n10, j, k, x1, x2, x3, x4); #endif /* MODULE11: standard functions */ x = 0.75; for(i = 1; i <= n11; i +=1) x = sqrt( exp( log(x) / t1)); #ifdef POUT pout(n11, j, k, x, x, x, x); #endif cputime (&time1); printf ("\n Single Precision C-languare Whetstones KIPS %8.2f\n",100*ITERATIONS/(time1-time0)); } pa(e) float e[4]; { register int j; j = 0; lab: e[0] = ( e[0] + e[1] + e[2] - e[3] ) * t; e[1] = ( e[0] + e[1] - e[2] + e[3] ) * t; e[2] = ( e[0] - e[1] + e[2] + e[3] ) * t; e[3] = ( -e[0] + e[1] + e[2] + e[3] ) / t2; j += 1; if (j < 6) goto lab; } p3(x, y, z) float x, y, *z; { x = t * (x + y); y = t * (x + y); *z = (x + y) /t2; } p0() { e1[j] = e1[k]; e1[k] = e1[l]; e1[l] = e1[j]; } #ifdef POUT pout(n, j, k, x1, x2, x3, x4) int n, j, k; float x1, x2, x3, x4; { printf("%6d%6d%6d %5e %5e %5e %5e\n", n, j, k, x1, x2, x3, x4); } #endif cputime (secs) float *secs; { #include #include struct tms tms; times(&tms); *secs = tms.tms_utime/60.0; } End of 5 echo 6 1>&2 cat >6 <<'End of 6' Date: Mon, 17 Nov 86 11:36:39 est From: David Krowitz To: dongarra@anl-mcs Subject: Here is file cwhetstoned.c for FX/1 library Status: RO /* * Whetstone benchmark in C. This program is a translation of the * original Algol version in "A Synthetic Benchmark" by H.J. Curnow * and B.A. Wichman in Computer Journal, Vol 19 #1, February 1976. * * Used to test compiler optimization and floating point performance. * * Compile by: cc -O -s -o whet whet.c * or: cc -O -DPOUT -s -o whet whet.c * if output is desired. */ #define ITERATIONS 1000 /* 1 Million Whetstone instructions */ #include "math.h" double x1, x2, x3, x4, x, y, z, t, t1, t2; double e1[4]; int i, j, k, l, n1, n2, n3, n4, n6, n7, n8, n9, n10, n11; float time0,time1; main() { printf ("Double Precision Whetstone Benchmark - C language version\n\n"); cputime (&time0); /* initialize constants */ t = 0.499975; t1 = 0.50025; t2 = 2.0; /* set values of module weights */ n1 = 0 * ITERATIONS; n2 = 12 * ITERATIONS; n3 = 14 * ITERATIONS; n4 = 345 * ITERATIONS; n6 = 210 * ITERATIONS; n7 = 32 * ITERATIONS; n8 = 899 * ITERATIONS; n9 = 616 * ITERATIONS; n10 = 0 * ITERATIONS; n11 = 93 * ITERATIONS; /* MODULE 1: simple identifiers */ x1 = 1.0; x2 = x3 = x4 = -1.0; for(i = 1; i <= n1; i += 1) { x1 = ( x1 + x2 + x3 - x4 ) * t; x2 = ( x1 + x2 - x3 - x4 ) * t; x3 = ( x1 - x2 + x3 + x4 ) * t; x4 = (-x1 + x2 + x3 + x4 ) * t; } #ifdef POUT pout(n1, n1, n1, x1, x2, x3, x4); #endif /* MODULE 2: array elements */ e1[0] = 1.0; e1[1] = e1[2] = e1[3] = -1.0; for (i = 1; i <= n2; i +=1) { e1[0] = ( e1[0] + e1[1] + e1[2] - e1[3] ) * t; e1[1] = ( e1[0] + e1[1] - e1[2] + e1[3] ) * t; e1[2] = ( e1[0] - e1[1] + e1[2] + e1[3] ) * t; e1[3] = (-e1[0] + e1[1] + e1[2] + e1[3] ) * t; } #ifdef POUT pout(n2, n3, n2, e1[0], e1[1], e1[2], e1[3]); #endif /* MODULE 3: array as parameter */ for (i = 1; i <= n3; i += 1) pa(e1); #ifdef POUT pout(n3, n2, n2, e1[0], e1[1], e1[2], e1[3]); #endif /* MODULE 4: conditional jumps */ j = 1; for (i = 1; i <= n4; i += 1) { if (j == 1) j = 2; else j = 3; if (j > 2) j = 0; else j = 1; if (j < 1 ) j = 1; else j = 0; } #ifdef POUT pout(n4, j, j, x1, x2, x3, x4); #endif /* MODULE 5: omitted */ /* MODULE 6: integer arithmetic */ j = 1; k = 2; l = 3; for (i = 1; i <= n6; i += 1) { j = j * (k - j) * (l -k); k = l * k - (l - j) * k; l = (l - k) * (k + j); e1[l - 2] = j + k + l; /* C arrays are zero based */ e1[k - 2] = j * k * l; } #ifdef POUT pout(n6, j, k, e1[0], e1[1], e1[2], e1[3]); #endif /* MODULE 7: trig. functions */ x = y = 0.5; for(i = 1; i <= n7; i +=1) { x = t * atan(t2*sin(x)*cos(x)/(cos(x+y)+cos(x-y)-1.0)); y = t * atan(t2*sin(y)*cos(y)/(cos(x+y)+cos(x-y)-1.0)); } #ifdef POUT pout(n7, j, k, x, x, y, y); #endif /* MODULE 8: procedure calls */ x = y = z = 1.0; for (i = 1; i <= n8; i +=1) p3(x, y, &z); #ifdef POUT pout(n8, j, k, x, y, z, z); #endif /* MODULE9: array references */ j = 1; k = 2; l = 3; e1[0] = 1.0; e1[1] = 2.0; e1[2] = 3.0; for(i = 1; i <= n9; i += 1) p0(); #ifdef POUT pout(n9, j, k, e1[0], e1[1], e1[2], e1[3]); #endif /* MODULE10: integer arithmetic */ j = 2; k = 3; for(i = 1; i <= n10; i +=1) { j = j + k; k = j + k; j = k - j; k = k - j - j; } #ifdef POUT pout(n10, j, k, x1, x2, x3, x4); #endif /* MODULE11: standard functions */ x = 0.75; for(i = 1; i <= n11; i +=1) x = sqrt( exp( log(x) / t1)); #ifdef POUT pout(n11, j, k, x, x, x, x); #endif cputime (&time1); printf ("\n Single Precision C-languare Whetstones KIPS %8.2f\n",100*ITERATIONS/(time1-time0)); } pa(e) double e[4]; { register int j; j = 0; lab: e[0] = ( e[0] + e[1] + e[2] - e[3] ) * t; e[1] = ( e[0] + e[1] - e[2] + e[3] ) * t; e[2] = ( e[0] - e[1] + e[2] + e[3] ) * t; e[3] = ( -e[0] + e[1] + e[2] + e[3] ) / t2; j += 1; if (j < 6) goto lab; } p3(x, y, z) double x, y, *z; { x = t * (x + y); y = t * (x + y); *z = (x + y) /t2; } p0() { e1[j] = e1[k]; e1[k] = e1[l]; e1[l] = e1[j]; } #ifdef POUT pout(n, j, k, x1, x2, x3, x4) int n, j, k; double x1, x2, x3, x4; { printf("%6d%6d%6d %5e %5e %5e %5e\n", n, j, k, x1, x2, x3, x4); } #endif cputime (secs) float *secs; { #include #include struct tms tms; times(&tms); *secs = tms.tms_utime/60.0; } End of 6 echo 7 1>&2 cat >7 <<'End of 7' Date: Mon, 17 Nov 86 11:41:23 est From: David Krowitz To: dongarra@anl-mcs Subject: Here is file linpackd.bld for FX/1 library Status: RO ### ### Command file to compile and bind the double precision ### version of the Linpack benchmark for the DSP9010 (Alliant FX/1). ### rm linpackd fortran -o linpackd -Ogv -AS linpackd.f rm linpackd.o End of 7 echo 8 1>&2 cat >8 <<'End of 8' Date: Mon, 17 Nov 86 11:43:01 est From: David Krowitz To: dongarra@anl-mcs Subject: Here is file linpacks.bld for FX/1 library Status: RO ### ### Command file to compile and bind the single precision ### version of the Linpack benchmark for the DSP9010 (Alliant FX/1). ### rm linpacks fortran -o linpacks -Ogv -AS linpacks.f rm linpacks.o End of 8 echo 9 1>&2 cat >9 <<'End of 9' Date: Mon, 17 Nov 86 11:42:13 est From: David Krowitz To: dongarra@anl-mcs Subject: Here is file linpackd.f for FX/1 library Status: RO program linpackd double precision aa(200,200),a(201,200),b(200),x(200) double precision time(8,6),cray,ops,total,norma,normx double precision resid,residn,eps,epslon integer ipvt(200) lda = 201 ldaa = 200 c n = 100 cray = .056 write(6,1) 1 format(' Please send the results of this run to:'// $ ' Jack J. Dongarra'/ $ ' Mathematics and Computer Science Division'/ $ ' Argonne National Laboratory'/ $ ' Argonne, Illinois 60439'// $ ' Telephone: 312-972-7246'// $ ' ARPAnet: DONGARRA@ANL-MCS'/) ops = (2.0d0*n**3)/3.0d0 + 2.0d0*n**2 c call matgen(a,lda,n,b,norma) t1 = second() call dgefa(a,lda,n,ipvt,info) time(1,1) = second() - t1 t1 = second() call dgesl(a,lda,n,ipvt,b,0) time(1,2) = second() - t1 total = time(1,1) + time(1,2) c c compute a residual to verify results. c do 10 i = 1,n x(i) = b(i) 10 continue call matgen(a,lda,n,b,norma) do 20 i = 1,n b(i) = -b(i) 20 continue call dmxpy(n,b,n,lda,x,a) resid = 0.0 normx = 0.0 do 30 i = 1,n resid = dmax1( resid, dabs(b(i)) ) normx = dmax1( normx, dabs(x(i)) ) 30 continue eps = epslon(1.0d0) residn = resid/( n*norma*normx*eps ) write(6,40) 40 format(' norm. resid resid machep', $ ' x(1) x(n)') write(6,50) residn,resid,eps,x(1),x(n) 50 format(1p5e16.8) c write(6,60) n 60 format(//' times are reported for matrices of order ',i5) write(6,70) 70 format(6x,'dgefa',6x,'dgesl',6x,'total',5x,'mflops',7x,'unit', $ 6x,'ratio') c time(1,3) = total time(1,4) = ops/(1.0d6*total) time(1,5) = 2.0d0/time(1,4) time(1,6) = total/cray write(6,80) lda 80 format(' times for array with leading dimension of',i4) write(6,110) (time(1,i),i=1,6) c call matgen(a,lda,n,b,norma) t1 = second() call dgefa(a,lda,n,ipvt,info) time(2,1) = second() - t1 t1 = second() call dgesl(a,lda,n,ipvt,b,0) time(2,2) = second() - t1 total = time(2,1) + time(2,2) time(2,3) = total time(2,4) = ops/(1.0d6*total) time(2,5) = 2.0d0/time(2,4) time(2,6) = total/cray c call matgen(a,lda,n,b,norma) t1 = second() call dgefa(a,lda,n,ipvt,info) time(3,1) = second() - t1 t1 = second() call dgesl(a,lda,n,ipvt,b,0) time(3,2) = second() - t1 total = time(3,1) + time(3,2) time(3,3) = total time(3,4) = ops/(1.0d6*total) time(3,5) = 2.0d0/time(3,4) time(3,6) = total/cray c ntimes = 10 tm2 = 0 t1 = second() do 90 i = 1,ntimes tm = second() call matgen(a,lda,n,b,norma) tm2 = tm2 + second() - tm call dgefa(a,lda,n,ipvt,info) 90 continue time(4,1) = (second() - t1 - tm2)/ntimes t1 = second() do 100 i = 1,ntimes call dgesl(a,lda,n,ipvt,b,0) 100 continue time(4,2) = (second() - t1)/ntimes total = time(4,1) + time(4,2) time(4,3) = total time(4,4) = ops/(1.0d6*total) time(4,5) = 2.0d0/time(4,4) time(4,6) = total/cray c write(6,110) (time(2,i),i=1,6) write(6,110) (time(3,i),i=1,6) write(6,110) (time(4,i),i=1,6) 110 format(6(1pe11.3)) c call matgen(aa,ldaa,n,b,norma) t1 = second() call dgefa(aa,ldaa,n,ipvt,info) time(5,1) = second() - t1 t1 = second() call dgesl(aa,ldaa,n,ipvt,b,0) time(5,2) = second() - t1 total = time(5,1) + time(5,2) time(5,3) = total time(5,4) = ops/(1.0d6*total) time(5,5) = 2.0d0/time(5,4) time(5,6) = total/cray c call matgen(aa,ldaa,n,b,norma) t1 = second() call dgefa(aa,ldaa,n,ipvt,info) time(6,1) = second() - t1 t1 = second() call dgesl(aa,ldaa,n,ipvt,b,0) time(6,2) = second() - t1 total = time(6,1) + time(6,2) time(6,3) = total time(6,4) = ops/(1.0d6*total) time(6,5) = 2.0d0/time(6,4) time(6,6) = total/cray c call matgen(aa,ldaa,n,b,norma) t1 = second() call dgefa(aa,ldaa,n,ipvt,info) time(7,1) = second() - t1 t1 = second() call dgesl(aa,ldaa,n,ipvt,b,0) time(7,2) = second() - t1 total = time(7,1) + time(7,2) time(7,3) = total time(7,4) = ops/(1.0d6*total) time(7,5) = 2.0d0/time(7,4) time(7,6) = total/cray c ntimes = 10 tm2 = 0 t1 = second() do 120 i = 1,ntimes tm = second() call matgen(aa,ldaa,n,b,norma) tm2 = tm2 + second() - tm call dgefa(aa,ldaa,n,ipvt,info) 120 continue time(8,1) = (second() - t1 - tm2)/ntimes t1 = second() do 130 i = 1,ntimes call dgesl(aa,ldaa,n,ipvt,b,0) 130 continue time(8,2) = (second() - t1)/ntimes total = time(8,1) + time(8,2) time(8,3) = total time(8,4) = ops/(1.0d6*total) time(8,5) = 2.0d0/time(8,4) time(8,6) = total/cray c write(6,140) ldaa 140 format(/' times for array with leading dimension of',i4) write(6,110) (time(5,i),i=1,6) write(6,110) (time(6,i),i=1,6) write(6,110) (time(7,i),i=1,6) write(6,110) (time(8,i),i=1,6) stop end subroutine matgen(a,lda,n,b,norma) double precision a(lda,1),b(1),norma c init = 1325 norma = 0.0 do 30 j = 1,n do 20 i = 1,n init = mod(3125*init,65536) a(i,j) = (init - 32768.0)/16384.0 norma = dmax1(a(i,j), norma) 20 continue 30 continue do 35 i = 1,n b(i) = 0.0 35 continue do 50 j = 1,n do 40 i = 1,n b(i) = b(i) + a(i,j) 40 continue 50 continue return end subroutine dgefa(a,lda,n,ipvt,info) integer lda,n,ipvt(1),info double precision a(lda,1) c c dgefa factors a double precision matrix by gaussian elimination. c c dgefa is usually called by dgeco, but it can be called c directly with a saving in time if rcond is not needed. c (time for dgeco) = (1 + 9/n)*(time for dgefa) . c c on entry c c a double precision(lda, n) c the matrix to be factored. c c lda integer c the leading dimension of the array a . c c n integer c the order of the matrix a . c c on return c c a an upper triangular matrix and the multipliers c which were used to obtain it. c the factorization can be written a = l*u where c l is a product of permutation and unit lower c triangular matrices and u is upper triangular. c c ipvt integer(n) c an integer vector of pivot indices. c c info integer c = 0 normal value. c = k if u(k,k) .eq. 0.0 . this is not an error c condition for this subroutine, but it does c indicate that dgesl or dgedi will divide by zero c if called. use rcond in dgeco for a reliable c indication of singularity. c c linpack. this version dated 08/14/78 . c cleve moler, university of new mexico, argonne national lab. c c subroutines and functions c c blas daxpy,dscal,idamax c c internal variables c double precision t integer idamax,j,k,kp1,l,nm1 c c c gaussian elimination with partial pivoting c info = 0 nm1 = n - 1 if (nm1 .lt. 1) go to 70 do 60 k = 1, nm1 kp1 = k + 1 c c find l = pivot index c l = idamax(n-k+1,a(k,k),1) + k - 1 ipvt(k) = l c c zero pivot implies this column already triangularized c if (a(l,k) .eq. 0.0d0) go to 40 c c interchange if necessary c if (l .eq. k) go to 10 t = a(l,k) a(l,k) = a(k,k) a(k,k) = t 10 continue c c compute multipliers c t = -1.0d0/a(k,k) call dscal(n-k,t,a(k+1,k),1) c c row elimination with column indexing c do 30 j = kp1, n t = a(l,j) if (l .eq. k) go to 20 a(l,j) = a(k,j) a(k,j) = t 20 continue call daxpy(n-k,t,a(k+1,k),1,a(k+1,j),1) 30 continue go to 50 40 continue info = k 50 continue 60 continue 70 continue ipvt(n) = n if (a(n,n) .eq. 0.0d0) info = n return end subroutine dgesl(a,lda,n,ipvt,b,job) integer lda,n,ipvt(1),job double precision a(lda,1),b(1) c c dgesl solves the double precision system c a * x = b or trans(a) * x = b c using the factors computed by dgeco or dgefa. c c on entry c c a double precision(lda, n) c the output from dgeco or dgefa. c c lda integer c the leading dimension of the array a . c c n integer c the order of the matrix a . c c ipvt integer(n) c the pivot vector from dgeco or dgefa. c c b double precision(n) c the right hand side vector. c c job integer c = 0 to solve a*x = b , c = nonzero to solve trans(a)*x = b where c trans(a) is the transpose. c c on return c c b the solution vector x . c c error condition c c a division by zero will occur if the input factor contains a c zero on the diagonal. technically this indicates singularity c but it is often caused by improper arguments or improper c setting of lda . it will not occur if the subroutines are c called correctly and if dgeco has set rcond .gt. 0.0 c or dgefa has set info .eq. 0 . c c to compute inverse(a) * c where c is a matrix c with p columns c call dgeco(a,lda,n,ipvt,rcond,z) c if (rcond is too small) go to ... c do 10 j = 1, p c call dgesl(a,lda,n,ipvt,c(1,j),0) c 10 continue c c linpack. this version dated 08/14/78 . c cleve moler, university of new mexico, argonne national lab. c c subroutines and functions c c blas daxpy,ddot c c internal variables c double precision ddot,t integer k,kb,l,nm1 c nm1 = n - 1 if (job .ne. 0) go to 50 c c job = 0 , solve a * x = b c first solve l*y = b c if (nm1 .lt. 1) go to 30 do 20 k = 1, nm1 l = ipvt(k) t = b(l) if (l .eq. k) go to 10 b(l) = b(k) b(k) = t 10 continue call daxpy(n-k,t,a(k+1,k),1,b(k+1),1) 20 continue 30 continue c c now solve u*x = y c do 40 kb = 1, n k = n + 1 - kb b(k) = b(k)/a(k,k) t = -b(k) call daxpy(k-1,t,a(1,k),1,b(1),1) 40 continue go to 100 50 continue c c job = nonzero, solve trans(a) * x = b c first solve trans(u)*y = b c do 60 k = 1, n t = ddot(k-1,a(1,k),1,b(1),1) b(k) = (b(k) - t)/a(k,k) 60 continue c c now solve trans(l)*x = y c if (nm1 .lt. 1) go to 90 do 80 kb = 1, nm1 k = n - kb b(k) = b(k) + ddot(n-k,a(k+1,k),1,b(k+1),1) l = ipvt(k) if (l .eq. k) go to 70 t = b(l) b(l) = b(k) b(k) = t 70 continue 80 continue 90 continue 100 continue return end subroutine daxpy(n,da,dx,incx,dy,incy) c c constant times a vector plus a vector. c jack dongarra, linpack, 3/11/78. c double precision dx(1),dy(1),da integer i,incx,incy,ix,iy,m,mp1,n c if(n.le.0)return if (da .eq. 0.0d0) return if(incx.eq.1.and.incy.eq.1)go to 20 c c code for unequal increments or equal increments c not equal to 1 c ix = 1 iy = 1 if(incx.lt.0)ix = (-n+1)*incx + 1 if(incy.lt.0)iy = (-n+1)*incy + 1 do 10 i = 1,n dy(iy) = dy(iy) + da*dx(ix) ix = ix + incx iy = iy + incy 10 continue return c c code for both increments equal to 1 c 20 continue do 30 i = 1,n dy(i) = dy(i) + da*dx(i) 30 continue return end double precision function ddot(n,dx,incx,dy,incy) c c forms the dot product of two vectors. c jack dongarra, linpack, 3/11/78. c double precision dx(1),dy(1),dtemp integer i,incx,incy,ix,iy,m,mp1,n c ddot = 0.0d0 dtemp = 0.0d0 if(n.le.0)return if(incx.eq.1.and.incy.eq.1)go to 20 c c code for unequal increments or equal increments c not equal to 1 c ix = 1 iy = 1 if(incx.lt.0)ix = (-n+1)*incx + 1 if(incy.lt.0)iy = (-n+1)*incy + 1 do 10 i = 1,n dtemp = dtemp + dx(ix)*dy(iy) ix = ix + incx iy = iy + incy 10 continue ddot = dtemp return c c code for both increments equal to 1 c 20 continue do 30 i = 1,n dtemp = dtemp + dx(i)*dy(i) 30 continue ddot = dtemp return end subroutine dscal(n,da,dx,incx) c c scales a vector by a constant. c jack dongarra, linpack, 3/11/78. c double precision da,dx(1) integer i,incx,m,mp1,n,nincx c if(n.le.0)return if(incx.eq.1)go to 20 c c code for increment not equal to 1 c nincx = n*incx do 10 i = 1,nincx,incx dx(i) = da*dx(i) 10 continue return c c code for increment equal to 1 c 20 continue do 30 i = 1,n dx(i) = da*dx(i) 30 continue return end integer function idamax(n,dx,incx) c c finds the index of element having max. dabsolute value. c jack dongarra, linpack, 3/11/78. c double precision dx(1),dmax integer i,incx,ix,n c idamax = 0 if( n .lt. 1 ) return idamax = 1 if(n.eq.1)return if(incx.eq.1)go to 20 c c code for increment not equal to 1 c ix = 1 dmax = dabs(dx(1)) ix = ix + incx do 10 i = 2,n if(dabs(dx(ix)).le.dmax) go to 5 idamax = i dmax = dabs(dx(ix)) 5 ix = ix + incx 10 continue return c c code for increment equal to 1 c 20 dmax = dabs(dx(1)) do 30 i = 2,n if(dabs(dx(i)).le.dmax) go to 30 idamax = i dmax = dabs(dx(i)) 30 continue return end double precision function epslon (x) double precision x c c estimate unit roundoff in quantities of size x. c double precision a,b,c,eps c c this program should function properly on all systems c satisfying the following two assumptions, c 1. the base used in representing dfloating point c numbers is not a power of three. c 2. the quantity a in statement 10 is represented to c the accuracy used in dfloating point variables c that are stored in memory. c the statement number 10 and the go to 10 are intended to c force optimizing compilers to generate code satisfying c assumption 2. c under these assumptions, it should be true that, c a is not exactly equal to four-thirds, c b has a zero for its last bit or digit, c c is not exactly equal to one, c eps measures the separation of 1.0 from c the next larger dfloating point number. c the developers of eispack would appreciate being informed c about any systems where these assumptions do not hold. c c ***************************************************************** c this routine is one of the auxiliary routines used by eispack iii c to avoid machine dependencies. c ***************************************************************** c c this version dated 4/6/83. c a = 4.0d0/3.0d0 10 b = a - 1.0d0 c = b + b + b eps = dabs(c-1.0d0) if (eps .eq. 0.0d0) go to 10 epslon = eps*dabs(x) return end subroutine mm (a, lda, n1, n3, b, ldb, n2, c, ldc) double precision a(lda,*), b(ldb,*), c(ldc,*) c c purpose: c multiply matrix b times matrix c and store the result in matrix a. c c parameters: c c a double precision(lda,n3), matrix of n1 rows and n3 columns c c lda integer, leading dimension of array a c c n1 integer, number of rows in matrices a and b c c n3 integer, number of columns in matrices a and c c c b double precision(ldb,n2), matrix of n1 rows and n2 columns c c ldb integer, leading dimension of array b c c n2 integer, number of columns in matrix b, and number of rows in c matrix c c c c double precision(ldc,n3), matrix of n2 rows and n3 columns c c ldc integer, leading dimension of array c c c ---------------------------------------------------------------------- c do 20 j = 1, n3 do 10 i = 1, n1 a(i,j) = 0.0 10 continue call dmxpy (n2,a(1,j),n1,ldb,c(1,j),b) 20 continue c return end subroutine dmxpy (n1, y, n2, ldm, x, m) double precision y(*), x(*), m(ldm,*) c c purpose: c multiply matrix m times vector x and add the result to vector y. c c parameters: c c n1 integer, number of elements in vector y, and number of rows in c matrix m c c y double precision(n1), vector of length n1 to which is added c the product m*x c c n2 integer, number of elements in vector x, and number of columns c in matrix m c c ldm integer, leading dimension of array m c c x double precision(n2), vector of length n2 c c m double precision(ldm,n2), matrix of n1 rows and n2 columns c c ---------------------------------------------------------------------- c c cleanup odd vector c j = mod(n2,2) if (j .ge. 1) then do 10 i = 1, n1 y(i) = (y(i)) + x(j)*m(i,j) 10 continue endif c c cleanup odd group of two vectors c j = mod(n2,4) if (j .ge. 2) then do 20 i = 1, n1 y(i) = ( (y(i)) $ + x(j-1)*m(i,j-1)) + x(j)*m(i,j) 20 continue endif c c cleanup odd group of four vectors c j = mod(n2,8) if (j .ge. 4) then do 30 i = 1, n1 y(i) = ((( (y(i)) $ + x(j-3)*m(i,j-3)) + x(j-2)*m(i,j-2)) $ + x(j-1)*m(i,j-1)) + x(j) *m(i,j) 30 continue endif c c cleanup odd group of eight vectors c j = mod(n2,16) if (j .ge. 8) then do 40 i = 1, n1 y(i) = ((((((( (y(i)) $ + x(j-7)*m(i,j-7)) + x(j-6)*m(i,j-6)) $ + x(j-5)*m(i,j-5)) + x(j-4)*m(i,j-4)) $ + x(j-3)*m(i,j-3)) + x(j-2)*m(i,j-2)) $ + x(j-1)*m(i,j-1)) + x(j) *m(i,j) 40 continue endif c c main loop - groups of sixteen vectors c jmin = j+16 do 60 j = jmin, n2, 16 do 50 i = 1, n1 y(i) = ((((((((((((((( (y(i)) $ + x(j-15)*m(i,j-15)) + x(j-14)*m(i,j-14)) $ + x(j-13)*m(i,j-13)) + x(j-12)*m(i,j-12)) $ + x(j-11)*m(i,j-11)) + x(j-10)*m(i,j-10)) $ + x(j- 9)*m(i,j- 9)) + x(j- 8)*m(i,j- 8)) $ + x(j- 7)*m(i,j- 7)) + x(j- 6)*m(i,j- 6)) $ + x(j- 5)*m(i,j- 5)) + x(j- 4)*m(i,j- 4)) $ + x(j- 3)*m(i,j- 3)) + x(j- 2)*m(i,j- 2)) $ + x(j- 1)*m(i,j- 1)) + x(j) *m(i,j) 50 continue 60 continue return end C C Timing routine for the Alliant FX/1 and FX/8 C Return cumulative CPU time used by process C REAL FUNCTION SECOND () REAL STIME,UTIME CALL ETIME(UTIME,STIME) SECOND=UTIME RETURN END End of 9