BENCHMARKS FOR THE_ELEMENTS WITH THE PROGRAM FLOPS.C(*)

SYSTEM  THE_ELEMENTS  FAIDRA  ORIGIN4  DEDALOS 
CPUi Intel Pentium II Sun UltraSparc-II  MIPS R10000  HP CONVEX SPP2000
MHz 450 400 226 180
COMPILER PGI v3.0 Sun Workstation 5.0 Silicon Graphics HP-UX CC
OPTIONS -O4-tp p6-Mnoframe -xchip=ultra2 -xO5 -64 -mips4 -O3  -O -w -mp  -64 
MFLOPS(1) 125 199 106 114
MFLOPS(2) 95 178 102 84
MFLOPS(3) 142 364 205 149
MFLOPS(4) 188 727 388 254
AVERAGE 138 367 200 150

(*) Flops.c

Flops.c is a 'c' program which attempts to estimate your systems floating-point 'MFLOPS' rating for the FADD, FSUB, FMUL, and FDIV operations based on specific 'instruction mixes' (discussed below). The program provides an estimate of PEAK MFLOPS performance by making maximal use of register variables with
minimal interaction with main memory. The execution loops are all small so that they will fit in any cache. Flops.c can be used along with Linpack and the Livermore kernels (which exercise memory much more extensively) to gain further insight into the limits of system performance. The flops.c execution modules include
various percent weightings of FDIV's (from 0% to 25% FDIV's) so that the range of performance can be obtained when using FDIV's. FDIV's, being computationally more intensive than FADD's or FMUL's, can impact performance considerably on some systems. Flops.c consists of 8 independent 'modules' which, except
for module 2, conduct numerical integration of various functions. Some of the functions (sin(x) and cos(x)) are approximated using a power series expansion accurate to 1.0e-14 over the integration interval. Module 2, estimates the value of pi based upon the Maclaurin series expansion of atan(1). MFLOPS ratings are provided
for each module, but the programs overall results are summerized by the MFLOPS(1), MFLOPS(2), MFLOPS(3), and MFLOPS(4) outputs.
 
 

The MFLOPS(1) result is identical to the result provided by all previous versions of flops.c (flops12c.c and earliar versions). It is based only upon the results from modules 2 and 3. Actually, on faster machines, MFLOPS(1) from flops.c V2.0 is expected to provide more accurate results since the number of iterations
conducted (which is reflected in the timing accuracy) is more tightly controlled than in previous versions of flops.c. Two problems surfaced in using MFLOPS(1). First, it was difficult to completely 'vectorize' the result due to the recurrence of the 's' variable in module 2. This problem is addressed in the MFLOPS(2) result
which does not use module 2, but maintains nearly the same weighting of FDIV's (9.2%) as in MFLOPS(1) (9.6%). For scalar machines the MFLOPS(2) results 'should' be similar to the MFLOPS(1) results. How ever, for vector machines the MFLOPS(1) and MFLOPS(2) results may differ considerably since the MFLOPS(2) result is expected to be more completely vectorizable. The second problem with MFLOPS(1) centers around the percentage of FDIV's (9.6%) which was viewed as too high for an important class of problems. This concern is addressed in the MFLOPS(3) result which does only 3.4% FDIV's, and the MFLOPS(4) result where NO FDIV's are conducted at all.

Al Aburto (aburto@nosc.mil)