FLOPS_imsl.txt.html

BENCHMARKS FOR THE_ELEMENTS WITH THE PROGRAM FLOPS.C(*)

SYSTEM	THE_ELEMENTS	FAIDRA	ORIGIN4	DEDALOS
CPUi	Intel Pentium II	Sun UltraSparc-II	MIPS R10000	HP CONVEX SPP2000
MHz	450	400	226	180
COMPILER	PGI v3.0	Sun Workstation 5.0	Silicon Graphics	HP-UX CC
OPTIONS	-O4-tp p6-Mnoframe	-xchip=ultra2 -xO5	-64 -mips4 -O3	-O -w -mp -64
MFLOPS(1)	125	199	106	114
MFLOPS(2)	95	178	102	84
MFLOPS(3)	142	364	205	149
MFLOPS(4)	188	727	388	254
AVERAGE	138	367	200	150

(*) Flops.c

Flops.c is a 'c' program which attempts to estimate your systems floating-point 'MFLOPS' rating for the FADD, FSUB, FMUL, and FDIV operations based on specific 'instruction mixes' (discussed below). The program provides an estimate of PEAK MFLOPS performance by making maximal use of register variables with
minimal interaction with main memory. The execution loops are all small so that they will fit in any cache. Flops.c can be used along with Linpack and the Livermore kernels (which exercise memory much more extensively) to gain further insight into the limits of system performance. The flops.c execution modules include
various percent weightings of FDIV's (from 0% to 25% FDIV's) so that the range of performance can be obtained when using FDIV's. FDIV's, being computationally more intensive than FADD's or FMUL's, can impact performance considerably on some systems. Flops.c consists of 8 independent 'modules' which, except
for module 2, conduct numerical integration of various functions. Some of the functions (sin(x) and cos(x)) are approximated using a power series expansion accurate to 1.0e-14 over the integration interval. Module 2, estimates the value of pi based upon the Maclaurin series expansion of atan(1). MFLOPS ratings are provided
for each module, but the programs overall results are summerized by the MFLOPS(1), MFLOPS(2), MFLOPS(3), and MFLOPS(4) outputs.

The MFLOPS(1) result is identical to the result provided by all previous versions of flops.c (flops12c.c and earliar versions). It is based only upon the results from modules 2 and 3. Actually, on faster machines, MFLOPS(1) from flops.c V2.0 is expected to provide more accurate results since the number of iterations
conducted (which is reflected in the timing accuracy) is more tightly controlled than in previous versions of flops.c. Two problems surfaced in using MFLOPS(1). First, it was difficult to completely 'vectorize' the result due to the recurrence of the 's' variable in module 2. This problem is addressed in the MFLOPS(2) result
which does not use module 2, but maintains nearly the same weighting of FDIV's (9.2%) as in MFLOPS(1) (9.6%). For scalar machines the MFLOPS(2) results 'should' be similar to the MFLOPS(1) results. How ever, for vector machines the MFLOPS(1) and MFLOPS(2) results may differ considerably since the MFLOPS(2) result is expected to be more completely vectorizable. The second problem with MFLOPS(1) centers around the percentage of FDIV's (9.6%) which was viewed as too high for an important class of problems. This concern is addressed in the MFLOPS(3) result which does only 3.4% FDIV's, and the MFLOPS(4) result where NO FDIV's are conducted at all.

Al Aburto (aburto@nosc.mil)