BENCHMARKS FOR THE_ELEMENTS WITH THE PROGRAM FLOPS.C(*)
SYSTEM | THE_ELEMENTS | FAIDRA | ORIGIN4 | DEDALOS |
CPUi | Intel Pentium II | Sun UltraSparc-II | MIPS R10000 | HP CONVEX SPP2000 |
MHz | 450 | 400 | 226 | 180 |
COMPILER | PGI v3.0 | Sun Workstation 5.0 | Silicon Graphics | HP-UX CC |
OPTIONS | -O4-tp p6-Mnoframe | -xchip=ultra2 -xO5 | -64 -mips4 -O3 | -O -w -mp -64 |
MFLOPS(1) | 125 | 199 | 106 | 114 |
MFLOPS(2) | 95 | 178 | 102 | 84 |
MFLOPS(3) | 142 | 364 | 205 | 149 |
MFLOPS(4) | 188 | 727 | 388 | 254 |
AVERAGE | 138 | 367 | 200 | 150 |
(*) Flops.c
Flops.c is a 'c' program which
attempts to estimate your systems floating-point 'MFLOPS' rating for the
FADD, FSUB, FMUL, and FDIV operations based on specific 'instruction mixes'
(discussed below). The program provides an estimate of PEAK MFLOPS performance
by making maximal use of register variables with
minimal interaction with main
memory. The execution loops are all small so that they will fit in any
cache. Flops.c can be used along with Linpack and the Livermore kernels
(which exercise memory much more extensively) to gain further insight into
the limits of system performance. The flops.c execution modules include
various percent weightings of
FDIV's (from 0% to 25% FDIV's) so that the range of performance can be
obtained when using FDIV's. FDIV's, being computationally more intensive
than FADD's or FMUL's, can impact performance considerably on some systems.
Flops.c consists of 8 independent 'modules' which, except
for module 2, conduct numerical
integration of various functions. Some of the functions (sin(x) and cos(x))
are approximated using a power series expansion accurate to 1.0e-14 over
the integration interval. Module 2, estimates the value of pi based upon
the Maclaurin series expansion of atan(1). MFLOPS ratings are provided
for each module, but the programs
overall results are summerized by the MFLOPS(1),
MFLOPS(2), MFLOPS(3), and MFLOPS(4) outputs.
The MFLOPS(1)
result is identical to the result provided by all previous versions of
flops.c (flops12c.c and earliar versions). It is based only upon the results
from modules 2 and 3. Actually, on faster machines, MFLOPS(1)
from flops.c V2.0 is expected to provide more accurate results since the
number of iterations
conducted (which is reflected
in the timing accuracy) is more tightly controlled than in previous versions
of flops.c. Two problems surfaced in using MFLOPS(1).
First, it was difficult to completely 'vectorize' the result due to the
recurrence of the 's' variable in module 2. This problem is addressed in
the MFLOPS(2) result
which does not use module 2,
but maintains nearly the same weighting of FDIV's (9.2%) as in MFLOPS(1)
(9.6%).
For scalar machines the MFLOPS(2) results
'should' be similar to the MFLOPS(1) results.
How ever, for vector machines the MFLOPS(1)
and MFLOPS(2) results may differ considerably
since the MFLOPS(2) result is expected to
be more completely vectorizable. The second problem with MFLOPS(1)
centers around the percentage of FDIV's (9.6%) which was viewed as too
high for an important class of problems. This concern is addressed in the
MFLOPS(3)
result which does only 3.4% FDIV's, and the
MFLOPS(4) result where NO FDIV's are
conducted at all.
Al Aburto (aburto@nosc.mil)