Having discussed in detail the derivation of one particular block algorithm, we now describe examples of the performance that has been achieved with a variety of block algorithms. Tables 3.2, 3.3, 3.4, 3.5, and 3.6 describe the hardware and software characteristics of the machines.
Dec Alpha Miata | Compaq AlphaServer DS-20 | |
Model | LX164 | DS-20 (21264) |
Processor | EV56 | EV6 |
Clock speed (MHz) | 533 | 500 |
Processors per node | 1 | 1 |
Operating system | Linux 2.2.7 | OSF1 V4.0 1091 |
BLAS | ATLAS (version 1.0) | DXML (version 3.5) |
Fortran compiler | g77 (egcs 2.91.60) | f77 (version 5.2) |
Fortran flags | -funroll-all-loops -fno-f2c -O3 | -O4 -fpe1 |
Precision | double (64-bit) | double (64-bit) |
IBM Power 3 | IBM PowerPC | |
Model | Winterhawk | |
Processor | 630 | 604e |
Clock speed (MHz) | 200 | 190 |
Processors per node | 1 | 1 |
Operating system | AIX 4.3 | Linux 2.2.7 |
BLAS | ESSL (3.1.1.0) | ATLAS (version 1.0) |
Fortran compiler | xlf (6.1.0.0) | g77 (egcs 2.91.66) |
Fortran flags | -O4 -qmaxmem=-1 | -funroll-all-loops -fno-f2c -O3 |
Precision | double (64-bit) | double (64-bit) |
Intel Pentium II | Intel Pentium III | |
Model | ||
Processor | Pentium II | Pentium III |
Clock speed (MHz) | 450 | 550 |
Processors per node | 1 | 1 |
Operating system | Linux 2.2.7 | Linux 2.2.5-15 |
BLAS | ATLAS (version 1.0) | ATLAS (version 1.0) |
Fortran compiler | g77 (egcs 2.91.60) | g77 (egcs 2.91.66) |
Fortran flags | -funroll-all-loops -fno-f2c -O3 | -funroll-all-loops -fno-f2c -O3 |
Precision | double (64-bit) | double (64-bit) |
SGI Origin 2000 | |
Model | IP27 |
Processor | MIPS R12000 |
Clock speed (MHz) | 300 |
Processors per node | 64 |
Operating system | IRIX 6.5 |
BLAS | SGI BLAS |
Fortran compiler | f77 (7.2.1.2m) |
Fortran flags | -O3 -64 -mips4 -r10000 -OPT:IEEE_NaN_inf=ON |
Precision | double (64-bit) |
Sun Ultra 2 | Sun Enterprise 450 | |
Model | Ultra 2 Model 2200 | Model 1300 |
Processor | Sun UltraSPARC | Sun UltraSPARC-II |
Clock speed (MHz) | 200 | 300 |
Processors per node | 1 | 1 |
Operating system | SunOS 5.5.1 | SunOS 5.5.7 |
BLAS | Sun Performance Library | Sun Performance Library |
Fortran compiler | f77 (SC5.0) | f77 (SC5.0) |
Fortran flags | -f -dalign -native -xO5 -xarch=v8plusa | -f -dalign -native -xO5 -xarch=v8plusa |
Precision | double (64-bit) | double (64-bit) |
See Gallivan et al. [52] and Dongarra et al. [43] for an alternative survey of algorithms for dense linear algebra on high-performance computers.