next up previous contents index
Next: Factorizations for Solving Linear Up: Performance of LAPACK Previous: Block Algorithms and their   Contents   Index


Examples of Block Algorithms in LAPACK

Having discussed in detail the derivation of one particular block algorithm, we now describe examples of the performance that has been achieved with a variety of block algorithms. Tables 3.2, 3.3, 3.4, 3.5, and 3.6 describe the hardware and software characteristics of the machines.


Table 3.2: Characteristics of the Compaq/Digital computers timed
  Dec Alpha Miata Compaq AlphaServer DS-20
Model LX164 DS-20 (21264)
Processor EV56 EV6
Clock speed (MHz) 533 500
Processors per node 1 1
Operating system Linux 2.2.7 OSF1 V4.0 1091
BLAS ATLAS (version 1.0) DXML (version 3.5)
Fortran compiler g77 (egcs 2.91.60) f77 (version 5.2)
Fortran flags -funroll-all-loops -fno-f2c -O3 -O4 -fpe1
Precision double (64-bit) double (64-bit)


Table 3.3: Characteristics of the IBM computers timed
  IBM Power 3 IBM PowerPC
Model Winterhawk  
Processor 630 604e
Clock speed (MHz) 200 190
Processors per node 1 1
Operating system AIX 4.3 Linux 2.2.7
BLAS ESSL (3.1.1.0) ATLAS (version 1.0)
Fortran compiler xlf (6.1.0.0) g77 (egcs 2.91.66)
Fortran flags -O4 -qmaxmem=-1 -funroll-all-loops -fno-f2c -O3
Precision double (64-bit) double (64-bit)


Table 3.4: Characteristics of the Intel computers timed
  Intel Pentium II Intel Pentium III
Model    
Processor Pentium II Pentium III
Clock speed (MHz) 450 550
Processors per node 1 1
Operating system Linux 2.2.7 Linux 2.2.5-15
BLAS ATLAS (version 1.0) ATLAS (version 1.0)
Fortran compiler g77 (egcs 2.91.60) g77 (egcs 2.91.66)
Fortran flags -funroll-all-loops -fno-f2c -O3 -funroll-all-loops -fno-f2c -O3
Precision double (64-bit) double (64-bit)


Table 3.5: Characteristics of the SGI computer timed
  SGI Origin 2000
Model IP27
Processor MIPS R12000
Clock speed (MHz) 300
Processors per node 64
Operating system IRIX 6.5
BLAS SGI BLAS
Fortran compiler f77 (7.2.1.2m)
Fortran flags -O3 -64 -mips4 -r10000 -OPT:IEEE_NaN_inf=ON
Precision double (64-bit)


Table 3.6: Characteristics of the Sun computers timed
  Sun Ultra 2 Sun Enterprise 450
Model Ultra 2 Model 2200 Model 1300
Processor Sun UltraSPARC Sun UltraSPARC-II
Clock speed (MHz) 200 300
Processors per node 1 1
Operating system SunOS 5.5.1 SunOS 5.5.7
BLAS Sun Performance Library Sun Performance Library
Fortran compiler f77 (SC5.0) f77 (SC5.0)
Fortran flags -f -dalign -native -xO5 -xarch=v8plusa -f -dalign -native -xO5 -xarch=v8plusa
Precision double (64-bit) double (64-bit)

See Gallivan et al. [52] and Dongarra et al. [43] for an alternative survey of algorithms for dense linear algebra on high-performance computers.




next up previous contents index
Next: Factorizations for Solving Linear Up: Performance of LAPACK Previous: Block Algorithms and their   Contents   Index
Susan Blackford
1999-10-01