The Level 3 BLAS specifications [40] specify the input, output and calling sequence for each routine, but allow freedom of implementation, subject to the requirement that the routines be numerically stable. Level 3 BLAS implementations can therefore be built using matrix multiplication algorithms that achieve a more favorable operation count (for suitable dimensions) than the standard multiplication technique, provided that these ``fast'' algorithms are numerically stable. The simplest fast matrix multiplication technique is Strassen's method, which can multiply two n-by-n matrices in fewer than operations, where .
The effect on the results in this chapter of using a fast Level 3 BLAS implementation can be explained as follows. In general, reasonably implemented fast Level 3 BLAS preserve all the bounds presented here (except those at the end of subsection 4.10), but the constant p(n) may increase somewhat. Also, the iterative refinement routine xyyRFS may take more steps to converge.
This is what we mean by reasonably implemented fast Level 3 BLAS. Here, ci denotes a constant depending on the specified matrix dimensions.
(1) If A is m-by-n, B is n-by-p and
is the computed
approximation to C=AB, then
(2)
The computed solution
to the triangular systems TX=B,
where T is m-by-m and B is m-by-p, satisfies
For further details, and references to fast multiplication techniques, see [27].