# 《矩阵计算（英文版•第4版）》试读：1.2 Structure and Efficiency

The efficiency of a given matrix algorithm depends upon several factors. Most obvious and what we treat in this section is the amount of required arithmetic and storage. How to reason about these important attributes is nicely illustrated by considering examples that involve triangular matrices, diagonal matrices, banded matrices, symmetric matrices, and permutation matrices. These are among the most important types of structured matrices that arise in practice, and various economies can be realized if they are involved in a calculation. 1.2.1 Band Matrices A matrix is sparse if a large fraction of its entries are zero. An important special case is the band matrix. We say that A ∈ IRm×n has lower bandwidth p if aij = 0 whenever i > j +p and upper bandwidth q if j > i+q implies aij = 0. Here is an example of an8-by-5 matrix that has lower bandwidth 1 and upper bandwidth 2: 代码无法显示 The ×’s designate arbitrary nonzero entries. This notation is handy to indicate the structure of a matrix and we use it extensively. Band structures that occur frequently are tabulated in Table 1.2.1. Table 1.2.1. Band terminology for m-by-n matrices 1.2.2 Triangular Matrix Multiplication To introduce band matrix “thinking” we look at the matrix multiplication update problem C = C + AB where A, B, and C are each n-by-n and upper triangular. The3-by-3 case is illuminating: 代码无法显示 It suggests that the product is upper triangular and that its upper triangular entries are the result of abbreviated inner products. Indeed, since aikbkj = 0 whenever k < ior j < k, we see that the update has the form 代码无法显示 for all i and j that satisfy i ≤ j. This yields the following algorithm: Algorithm 1.2.1 (Triangular Matrix Multiplication) Given upper triangular matrices A,B, C ∈ IRn×n, this algorithm overwrites C with C + AB. for i = 1:n for j = i:n for k = i:j C(i, j) = C(i, j) + A(i, k)•B(k, j) end end end 1.2.3 The Colon Notation—Again The dot product that the k-loop performs in Algorithm 1.2.1 can be succinctly stated if we extend the colon notation introduced in §1.1.8. If A ∈ IRm×n and the integers p,q, and r satisfy 1 ≤ p ≤ q ≤ n and 1 ≤ r ≤ m, then A(r, p:q) = [ arp | • • • | arq ] ∈ IR1×(q−p+1) . Likewise, if 1 ≤ p ≤ q ≤ m and 1 ≤ c ≤ n, then 代码无法显示 With this notation we can rewrite Algorithm 1.2.1 as for i = 1:n for j = i:n C(i, j) = C(i, j) + A(i, i:j)•B(i:j, j) end end This highlights the abbreviated inner products that are computed by the innermost loop. 1.2.4 Assessing Work Obviously, upper triangular matrix multiplication involves less arithmetic than full matrix multiplication. Looking at Algorithm 1.2.1, we see that cij requires 2(j −i+1)flops if (i ≤ j). Using the approximations 代码无法显示 we find that triangular matrix multiplication requires one-sixth the number of flops as full matrix multiplication: 代码无法显示 We throw away the low-order terms since their inclusion does not contribute to what the flop count “says.” For example, an exact flop count of Algorithm 1.2.1 reveals that precisely n3/3 + n2 + 2n/3 flops are involved. For large n (the typical situation of interest) we see that the exact flop count offers no insight beyond the simple n3/3accounting. Flop counting is a necessarily crude approach to the measurement of program efficiency since it ignores subscripting, memory traffic, and other overheads associated with program execution. We must not infer too much from a comparison of flop counts. We cannot conclude, for example, that triangular matrix multiplication is six times faster than full matrix multiplication. Flop counting captures just one dimension of what makes an algorithm efficient in practice. The equally relevant issues of vectorization and data locality are taken up in §1.5. 1.2.5 Band Storage Suppose A ∈ IRn×n has lower bandwidth p and upper bandwidth q and assume that p and q are much smaller than n. Such a matrix can be stored in a (p+q+1)-by-n array A. band with the convention that aij = A. band(i − j + q +1, j) (1.2.1) for all (i, j) that fall inside the band, e.g., Here, the “＊” entries are unused. With this data structure, our column-oriented gaxpy algorithm (Algorithm 1.1.4) transforms to the following: Algorithm 1.2.2 (Band Storage Gaxpy) Suppose A ∈ IRn×n has lower bandwidth pand upper bandwidth q and is stored in the A. band format (1.2.1). If x, y ∈ IRn, then this algorithm overwrites y with y + Ax. for j = 1:n α1 = max(1, j − q), α2 = min(n, j + p) β1 = max(1, q +2 − j), β2 = β1 + α2 − α1 y(α1:α2) = y(α1:α2) + A. band(β1:β2, j)x(j) end Notice that by storing A column by column in A. band, we obtain a column oriented saxpy procedure. Indeed, Algorithm 1.2.2 is derived from Algorithm 1.1.4 by recognizing that each saxpy involves a vector with a small number of nonzeros. Integer arithmetic is used to identify the location of these nonzeros. As a result of this careful zero/nonzero analysis, the algorithm involves just 2n(p+q +1) flops with the assumption that p and q are much smaller than n. 1.2.6 Working with Diagonal Matrices Matrices with upper and lower bandwidth zero are diagonal. If D ∈ IRm×n is diagonal, then we use the notation D = diag(d1, . . . , dq), q= min{m, n} ⇐⇒ di = dii. Shortcut notations when the dimension is clear include diag(d) and diag(di). Note that if D = diag(d) ∈ IRn×n and x ∈ IRn, then Dx = d. ∗ x. If A ∈ IRm×n, then premultiplication by D = diag(d1, . . . , dm) ∈ IRm×m scales rows, B = DA ⇐⇒ B(i, :) = di •A(i, :), i = 1:m while post-multiplication by D = diag(d1, . . . , dn) ∈ IRn×n scales columns, B = AD ⇐⇒ B(:, j) = dj •A(:, j), j = 1:n. Both of these special matrix-matrix multiplications require mn flops. 1.2.7 Symmetry A matrix A ∈ IRn×n is symmetric if AT = A and skew-symmetric if AT = −A. Likewise, a matrix A ∈ Cn×n is Hermitian if AH = A and skew-Hermitian if AH = −A. Hereare some examples: 代码无法显示 For such matrices, storage requirements can be halved by simply storing the lower triangle of elements, e.g., 代码无法显示 For general n, we set A.vec((n − j/2)(j − 1) + i) = aij 1 ≤ j ≤ i ≤ n. Here is a column-oriented gaxpy with the matrix A represented in A.vec. Algorithm 1.2.3 (Symmetric Storage Gaxpy) Suppose A ∈ IRn×n is symmetric and stored in the A.vec style (1.2.2). If x, y ∈ IRn, then this algorithm overwrites y withy + Ax. for j = 1:n for i = 1:j − 1 y(i) = y(i) + A.vec((i − 1)n − i(i − 1)/2 + j)x(j) end for i = j:n y(i) = y(i) + A.vec((j − 1)n − j(j − 1)/2 + i)x(j) end end This algorithm requires the same 2n2 flops that an ordinary gaxpy requires. 1.2.8 Permutation Matrices and the Identity We denote the n-by-n identity matrix by In, e.g., 代码无法显示 We use the notation ei to designate the ith column of In. If the rows of In are reordered, then the resulting matrix is said to be a permutation matrix, e.g., 代码无法显示 The representation of an n-by-n permutation matrix requires just an n-vector of integers whose components specify where the 1’s occur. For example, if v ∈ IRn has the property that vi specifies the column where the “1” occurs in row i, then y = Px implies that yi = xvi, i = 1:n. In the example above, the underlying v-vector is v = [ 2 4 3 1 ]. 1.2.9 Specifying Integer Vectors and Submatrices For permutation matrix work and block matrix manipulation (§1.3) it is convenient to have a method for specifying structured integer vectors of subscripts. The Matlab colon notation is again the proper vehicle and a few examples suffice to show how it works. If n = 8, then 代码无法显示 Suppose A ∈ IRm×n and that v ∈ IRr and w ∈ IRs are integer vectors with the property that 1 ≤ vi ≤ m and 1 ≤ wi ≤ n. If B = A(v,w), then B ∈ IRr×s is the matrix defined by bij = avi,wj for i = 1:r and j = 1:s. Thus, if A ∈ IR8×8, then 代码无法显示 1.2.10 Working with Permutation Matrices Using the colon notation, the 4-by-4 permutation matrix in (1.2.3) is defined by P =I4(v, :) where v = [ 2 4 3 1 ]. In general, if v ∈ IRn is a permutation of the vector1:n = [1, 2, . . . , n] and P = In(v, :), then 代码无法显示 The second result follows from the fact that vi is the row index of the “1” in column iof PT . Note that PT (Px) = x. The inverse of a permutation matrix is its transpose. The action of a permutation matrix on a given matrix A ∈ IRm×n is easily described. If P = Im(v, :) and Q = In(w, :), then PAQT = A(v,w). It also follows that In(v, :) • In(w, :) = In(w(v), :). Although permutation operations involve no flops, they move data and contribute to execution time, an issue that is discussed in §1.5. isbn: 7115346100