entry of C is equal to the inner product of the ith row of A and the jth row of B.Symbolically, this looks like the following, aTb1ab2· aTbp a喝b C=AB ambi amb2 ambp Remember that since A∈Rmxn and B∈Rnxp,ai∈Rn and b,∈Rm,so these inner products all make sense.This is the most "natural"representation when we represent A by rows and B by columns.Alternatively,we can represent A by columns,and B by rows. This representation leads to a much trickier interpretation of AB as a sum of outer products. Symbolically, C=AB= a:t Put another way,AB is equal to the sum,over all i,of the outer product of the ith column of A and the ith row of B.Since,in this case,aiE Rm and biE RP,the dimension of the outer product ab is m x p,which coincides with the dimension of C.Chances are,the last equality above may appear confusing to you.If so,take the time to check it for yourself! Second,we can also view matrix-matrix multiplication as a set of matrix-vector products. Specifically,if we represent B by columns,we can view the columns of C as matrix-vector products between A and the columns of B.Symbolically, C=AB=A Ab Ab2 b, Here the ith column of C is given by the matrix-vector product with the vector on the right, ci=Abi.These matrix-vector products can in turn be interpreted using both viewpoints given in the previous subsection.Finally,we have the analogous viewpoint,where we repre- sent A by rows,and view the rows of C as the matrix-vector product between the rows of A and C.Symbolically, 二 C=AB Here the ith row of C is given by the matrix-vector product with the vector on the left, c=a B. 6
entry of C is equal to the inner product of the ith row of A and the jth row of B. Symbolically, this looks like the following, C = AB = — a T 1 — — a T 2 — . . . — a T m — | | | b1 b2 · · · bp | | | = a T 1 b1 a T 1 b2 · · · a T 1 bp a T 2 b1 a T 2 b2 · · · a T 2 bp . . . . . . . . . . . . a T mb1 a T mb2 · · · a T mbp . Remember that since A ∈ R m×n and B ∈ R n×p , ai ∈ R n and bj ∈ R n , so these inner products all make sense. This is the most “natural” representation when we represent A by rows and B by columns. Alternatively, we can represent A by columns, and B by rows. This representation leads to a much trickier interpretation of AB as a sum of outer products. Symbolically, C = AB = | | | a1 a2 · · · an | | | — b T 1 — — b T 2 — . . . — b T n — = Xn i=1 aib T i . Put another way, AB is equal to the sum, over all i, of the outer product of the ith column of A and the ith row of B. Since, in this case, ai ∈ R m and bi ∈ R p , the dimension of the outer product aib T i is m × p, which coincides with the dimension of C. Chances are, the last equality above may appear confusing to you. If so, take the time to check it for yourself! Second, we can also view matrix-matrix multiplication as a set of matrix-vector products. Specifically, if we represent B by columns, we can view the columns of C as matrix-vector products between A and the columns of B. Symbolically, C = AB = A | | | b1 b2 · · · bp | | | = | | | Ab1 Ab2 · · · Abp | | | . Here the ith column of C is given by the matrix-vector product with the vector on the right, ci = Abi . These matrix-vector products can in turn be interpreted using both viewpoints given in the previous subsection. Finally, we have the analogous viewpoint, where we represent A by rows, and view the rows of C as the matrix-vector product between the rows of A and C. Symbolically, C = AB = — a T 1 — — a T 2 — . . . — a T m — B = — a T 1 B — — a T 2 B — . . . — a T mB — . Here the ith row of C is given by the matrix-vector product with the vector on the left, c T i = a T i B. 6
It may seem like overkill to dissect matrix multiplication to such a large degree,especially when all these viewpoints follow immediately from the initial definition we gave (in about a line of math)at the beginning of this section.However,virtually all of linear algebra deals with matrix multiplications of some kind,and it is worthwhile to spend some time trying to develop an intuitive understanding of the viewpoints presented here. In addition to this,it is useful to know a few basic properties of matrix multiplication at a higher level: Matrix multiplication is associative:(AB)C=A(BC). Matrix multiplication is distributive:A(B+C)=AB+AC. Matrix multiplication is,in general,not commutative;that is,it can be the case that AB≠BA.(For example,ifA∈Rmxn and B∈Rnxg,the matrix product BA does not even exist if m and g are not equal!) If you are not familiar with these properties,take the time to verify them for yourself. For example,to check the associativity of matrix multiplication,suppose that A Rmxn B∈Rnxp,andC∈Rpxg.Note that AB∈Rmxp,so(AB)C∈Rmxg.Similarly,BC∈Rnxg, so A(BC)E Rmxa.Thus,the dimensions of the resulting matrices agree.To show that matrix multiplication is associative,it suffices to check that the (i,j)th entry of(AB)C is equal to the (i,j)th entry of A(BC).We can verify this directly using the definition of matrix multiplication: (AB)C)5= Ck 1 若 ( (BC)-(A(BC) Here,the first and last two equalities simply use the definition of matrix multiplication,the third and fifth equalities use the distributive property for scalar multiplication over addition, and the fourth equality uses the commutative and associativity of scalar addition.This technique for proving matrix properties by reduction to simple scalar properties will come up often,so make sure you're familiar with it. 3 Operations and Properties In this section we present several operations and properties of matrices and vectors.Hope- fully a great deal of this will be review for you,so the notes can just serve as a reference for these topics. 7
It may seem like overkill to dissect matrix multiplication to such a large degree, especially when all these viewpoints follow immediately from the initial definition we gave (in about a line of math) at the beginning of this section. However, virtually all of linear algebra deals with matrix multiplications of some kind, and it is worthwhile to spend some time trying to develop an intuitive understanding of the viewpoints presented here. In addition to this, it is useful to know a few basic properties of matrix multiplication at a higher level: • Matrix multiplication is associative: (AB)C = A(BC). • Matrix multiplication is distributive: A(B + C) = AB + AC. • Matrix multiplication is, in general, not commutative; that is, it can be the case that AB 6= BA. (For example, if A ∈ R m×n and B ∈ R n×q , the matrix product BA does not even exist if m and q are not equal!) If you are not familiar with these properties, take the time to verify them for yourself. For example, to check the associativity of matrix multiplication, suppose that A ∈ R m×n , B ∈ R n×p , and C ∈ R p×q . Note that AB ∈ R m×p , so (AB)C ∈ R m×q . Similarly, BC ∈ R n×q , so A(BC) ∈ R m×q . Thus, the dimensions of the resulting matrices agree. To show that matrix multiplication is associative, it suffices to check that the (i, j)th entry of (AB)C is equal to the (i, j)th entry of A(BC). We can verify this directly using the definition of matrix multiplication: ((AB)C)ij = X p k=1 (AB)ikCkj = X p k=1 Xn l=1 AilBlk! Ckj = X p k=1 Xn l=1 AilBlkCkj! = Xn l=1 X p k=1 AilBlkCkj! = Xn l=1 Ail Xn k=p BlkCkj! = Xn l=1 Ail(BC)lj = (A(BC))ij . Here, the first and last two equalities simply use the definition of matrix multiplication, the third and fifth equalities use the distributive property for scalar multiplication over addition, and the fourth equality uses the commutative and associativity of scalar addition. This technique for proving matrix properties by reduction to simple scalar properties will come up often, so make sure you’re familiar with it. 3 Operations and Properties In this section we present several operations and properties of matrices and vectors. Hopefully a great deal of this will be review for you, so the notes can just serve as a reference for these topics. 7
3.1 The Identity Matrix and Diagonal Matrices The identity matric,denoted I E Rnxr,is a square matrix with ones on the diagonal and zeros everywhere else.That is, (1i=j ={0i+j It has the property that for all ARmxn, AI=A=IA. Note that in some sense,the notation for the identity matrix is ambiguous,since it does not specify the dimension of I.Generally,the dimensions of I are inferred from context so as to make matrix multiplication possible.For example,in the equation above,the I in Al=A is an n x n matrix,whereas the I in A=IA is an m x m matrix. A diagonal matrin is a matrix where all non-diagonal elements are 0.This is typically denoted D diag(di,d2,...,dn),with di i=j 0i≠j Clearly,I diag(1,1,...,1). 3.2 The Transpose The transpose of a matrix results from "flipping"the rows and columns.Given a matrix A E Rmxr,its transpose,written ATE Rnxm,is the n x m matrix whose entries are given by (AT)i=Aji We have in fact already been using the transpose when describing row vectors,since the transpose of a column vector is naturally a row vector. The following properties of transposes are easily verified: ·(AT)T=A ·(AB)T=BTAT ·(A+B)T=AT+BT 3.3 Symmetric Matrices A square matrix AE Rnxn is symmetric if A=AT.It is anti-symmetric if A=-AT. It is easy to show that for any matrix A E R"x",the matrix A+AT is symmetric and the 8
3.1 The Identity Matrix and Diagonal Matrices The identity matrix, denoted I ∈ R n×n , is a square matrix with ones on the diagonal and zeros everywhere else. That is, Iij = 1 i = j 0 i 6= j It has the property that for all A ∈ R m×n , AI = A = IA. Note that in some sense, the notation for the identity matrix is ambiguous, since it does not specify the dimension of I. Generally, the dimensions of I are inferred from context so as to make matrix multiplication possible. For example, in the equation above, the I in AI = A is an n × n matrix, whereas the I in A = IA is an m × m matrix. A diagonal matrix is a matrix where all non-diagonal elements are 0. This is typically denoted D = diag(d1, d2, . . . , dn), with Dij = di i = j 0 i 6= j Clearly, I = diag(1, 1, . . . , 1). 3.2 The Transpose The transpose of a matrix results from “flipping” the rows and columns. Given a matrix A ∈ R m×n , its transpose, written AT ∈ R n×m, is the n × m matrix whose entries are given by (A T )ij = Aji. We have in fact already been using the transpose when describing row vectors, since the transpose of a column vector is naturally a row vector. The following properties of transposes are easily verified: • (AT ) T = A • (AB) T = BTAT • (A + B) T = AT + BT 3.3 Symmetric Matrices A square matrix A ∈ R n×n is symmetric if A = AT . It is anti-symmetric if A = −AT . It is easy to show that for any matrix A ∈ R n×n , the matrix A + AT is symmetric and the 8
matrix A-AT is anti-symmetric.From this it follows that any square matrix A E R"x"can be represented as a sum of a symmetric matrix and an anti-symmetric matrix,since A=(A+4)+(A-AT) and the first matrix on the right is symmetric,while the second is anti-symmetric.It turns out that symmetric matrices occur a great deal in practice,and they have many nice properties which we will look at shortly.It is common to denote the set of all symmetric matrices of size n as S",so that A E S"means that A is a symmetric n x n matrix; 3.4 The Trace The trace of a square matrix A E Rnx",denoted tr(A)(or just trA if the parentheses are obviously implied),is the sum of diagonal elements in the matrix: trA As described in the CS229 lecture notes,the trace has the following properties (included here for the sake of completeness): ●ForA∈Rnxn,trA=trAT. ·ForA,B∈Rnxn,tr(A+B)=trA+trB. ·ForA∈Rmxn,t∈R,tr(tA)=ttrA. For A,B such that AB is square,trAB trBA For A,B,C such that ABC is square,trABC trBCA trCAB,and so on for the product of more matrices. As an example of how these properties can be proven,we'll consider the fourth property given above.Suppose that A E Rmxn and B E Rnxm (so that AB E Rmxm is a square matrix).Observe that BA E Rnxn is also a square matrix,so it makes sense to apply the trace operator to it.To verify that trAB =trBA,note that =(三)B4防=BM 9
matrix A−AT is anti-symmetric. From this it follows that any square matrix A ∈ R n×n can be represented as a sum of a symmetric matrix and an anti-symmetric matrix, since A = 1 2 (A + A T ) + 1 2 (A − A T ) and the first matrix on the right is symmetric, while the second is anti-symmetric. It turns out that symmetric matrices occur a great deal in practice, and they have many nice properties which we will look at shortly. It is common to denote the set of all symmetric matrices of size n as S n , so that A ∈ S n means that A is a symmetric n × n matrix; 3.4 The Trace The trace of a square matrix A ∈ R n×n , denoted tr(A) (or just trA if the parentheses are obviously implied), is the sum of diagonal elements in the matrix: trA = Xn i=1 Aii. As described in the CS229 lecture notes, the trace has the following properties (included here for the sake of completeness): • For A ∈ R n×n , trA = trAT . • For A, B ∈ R n×n , tr(A + B) = trA + trB. • For A ∈ R n×n , t ∈ R, tr(tA) = t trA. • For A, B such that AB is square, trAB = trBA. • For A, B, C such that ABC is square, trABC = trBCA = trCAB, and so on for the product of more matrices. As an example of how these properties can be proven, we’ll consider the fourth property given above. Suppose that A ∈ R m×n and B ∈ R n×m (so that AB ∈ R m×m is a square matrix). Observe that BA ∈ R n×n is also a square matrix, so it makes sense to apply the trace operator to it. To verify that trAB = trBA, note that trAB = Xm i=1 (AB)ii = Xm i=1 Xn j=1 AijBji! = Xm i=1 Xn j=1 AijBji = Xn j=1 Xm i=1 BjiAij = Xn j=1 Xm i=1 BjiAij! = Xn j=1 (BA)jj = trBA. 9