Linear Algebra Review and Reference Zico Kolter (updated by Chuong Do) October 7.2008 Contents 1 Basic Concepts and Notation 2 l.1 Basic Notation...···.········· 2 2 Matrix Multiplication 3 2.1 Vector-Vector Products... 4 2.2 Matrix-Vector Products 4 2.3 Matrix-Matrix Products 5 3 Operations and Properties 个 3.1 The Identity Matrix and Diagonal Matrices 8 3.2 The Transpose·········· 8 3.3 Symmetric Matrices..·..·. 8 3.4 The Trace..········· 9 3.5 Norms.····· 10 3.6 Linear Independence and Rank 11 3.7 The Inverse·.:..······· 11 3.8 Orthogonal Matrices...................·.··.. 12 3.9 Range and Nullspace of a Matrix 12 3.l0 The Determinant.·.·. 14 3.11 Quadratic Forms and Positive Semidefinite Matrices.......··· 17 3.12 Eigenvalues and Eigenvectors... 18 3.13 Eigenvalues and Eigenvectors of Symmetric Matrices 19 4 Matrix Calculus 20 4.1 The Gradient 20 4.2 The Hessian.. 2 4.3 Gradients and Hessians of Quadratic and Linear Functions 23 4.4 Least☒Squares...............······· 5 4.5 Gradients of the Determinant 25 4.6 Eigenvalues as Optimization ... 26
Linear Algebra Review and Reference Zico Kolter (updated by Chuong Do) October 7, 2008 Contents 1 Basic Concepts and Notation 2 1.1 Basic Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2 Matrix Multiplication 3 2.1 Vector-Vector Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 Matrix-Vector Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.3 Matrix-Matrix Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 3 Operations and Properties 7 3.1 The Identity Matrix and Diagonal Matrices . . . . . . . . . . . . . . . . . . 8 3.2 The Transpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3.3 Symmetric Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3.4 The Trace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.5 Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.6 Linear Independence and Rank . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.7 The Inverse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.8 Orthogonal Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.9 Range and Nullspace of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . 12 3.10 The Determinant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.11 Quadratic Forms and Positive Semidefinite Matrices . . . . . . . . . . . . . . 17 3.12 Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.13 Eigenvalues and Eigenvectors of Symmetric Matrices . . . . . . . . . . . . . 19 4 Matrix Calculus 20 4.1 The Gradient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 4.2 The Hessian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4.3 Gradients and Hessians of Quadratic and Linear Functions . . . . . . . . . . 23 4.4 Least Squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4.5 Gradients of the Determinant . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4.6 Eigenvalues as Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 1
1 Basic Concepts and Notation Linear algebra provides a way of compactly representing and operating on sets of linear equations.For example,consider the following system of equations: 4x1-5x2=-13 -2x1+3x2=9. This is two equations and two variables,so as you know from high school algebra,you can find a unique solution for x and z2(unless the equations are somehow degenerate,for example if the second equation is simply a multiple of the first,but in the case above there is in fact a unique solution).In matrix notation,we can write the system more compactly as Ax=b with 4=[4],=[] As we will see shortly,there are many advantages (including the obvious space savings) to analyzing linear equations in this form. 1.1 Basic Notation We use the following notation: By A E Rmxn we denote a matrix with m rows and n columns,where the entries of A are real numbers. By z E R",we denote a vector with n entries.By convention,an n-dimensional vector is often thought of as a matrix with n rows and 1 column,known as a column vector. If we want to explicitly represent a row vector-a matrix with 1 row and n columns -we typically write zT(here aT denotes the transpose of x,which we will define shortly). The ith element of a vector x is denoted xi: T1 T= In 2
1 Basic Concepts and Notation Linear algebra provides a way of compactly representing and operating on sets of linear equations. For example, consider the following system of equations: 4x1 − 5x2 = −13 −2x1 + 3x2 = 9. This is two equations and two variables, so as you know from high school algebra, you can find a unique solution for x1 and x2 (unless the equations are somehow degenerate, for example if the second equation is simply a multiple of the first, but in the case above there is in fact a unique solution). In matrix notation, we can write the system more compactly as Ax = b with A = 4 −5 −2 3 , b = −13 9 . As we will see shortly, there are many advantages (including the obvious space savings) to analyzing linear equations in this form. 1.1 Basic Notation We use the following notation: • By A ∈ R m×n we denote a matrix with m rows and n columns, where the entries of A are real numbers. • By x ∈ R n , we denote a vector with n entries. By convention, an n-dimensional vector is often thought of as a matrix with n rows and 1 column, known as a column vector. If we want to explicitly represent a row vector — a matrix with 1 row and n columns — we typically write x T (here x T denotes the transpose of x, which we will define shortly). • The ith element of a vector x is denoted xi : x = x1 x2 . . . xn . 2
We use the notation aii (or Ai,Aii,etc)to denote the entry of A in the ith row and jth column: a11 012··· ain a21 a22 a2n A= aml am2 amn We denote the ith column of A by aj or A.j: … We denote the ith row of A by af or Ai.:: A= Note that these definitions are ambiguous (for example,the a and af in the previous two definitions are not the same vector).Usually the meaning of the notation should be obvious from its use. 2 Matrix Multiplication The product of two matrices AE Rmxn and BERnxp is the matrix C=AB∈RmxP, where C,=∑AkB k=1 Note that in order for the matrix product to exist,the number of columns in A must equal the number of rows in B.There are many ways of looking at matrix multiplication,and we'll start by examining a few special cases. 3
• We use the notation aij (or Aij , Ai,j , etc) to denote the entry of A in the ith row and jth column: A = a11 a12 · · · a1n a21 a22 · · · a2n . . . . . . . . . . . . am1 am2 · · · amn . • We denote the jth column of A by aj or A:,j : A = | | | a1 a2 · · · an | | | . • We denote the ith row of A by a T i or Ai,: : A = — a T 1 — — a T 2 — . . . — a T m — . • Note that these definitions are ambiguous (for example, the a1 and a T 1 in the previous two definitions are not the same vector). Usually the meaning of the notation should be obvious from its use. 2 Matrix Multiplication The product of two matrices A ∈ R m×n and B ∈ R n×p is the matrix C = AB ∈ R m×p , where Cij = Xn k=1 AikBkj . Note that in order for the matrix product to exist, the number of columns in A must equal the number of rows in B. There are many ways of looking at matrix multiplication, and we’ll start by examining a few special cases. 3
2.1 Vector-Vector Products Given two vectors r,ye R",the quantity ry,sometimes called the inner product or dot product of the vectors,is a real number given by 1 2 x'y∈R=[x12·cn : Tiyi i=l Observe that inner products are really just special case of matrix multiplication.Note that it is always the case that aTy=yTc. Given vectors E R,yE R(not necessarily of the same size),xyTE Rmxn is called the outer product of the vectors.It is a matrix whose entries are given by (y)=ij, i.e., T1 T141 T142 ··1yn xyT∈Rmxn T2 E21 E22 T2Un 1 2· m]= Imy1 Tmy2 TmYn As an example of how the outer product can be useful,let 1 ER"denote an n-dimensional vector whose entries are all equal to 1.Furthermore,consider the matrix A E Rmxn whose columns are all equal to some vector xE Rm.Using outer products,we can represent A compactly as, T1 --小 2 2 T2 2 [11 1]=x1 工m m 2.2 Matrix-Vector Products Given a matrix A E Rmxn and a vector x E R",their product is a vector y=Ax E Rm. There are a couple ways of looking at matrix-vector multiplication,and we will look at each of them in turn. If we write A by rows,then we can express Ax as, x Ax= 4
2.1 Vector-Vector Products Given two vectors x, y ∈ R n , the quantity x T y, sometimes called the inner product or dot product of the vectors, is a real number given by x T y ∈ R = x1 x2 · · · xn y1 x2 . . . yn = Xn i=1 xiyi . Observe that inner products are really just special case of matrix multiplication. Note that it is always the case that x T y = y T x. Given vectors x ∈ R m, y ∈ R n (not necessarily of the same size), xyT ∈ R m×n is called the outer product of the vectors. It is a matrix whose entries are given by (xyT )ij = xiyj , i.e., xyT ∈ R m×n = x1 x2 . . . xm y1 y2 · · · yn = x1y1 x1y2 · · · x1yn x2y1 x2y2 · · · x2yn . . . . . . . . . . . . xmy1 xmy2 · · · xmyn . As an example of how the outer product can be useful, let 1 ∈ R n denote an n-dimensional vector whose entries are all equal to 1. Furthermore, consider the matrix A ∈ R m×n whose columns are all equal to some vector x ∈ R m. Using outer products, we can represent A compactly as, A = | | | x x · · · x | | | = x1 x1 · · · x1 x2 x2 · · · x2 . . . . . . . . . . . . xm xm · · · xm = x1 x2 . . . xm 1 1 · · · 1 = x1 T . 2.2 Matrix-Vector Products Given a matrix A ∈ R m×n and a vector x ∈ R n , their product is a vector y = Ax ∈ R m. There are a couple ways of looking at matrix-vector multiplication, and we will look at each of them in turn. If we write A by rows, then we can express Ax as, y = Ax = — a T 1 — — a T 2 — . . . — a T m — x = a T 1 x a T 2 x . . . a T mx . 4
In other words,the ith entry of y is equal to the inner product of the ith row of A and x, 头=ax. Alternatively,let's write A in column form.In this case we see that, =Ax= 小小小小 In other words,y is a linear combination of the columns of A,where the coefficients of the linear combination are given by the entries of z. So far we have been multiplying on the right by a column vector,but it is also possible to multiply on the left by a row vector.This is written,yT=TA for A ERmxn,e Rm, and y ER".As before,we can express yf in two obvious ways,depending on whether we express A in terms on its rows or columns.In the first case we express A in terms of its columns,which gives yT=TA=2T which demonstrates that the ith entry of y is equal to the inner product of x and the ith column of A. Finally,expressing A in terms of rows we get the final representation of the vector-matrix product, y=xTA =[12 品 =1[-a-]+2[-a3-]+.+xn[-a7-] so we see that y is a linear combination of the rows of A,where the coefficients for the linear combination are given by the entries of r. 2.3 Matrix-Matrix Products Armed with this knowledge,we can now look at four different (but,of course,equivalent) ways of viewing the matrix-matrix multiplication C AB as defined at the beginning of this section. First,we can view matrix-matrix multiplication as a set of vector-vector products.The most obvious viewpoint,which follows immediately from the definition,is that the (i,j)th 5
In other words, the ith entry of y is equal to the inner product of the ith row of A and x, yi = a T i x. Alternatively, let’s write A in column form. In this case we see that, y = Ax = | | | a1 a2 · · · an | | | x1 x2 . . . xn = a1 x1 + a2 x2 + . . . + an xn . In other words, y is a linear combination of the columns of A, where the coefficients of the linear combination are given by the entries of x. So far we have been multiplying on the right by a column vector, but it is also possible to multiply on the left by a row vector. This is written, y T = x TA for A ∈ R m×n , x ∈ R m, and y ∈ R n . As before, we can express y T in two obvious ways, depending on whether we express A in terms on its rows or columns. In the first case we express A in terms of its columns, which gives y T = x TA = x T | | | a1 a2 · · · an | | | = x T a1 x T a2 · · · x T an which demonstrates that the ith entry of y T is equal to the inner product of x and the ith column of A. Finally, expressing A in terms of rows we get the final representation of the vector-matrix product, y T = x TA = x1 x2 · · · xn — a T 1 — — a T 2 — . . . — a T m — = x1 — a T 1 — + x2 — a T 2 — + ... + xn — a T n — so we see that y T is a linear combination of the rows of A, where the coefficients for the linear combination are given by the entries of x. 2.3 Matrix-Matrix Products Armed with this knowledge, we can now look at four different (but, of course, equivalent) ways of viewing the matrix-matrix multiplication C = AB as defined at the beginning of this section. First, we can view matrix-matrix multiplication as a set of vector-vector products. The most obvious viewpoint, which follows immediately from the definition, is that the (i, j)th 5