51.5.EigenvaluesandeigenvectorsDeterminants are usually calculated with a Laplace development along anygiven row or column.To this end, let A =(ai) Rn.Now,definetheminorm(i,j)/of aijas the determinant of the (n-1)x (n-1)"submatrix"obtained by deleting the ith row and thejth column of A and the cofactorof aij as c(i,3) = (-1)i+j|m(i,j)l. Then, the Laplace development of [A/along the ith row is [A/ - j-1 ij-c(i,j) and a similar development alongthe jth column is [A/ = "=1 aij c(i,j). By defining adj(A) = (c(j,i),the transpose of the matrix of cofactors, to be the adjoint of A, it can beshown A-1 =|A/-1 adj(A).But thenPropositionl.4Aisone-to-oneA/+0Proof.A is one-to-onemeans it has an inverseB,A/B|=1 so[Al + 0. But, conversely, if [Al + 0, suppose Ax = Zj=iajaj = 0,then substituting Ax for the ith column of ATA/=0,i=1,...,nTiai:j=1口so that x = 0, whereby A is one-to-one.In general, for aj e Rn, j - 1,...,k, write A = (ai,...,a) and formthe“"inner product"matrix A'A=(a'a,)ERk.We findProposition 1.5 For AERr1. ker A = ker A'A2. rank A= rank A'A3.ai....,akare linearly independent in Rn A'A/+0.Proof.IfxEkerA.then Ax=oA'Ax=o,and,conversely,ifxEker A'A,thenA'Ax=0xA'Ax=0=|Ax/2Ax=0The second partfollows from the relationk=nullityA+rank A and the口third part is immediate as ker A = (0] iff ker A'A = [0].1.5Eigenvalues and eigenvectorsWe now briefly state some concepts related to eigenvalues and eigenvectors.Consider,first, the complex vector space Cn.The conjuguate of u = r+iy EC, ,y E R, is = -iy.The concepts defined earlier are anologous in thiscase.TheHermitian transpose of a column vector v= (ui)E Cn is the rowvector vH = (us)'. The inner product on Cn can then be written (vi,V2) =
1.5. Eigenvalues and eigenvectors 5 Determinants are usually calculated with a Laplace development along any given row or column. To this end, let A = (aij ) ∈ Rn n. Now, define the minor |m(i, j)| of aij as the determinant of the (n−1)×(n−1) “submatrix” obtained by deleting the ith row and the jth column of A and the cofactor of aij as c(i, j)=(−1)i+j |m(i, j)|. Then, the Laplace development of |A| along the ith row is |A| = n j=1 aij ·c(i, j) and a similar development along the jth column is |A| = n i=1 aij · c(i, j). By defining adj(A)=(c(j, i)), the transpose of the matrix of cofactors, to be the adjoint of A, it can be shown A−1 = |A| −1 adj(A). But then Proposition 1.4 A is one-to-one ⇐⇒ |A| = 0. Proof. A is one-to-one means it has an inverse B, |A| |B| = 1 so |A| = 0. But, conversely, if |A| = 0, suppose Ax = n j=1 xjaj = 0, then substituting Ax for the ith column of A a1,.,n j=1 xjaj ,., an = xi |A| = 0, i = 1,.,n so that x = 0, whereby A is one-to-one. ✷ In general, for aj ∈ Rn, j = 1,.,k, write A = (a1,., ak) and form the “inner product” matrix A A = (a iaj ) ∈ Rk k. We find Proposition 1.5 For A ∈ Rn k , 1. ker A = ker A A 2. rank A = rank A A 3. a1,., ak are linearly independent in Rn ⇐⇒ |A A| = 0. Proof. If x ∈ ker A, then Ax = 0 =⇒ A Ax = 0, and, conversely, if x ∈ ker A A, then A Ax = 0 =⇒ x A Ax = 0 = |Ax| 2 =⇒ Ax = 0. The second part follows from the relation k = nullity A + rank A and the third part is immediate as ker A = {0} iff ker A A = {0}. ✷ 1.5 Eigenvalues and eigenvectors We now briefly state some concepts related to eigenvalues and eigenvectors. Consider, first, the complex vector space Cn. The conjuguate of v = x+iy ∈ C, x, y ∈ R, is v = x−iy. The concepts defined earlier are anologous in this case. The Hermitian transpose of a column vector v = (vi) ∈ Cn is the row vector vH = (vi) . The inner product on Cn can then be written v1, v2 =
61.LinearalgebravHv2 for any vi, v2 E Cn. The Hermitian transpose of A = (ais) E Cmis AH = (aji) E Cm and satisfies for B e Cp, (AB)H = BHAH. Thematrix A e Cn is termed Hermitian iff A = AH. We now define what ismeant by an eigenvalue.A scalar e C is an eigenvalue of A e Cn if thereexistsavectorvoin Cn suchthatAv=Av.Equivalently,ECisaneigenvalueof AiffA-I|=O,which is apolynomial equationofdegreen.Hence,thereare n complex eigenvalues, some of which may be real,withpossibly somerepetitions (multiplicity).The vector v isthentermed theeigenvector of A corresponding to theeigenvalue ).Note thatif v is aneigenvector, so is qv,Va o in C,and, in particular,v/lvis a normalizedeigenvector.Now, before defining what is meant by A is“diagonalizable"we definea matrix U E Cn to be unitary iff UHU=I =UUH, This means thatthe columns (or rows) of U comprise an orthonormal basis of Cn. We noteimmediately that if [ui,...,un) is an orthonormal basis of eigenvectorscorresponding to eigenvalues [Ai....,An], then A can be diagonalized bythe unitary matrix U - (ui,..., un); i.e., we can writeUHAU=UH(Aul,.., Aun) - UH(Aiul,..., Anun) = diag(A),where-(i....,An)'.Another simple related property:If there exists aunitary matrix U - (ui,...,un) such that UHAU = diag(), then u, isan eigenvector corresponding to A.To verifythis,notethatAu;=Udiag()UHu,=Udiag()ei=U),e,=^,u.Two fundamental propositions concerning Hermitian matrices are thefollowingProposition 1.6 If A e Cn is Hermitian, then all its eigenvalues are real.Proof.vHAv= (vHAv)H -vHAHV=vHAv,which means that vH Av is real for any v E Cn, Now, if Av = Av for someV+ 0 in Cn,then vHAv=AvHv=A/v/?.But since vHAv and v/2are口real, so is 入.Proposition 1.7 If A e Cn is Hermitian and vi and v2 are eigenvectorscorresponding to eigenvalues i and 2,respectively, where i A2, thenVi l V2.Proof. Since A is Hermitian, A=AH and Ai, i =1,2, are real. Then,AV=HAH=IA=HAV2=H2Av2 = ^2V2 — vHAv2 = ^2vI v2.Subtracting thelast two expressions,(1->2)vv2=0and,thus,vv2口0
6 1. Linear algebra vH 1 v2 for any v1, v2 ∈ Cn. The Hermitian transpose of A = (aij ) ∈ Cm n is AH = (aji) ∈ Cn m and satisfies for B ∈ Cn p , (AB)H = BHAH. The matrix A ∈ Cn n is termed Hermitian iff A = AH. We now define what is meant by an eigenvalue. A scalar λ ∈ C is an eigenvalue of A ∈ Cn n if there exists a vector v = 0 in Cn such that Av = λv. Equivalently, λ ∈ C is an eigenvalue of A iff |A − λI| = 0, which is a polynomial equation of degree n. Hence, there are n complex eigenvalues, some of which may be real, with possibly some repetitions (multiplicity). The vector v is then termed the eigenvector of A corresponding to the eigenvalue λ. Note that if v is an eigenvector, so is αv, ∀α = 0 in C, and, in particular, v/|v| is a normalized eigenvector. Now, before defining what is meant by A is “diagonalizable” we define a matrix U ∈ Cn n to be unitary iff UHU = I = UUH. This means that the columns (or rows) of U comprise an orthonormal basis of Cn. We note immediately that if {u1,., un} is an orthonormal basis of eigenvectors corresponding to eigenvalues {λ1,.,λn}, then A can be diagonalized by the unitary matrix U = (u1,., un); i.e., we can write UHAU = UH(Au1,., Aun) = UH(λ1u1,.,λnun) = diag(λ), where λ = (λ1,.,λn) . Another simple related property: If there exists a unitary matrix U = (u1,., un) such that UHAU = diag(λ), then ui is an eigenvector corresponding to λi. To verify this, note that Aui = U diag(λ)UHui = U diag(λ)ei = Uλiei = λiui. Two fundamental propositions concerning Hermitian matrices are the following. Proposition 1.6 If A ∈ Cn n is Hermitian, then all its eigenvalues are real. Proof. vHAv = (vHAv) H = vHAHv = vHAv, which means that vHAv is real for any v ∈ Cn. Now, if Av = λv for some v = 0 in Cn, then vHAv = λvHv = λ|v| 2. But since vHAv and |v| 2 are real, so is λ. ✷ Proposition 1.7 If A ∈ Cn n is Hermitian and v1 and v2 are eigenvectors corresponding to eigenvalues λ1 and λ2, respectively, where λ1 = λ2, then v1 ⊥ v2. Proof. Since A is Hermitian, A = AH and λi, i = 1, 2, are real. Then, Av1 = λ1v1 =⇒ vH 1 AH = vH 1 A = λ1vH 1 =⇒ vH 1 Av2 = λ1vH 1 v2, Av2 = λ2v2 =⇒ vH 1 Av2 = λ2vH 1 v2. Subtracting the last two expressions, (λ1−λ2)vH 1 v2 = 0 and, thus, vH 1 v2 = 0. ✷
1.5.Eigenvalues and eigenvectorsTProposition 1.7 immediately shows that if all the eigenvalues of A, Her-mitian,are distinct.then there exists an orthonormal basis of eigenvectorswhereby A is diagonalizable.Toward proving this is true even when theeigenvalues maybeof amultiplenature,weneed thefollowingpropositionHowever, before stating it, define T= (ti) eRn to be a lower triangularmatrix iff tij = O, i < j. Similarly, T e Rn is termed upper triangular ifftij =0,i>j.Proposition l.8 Let A E Cn be any matrir. There erists a unitary matrirUECn such thatUHAU is upper triangular.Proof.The proof is by induction on n.Theresult is obvious for n=1.Next, assume the proposition holds for n and prove it is true for n + 1.Let A be an eigenvalue of A and ui,[uil = 1, be an eigenvector.LetUi = (ui,F) for some T such that Ui is unitary (such a T exists from theGram-Schmidt method). Then,uHATUHAUi=UH(Aiui,Ar)Awhere B =rHAT e Cn.From the induction hypothesis, there exists Vunitary such that VHBV -T is triangular.Define0'1U20and it is clear thatU2isalso unitary.FinallyuHAr入1UH(U,U2)HA(U,U2)U2-0B0'uHAI入VH00Bu'ArV入1T0which is of the desired form. The proof is complete because U =U,U, is口unitary.As a corollary we obtain that Hermitian matrices are always diagonalizable.Corollary l.l Let A E Cn be Hermitian. There erists a unitary matrirU such that UH AU= diag().Proof. Proposition 1.8 showed there exists U, unitary, such that UH AUis triangular. However, if A is Hermitian, so is UH AU. The only matrices口that are both Hermitian and triangular are the diagonal matrices.In the sequel, we will always use Corollary 1.1 for S ERn symmetric.However, first note that when S is symmetric all its eigenvalues are real,whereby the eigenvectors can also be chosen to be real, they are the solu-tions of (S - I)x = o. When U e Rn is unitary, it is called an orthogonal
1.5. Eigenvalues and eigenvectors 7 Proposition 1.7 immediately shows that if all the eigenvalues of A, Hermitian, are distinct, then there exists an orthonormal basis of eigenvectors whereby A is diagonalizable. Toward proving this is true even when the eigenvalues may be of a multiple nature, we need the following proposition. However, before stating it, define T = (tij ) ∈ Rn n to be a lower triangular matrix iff tij = 0, i<j. Similarly, T ∈ Rn n is termed upper triangular iff tij = 0, i>j. Proposition 1.8 Let A ∈ Cn n be any matrix. There exists a unitary matrix U ∈ Cn n such that UHAU is upper triangular. Proof. The proof is by induction on n. The result is obvious for n = 1. Next, assume the proposition holds for n and prove it is true for n + 1. Let λ1 be an eigenvalue of A and u1, |u1| = 1, be an eigenvector. Let U1 = (u1,Γ) for some Γ such that U1 is unitary (such a Γ exists from the Gram-Schmidt method). Then, UH 1 AU1 = UH 1 (λ1u1, AΓ) = λ1 uH 1 AΓ 0 B , where B = ΓHAΓ ∈ Cn n. From the induction hypothesis, there exists V unitary such that VHBV = T is triangular. Define U2 = 1 0 0 V and it is clear that U2 is also unitary. Finally, (U1U2) HA(U1U2) = UH 2 λ1 uH 1 AΓ 0 B U2 = 1 0 0 VH λ1 uH 1 AΓ 0 B 1 0 0 V = λ1 uH 1 AΓV 0 T , which is of the desired form. The proof is complete because U ≡ U1U2 is unitary. ✷ As a corollary we obtain that Hermitian matrices are always diagonalizable. Corollary 1.1 Let A ∈ Cn n be Hermitian. There exists a unitary matrix U such that UHAU = diag(λ). Proof. Proposition 1.8 showed there exists U, unitary, such that UHAU is triangular. However, if A is Hermitian, so is UHAU. The only matrices that are both Hermitian and triangular are the diagonal matrices. ✷ In the sequel, we will always use Corollary 1.1 for S ∈ Rn n symmetric. However, first note that when S is symmetric all its eigenvalues are real, whereby the eigenvectors can also be chosen to be real, they are the solutions of (S − λI)x = 0. When U ∈ Rn n is unitary, it is called an orthogonal
81. Linear algebramatrix instead. A matrix H Rn is said to be orthogonal iff the columns(orrows)ofHform an orthonormal basis ofRn,i.e.,H'H =I=HH'.The group of orthogonal matrices in Rn will be denoted byOn=(HERn:HH'-I).We have proven the"spectral decomposition:"Proposition 1.9 If S e Rn is symmetric, then there erists H e On suchthat H'SH = diag(>).The columns of H form an orthonormal basis of eigenvectors and X is thevector of corresponding eigenvalues.Now, a symmetric matrix S e Rn is said to be positive semidefinite,denoted S≥o or S PSn,iff v'Sv ≥O,VvERn,and itispositivedefinite,denoted S >0 or S EPn,iff v'Sv>0,Vv0.Finally,thepositive semidefinite and positive definite matrices can be characterized interms of eigenvalues.Proposition 1.1o Let S eRn symmetric with eigenvalues Ai,..., An1S≥oiff入≥0,i=l,...,n.2.S>0ff>0,i=1,...,n.Notethat if S is positive semidefinite,then fromProposition 1.9,we canwriteS = HDH' = (HD1/2)(HD1/2) = (HD1/2H')2,where D = diag(入,) and D1/2 = diag(>1/2), so that for A = HD1/2,S =AA', or for B -HD1/2H', S =B2.The positive semidefinite matrixB is often denoted Si/2 and is the square root of S.If S is positive definite,we can also define S-1/2 = HD-1/2H', which satisfies (S-1/2)2 = s-1.Finally,inequalities between matrices must be understood in terms of pos-itive definiteness; i.e., for matrices A and B, A≥B (respectively A>B)means A-B≥0 (respectivelyA-B>0)A related decomposition which will prove useful for canonical correlationsis the singular value decomposition (SVD).Proposition 1.11 Let A ERm of rank A=r.There erists G EOm,HeOnsuchthatA=GHwhere Dp = diag(pi,...,Pr), pi>0, i=1,..,r.Proof. Since A'A ≥O, there exists H= (hi,...,hn) On such thatA'A=Hdiag(>,...,A,,O)H
8 1. Linear algebra matrix instead. A matrix H ∈ Rn n is said to be orthogonal iff the columns (or rows) of H form an orthonormal basis of Rn, i.e., H H = I = HH . The group of orthogonal matrices in Rn n will be denoted by On = {H ∈ Rn n : HH = I}. We have proven the “spectral decomposition:” Proposition 1.9 If S ∈ Rn n is symmetric, then there exists H ∈ On such that H SH = diag(λ). The columns of H form an orthonormal basis of eigenvectors and λ is the vector of corresponding eigenvalues. Now, a symmetric matrix S ∈ Rn n is said to be positive semidefinite, denoted S ≥ 0 or S ∈ PSn, iff v Sv ≥ 0, ∀v ∈ Rn, and it is positive definite, denoted S > 0 or S ∈ Pn, iff v Sv > 0, ∀v = 0. Finally, the positive semidefinite and positive definite matrices can be characterized in terms of eigenvalues. Proposition 1.10 Let S ∈ Rn n symmetric with eigenvalues λ1,.,λn. 1. S ≥ 0 iff λi ≥ 0, i = 1,.,n. 2. S > 0 iff λi > 0, i = 1,.,n. Note that if S is positive semidefinite, then from Proposition 1.9, we can write S = HDH = (HD1/2)(HD1/2) = (HD1/2H ) 2, where D = diag(λi) and D1/2 = diag(λ1/2 i ), so that for A = HD1/2, S = AA , or for B = HD1/2H , S = B2. The positive semidefinite matrix B is often denoted S1/2 and is the square root of S. If S is positive definite, we can also define S−1/2 = HD−1/2H , which satisfies S−1/22 = S−1. Finally, inequalities between matrices must be understood in terms of positive definiteness; i.e., for matrices A and B, A ≥ B (respectively A > B) means A − B ≥ 0 (respectively A − B > 0). A related decomposition which will prove useful for canonical correlations is the singular value decomposition (SVD). Proposition 1.11 Let A ∈ Rm n of rank A = r. There exists G ∈ Om, H ∈ On such that A = G Dρ 0 0 0 H where Dρ = diag(ρ1,.,ρr), ρi > 0, i = 1,.,r. Proof. Since A A ≥ 0, there exists H = (h1,., hn) ∈ On such that A A = H diag(λ1,.,λr, 0) H
1.6.Orthogonalprojections9where >,> 0, i = 1,...,r. For j>r, |Ah,|? =h',A'Ahj = 0 which meansAh, = 0. For j ≤ r, define Pi = V, and gj = Ah;/pj. Then, g,gj =h,A'Ahi/piPj =dij; i.e., gi,....gr are orthonormal. By completing to anorthonormal basisofRm,wecanfindG = (gi....gr.gr+1.....gm) eOm.Now,0.j>rg,Ahj =4Pjoj≤ror in matrix notation.G'AH-口In the SVD p?, j = 1,.,r, are the nonzero eigenvalues of A'A and thecolumns of H arethe eigenvectors.1.6OrthogonalprojectionsNow recall some basic facts about orthogonal projections. By definition,an orthogonal projection,P, is simply a linear transformation for whichx-Px IPy,Vx,y eR", but then, equivalently,(x-Px)(Py)=0,Vx,y ERnxPy =x'P'Py,Vx,y ERnPP=PP=P=P2.A matrix P such that P =P'=p2is also called an idempotent matrixNot surprisingly,an orthogonal projection is completely determined by itsimage.Proposition1.12 IfP1andP2are two orthogonal projections,thenImPi=ImP2Pi=P2Proof.It holds sincex-PixlP2y,Vx,yER"P2=PP2,口and, similarly, P1=P,Pi, whence P1 =P =-P2.If X = (xi,..., xk) is any basis for Im P, we have explicitlyP = X(X'X)-1X(1.3)To see this, simply write Px = Xb, and orthogonality, X'(x - Xb) = 0,determines the (unique) coefficients b = (X'X)-1X'x.In particular, for
1.6. Orthogonal projections 9 where λi > 0, i = 1,.,r. For j>r, |Ahj | 2 = h jA Ahj = 0 which means Ahj = 0. For j ≤ r, define ρj = λj and gj = Ahj/ρj . Then, g igj = h iA Ahj/ρiρj = δij ; i.e., g1,., gr are orthonormal. By completing to an orthonormal basis of Rm, we can find G = (g1,., gr, gr+1,., gm) ∈ Om. Now, g iAhj = 0, j>r ρj δij , j ≤ r, or in matrix notation, G AH = Dρ 0 0 0 . ✷ In the SVD ρ2 j , j = 1,.,r, are the nonzero eigenvalues of A A and the columns of H are the eigenvectors. 1.6 Orthogonal projections Now recall some basic facts about orthogonal projections. By definition, an orthogonal projection, P, is simply a linear transformation for which x − Px ⊥ Py, ∀x, y ∈ Rn, but then, equivalently, (x − Px) (Py)=0, ∀x, y ∈ Rn ⇐⇒ x Py = x P Py, ∀x, y ∈ Rn ⇐⇒ P P = P ⇐⇒ P = P = P2. A matrix P such that P = P = P2 is also called an idempotent matrix. Not surprisingly, an orthogonal projection is completely determined by its image. Proposition 1.12 If P1 and P2 are two orthogonal projections, then Im P1 = Im P2 ⇐⇒ P1 = P2. Proof. It holds since x − P1x ⊥ P2y, ∀x, y ∈ Rn =⇒ P2 = P 1P2, and, similarly, P1 = P 2P1, whence P1 = P 1 = P2. ✷ If X = (x1,., xk) is any basis for Im P, we have explicitly P = X(X X) −1X . (1.3) To see this, simply write Px = Xb, and orthogonality, X (x − Xb) = 0, determines the (unique) coefficients b = (X X)−1X x. In particular, for