18. J/16.394J: The Mathematics of Infinite Random Matrices Essentials of Finite Random Matrix Theory Alan edelman Handout #6, Tuesday, September 28, 2004 This handout provides the essential elements needed to understand finite random matrix theory. A cursory observation should reveal that the tools for infinite random matrix theory are quite different from the tools for finite random matrix theory. Nonetheless, there are significantly more published applications that use finite random matrix theory as opposed to infinite random matrix theory. Our belief is that many of the results that have been historically derived using finite random matrix theory can be reformulated and answered using infinite random matrix theory. In this sense, it is worth recognizing that in many applications it is an integral of a function of the eigenvalues that is more important that the mere distribution of the eigenvalues. For finite random matrix theory, the tools that often come into play when setting up such integrals are the Matric Jacobians, the Joint Eigenvalue Densities and the Cauchy-Binet theorem. We describe these in subsequent section 1 Matrix and vector Differentiation In this section. we concern ourselves with the differentiation of matrices. Differentiating matrix and vector functions is not significantly harder than differentiating scalar functions except that we need notation to keep track of the variables. We titled this section"matrix and vector"differentiation, but of course it is the function that we differentiate. The matrix or vector is just a notational package for the scalar functions involved. In the end. a derivative is nothing more than the "linearization" of all the involved functions We find it useful to think of this linearization both symbolically(for manipulative purposes)as well as numerically (in the sense of small numerical perturbations ). The differential notation idea captures these viewpoints very well We begin with the familiar product rule for scalars d(uv)=u(du)+u(du) from which we can derive that d(z)=3 z2dz. We refer to dz as a differential We all unconsciously interpret the"dr" symbolically as well as numerically. Sometimes it is nice to confirm on a computer that (x+e)3-x3 I do this by taking z to be 1 or 2 or randn(1) and e to be 001 or. 0001 The product rule holds for matrices as well d(Uv)=U(dv)+(du)V In the examples we will see some symbolic and numerical interpretations. Example 1: Y=X3 We use the product rule to differentiate Y(X)=x to obtain that d(x)=X(dX)+ X(dX)X+(dx)X
1 18.338J/16.394J: The Mathematics of Infinite Random Matrices Essentials of Finite Random Matrix Theory Alan Edelman Handout #6, Tuesday, September 28, 2004 This handout provides the essential elements needed to understand finite random matrix theory. A cursory observation should reveal that the tools for infinite random matrix theory are quite different from the tools for finite random matrix theory. Nonetheless, there are significantly more published applications that use finite random matrix theory as opposed to infinite random matrix theory. Our belief is that many of the results that have been historically derived using finite random matrix theory can be reformulated and answered using infinite random matrix theory. In this sense, it is worth recognizing that in many applications it is an integral of a function of the eigenvalues that is more important that the mere distribution of the eigenvalues. For finite random matrix theory, the tools that often come into play when setting up such integrals are the Matrix Jacobians, the Joint Eigenvalue Densities and the Cauchy-Binet theorem. We describe these in subsequent sections. Matrix and Vector Differentiation In this section, we concern ourselves with the differentiation of matrices. Differentiating matrix and vector functions is not significantly harder than differentiating scalar functions except that we need notation to keep track of the variables. We titled this section “matrix and vector” differentiation, but of course it is the function that we differentiate. The matrix or vector is just a notational package for the scalar functions involved. In the end, a derivative is nothing more than the “linearization” of all the involved functions. We find it useful to think of this linearization both symbolically (for manipulative purposes) as well as numerically (in the sense of small numerical perturbations). The differential notation idea captures these viewpoints very well. We begin with the familiar product rule for scalars, d(uv) = u(dv) + v(du), from which we can derive that d(x3) = 3x2dx. We refer to dx as a differential. We all unconsciously interpret the “dx” symbolically as well as numerically. Sometimes it is nice to confirm on a computer that 3 (x + ǫ)3 − x ≈ 3x 2 . (1) ǫ I do this by taking x to be 1 or 2 or randn(1) and ǫ to be .001 or .0001. The product rule holds for matrices as well: d(UV ) = U(dV ) + (dU)V . In the examples we will see some symbolic and numerical interpretations. Example 1: Y = X3 We use the product rule to differentiate Y (X) = X3 to obtain that d(X3) = X2(dX) + X(dX)X + (dX)X2
When I introduce undergraduate students to matrix multiplication, I tell them that matrices are like scalars except that they do not commute at first. Numerically take X-randn(n)and E-randn(n) for e=001 say, and then compule m less familiar The numerical (or first order perturbation theory) interpretation applies, but it may see X+∈E ≈X2E+XEX+EX This is the matrix version of (1). Holding X fixed and allowing E to vary, the right-hand side is a linear function of E. There is no simpler form possible Symbolically(or numerically) one can take dX= Ek which is the matrix that has a one in element(k, 1) nd 0 elsewhere. Then we can write down the matrix of partial derivatives 0X3 X(Ekl)+X(Ek)X+(Ek)X As we let h, I vary over all possible indices, we obtain all the information we need to compute the derivative in any general direction E In general, the directional derivative of Yi,(X)in the direction dX is given by(dY)ij. For a particula matrix X, dy(X) is a matrix of directional derivatives corresponding to a first order perturbation in tl direction E=dX. It is a matrix of linear functions corresponding to the linearization of Y(X)about X Structured Perturbations We sometimes restrict our E to be a structured perturbation. For example if X is triangular, symmetric antisymmetric, or even sparse then often we wish to restrict E so that the pattern is maintained in the perturbed matrix as well. An important case occurs when X is orthogonal. We will see in an example belor that we will want to restrict E so that X E is antisymmetric when X is orthogonal Here y is a scalar and dot products commute so that dy= 2r dz. When y= l, a is on the unit sphere. o stay on the sphere, we need dy=0 so that z dr=0, i.e., the tangent to the sphere is perpendicular to the sphere. Note the two uses of dy. In the first case it is the change to the squared length of a. In the second case, by setting dy=0, we find perturbations to a which to first order do not change the length at ll.Indeed if one computes(r+dr)(r+dr) for a small finite dr, one sees that if r de=0 then the length changes only to second order. Geometrically, one can draw a tangent to a circle. The distance to the circle is second order in the distance along the tangent Example 3: y=z Ar Again y is scalar. We have dy=dxTAz +Adz. If A is symmetric then dy=2x"Adz Example 4: Y=X-1 We have that XY=I so that X(dy)+(dX)Y=0 so that dy=-X-dXX-I We recommend that the reader compute e-((X+EE)-X)numerically and verify that it is equal to-x-IEX-I In other words (X+E)-=X--eX-Ex-1+O(e2) Example 5:I=QQ If Q is orthogonal we have that Q dQ +dQ Q=0 so that Q"dQ is antisymmetric. If y is a scalar function of T1, T2,..., In then we have the"chain rule dy +dx2+.+d
When I introduce undergraduate students to matrix multiplication, I tell them that matrices are like scalars, except that they do not commute. The numerical (or first order perturbation theory) interpretation applies, but it may seem less familiar at first. Numerically take X=randn(n) and E=randn(n) for ǫ = .001 say, and then compute (X + ǫE)3 − X3 ≈ X2E + XEX + EX2 . (2) ǫ This is the matrix version of (1). Holding X fixed and allowing E to vary, the right-hand side is a linear function of E. There is no simpler form possible. Symbolically (or numerically) one can take dX = Ekl which is the matrix that has a one in element (k, l) and 0 elsewhere. Then we can write down the matrix of partial derivatives: ∂X3 = X2(Ekl) + X(Ekl)X + (Ekl)X2 . ∂xkl As we let k, l vary over all possible indices, we obtain all the information we need to compute the derivative in any general direction E. In general, the directional derivative of Yij (X) in the direction dX is given by (dY )ij . For a particular matrix X, dY (X) is a matrix of directional derivatives corresponding to a first order perturbation in the direction E = dX. It is a matrix of linear functions corresponding to the linearization of Y (X) about X. Structured Perturbations We sometimes restrict our E to be a structured perturbation. For example if X is triangular, symmetric, antisymmetric, or even sparse then often we wish to restrict E so that the pattern is maintained in the perturbed matrix as well. An important case occurs when X is orthogonal. We will see in an example below that we will want to restrict E so that XTE is antisymmetric when X is orthogonal. Example 2: y = xTx Here y is a scalar and dot products commute so that dy = 2xTdx. When y = 1, x is on the unit sphere. To stay on the sphere, we need dy = 0 so that xTdx = 0, i.e., the tangent to the sphere is perpendicular to the sphere. Note the two uses of dy. In the first case it is the change to the squared length of x. In the second case, by setting dy = 0, we find perturbations to x which to first order do not change the length at all. Indeed if one computes (x+ dx)T (x+ dx) for a small finite dx, one sees that if xT dx = 0 then the length changes only to second order. Geometrically, one can draw a tangent to a circle. The distance to the circle is second order in the distance along the tangent. Example 3: y = xTAx Again y is scalar. We have dy = dxTAx + xTAdx. If A is symmetric then dy = 2xTAdx. Example 4: Y = X−1 We have that XY = I so that X(dY ) + (dX)Y = 0 so that dY = −X−1dXX−1 . We recommend that the reader compute ǫ−1((X + ǫE)−1 − X−1) numerically and verify that it is equal to −X−1EX−1 . In other words, (X + ǫE) −1 = X−1 − ǫX−1EX−1 + O(ǫ2). Example 5: I = QTQ If Q is orthogonal we have that QTdQ + dQTQ = 0 so that QTdQ is antisymmetric. In general, d(QTQ) = QTdQ+dQTQ, but with no orthogonality condition on Q, there is no anti-symmetry condition on QTdQ. If y is a scalar function of x1, x2, . . . , xn then we have the “chain rule” ∂y ∂y ∂y dy = dx1 + dx2 + . . . + dxn . ∂x1 ∂x2 ∂xn
Often we wish to apply the chain rule to every element of a vector or matrix Let X P2+gr pq+rsthen pr+rs gs+82 dY=XdX+dXX 2 Matrix Jacobians(getting started) 2.1 Definition Let IE Rn and y=y(a)E Rn be a differentiable function of a. It is well known that the Jacobian matrix 0 dIn evaluated at a point a approximates y(a) by a linear function. Intuitively y(a +dr)s y(r)+JSr, i.e., J is the matrix that allows us to invoke perturbation theory. The function y may be viewed as performing a hange of variables Furthermore(intuitively)if a little box of n dimensional volume e surrounds a, then it is transformed by y into a parallelopiped of volume det Je around y(a). Therefore the Jacobian det is the magnification factor for volumes If we are integrating some function of y E R as in p(y)dy, (where dy=dy..dyn), then when we change variables from y to z where y= y(a), then the integral becomes ply(a)ldet(a)l dr. For many people this becomes a matter of notation, but one should understand intuitively that the Jacobian tells you how volume elements scale The determinant is 0 exactly where the change of variables breaks down. It is always instructive to see when this happens. Either there is no"a'"locally for each"y" or there are many as in the example of polar coordinates at the Notation: throughout this book, J denotes the Jacobian matrix.(Sometimes called the derivative or simply the Jacobian in the literature. We will consistently write det for the Jacobian determinant (un- fortunately also called the Jacobian in the literature. When we say Jacobian, we will be talking about both 2.2 Simple Examples (n=2 We get our feet wet with some simple 2x 2 examples. Every reader is familiar with changing scalar variables f(a)dr=/f(2)(2y)dy We want the reader to be just as comfortable when f is a scalar function of a matrix and we change X=Y2 f(X)(dx)= One can compute all of the 2 by 2 Jacobians that follow by hand. but in some cases it can be tedious and hard to get right on the first try. Code 8.1 in MATLAB takes away the drudgery and gives the right answer. Later we will learn fancy ways to get the answer without too much drudgery and also without the aid of a computer. We now consider the following examples 2X 2 Example 1: Matrix Square Y=X
R R Z Z Often we wish to apply the chain rule to every element of a vector or matrix. 2 Let X = p q and Y = X2 = p + qr pq + rs then r s pr + rs qs + s2 dY = XdX + dXX . (3) 2 Matrix Jacobians (getting started) 2.1 Definition Let x ∈ Rn and y = y(x) ∈ Rn be a differentiable function of x. It is well known that the Jacobian matrix ∂y1 ∂y1 ∂x1 · · · ∂xn . ∂yi J = . . . = . . ∂xj i,j=1,2,...,n ∂yn ∂yn ∂x1 · · · ∂xn evaluated at a point x approximates y(x) by a linear function. Intuitively y(x + δx) ≈ y(x) + Jδx, i.e., J is the matrix that allows us to invoke perturbation theory. The function y may be viewed as performing a change of variables. Furthermore (intuitively) if a little box of n dimensional volume ǫ surrounds x, then it is transformed by y into a parallelopiped of volume det | | J ǫ around y(x). Therefore the Jacobian | | det J is the magnification factor for volumes. If we are integrating some function of y ∈ Rn as in p(y)dy, (where dy = dy1 . . . dyn), then when we change variables from y to x where y = y(x), then the integral becomes p(y(x))|det( ∂yi ) dx. For many ∂xj | people this becomes a matter of notation, but one should understand intuitively that the Jacobian tells you how volume elements scale. The determinant is 0 exactly where the change of variables breaks down. It is always instructive to see when this happens. Either there is no “x” locally for each “y” or there are many as in the example of polar coordinates at the origin. Notation: throughout this book, J denotes the Jacobian matrix. (Sometimes called the derivative or simply the Jacobian in the literature.) We will consistently write det J for the Jacobian determinant (unfortunately also called the Jacobian in the literature.) When we say Jacobian, we will be talking about both. 2.2 Simple Examples (n=2) We get our feet wet with some simple 2×2 examples. Every reader is familiar with changing scalar variables as in Z Z f(x) dx = f(y 2)(2y) dy . We want the reader to be just as comfortable when f is a scalar function of a matrix and we change X = Y 2: f(X)(dX) = f(Y 2)(Jacobian)(dY ). One can compute all of the 2 by 2 Jacobians that follow by hand, but in some cases it can be tedious and hard to get right on the first try. Code 8.1 in MATLAB takes away the drudgery and gives the right answer. Later we will learn fancy ways to get the answer without too much drudgery and also without the aid of a computer. We now consider the following examples: 2 × 2 Example 1: Matrix Square (Y = X2)
2 x 2 Example 2: Matrix Cube(Y=X3 2 X 2 Example 3: Matrix Inverse(Y=X) 2 X 2 Example 4: Linear Transformation (Y= AX+B) 2 X 2 Example 5: The LU Decomposition(X=LU) 2 X 2 Example 6: A Symmetric Decomposition (S= DMD) 2 X 2 Example 7: Traceless Symmetric = Polar Decomposition(S=QAQ, tr S=O) 2 X 2 Example 8: The Symmetric Eigenvalue Problem(S=QAQ) 2 X 2 Example 9: Symmetric Congruence(Y= ATSA) Discussion: Example 1: Matrix Square(Y=X2) With A≈/pq and Y=X2 the jacobian matrix of interest is 01aY1 +s aYr rq0 0Y12 On this first example we label the columns and rows so that the elements correspond to the definition J=(OX. Later we will omit the labels. We invite readers to compare with Equation (3). We see that the Jacobian matrix and the differential contain the same information. We can compute then det=4(p+s),(sp-gr)=4(tr X)2 det(X) Notice that breakdown occurs if x is singular or has trace zero Example 2: Matrix Cube(Y=x) With X p q andY=X p2+2 erp+sr p2+2qr+(p+s) rp+2sr p(p+s)+2 qr+8 pq+2qs pg +2 gs (p+s)+sr 2 qr+3s so that det J=9(sp-gr)(gr +p2+82+sp)2-9(det X)(tr X2+(tr X)2)2 Breakdown occurs if X is singular or if the eigenvalue ratio is a complex cube root of unit Example 3: Matrix Inverse (Y=X) and Y=X-I
" # " # 2 × 2 Example 2: Matrix Cube (Y = X3) 2 × 2 Example 3: Matrix Inverse (Y = X−1) 2 × 2 Example 4: Linear Transformation (Y = AX + B) 2 × 2 Example 5: The LU Decomposition (X = LU) 2 × 2 Example 6: A Symmetric Decomposition (S = DMD) 2 × 2 Example 7: Traceless Symmetric = Polar Decomposition (S = QΛQT , tr S = 0) 2 × 2 Example 8: The Symmetric Eigenvalue Problem (S = QΛQT ) 2 × 2 Example 9: Symmetric Congruence (Y = ATSA) Discussion: Example 1: Matrix Square (Y = X2) " # With X = p q and Y = X2 the Jacobian matrix of interest is r s ∂p ∂r ∂q ∂s 2p q r 0 ∂Y11 r p + s 0 r ∂Y21 J = q 0 p + s q ∂Y12 0 q r 2s ∂Y22 On this first example we label the columns and rows so that the elements correspond to the definition J = ∂Yij . Later we will omit the labels. We invite readers to compare with Equation (3). We see that ∂Xkl the Jacobian matrix and the differential contain the same information. We can compute then det J = 4(p + s) 2(sp − qr) = 4(tr X) 2 det(X). Notice that breakdown occurs if X is singular or has trace zero. Example 2: Matrix Cube (Y = X3) p q With X = and Y = X3 r s 3 p2 + 2 qr pq + q (p + s) 2 rp + sr qr 2 rp + sr p2 + 2 qr + (p + s) s r2 rp + 2 sr J = , 2 pq + qs q2 p (p + s) + 2 qr + s2 pq + 2 qs qr pq + 2 qs r (p + s) + sr 2 qr + 3 s2 so that 2 2 det J = 9(sp − qr) 2(qr + p + s + sp) 2 = 9 (det X) 2(tr X2 + (tr X) 2) 2 . 4 Breakdown occurs if X is singular or if the eigenvalue ratio is a complex cube root of unity. Example 3: Matrix Inverse (Y = X−1) p q With X = and Y = X−1 r s 2 −s qs sr −qr sr −ps 2 rp 1 −r J = , det X2 × qs −q2 −ps pq − 2 qr pq rp −p
so that det =(det x)-1 Breakdown occurs if X is singular Example 4: Linear Transformation(Y= AX +B) c do 0 00 The Jacobian matrix has two copies of the constant matrix A so that det J= det A2=(det A)2.Breakdown if a Example 5: The LU Decomposition(X=LU) The LU Decomposition computes a lower triangular L with ones on the diagonal and an upper triangular U such that X=LU For a general 2 x 2 matrix it takes the form which exists when p≠0. Notice the function of four variables might be written: r/p The Jacobian matrix is itself lower triangular 0010 so that det =1/p. Breakdown occurs when p=0 Example 6: A Symmetric Decomposition(S=DMD) Any symmetric matrix x= P r may be written as 0 X=DMD where D r/vps 1 Since X is symmetric, there are three independent variables, and thus the Jacobian matrix is 3x 3. The three independent elements in D and M may be thought of as functions of p, r, and s: namely y2
" # " # " # 6 " # so that det J = (det X) −4 . Breakdown occurs if X is singular. Example 4: Linear Transformation (Y = AX + B) a b 0 0 c d 0 0 J = 0 0 a b 0 0 c d The Jacobian matrix has two copies of the constant matrix A so that det J = det A2 = (det A)2 . Breakdown occurs if A is singular. Example 5: The LU Decomposition (X = LU) The LU Decomposition computes a lower triangular L with ones on the diagonal and an upper triangular U such that X = LU. For a general 2 × 2 matrix it takes the form p q 1 0 p q = r s which exists when p = 0. r p p ps−qr 1 0 Notice the function of four variables might be written: y1 = p y2 = r/p y3 = q y4 = (ps − qr)/p = det(X)/p The Jacobian matrix is itself lower triangular 1 0 0 0 p−1 0 0 r p2 , 0 0 1 0 − J = qr p2 so that det J = 1/p. Breakdown occurs when p = 0. Example 6: A Symmetric Decomposition (S = DMD) Any symmetric matrix X = p r may be written as r s √p 0 1 r/√ps X = DMD where D = . 0 √s r/√ps 1 Since X is symmetric, there are three independent variables, and thus the Jacobian matrix is 3 × 3. The three independent elements in D and M may be thought of as functions of p, r, and s: namely y1 = √p y2 = √s and y3 = r/√ps q p r p − − 1 and M =