The Stieltjes transform based approach om Matrices 18. J/16.394J: The Mathematics of Infinite Rando Rai rao Handout #4, Thursday, September 16, 2004 1 The eigenvalue distribution function For an N X N matrix AN, the eigenvalue distribution function (e. d.f. FAN(a)is defined as FaN()=summer or egenv es orAns As defined, the e.d. f. is right continuous and possibly atomic i.e. with step discontinuities at discrete points. In practical terms, the derivative of(1), referred to as the(eigenvalue) level density, is simply the appropriately normalized histogram of the eigenvalues of A. The MATLAB code hist we distributed earlier approximates this density A surprising result in infinite RMT is that for some matrix ensembles, the expectation EIFAN(a)has a well defined i. e. not zero and not infinite limit. We drop the notational dependence on N in(1) by defining the limiting edf as This limiting e.d. f. is also sometimes referred to in literature as the integrated density of states[ 2, 3.It derivative is referred to as the level density in physics literature 4. The region of support associated with this limiting density ply ion where dFA(a)#0. When discussing the limiting e.d.f. we shall often distinguish between, its atomic and non-atomic components 2 The Stieltjes transform representation One step removed from the e.d. f. is the Stieltjes transform which has proved to be an efficient tool for determining this limiting density. For all non-real z the Stieltjes(or Cauchy) transform of the probability neasure FA(a)is given by ()=/2=4F()Im:≠0 (3) The integral above is over the whole 3 or some subset of the real axis since for the matrices of interest such as the Hermitian or real symmetric matrices, the eigenvalues are real. When we refer to the"Stieltjes transform of A "in this paper, we are referring to a(z) defined as in (3)expressed in terms of the limiting density dF(a)of the random matrix ensemble A I This is also 2Unless we state otherwise any reference to an e.d. f. or the level density. in this paper will refer to the corresponding limiting e.d.f. or density respectively. 3While the Stieltjes integral is over the positive real axis, the tegral is more general [5] and can include complex contours as well. This distinction is irrelevant for several practica of matrices, such as the sample covariance matrices, there all of the eigenvalues are non-negative this paper,(3) will be referred to as the Stieltjes transform with the implicit assumption that the integral is over t. eal axis
6 � 18.338J/16.394J: The Mathematics of Infinite Random Matrices The Stieltjes transform based approach Raj Rao Handout #4, Thursday, September 16, 2004. 1 The eigenvalue distribution function For an N × N matrix AN , the eigenvalue distribution function 1 (e.d.f.) F AN (x) is defined as F AN (x) = Number of eigenvalues of AN ≤ x . (1) N As defined, the e.d.f. is right continuous and possibly atomic i.e. with step discontinuities at discrete points. In practical terms, the derivative of (1), referred to as the (eigenvalue) level density, is simply the appropriately normalized histogram of the eigenvalues of AN . The MATLAB code histn we distributed earlier approximates this density. A surprising result in infinite RMT is that for some matrix ensembles, the expectation E[F AN (x)] has a well defined i.e. not zero and not infinite limit. We drop the notational dependence on N in (1) by defining the limiting e.d.f. as F A(x) = lim E[F AN (x)]. (2) N→∞ This limiting e.d.f. 2 is also sometimes referred to in literature as the integrated density of states [2, 3]. Its derivative is referred to as the level density in physics literature [4]. The region of support associated with this limiting density is simply the region where dF A(x) = 0. When discussing the limiting e.d.f. we shall often distinguish between, its atomic and non-atomic components. 2 The Stieltjes transform representation One step removed from the e.d.f. is the Stieltjes transform which has proved to be an efficient tool for determining this limiting density. For all non-real z the Stieltjes (or Cauchy) transform of the probability measure F A(x) is given by 1 mA(z) = x − z dF A(x) Im z 6= 0. (3) The integral above is over the whole 3 or some subset of the real axis since for the matrices of interest, such as the Hermitian or real symmetric matrices, the eigenvalues are real. When we refer to the “Stieltjes transform of A” in this paper, we are referring to mA(z) defined as in (3) expressed in terms of the limiting density dF A(x) of the random matrix ensemble A. 1This is also referred to in literature as the empirical distribution function [1]. 2Unless we state otherwise any reference to an e.d.f. or the level density. in this paper will refer to the corresponding limiting e.d.f. or density respectively. 3While the Stieltjes integral is over the positive real axis, the Cauchy integral is more general [5] and can include complex contours as well. This distinction is irrelevant for several practical classes of matrices, such as the sample covariance matrices, where all of the eigenvalues are non-negative. Nonetheless, throughout this paper, (3) will be referred to as the Stieltjes transform with the implicit assumption that the integral is over the entire real axis
The Stieltjes transform in (3)can also be interpreted as an expectation with respect to the measure (r) such that Since there is a one-to-one correspondence between the probability measure FA(a) and the Stieltjes nsform, convergence of the Stieltjes transform can be used to show the convergence of the probability measure Fa(r). Once this convergence has been established, the Stieltjes transform can be used to yield the density using the so-called Stieltjes-Perron inversion formula [6 d=亓如mmm4(x+i) (5) When studying the limiting distribution of large random matrices, the Stieltjes transform has proved to be particularly relevant because of its correspondence with the matrix resolvent. The trace of the matrix resolvent, MA(z), defined as MA(z)=(AN-2r)-I can be written as trMA(z)l where Ai for i= 1, 2, are the eigenvalues of AN. For any AN, MA(z) is a non-random quantity However, when AN is a random matrix ma(z)=lim =tr[MA(a) The Stieltjes transform and its resolvent form in(7) are intimately linked to the classical moment problem 6. This connection can be observed by noting that the integral in (3) can be expressed as an analytic multipole" series expansion about 2= oo such that 2k+1 k+1 k=0 k=1 where MA=adFA()is the Ath moment of r on the probability measure dFA(a). The analyticity of the Stieltjes transform about z oo expressed in( 8)is a consequence of our implicit assumption that the region of support for the limiting density dFa(a)is bounded i.e. limz-oo dF(a)=0 Incidentally, the n-transform introduced by Tulino and Verdi in 7 can be expressed in terms of m(z) nd permits a series expansion about 2=0 Given the relationship in(7), it is worth noting that the matrix moment MA is simply MA= lim xtr[AN]=/rkdFA() Equation(8), written as a multipole series, suggests that a way of computing the density would be te determine the moments of the random matrix as in(9), and then invert the Stieltjes transform using(5).For the famous semi-circular law, Wigner actually used a moment based approach in [8 to determine the density for the standard Wigner matrix though he did not explicitly invert the Stieltjes transform as we suggested e reader may imagine, such a moment based approach is not particularly useful for more genera classes of random matrices. We discuss a more relevant and practically useful Stieltjes transform based
� � � � The Stieltjes transform in (3) can also be interpreted as an expectation with respect to the measure F A(x) such that 1 mA(z) = EX . (4) x − z Since there is a one-to-one correspondence between the probability measure F A(x) and the Stieltjes transform, convergence of the Stieltjes transform can be used to show the convergence of the probability measure F A(x). Once this convergence has been established, the Stieltjes transform can be used to yield the density using the so-called Stieltjes-Perron inversion formula [6] dF A(x) 1 = lim Im mA(x + iξ). (5) dx π ξ→0 When studying the limiting distribution of large random matrices, the Stieltjes transform has proved to be particularly relevant because of its correspondence with the matrix resolvent. The trace of the matrix resolvent, MA(z), defined as MA(z) = (AN − zI)−1 can be written as � N 1 tr[MA(z)] = (6) λi − z i=1 where λi for i = 1, 2, . . . , N are the eigenvalues of AN . For any AN , MA(z) is a non-random quantity. However, when AN is a large random matrix, 1 mA(z) = lim tr[MA(z)]. (7) N→∞ N The Stieltjes transform and its resolvent form in (7) are intimately linked to the classical moment problem [6]. This connection can be observed by noting that the integral in (3) can be expressed as an analytic “multipole” series expansion about z = ∞ such that � ∞ k ∞ � k mA(z) = zk+1 dF A(x) = � x dF A(x) � x zk+1 − − k=0 k=0 ∞ (8) 1 � Mk A = . zk+1 −z − k=1 where Mk A = xkdF A(x) is the kth moment of x on the probability measure dF A(x). The analyticity of the Stieltjes transform about z = ∞ expressed in (8) is a consequence of our implicit assumption that the region of support for the limiting density dF A(x) is bounded i.e. limx→∞ dF A(x) = 0. Incidentally, the η-transform introduced by Tulino and Verd`u in [7] can be expressed in terms of m(z) and permits a series expansion about z = 0. Given the relationship in (7), it is worth noting that the matrix moment MA is simply k MA = lim 1 tr[Ak N ] = x kdF A(x). (9) k N→∞ N Equation (8), written as a multipole series, suggests that a way of computing the density would be to determine the moments of the random matrix as in (9), and then invert the Stieltjes transform using (5). For the famous semi-circular law, Wigner actually used a moment based approach in [8] to determine the density for the standard Wigner matrix though he did not explicitly invert the Stieltjes transform as we suggested above, As the reader may imagine, such a moment based approach is not particularly useful for more general classes of random matrices. We discuss a more relevant and practically useful Stieltjes transform based approach next
3 Stieltjes transform based approach Instead of obtaining the Stieltjes transform directly, the so-called Stieltjes transform approach relies instead on finding a canonical equation that the Stieltjes transform satisfies. The Marcenko-Pastur theorem 9 was the first and remains most famous example of such an approach. We include a statement of its theorem in the form found in literature. We encourage you to write this theorem, as an exericse, in a simpler manner. Theorem 1(The Marcenko-Pastur Theorem ). Consider an N X N matriT, BN. Assume that 1. Xn is an nxN matriz such that the matriz elements Xn are independent identically distributed (i i.d. opler random variables with mean zero and variance I i.e. XmEC, EIXml=0 and Ell Xm=1 2.n=n(N)uihn/N→c>0asN→∞ 3. Tn= diag(T, T2, .. Tn) where Ti E R, and the e d f. of Ti,..., Tn converges almost surely in distribution to a probability distribution function H(T)as N-o0 4. BN=AN+NXnTnXn, where AN is a Hermitian NXN matriz for which FAN converges vaguely to A almost surely, A being a possibly defective(i.e. with discontinuities) nonrandom distribution function Xn, Tn, and AN are independent Then, almost surely, FoN converges vaguely, almost surely, as N-oo to a nonrandom d f. F whose Stieltjes transform m(2), zE C, satisfies the canonical equation dhl m(2)=mA(z- 1+m(z) (10) We now illustrate the use of this theorem with a representative example. This example will help highlight issues that will be of pedagogical interest throughout this semester Suppose AN=0ie. BN=NXnTnXn. The Stieltjes transform of AN, by the definition in(3), is then Hence, using the Marcenko-Pastur theorem as expressed in(10), the Stieltjes transform m(z)of BN is given Rearranging the terms in this equation and using m instead of m(z) for notational convenience, we get dH(T) (13) Equation(13)expresses the dependence between the Stieltjes transform variable m and probability space variable z. Such a dependence, expressed explicitly in terms of dH(T), will be referred to throughout this paper as a canonical equation. Equation(13)can also be interpreted as the expression for the functional inverse of m(z) To determine the density of BN by using the inversion formula in(5) we need to first solve(13)for m(z) In order to obtain an equation in m and z we need to first know dH(T)in(13). In theory, dH(r) could be any density that satisfies the conditions of the Marcenko-Pastur theorem. However, as we shall shortly recognize, for an arbitrary distribution, it might not be possible to obtain an analytical or even an easy numerical solution for the density On the other hand, for some specific distributions of dH(T), it will indeed be possible to analytically obtain the density We consider one such distribution below L ppose T=I i.e. the diagonal elements of Tn are non-random with d.f. dH(T)=8(T-1).Equation (14)
� � � � 3 Stieltjes transform based approach Instead of obtaining the Stieltjes transform directly, the so-called Stieltjes transform approach relies instead on finding a canonical equation that the Stieltjes transform satisfies. The Marˇcenko-Pastur theorem [9] was the first and remains most famous example of such an approach. We include a statement of its theorem in the form found in literature. We encourage you to write this theorem, as an exericse, in a simpler manner. Theorem 1 (The Marˇcenko-Pastur Theorem ). Consider an N × N matrix, BN . Assume that 1. Xn is an n×N matrix such that the matrix elements Xn ij are independent identically distributed (i.i.d.) complex random variables with mean zero and variance 1 i.e. Xn ij ] = 0 and E[kXn = 1. ij ∈ C, E[Xn ijk2] 2. n = n(N) with n/N → c > 0 as N → ∞. 3. Tn = diag(τ1 n, τ2 n, . . . , τ n n } converges almost surely in n ) where τi n ∈ R, and the e.d.f. of {τ1 n, . . . , τ n distribution to a probability distribution function H(τ) as N → ∞. 1 4. BN = AN + N X∗TnXn, where AN is a Hermitian N ×N matrix for which F AN n converges vaguely to A almost surely, A being a possibly defective (i.e. with discontinuities) nonrandom distribution function. 5. Xn, Tn, and AN are independent. Then, almost surely, F BN converges vaguely, almost surely, as N → ∞ to a nonrandom d.f. F B whose Stieltjes transform m(z), z ∈ C, satisfies the canonical equation τ dH(τ) m(z) = mA z − c (10) 1 + τ m(z) We now illustrate the use of this theorem with a representative example. This example will help us highlight issues that will be of pedagogical interest throughout this semester. 1 Suppose AN = 0 i.e. BN = N X∗TnXn. The Stieltjes transform of AN , by the definition in (3), is then n simply 1 1 mA(z) = = . (11) 0 − z −z Hence, using the Marˇcenko-Pastur theorem as expressed in (10), the Stieltjes transform m(z) of BN is given by 1 m(z) = � . (12) τ dH(τ) −z − c 1+τm(z) Rearranging the terms in this equation and using m instead of m(z) for notational convenience, we get 1 τ dH(τ) z = + − c . (13) m 1 + τm Equation (13) expresses the dependence between the Stieltjes transform variable m and probability space variable z. Such a dependence, expressed explicitly in terms of dH(τ), will be referred to throughout this paper as a canonical equation. Equation (13) can also be interpreted as the expression for the functional inverse of m(z). To determine the density of BN by using the inversion formula in (5) we need to first solve (13) for m(z). In order to obtain an equation in m and z we need to first know dH(τ) in (13). In theory, dH(τ) could be any density that satisfies the conditions of the Marˇcenko-Pastur theorem. However, as we shall shortly recognize, for an arbitrary distribution, it might not be possible to obtain an analytical or even an easy numerical solution for the density On the other hand, for some specific distributions of dH(τ), it will indeed be possible to analytically obtain the density We consider one such distribution below. Suppose Tn = I i.e. the diagonal elements of Tn are non-random with d.f. dH(τ) = δ(τ − 1). Equation (13) then becomes 1 c z = + − . (14) m 1 + m
Rearranging the terms in the above equation we get zm(1+m)=-(1+m)+cm (15) which. with a bit of algebra can be written as m(1-c+z)+1=0 Equation(16)is a polynomial equation in m whose coefficients are polynomials in z. We will refer to such polynomials, often derived from canonical equations as Stieltjes(transform) polynomials for the remainder of this paper. As discussed, to obtain the density using (5) we need to first solve(16) for m in terms of z. Since, from (16), we have a second degree polynomial in m it is indeed possible to analytically solve for its roots an obtain the density. Figure 1: Level density for BN=(1/N)XnXn with c= 2 This level density, sometimes referred to in the literature as the Marcenko-Pastur distribution, is b dF(a) b√(x-b-)(b+=x (17) where b+=(1+va and I(- b+ is the indicator function that is equal to 1 for b-<z<b+ and 0 elsewhere Figure 1 compares the histogram of the eigenvalues of 1000 realizations of the matrix B IX*X wit N=100 and n= 200 with the solid line indicating the theoretical density given by(17)for c=n/N=2 ance matrices that appear often in array processing applications em that is motivated by the sample covari- We now consider a modification to the marcenko-Pastur thee 3.1 The Sample Covariance Matrix In the previous section we used the Marcenko-Pastur theorem to examine the density of a class of random matrices BN= LX*TnXn. Suppose we defined the N x n matrix, YN=X:Tn/ 2, then BN may be written Recall that Tn was assumed to be diagonal and non-negative definite so Tl/ can be constructed uniquely up o the sign. If YN were to be interpreted as a matrix of observations, then BN written as(18)is reminiscent
� � � Rearranging the terms in the above equation we get zm(1 + m) = −(1 + m) + cm (15) which, with a bit of algebra, can be written as m z 2 + m(1 − c + z) + 1 = 0. (16) Equation (16) is a polynomial equation in m whose coefficients are polynomials in z. We will refer to such polynomials, often derived from canonical equations as Stieltjes (transform) polynomials for the remainder of this paper. As discussed, to obtain the density using (5) we need to first solve (16) for m in terms of z. Since, from (16), we have a second degree polynomial in m it is indeed possible to analytically solve for its roots and obtain the density. dFB/dx 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 −1 0 1 2 3 4 5 6 7 x Figure 1: Level density for BN = (1/N)X∗Xn with c = 2. n This level density, sometimes referred to in the literature as the Marˇcenko-Pastur distribution, is given by dF B(x) = max 0, 1 − c δ(x) + (x − b−)(b+ − x) I[ b− ,b+] (17) dx 2πx where b± = (1±√c)2 and I[b ,b+] is the indicator function that is equal to 1 for b− < z < b+ and 0 elsewhere. − 1 Figure 1 compares the histogram of the eigenvalues of 1000 realizations of the matrix BN = N Xn ∗Xn with N = 100 and n = 200 with the solid line indicating the theoretical density given by (17) for c = n/N = 2. We now consider a modification to the Marˇcenko-Pastur theorem that is motivated by the sample covariance matrices that appear often in array processing applications. 3.1 The Sample Covariance Matrix In the previous section we used the Marˇcenko-Pastur theorem to examine the density of a class of random 1 ∗ 1/2 matrices BN = N X∗TnXn. Suppose we defined the N × n matrix, YN = XnTn , then BN may be written n as 1 ∗ BN = YN YN . (18) N 1/2 Recall that Tn was assumed to be diagonal and non-negative definite so Tn can be constructed uniquely up to the sign. If YN were to be interpreted as a matrix of observations, then BN written as (18) is reminiscent
四 ↓an Figure 2: The matrices Bn and CN when n>N of sample covariance matrices that in many engineering and statistical applications. However, among other things, it is subtly different be of the normalization of the by the number of rows N of YN rather than by the number of colum Hence, we need to come up definition of a sample covar matrix that mirrors the manner in which it is used in practical applications With engineering, particularly signal processing applications in mind, we introduce the n x N matrix In'-Xn and define to be the sample covariance matrit(SCM). By comparing(18)and(19) while recalling the definitions of YN and On, it is clear that the eigenvalues of Bn are related to the eigenvalues of Cn. For BN of the form in(18) the Marcenko- Pastur theorem can be used to obtain the canonical equation for mB(a) given by(13). Recal that by mB(z) we mean the Stieltjes transform associated with the limiting e d f. F(r) of BN as N-00 There is however, an exact relationship between the non-limiting e df's FBN() and the FCn(a)and hence the corresponding Stieltjes transforms mBN(a) and mc, (z) respectively. We exploit this relationship below to derive the canonical equation for Cn from the canonical equation for BN given in(13) Figure 2 schematically depicts Cn and BN when n>N i.e. when c>l. In this case, Cn, as denoted in the figure, will have n-N zero eigenvalues. The other N eigenvalues of Cn will, however, be identically equal to the N eigenvalues of BN. Hence, the e d f. of Cn can be exactly expressed in terms of the e.d. f of Recalling the definition of the Stieltjes transform in(3), this implies that the Stieltjes transform mcn (a) of Cn is related to the Stieltjes transform mBN(z) of BN by the expression mc,(2) +cmB(2) Similarly, Figure 3 schematically depicts Cn and BN when n< N i.e. c<l. In this case, BN, as denoted in the figure, will have N-n zero eigenvalues. The other n eigenvalues of BN will, however, be identically equal to the n eigenvalues of Cn. Hence, as before, the e.d. f. of BN can be exactly expressed in terms of the e.d.f. of C. as N(a) I(o,oo+-FCn(r 2-1)4as+=Fc Once again, recalling the definition of the Stieltjes transform in(3), this implies that the Stieltjes transform mcn(=)of Cn is related to the Stieltjes transform mBN(a) of Bn by the expres
� � � � � � � � 1 n n X∗ n n N n N n T 1/2 n n T 1/2 Xn n n N n n n n N n X∗ n T 1/2 T n Xn 1/2 n O 1 N n n N YN × YN × O ∗ ∗ = = Cn BN Figure 2: The matrices Bn and CN when n > N. of sample covariance matrices that appear in many engineering and statistical applications. However, among ∗ other things, it is subtly different because of the normalization of the YN YN by the number of rows N of YN rather than by the number of columns n. Hence, we need to come up with a definition of a sample covariance matrix that mirrors the manner in which it is used in practical applications. With engineering, particularly signal processing applications in mind, we introduce the n × N matrix, 1/2 On = Tn Xn and define 1 ∗ Cn = OnOn (19) N to be the sample covariance matrix (SCM). By comparing (18) and (19) while recalling the definitions of YN and On, it is clear that the eigenvalues of Bn are related to the eigenvalues of Cn. For BN of the form in (18), the Marˇcenko-Pastur theorem can be used to obtain the canonical equation for mB(z) given by (13). Recall, that by mB(z) we mean the Stieltjes transform associated with the limiting e.d.f. F B(x) of BN as N → ∞. There is however, an exact relationship between the non-limiting e.d.f.’s F BN (x) and the F Cn (x) and hence the corresponding Stieltjes transforms mBN (z) and mCn (z) respectively. We exploit this relationship below to derive the canonical equation for Cn from the canonical equation for BN given in (13). Figure 2 schematically depicts Cn and BN when n > N i.e. when c > 1. In this case, Cn, as denoted in the figure, will have n − N zero eigenvalues. The other N eigenvalues of Cn will, however, be identically equal to the N eigenvalues of BN . Hence, the e.d.f. of Cn can be exactly expressed in terms of the e.d.f. of BN as n F Cn (x) = n − N I(0,∞] + F BN (x) (20) N N = (c − 1)I(0,∞] + cF BN (x). (21) Recalling the definition of the Stieltjes transform in (3), this implies that the Stieltjes transform mCn (z) of Cn is related to the Stieltjes transform mBN (z) of BN by the expression mCn (z) = − c − 1 + cmBN (z). (22) z Similarly, Figure 3 schematically depicts Cn and BN when n < N i.e. c < 1. In this case, BN , as denoted in the figure, will have N − n zero eigenvalues. The other n eigenvalues of BN will, however, be identically equal to the n eigenvalues of Cn. Hence, as before, the e.d.f. of BN can be exactly expressed in terms of the e.d.f. of Cn as N F BN (x) = N − n I(0,∞] + F Cn (x) (23) n n 1 1 = c − 1 I(0,∞] + F Cn (x) (24) c Once again, recalling the definition of the Stieltjes transform in (3), this implies that the Stieltjes transform mCn (z) of Cn is related to the Stieltjes transform mBN (z) of BN by the expression 1 1 1 mBN (z) = − c − 1 z + c mCn (z). (25)