Communication Theory Capacity of Multi-antenna Gaussian Channels* Lucent Technologies Bell Laborato EMRE TELATAR venue.Murtay Hill.NI 07974.USA Abstr et We in the nd/or receiving a nas for single use entia a sys 1 INTRODUCTION We will consider a amiliar co ha nnas by t and the entries form an i.id.Gaussian reaand imaginary ar with varianc ector ye depends on the transmitted .This choice modes y=Hx+n for each sow o the or.the cha t.The rans mitter is constrained in its total power to p g PRELIMINARIES con imaginary parts,= us,to specif everceri for the matix 1.H is deterministic. 网∈R2 and[-E阅(R-E[衡门∈R2ax2 nnel corresponds to We will say that a complex Gaussian random vectorx is fH. agerecincoiareoMtcoaspondn时 3.matrix.but is fixed once itis chosen -1=8现8}o) Vol.10.No.6.November-December 1999 55
1 Communication Theory] Capacity of Multi-antenna Gaussian Channels* EMRE TELATAR Lueent Technologies Bell Laboratories, 600 Mountain Avenue, Murray Hill, NJ 07974, USA telatar@ 1ucent.com Abstract. We investigate the use of multiple transmitting andor receiving antennas for single user communications over the additive Gaussian channel with and without fading. We derive formulas for the capacities and error exponents of such channels, and describe computational procedures to evaluate such formulas. We show that the potential gains of such multi-antenna systems over single-antenna systems is rather large under independenceassumptions for the fades and noises at different receiving antennas. 1 INTRODUCTION We will consider a single user Gaussian channel with multiple transmitting andor receiving antennas. We will denote the number of transmitting antennas by t and the number of receiving antennas by r. We will exclusively deal with a linear model in which the received vector y E Cr depends on the transmitted vector x E ([I via y=Hx+n (1) where H is a r x t complex matrix and n is zero-mean complex Gaussian noise with independent, equal variance real and imaginary parts. We assume E[nnt] = Ir, that is, the noises corrupting the different receivers are independent. The transmitter is constrained in its total power to P, E[xtx] 5 P. Equivalently, since xtx = tr(xxt), and expectation and trace commute, This second form of the power constraint will prove more useful in the upcoming discussion. We will consider several scenarios for the matrix H: 1. H is deterministic. 2. H is a random matrix (for which we shall use the notation H), chosen according to a probability distribution, and each use of the channel corresponds to an independent realization of H. 3. H is a random matrix, but is fixed once it is chosen. 'Invited paper The main focus of this paper in on the last two of these cases. The first case is included so as to expose the techniques used in the later cases in a more familiar context. In the cases when H is random, we will assume that its entries form an i.i.d. Gaussian collection with zero-mean, independent real and imaginary parts, each with variance 1/2. Equivalently, each entry of H has uniform phase and Rayleigh magnitude. This choice models a RayIeighfading environment with enough separation within the receiving antennas and the transmitting antennas such that the fades for each transmitting-receiving antenna pair are independent. In all cases, we will assume that the realization of H is known to the receiver, or, equivalently, the channel output consists of the pair (y, H), and the disfribution of H is known at the transmitter. 2 PRELIMINARIES A complex random vector x E C? is said to be Gaussian if the real random vector d E R2" consisting of its real and imaginary parts, 2 = [!lit(')], 3m(x) is Gaussian. Thus, to specify the distribution of a complex Gaussian random vector x, it is necessary to specify the expectation and covariance of 3, namely, E[2] E El2" and E[(d- E[2])(2- E[IZ])'] E R2nx2n. We will say that a complex Gaussian random vector x is circularly symmetric if the covariance of the corresponding d has the structure Vol. 10, No. 6, November-December 1999 585
E.Telatar for some Hermitian non-negative definite oeC*.Note that the real part of an Hermitian matrix is symmetric aring in (3)is real and symmetric.Ia this case(). T.o(x)=det()-exp(-()-1(i-A)) dom vector is specifed by =det(Q)-exp(-(x-u)'Q-(x-p)) Qhg5ypyoaoprcunt where the second equality follows from(4d)-(4h).The dif. Ha)=Ea【-log7e(xl logdet(Q)+(loge)E[x'Qx emma 1.The mappings→2=[and A→A =logdet()+(loge)tr(ExxO) =logdet(o)+(loge)tr(n have the following properties: logdet(reo). C=AB=→C=AB (4a) ortance of the circular C=A+B=C=A+月 14b】 metric complex Gaussians are entropy maximizers C=At→C= (4e) Lemma 2.Suppose the complex random vector x EC is C=A-1 =A- (4d) zero-mean and satisfies Elxx]=Q.i.e.Elx:=Qin det()=det(A)=det(AA") 4e} gdetreowhenhe es( t=x+y的主=+0 (4D y=Ax=A (4g e('y)= Eixx']=Q. (4h) 70(x)=det(*Q)-exp(-'-z) de闭=a(6可6-) o( (p)-H(Q)=-Epllogp(x)+Eollogra(x)] =det(A)det(A)'. =-E,los8】 (4f)(4g)and (4h)are immediate. ≤0 with eqaity only ifp=Yo.Thus(p)≤H(e.□ Proof.UTU =I(0)'0=h=ha doie then 的1]=E[x门t=i0t=R where K=AQA!. and (h) Lemma 4.Ifx andy are independent circularly symunetric xOx=阴c:0)=-O:>0 586
E. Telatar for some Hermitian non-negative definite Q E rc""". Note that the real part of an Hermitian matrix is symmetric and the imaginary part of an Hermitian matrix is antisymmetric and thus the matrix appearing in (3) is real and symmetric. In this case Z[(X - E[x])(x - !€[XI)+] = Q, and thus, a circularly symmetric complex Gaussian random vector x is specified by prescribing '€[XI and E[ (x - qXl)(x- mI,t]. For any z E C" and A E [Pxm define Lemma 1. The mappings z -+ 2 = [;:',:;] and A + A = Rc(A) -lim(A) have the following properties: Proof: The properties (4a), (4b) and (4c) are immediate. (4d) follows from (4a) and the fact that 1, = 12,. (4e) follows from det(A) =det ([A :]A [* I -il [I) = det ( [la A) :*I) = det(A) det(A)' , (4f), (4g) and (4h) are immediate. Corollary 1. (I E (TX" is unitaoifa?zd only $0 E PZnxZn is orthonormal. Corollary 2. /fQ E lPx" is non-negative definite then so is Q p 211 x Zn The probability density (with respect to the standard Lebesgue measure on C') of a circularly symmetric complex Gaussian with mean p and covariance Q is given by Y~,Q(~) = det(nQ)-'/'exp(-(f - fi)tQ-'(2- fi)) = det(rQ)-' exp (-(x- p) 'Q-'(x - p)) where the second equality follows from (4d)-(4h). The differential entropy of a complex Gaussian x with covariance Q is given by H(7Q) = !-?%a [-logYQ(x)l = logdet(nQ) + (loge) 'E[xtQ-'x] = logdet(nQ) + (Ioge)tr( '€[xxt]Q-') = logdet(nQ) + (loge) tr(1) = log det (nee). For us, the importance of the circularly symmetric complex Gaussians is due to the following lemma: circularly symmetric complex Gaussians are entropy maximizers. Lemma 2. Suppose the complex random vector x E ic is zero-mean and satisfies !E[xx'] = Q, i.e., E[x~x*.] J = Qij, 1 5 i, j 5 11. Then the entropy of x satisfies %(x) 5 logdet(neQ) with eqriafity ifand oniy ifx is ct circirlariy symmetric complex Gaussian with '€[xxt] = Q. Proof: Let p be any density function satisfying !E,[xix;] = Qi, 1 5 i,j 5 n. Let yp(x) = det(nQ)-l exp(-xtQ-'x). Observe that 'EyQ[~ixf] = Qij, and that logye(x) is a linear combination of the terms xi$. Thus qQ[log7~(x)] = Ep[logy~ (XI]. Then, @(PI - = - 2P[10gp(x)] f !-?%~[l~gTQ(~)l L 0, with equality only if p = 7p. Thus @(p) 5 H(yp). 0 Lemma 3. lf x E C' is a circularly symmetric complex Gaussian then so is y = Ax for any A E Pxn. Proof: We may assume x is zero-mean. Let Q = ~[xxt]. Then y is zero-mean, 3 = A%, and qyjq =A 55"BBtjAf = $AQ$ - = ZK ' ,. where K = AQA'. 0 Lemma 4. lfx and y are independent circiilarly symmetric coii~plex Gairssians, theti z = x i- y is a circ~rlnrly symtnetric cornplL.r Gaussian. Pmlf: Let A = E[xxt] and B = E[yy"]. Then '€[%?I = gc with C = A + 5. 0
Capacity of Multi-antenna Gaussian Channels 3 THE GAUSSIAN CHANNEL WITH FIXED Remark i (Recinrocity)Since the non-zer alues o TRANSFER FUNCTION HHare the same as those of HH we that th ies of channels corresponding toH and Hare the same Ve will sta Example I.Takeii=1 for all i.i.We can write H as 「可m H、 (m项.网 3.1 CAPACITY mg C=log(1+rp) H=UDVt The x=V&that achieves this capacity satisfies Elx x]= where UE and VE are unitary.andDeis Pkfora,ie,thetransmiersarealsendingthes ct.th agonal entries o at the re the colum ns of eigen Thus.we are uncorrelated the overall signal to noise ratio is prt. y=UDVx+n. Example 2.Taker=t=n and H=I.Then Let y=Uty,=V'x,n=U'n.Note that U and V are in- C=nlog(+P/n) channe For x that achieves this c apacity Efx x'l =P/n.i.e.the s of x are ii.d.How er,it is in rect to in 5=D龙+i 5) ssian.with ind the capcity of be achie ed by splitting the Since is of rank at most min),at most min of the ing these schemes sepa ately,and then sending the mod noting these by A d signals over the differe N:b ate them into =A入2+元,1≤i≤min{ct, and the rest of the co the corresp ding nd of these oba ty of erro ,r}is trans signal an ed to 3.2 ALTERNATIVE DERIVATION OF THE CAPACIT 0c(的=E0m内=- The mutual information I(x:y)can be written as where I(x:y)=(y)-H(ylx)=(y)-(n) d thus maximizing I(x:y)is lent to maximiz P=(-A)t C(u)=∑(a(uA) oatoehcov9aCe2Ei=Q,hen Vol.1.N.6.November-ecember 1999
Camcity of Multi-antenna Gaussian Channels 3 THE GAUSSIAN CHANNEL WITH FIXED TRANSFER FUNCTION We will start by reminding ourselves the case of deterministic H. The results of this section can be inferred from [I, Ch. 81 3.1 CAPACITY We will first derive an expression for the capacity C(H, P) of this channel. To that end, we will maximize the average mutual information f(x;y) between the input and the output of the channel over the choice of the distribution of x. By the singular value decomposition theorem, any matrix H E C’“‘ can be written as H = UDVt where U E Crxr and V E CXr are unitary, and D E Rrx‘ is non-negative and diagonal. In fact, the diagonal entries of D are the non-negative square roots of the eigenvalues of HHt, the columns of U are the eigenvectors of HHt and the columns of V are the eigenvectors of HtH. Thus, we can write (1) as y = UDV~X -+ n. Let 7 = Uty, % = Vtx, ii = Utn. Note that U and V are invertible, R has the same distribution as n and, E[ttt] = ‘€[xtx]. Thus, the original channel is equivalent to the channel Y=D%+ii (5) where R is zero-mean, Gaussian, with independent, identically distributed real and imaginary parts and E[fiiit] = I,. Since H is of rank at most min{ r,t}. at most min{ r, t} of the i = 1,. . ., min{r,r}, we can write (5) component-wise, to get singular values of it are non-zero. Denoting these by Xi 112 , ji = Xi 112 i;+iii, I _< i 5 min{r,t}, and the rest of the components of jj (if any) are equal to the corresponding components of ii. We thus see that j$ for i > min{t, r} is independent of the transmitted signal and that l; for i > min{t,r} don’t play any role. To maximize the mutual information, we need to choose {Ti : 1 5 i 5 min(r, t}) to be independent, with each 5; having independent Gaussian, zero-mean real and imaginary parts. The variances need to be chosen via “water-filling” as where p is chosen to meet the power constraint. Here, u+ denotes max{O,a}. The power P and the maximal mutual information can thus be parametrized as Remark I (Reciprocity). Since the non-zero eigenvalues of H’H are the same as those of HHt, we see that the capacities of channels corresponding to H and Ht are the same. Example 1. Take Hij = 1 for all i, j. We can write H as and we thus see that in the singular value decomposition of H the diagonal matrix D will have only one non-zero entry, fi. (We also see that the first column of U is *[I,. . ., 11’ and the first column of V is m[1,. ., I]+.) Thus, C=log(l+rtP). The x = V# that achieves this capacity satisfies E[xixf] = P/t for all i, j, i.e., the transmitters are all sending the same signal. Note that, even though each transmitter is sending a power of P/t, since their signals add coherently at the receiver, the power received at each receiver is Pt. Since each receiver sees the same signal and the noises at the receivers are uncorrelated the overall signal to noise ratio is Prt. Exarnple2. Taker=t=nandH=I,.Then C=nlog(l+P/n) For x that achieves this capacity !E[xixf] = G;jP/n, i.e, the components of x are i.i.d. However, it is incorrect to infer from this conclusion that to achieve capacity one has to do independent coding for each transmitter. It is true that the capacity of this channel can be achieved by splitting the incoming data stream into t streams, coding and modulating these schemes separately, and then sending the t modulated signals over the different transmitters. But, suppose Nt bits are going to be transmitted, and we will either separate them into t groups of N bits each and use each group to select one of 2N signals for each transmitter, or, we will use all all Nt bits to select one of 2” signal vectors. The second of these alternatives will yield a probability of error much smaller than the first, at the expense of much greater complexity. Indeed, the log of the error probability in the two cases will differ by a factor oft. (See the error exponents of parallel channels in [I, pp. 149-1501.) 3.2 ALTERNATIVE DERIVATION OF THE CAPACITY The mutual information I(x;y) can be written as I(x;Y) = d(y) -H(ylx) = H(Y) -H(n), and thus maximizing I(x;y) is equivalent to maximizing g(y). Note that if x satisfies Z[xrx] 5 P, so does x- ?€[XI, so we can restrict our attention to zero-mean x. Furthermore, if x is zero-mean with covariance ‘€[xxt] = Q, then Vol. 10, No. 6. November-December 1999 557
E.Telatar where the random cding exo(R)is given by an,which is the cas E风=哭op-pR mas 3 and 4).So.we can further restrict our attention to where,in turn,Enle)is given by the supremum over all input distributions gx satisfying the energy constraint of I(x:y)=logdet(l,+HQH)=logdet(+) olep,9d=-iogj[gl树a”d ality follo to choose maxim some th ion yo we get (atte quantity log det(+HH will occur in this note fre Eofp.Q)plogdet(+(1+)HOH) quently enough that we will let =p(1+p),H) (Q,H)logdet(1+HQH') same smaximizing C((+D).H) A=diag(A tion conce det(,+HQH')det(+UQUA) n uppe bound to the probability of error. equally well 0.Note also 4 THE GAUSSIAN CHANNEL WITH RAY- LEIGH FADING deU,+A/0A/23)≤Π1+2) Suppose now that the matrix H is not fixed.but is a ran the t tter.The e is thus with inputand outpu can be found be w me that the 2n=(),i=1. with in entrealandimigid each with ya try of H has unifor R expected m nitudes ual to ur ity.This is inte 0 ∑(Iog(rA) h enough physic e independ ce in the entric s ofH.We wil as before 3.3 ERROR EXPONENTS Lemma 5.Suppose H is Gaussian m CmlnwihindCpendCtealandinaginaygatswihzC capacity.Eror exponents provide a par engtd ate The upper o und is known as the ran- dom coding bound and is given by of this to G". s of Hare indep lent,the P(error)exp(-nE,(R)), 588 ETT
E. Telatar y is zero-mean with covariance ‘€[yyt] = HQHt + lr, and by Lemma 2 among such y the entropy is largest when y is circularly symmetric complex Gaussian, which is the case when x is circularly symmetric complex Gaussian (Lemmas 3 and 4). So, we can further restrict our attention to circularly symmetric complex Gaussian x. In this case the mutual information is given by Z(x;y) = logdet(lr + HQHt) = logdet(1, + QHtH) where the second equality follows from the determinant identity det(l+AB) = det(l+BA), and it only remains to choose Q to maximize this quantity subject to the constraints tr(Q) < P and that Q is non-negative definite. The quantity logdet(l+ HQH’) will occur in this note frequently enough that we will let P(Q, H) = logdet(l+ HQH’) to denote it. Since HtH is Hermitian it can be diagonalized, HtH = U’AU, with unitary U and non-negative diagonal A = diag( X 1 . . . A,). Applying the determinant identity again we see that det(lr+HQHt) = det(l, + il’/2UQUtil’/2). Observe that Q = UQUt is non-negative definite when and only when Q is, and that tr(Q) = tr(Q); thus the maximization over Q can be carried equally well over Q. Note also that for any non-negative definite matrix A, det(A) 5 ni Aii, thus det(l, + A’/zQA’’z) 5 n( 1 + &Xi) i with equality when Q is diagonal. Thus we see that the maximizing Q is diagonal, and the optimal diagonal entries can be found via “water-filling” to be 0;; = (p - A;’)+, i = I,. . .,t where p is chosen to satisfy xi Qii = P. The corresponding maximum mutual information is given by i as before. 3.3 ERROR EXPONENTS Knowing the capacity of a channel is not always sufficient. One may be interested in knowing how hard it is to get close to this capacity. Error exponents provide a partial answer to this question by giving an upper bound to the probability of error achievable by block codes of a given length n and rate R. The upper bound is known as the random coding bound and is given by ?‘(error) 5 exp(-nE,.(R)), where the random coding exponent E,(R) is given by where, in turn, Eo(p) is given by the supremum over all input distributions qx satisfying the energy constraint of In our case p(ylx) = det(Tl,)-’exp(-(y-x)+(y-x)). If we choose qx as the Gaussian distribution ye we get (after some algebra) The maximization of Eo over Q is thus same same problem as maximizing the mutual information, and we get Eo(p) = To choose qx as Gaussian is not optimal, and a distribution concentrated on a “thin spherical shell” will give better results as in [l, $7.3]-nonetheless, the above expression is a convenient lower bound to EO and thus yields an upper bound to the probability of error. PC(P/(l +P)IH). 4 THE GAUSSIAN CHANNEL WITH RAYLEIGH FADING Suppose now that the matrix H is not fixed, but is a random matrix H independent of both x and n. The realization H of H is assumed to be known at the receiver, but not at the transmitter. The channel is thus with input x and output (y,H) = (Hx+ n,H). We will assume that the entries of H are independent and each entry is zero-mean, Gaussian, with independent real and imaginary parts, each with variance 1/2. Equivalently, each entry of H has uniformly distributed phase and Rayleigh distributed magnitude, with expected magnitude square equal to unity. This is intended to model a Rayleigh fading channel with enough physical separation within the transmitting and the receiving antennas to achieve independence in the entries of H. We will first show that such an H is invariant under unitary transformations. Lemma 5. Suppose H E Vx‘ is a complex Gaussian matrix with independent identically distributed entries, each entry with independent real and imaginaryparts with zeromemi and equal variance. Then for any unitary U € Cxr, and V E Cx‘, the distribution of UHVt is the same as the distribrrtion H. Prooj? It suffices to show that G = UH has the same distribution as H. The lemma then follows from an application of this to Gt. Since columns of H are independent. the columns of G are independent also. It remains to check 588 ETT
Capacity of Multi-antenna Gaussian Channels s that of optimal e must f the f It is clear tha at the max Egg】=U Eh hlut=Ehh] P/.To summarize,we have shown the following: y ee皓 Theoreml,hecapacihafthctenclitnactiwiiwe n thie ea that the channel is pendent realiza of H is d case we are on f are valid ver in the limit of large r equals vever,the lts that follow rlog(1+P). (6 pacity 4.2 EVALUATION OF THE CAPACITY 4.1 CAPACITY Since the receiver knows the realization of H the chan- The mutual output is the det(+(P/t)H)=det((P/t)H' Ix:y,H)=Ix:H刊+IxyH and define =I(x:y H) 「trct =Ea(I(x:yi旧=H w- 1HtHr≥, n=maxr}andm三minr小.Then W isan mxmra mizes/( H)is the cir rly symmetric values.We can ite the capacity in termso We thus need to maximize (Q)=(,H)]=E[logdet(/,+HQH')] E[2+m e eigenvalues is known to be (see e.g.[2]or [3,p.371) 8ghWimiendDisoaegaead on)=KⅡeAmΠ- (Q)=E[logdet(I,+(HU)D(HU)')] A12.2m20 whereis a normalizing factor.The unordered eigen diagonal Give any such values then have the density mA=nmKΠey-夏A-护 itive definiter es.o The expectation we wish to compute [s+(-三es+al 0=20 =mE[og(1+(P/)A】 Vol.No.6 November-December 589
Capacity of Multi-antenna Gaussian Channels that each column of G has the same distribution as that of H. Since the columns of H are circularly symmetric complex Gaussian vectors, so are those of G. If g, and hi are the fh column of G and H respectively, then where the last equality holds because E[hjh) is a multiple In this section we will assume that the channel is memoryless: for each use of the channel an independent realization of H is drawn. In this case we are on familiar ground and the capacity can be computed as the maximum mutual information. However, the results that follow are valid verbatim for channels for which H is generated by an ergodic process: as long as the receiver observes the H process only the first order statistics are needed to determine channel capaci ty. of the identity matrix. 0 4.1 CAPACITY Since the receiver knows the realization of H, the channel output is the pair (y,H) = (Hx+n,H). The mutual information between input and output is then I(x;(y,H)) = I(x;H) + I(x;ylH) = I(x;ylH) = E~[l(x;y(H= H)]. We know from the previous section that if x is constrained to have covariance Q, the choice of x that maximizes I(x;y(H = H) is the circularly symmetric complex Gaussian of covariance Q, and !P(Q, H) = logdet(I, + HQHt) is the corresponding maximal mutual information. We thus need to maximize over the choice of non-negative definite Q subject to Since Q is non-negative definite, we can write it as Q = U5Ut where U is unitary and D is non-negative and diagonal. With this substitution S(Q) = '€[logdet(I,+ (HU)D(HU)t)] By Lemma 5 the distribution of HU is the same as that of H, and thus S(Q) = !P(D). We can thus restrict our attention to non-negative diagonal Q. Given any such Q and any permutation matrix n, consider Q" = 17Qllt. Since HI7 has the same distributionas H, @(en) = !P(Q). Note that for any H, the mapping Q ++ I, + HQHt is linear and preserves positive definiteness. Since logdet is concave on the set of positive definite matrices, Q c) #(el H) = logdet(l,+HQHt) is concave. It then follows that Q i-f S(Q) is concave. Thus tr(Q) 5 P. 1 t! Q= -CQ" satisfies !P(Q) 2 !P(Q) and tr(0) = tr(Q). Note that Q is a multiple of the identity matrix and we conclude that the optimal Q must be of the form d. It is clear that the maximum is achieved when a is the largest possible, namely f/t. To summarize, we have shown the following: Theorem 1. The capacity ofthe channel is achieved when x is a circularly symmetric complex Gaussian with zeromean and covariance (P/t)Ig. The capacity is given by E[logdet (I, + (P/t)HHt)]. Note that for fixed r, by the law of large numbers SHHt + I, almost surely as t gets large. Thus, the capacity in the limit of large t equals 4.2 EVALUATION OF THE CAPACITY Although the expectation"E[log det (I,+ (P /r)HHt)] is easy to evaluate for either r = 1 or t = 1, its evaluation gets rather involved for r and t larger than 1. We will now show how to do this evaluation. Note that det(I,+ (P/t)HHt) = det(I, + (P/t)HtH) and define W={ HH~ r<t H~H rzt, n = max{r,t} and m = min{r1t}. Then W is an m x m random non-negative definite matrix and thus has real, nonnegative eigenvalues. We can write the capacity in terms of the eigenvalues XI , . . . , Am of W: rm 1 EIClog(l+ (P/r)Xi) (7) i= I The distribution law of W is called the Wishart distribution with parameters m, n and the joint density of the ordered eigenvalues is known to be (see e.g. [2] or [3, p. 371) where Km,n is a normalizing factor. The unordered eigenvalues then have the density The expectation we wish to compute E[ i= 2 1 log( 1 + (P/t)Ai) Im = i= C 1 '€"log = m E[Iog( Vol. 10, No. 6, November-December 1999 589