6 CHAPTER 1.INTRODUCTION ·1995年,Gallager提出“Combining queueing theory with information theory for multiaccess;l9g8年,Ephremidus?发表了论文Information theory and commu- i2000 N.9 (Networ 和PR.K ion flo 文TeC 提出了传送容量 a tv的 腐念202后,研究正向若大规模网的信总理论发展T large networks) 1.5 A model of digital communication systems Channel encoder 信湖一→信湖编码洛 →EC的 数字方制器 Char 信道 ECC译码器 数字解器 symbols waveform Figure1.2:数字通信系统示意图 .source:discrete/continuous;memoryless/with memory encoder:convert the messages into the signal which is suitable for transmission over physical channels. .channel:wireless/cable,disk. ·interference,. ·sink:destination. 1.6 Review of Probability Baves rule: P(AIB)=P(AOB)=P(BIA)P(A) P(B) P(B) For a discrete random variable(r.v.), Probability Mass Function (PMF) Px(x)=P(X=z) denotes the Prob.of the event that X takes on the value For a continuous r.v
6 CHAPTER 1. INTRODUCTION • 1995年,Gallager提出“Combining queueing theory with information theory for multiaccess”; 1998年,Ephremidus发表了论文“Information theory and communication networks: An unconsummated union”; 2000年,R. Alshwede, N. Cai, S.-Y. R. Li, and R. W. Yeung发表了著名论文“Network information flow”,提出 了网络编码(Network coding)的思想; 2000年,P. Gupta和P. R. Kumar发表了论 文“The Capacity of wireless networks”,提出了传送容量(Transport capacity)的 概念;2003之后,研究正向着大规模网络的信息理论发展(Towards an IT of large networks)。 1.5 A model of digital communication systems 信源 信源编码器 ECC 编码器 数字调制器 信 道 干 扰 信宿 信源译码器 ECC 译码器 数字解调器 调制信道 Channel decoder n(t) bits Channel encoder symbols waveform Figure 1.2: 数字通信系统示意图 • source: discrete/continuous; memoryless/with memory • encoder: convert the messages into the signal which is suitable for transmission over physical channels. • channel: wireless/cable, disk. • interference. • sink: destination. 1.6 Review of Probability Bayes rule: P(A|B) = P(A ∩ B) P(B) = P(B|A)P(A) P(B) For a discrete random variable (r.v.), • Probability Mass Function (PMF) PX(x) = P(X = x) denotes the Prob. of the event that X takes on the value x. For a continuous r.v
1.6.REVIEW OF PROBABILITY Cumulative Distribution Function(CDF) Fx(e)=P(X≤x) .Probability Density Function(PDF) px)=&Px回
1.6. REVIEW OF PROBABILITY 7 • Cumulative Distribution Function (CDF) FX(x) = P(X ≤ x) • Probability Density Function (PDF) pX(x) = d dxFX(x)
8 CHAPTER 1.INTRODUCTION
8 CHAPTER 1. INTRODUCTION
Chapter 2 Entropy Mutual Information (Shannon's measure of information) 中r指e。n对药 ots and definition 2.1 Entropy Definition 2.1.1.The entropy of a discrete r.v.X is defined as =-∑ P(r)log P(r) (2.1) When=2.the unit is called the bt(binary digit):whene,the unit is called the peined take al togartht logarithms to In the above definition,we use the convention that 0log0=0.Note that equivalent- ly,many books adopt the convention that the summation is taken over the corresponding support set.The support set of P(X),denoted by Sx,is the set of all such that P(r)>0;i.e.:Sx supp(Px)={P(r)>0}. The entropy H(X)is also called the uncertainty of X,meaning that it is a measure of the randomness of ={Green,Blue,Red) y={Sunday,Monday,Friday P(X):0.2,0.3,0.5 PY:0.2.0.3,0.5
Chapter 2 Entropy & Mutual Information (Shannon’s measure of information) This chapter introduces basic concepts and definitions required for the discussion later. Mainly include: Entropy, Mutual information(互信息), and relative entropy(相对熵). 2.1 Entropy Let X be a discrete variable with alphabet X and PMF PX(x) = Pr{X = x}, x ∈ X . For convenience, we will often write simply P(x) for PX(x). Definition 2.1.1. The entropy of a discrete r.v. X is defined as H(X) = ∑ x∈X P(x) logb 1 P(x) = − ∑ x∈X P(x) logb P(x) (2.1) When b = 2, the unit is called the bit (binary digit); when b = e, the unit is called the nat (natural unit). (Conversion is easy: logb x = logb a loga x ⇒ Hb(x) = (logb a)Ha(x)). Unless otherwise specified, we will take all logarithms to base 2, hence all entropies will be measured in bits. In the above definition, we use the convention that 0 log 0 = 0. Note that equivalently, many books adopt the convention that the summation is taken over the corresponding support set. The support set of P(X), denoted by SX, is the set of all x ∈ X such that P(x) > 0; i.e., SX = supp(PX) = {x : P(x) > 0}. The entropy H(X) is also called the uncertainty of X, meaning that it is a measure of the randomness of X. Note that the entropy H(X) depends on the probabilities of different outcomes of X, but not on the names of the outcomes. For example, X = {Green, Blue, Red} Y = {Sunday, Monday, F riday} P(X) : 0.2, 0.3, 0.5 P(Y ) : 0.2, 0.3, 0.5 9
10 CHAPTER 2.ENTROPY AND MUTUAL INFORMATION H(X=H(Y) Remark:The entropy of can also be interpreted as the expected value of log (i.e,the average uncertainty):片 H(X)=E108 P(X) 1 (2.2) where we define [F()F().Recall that I()og is the self formation of the event=,soH(X)=E[I(e月is also referred toas平均自信息 量 A immediate consequence of the definition is that H(X)>0. Example 2.1.1.Let 1 with mrobabilitu X=0 with probability 1-p Then H(X)=-plogp-(1-p)log(1-p) (2.3) )) 44 Figure2.1:二元随机变量的熵函数 Example 2.1.2.Let a with probability 1/2 X= 61/4 e1/8 d1/8 ThenH(X)=-log麦-log}-专log8-&log是=是bits. =fa,b,c,d}with equal probability, then we ha We can see tha the unif orm distr
10 CHAPTER 2. ENTROPY AND MUTUAL INFORMATION H(X) = H(Y ) Remark: The entropy of X can also be interpreted as the expected value of log 1 P(X) (i.e., the average uncertainty): H(X) = E [ log 1 P(X) ] (2.2) where we define E[F(x)] = ∑ x∈X PX(x)F(x). Recall that I(x) = log 1 PX(x) is the selfinformation of the event X = x, so H(X) = E[I(x)] is also referred to as 平均自信息 量。 A immediate consequence of the definition is that H(X) ≥ 0. Example 2.1.1. Let X = { 1 with probability p 0 with probability 1 − p Then H(X) = −p log p − (1 − p) log(1 − p) (2.3) Equation (2.3) is often called the binary entropy function, and denoted by H2(p). Its graph is shown in Fig. 2.1. We can see that H(X) = 1 bit when p = 1 2 . 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 p H(p) Figure 2.1: 二元随机变量的熵函数 Example 2.1.2. Let X = a with probability 1/2 b 1/4 c 1/8 d 1/8 Then H(X) = − 1 2 log 1 2 − 1 4 log 1 4 − 1 8 log 1 8 − 1 8 log 1 8 = 7 4 bits. On the other hand, if X takes on values in X = {a, b, c, d} with equal probability, then we have H(X) = − 1 4 log 1 4 × 4 = 2 bits=log |X |. We can see that the uniform distribution over the range X is the maximum entropy distribution over this range. (In other words, the entropy of X is maximized when its values are equally likely.)