ECOLE POLYTECHNIQUE FEDERALE DE LAUSANNE School of Computer and Communication Sciences Handout 18 Information Theory and Codins Notes on random coding December 12.2005 RANDOM CODING in this note we prove the achievability part of the channel coding theorem for discrete memorvless channels without feedback.The discussion is largely based on chapter 5 of R.G.Gallager,Information Theory and Reliable Communication.Wiley,1968. 1.DISCRETE MEMORYLESS CHANNELS WITHOUT FEEDBACK de pociving for oncthe fumetion and lavior of the t the co ll be completely B:X*×Jy*→R,(,)P(z,-), al ne pa P(,)=P(, which in means that the channel keeps no me channel do es not ave d at c ion P on the right hand side i not an expiit fuction of If a memoryless channel is used without feedback,i.e.,if Pxx)=Pxx) (in words:if the channel inputs do not depend on the past channel outputs)then ()- Px(IT) _Π-1xe,时-) Pxr(ri) _Ixxe-乃x-) Px好() IΠ-1PxX-(arl-)乃Ix(lr) Pxr(i)
´ECOLE POLYTECHNIQUE F´ED´ERALE DE LAUSANNE School of Computer and Communication Sciences Handout 18 Information Theory and Coding Notes on Random Coding December 12, 2003 Random Coding In this note we prove the achievability part of the channel coding theorem for discrete memoryless channels without feedback. The discussion is largely based on chapter 5 of R. G. Gallager, Information Theory and Reliable Communication, Wiley, 1968. 1. Discrete Memoryless Channels Without Feedback Throughout this note we will fix the channel we wish to communicate over. Let X and Y denote the input and output alphabets of this channel. We will assume that the channel is discrete, i.e., that X and Y are finite sets. The behavior of the channel will be completely described by specifying for each k ≥ 1 the function Pk : X k × Yk → R, (x k 1 , yk 1 ) 7→ Pk(yk|x k 1 , yk−1 1 ), which gives the probability of receiving the letter yk at the output at time k, given the past and current inputs to the channel, and the past outputs of the channel. We make the two following assumptions: the channel is memoryless and used without feedback. We will call a channel memoryless if Pk(yk|x k 1 , yk−1 1 ) = P(yk|xk), which in words means that the channel keeps no memory of the past inputs and outputs in determining the output of time k. Also note that the channel does not behave differently at different times: the function P on the right hand side is not an explicit function of k. If a memoryless channel is used without feedback, i.e., if PXk|X k−1 1 ,Y k−1 1 (xk|x k−1 1 , yk−1 1 ) = PXk|X k−1 1 (xk|x k−1 1 ) (in words: if the channel inputs do not depend on the past channel outputs) then PY n 1 |Xn 1 (y n 1 |x n 1 ) = PXn 1 ,Y n 1 (x n 1 , yn 1 ) PXn 1 (x n 1 ) = Qn k=1 PXk,Yk|X k−1 1 ,Y k−1 1 (xk, yk|x k−1 1 , yk−1 1 ) PXn 1 (x n 1 ) = Qn k=1 PXk|X k−1 1 ,Y k−1 1 (xk|x k−1 1 , yk−1 1 )PYk|Xk 1 ,Y k−1 1 (yk|x k 1 , yk−1 1 ) PXn 1 (x n 1 ) = Qn k=1 PXk|X k−1 1 (xk|x k−1 1 )PYk|Xk (yk|xk) PXn 1 (x n 1 ) = Yn k=1 P(yk|xk)
where we use the memoryless and without feedback conditions at the fourth equality. From now on,we will restrict our attention of channels used without feedback.With some abuse of notation we will let P(yx)denote the probability of receiving the sequence y =y"at the output of the channel when the channel input is the sequence x=x".If the channel is memoryless,we see from above that Pyx)=ΠP() 2.BLOCK CODES A block code with M messages and block lengthn is a mapping from a set of M messages 化pcd长eW出put of1eu8hs.aoc冰code specid whe@ input sequences ci (c11,.,cin).,cA (cM.1,.cM.m) the messages are mapped into.We will call cm the codeword for message m. To send message m with such a block code we simply give the sequence c to the channel as input. A decoder for such a block code is a mapping from channel output sequences yn to the set of M messages {1,.,M).For a given decoder,let Dc y"denote the set of channel outputs which are mapped to message m.Since an output sequence y is mapped to exactly one message,D's form a collection of disjoint sets whose union is ya. We define the rate of a block code with M messages and block length n as n and given such a code and a decoder we define Pm=∑P(ylcm) the probability of a decoding error when message m is sent.Further define Pame=方∑P.m and P.mas=,Pm m=1 as the average and maximal(both over the possible messages)error probability of such a code and decoder. Among many possible decoding methods,the rule that minimizes Peave is the maximum likelihood rule.Given a channel output sequence y,the maximum likelihood rule decodes a message m for which P(ylcm)≥P(ylcm')for every m'≠n, tha such m choos of them arbitrarily.We will restrict 3.ERROR PROBABILITY FOR TWO CODEWORDS 2,s8 onsists of two codewords,c and
where we use the memoryless and without feedback conditions at the fourth equality. From now on, we will restrict our attention of channels used without feedback. With some abuse of notation we will let P(y|x) denote the probability of receiving the sequence y = y n 1 at the output of the channel when the channel input is the sequence x = x n 1 . If the channel is memoryless, we see from above that P(y|x) = Yn k=1 P(yk|xk). 2. Block Codes A block code with M messages and block length n is a mapping from a set of M messages {1, . . . , M} to channel input sequences of length n. Thus, a block code is specified when we specify the M channel input sequences c1 = (c1,1, . . . , c1,n), . . . , cM = (cM,1, . . . , cM,n) the messages are mapped into. We will call cm the codeword for message m. To send message m with such a block code we simply give the sequence cm to the channel as input. A decoder for such a block code is a mapping from channel output sequences Y n to the set of M messages {1, . . . , M}. For a given decoder, let Dm ⊂ Yn denote the set of channel outputs which are mapped to message m. Since an output sequence y is mapped to exactly one message, Dm’s form a collection of disjoint sets whose union is Y n . We define the rate of a block code with M messages and block length n as ln M n , and given such a code and a decoder we define Pe,m = X y6∈Dm P(y|cm), the probability of a decoding error when message m is sent. Further define Pe,ave = 1 M X M m=1 Pe,m and Pe,max = max 1≤m≤M Pe,m as the average and maximal (both over the possible messages) error probability of such a code and decoder. Among many possible decoding methods, the rule that minimizes Pe,ave is the maximum likelihood rule. Given a channel output sequence y, the maximum likelihood rule decodes a message m for which P(y|cm) ≥ P(y|cm0) for every m0 6= m, and if there are more than one such m chooses one of them arbitrarily. We will restrict ourselves in the following to the maximum likelihood rule. 3. Error probability for two codewords Consider now the case when M = 2, so the block code consists of two codewords, c1 and c2. We will find a bound on Pe,m for the maximum likelihood decoding rule. 2
Suppose message 1 is to be sent.The channel input is then c,and the probability of receiving y at the channel output is P(yc).An error will occur if the received sequence is not in D.Since for every y not in D,P(ylc2)>P(ylc),(but there may be y's for which P(ylc2)=P(ylc)and y Di)we have P1=Pylc) Pylc) v:P(ylea)zP(yle.) =>Pylc)1Pyle2≥PyIe) ≤ N y:P(ylea)2P(ylc) s∑Pote%e for any s>0 =∑Pylc)l-Pylc2 The choice s=1/2 gives us P.1≤∑VP(ylc)Pylc2). and by symmetry,the same quantity also upper bounds P.2 For a memoryless channel P(ylCm)=ITP(yelcm.)and we obtain P.m≤∑VP(ylc)Pyle2 =∑.∑VPla)Phle2).VP((.】) -[∑P(le)P(nlcx)[∑VP(l)P(l】 =I∑VP(ek)P(e2) For example,for a binary symmetric channel with crossover probability e,we see that ∑VPa)P)= 1 if Ci=C2.k 2ve(1-e)else, and we obtain Pem≤e(1-ejp where d is the number of places ci and c2 are different (i.e.,d=:c) 4.ERROR PROBABILITY FOR TWO RANDOMLY CHOSEN CODEWORDS Suppose e cod ord sci and ca are chos sen randomly and indepen
Suppose message 1 is to be sent. The channel input is then c1, and the probability of receiving y at the channel output is P(y|c1). An error will occur if the received sequence is not in D1. Since for every y not in D1, P(y|c2) ≥ P(y|c1), (but there may be y’s for which P(y|c2) = P(y|c1) and y ∈ D1) we have Pe,1 = X y6∈D1 P(y|c1) ≤ X y: P(y|c2)≥P(y|c1) P(y|c1) = X y P(y|c1)11P(y|c2)≥P(y|c1) ≤ X y:P(y|c2)≥P(y|c1) P(y|c1) P(y|c2) s P(y|c1) s ≤ X y P(y|c1) P(y|c2) s P(y|c1) s for any s ≥ 0 = X y P(y|c1) 1−sP(y|c2) s The choice s = 1/2 gives us Pe,1 ≤ X y p P(y|c1)P(y|c2), and by symmetry, the same quantity also upper bounds Pe,2. For a memoryless channel P(y|cm) = Qn k=1 P(yk|cm,k) and we obtain Pe,m ≤ X y p P(y|c1)P(y|c2) = X y1 · · ·X yn q P(y1|c1,1)P(y1|c2,1). . . q P(yn|c1,nP(yn|c2,n) = hX y1 q P(y1|c1,1)P(y1|c2,1) i . . . hX yn q P(yn|c1,n)P(yn|c2,n) i = Yn k=1 hX y q P(y|c1,k)P(y|c2,k) i . For example, for a binary symmetric channel with crossover probability , we see that X y q P(y|c1,k)P(y|c2,k) = ( 1 if c1,k = c2,k 2 p (1 − ) else, and we obtain Pe,m ≤ [4(1 − )]d/2 where d is the number of places c1 and c2 are different (i.e., d = |{k : c1,k 6= c2,k}|). 4. Error Probability for two randomly chosen codewords Suppose now that the codewords c1 and c2 are chosen randomly and independently each from a distribution Q on X n . Observe that the codewords are random variables C1 and 3
C2 and the probability that the block code is the particular block code with codewords c and ca is given by (e)(ca).The error probabilities P are now random variables Pm since they are functions of C and C2.Let Pm denote the expectation of P.m We will now give an upper bound on Pm in two different ways.The first is the more straightforward,but the second way will turn out to be conceptually more useful.We will take m=1,but isis clear by symmetry that P=P.2. Method 1:Write P1=E[Pea] =∑∑Q(c)Q(e)E[PlC=c,C=c But when we are given c and c P is no longer random,and we can use the bound of the last section on the probability of error to get ps∑∑QcQe)∑VP(ylc.)VP(ylca) =P可Q可 =∑∑Q(e)vP(yk可 Method 2:Write Pe1 E[Pel] =>Q(ci)P(ylci)E[PealCi =c1,Y =y]. Note that here we are computing the expectation by first conditioning on the transmitted and received sequences.Now,observe that given Ci=ci and the received sequence y,an error is sure not to occur if C2 is chosen such that P(yC2)<P(ylc1),and otherwise we can upper bound Pe by 1.Thus,given C1=c1 and y,we have ∫1 if P(y]C2.)≥Pylc) Pa1≤0 P(ylCa))<Ple) =IP(yICa)zP(ylei) Taking expectations we see that E[Pe.lC1=C1,Y=y]<Pr(B2) the nt at C.that P)P) Pr(B2)=>Q(c2)1P(ylea)zP(yle) for any s≥0. ¥
C2 and the probability that the block code is the particular block code with codewords c1 and c2 is given by Q(c1)Q(c2). The error probabilities Pe,m are now random variables Pe,m since they are functions of C1 and C2. Let P¯ e,m denote the expectation of Pe,m. We will now give an upper bound on P¯ e,m in two different ways. The first is the more straightforward, but the second way will turn out to be conceptually more useful. We will take m = 1, but is is clear by symmetry that P¯ e,1 = P¯ e,2. Method 1: Write P¯ e,1 = E[Pe,1] = X c1 X c2 Q(c1)Q(c2)E[Pe,1|C1 = c1, C2 = c2]. But when we are given c1 and c2, Pe,1 is no longer random, and we can use the bound of the last section on the probability of error to get P¯ e,1 ≤ X c1 X c2 Q(c1)Q(c2) X y p P(y|c1) p P(y|c2) = X y hX c1 Q(c1) p P(y|c1) ihX c2 Q(c2) p P(y|c2) i = X y hX c Q(c) p P(y|c) i2 where the last equality follows by noting that the sums in the brackets in the next to last line are identical since they differ only by the index of summation. Method 2: Write P¯ e,1 = E[Pe,1] = X c1 X y Q(c1)P(y|c1)E[Pe,1|C1 = c1, Y = y]. Note that here we are computing the expectation by first conditioning on the transmitted and received sequences. Now, observe that given C1 = c1 and the received sequence y, an error is sure not to occur if C2 is chosen such that P(y|C2) < P(y|c1), and otherwise we can upper bound Pe,1 by 1. Thus, given C1 = c1 and y, we have Pe,1 ≤ ( 1 if P(y|C2) ≥ P(y|c1) 0 if P(y|C2) < P(y|c1) = 11P(y|C2)≥P(y|c1) . Taking expectations we see that E[Pe,1|C1 = c1, Y = y] ≤ Pr(B2) where B2 is the event that C2 is chosen such that P(y|C2) ≥ P(y|c1). We can bound Pr(B2) by Pr(B2) = X c2 Q(c2)11P(y|c2)≥P(y|c1) ≤ X c2 Q(c2) P(y|c2) s P(y|c1) s for any s ≥ 0. 4
Taking s=1/2 and substituting back we obtain the same bound as before, p,≤∑∑Q(c)vPyo可 channels the bound simplifies if (c)is chosen to beQ(c)=(c). ≤∑[∑Q()vF( 5.AVERAGE ERROR PROBABILITY OF A RANDOMLY CHOSEN CODE Consider now a code with M codewords,each codeword cm chosen independently according to the probability distribution Q.Just as in the above section,the probability that the block code constructed is a particular block code with codewords c1,.,ca is =Q(cm). The error probabilities Pe.m are also random variables,and again let Pem denote the expectation of Pe.m.By the symmetry with respect to the permutation of the codewords in the construction,we see that P.m does not depend on m,and it will suffice to analyze P.1.We will take the extension of 'method 2'in the previous section as our analysis method,and write P1=∑∑Q(c)Pylc)E[PealC=c,Y=y c1 y Now,for a given c and y define for each m >2,Bm as the event that codeword C is chosen such that P(yC)P(ylc),i.e.,that codeword m is at least as likely as the transmitted codeword.Then just as in the previous section, EPlC1=c,Y=y≤Pr(UBn) m2 ≤mim1,∑Pr(Bn)} ≤[∑Pr(Bjr for all p[0,1]. m=2 The second inequality above is just the union bound,the third inequality is because for p∈0,1,x≤z when r∈0,1],and1≤z when r≥l. Observe now that Pr(Bm)=>Q(Cm)1p(ylem)2P(yle) ≤Σee for any s0 Cm -2oo0器 and thus EP.lC=c,Y=≤W-)∑Qoe· P(ylc)sle
Taking s = 1/2 and substituting back we obtain the same bound as before, P¯ e,1 ≤ X y hX c Q(c) p P(y|c) i2 . For memoryless channels the bound simplifies if Q(c) is chosen to be Q(c) = Qn k=1 Q(ck). In this case we obtain P¯ e,1 ≤ X y hX x Q(x) p P(y|x) i2 n . 5. Average error probability of a randomly chosen code Consider now a code with M codewords, each codeword cm chosen independently according to the probability distribution Q. Just as in the above section, the probability that the block code constructed is a particular block code with codewords c1, . . . , cM is QM m=1 Q(cm). The error probabilities Pe,m are also random variables, and again let P¯ e,m denote the expectation of Pe,m. By the symmetry with respect to the permutation of the codewords in the construction, we see that P¯ e,m does not depend on m, and it will suffice to analyze P¯ e,1. We will take the extension of ‘method 2’ in the previous section as our analysis method, and write P¯ e,1 = X c1 X y Q(c1)P(y|c1)E[Pe,1|C1 = c1, Y = y]. Now, for a given c1 and y define for each m ≥ 2, Bm as the event that codeword Cm is chosen such that P(y|Cm) ≥ P(y|c1), i.e., that codeword m is at least as likely as the transmitted codeword. Then just as in the previous section, E[Pe,1|C1 = c1, Y = y] ≤ Pr [ M m=2 Bm ≤ minn 1, X M m=2 Pr(Bm) o ≤ hX M m=2 Pr(Bm) iρ for all ρ ∈ [0, 1]. The second inequality above is just the union bound, the third inequality is because for ρ ∈ [0, 1], x ≤ x ρ when x ∈ [0, 1], and 1 ≤ x ρ when x ≥ 1. Observe now that Pr(Bm) = X cm Q(cm)11P(y|cm)≥P(y|c1) ≤ X cm Q(cm) P(y|cm) s P(y|c1) s for any s ≥ 0 = X c Q(c) P(y|c) s P(y|c1) s and thus E[Pe,1|C1 = c1, Y = y] ≤ (M − 1)X c Q(c) P(y|c) s P(y|c1) s ρ . 5