XIPrefacePart Iof this book by itself may be regarded as a comprehensive textbookin information theory.The main reason why the book is in the present formis because in my opinion, the discussion of network coding in Part II is in-complete without Part I.Nevertheless, except for Chapter 21 on multi-sourcenetwork coding,Part II by itself may be used satisfactorily as a textbook onsingle-source network coding.An elementary course on probability theory and an elementary courseon linear algebra are prerequisites to Part I and Part II, respectively.ForChapter 1l, some background knowledge on digital communication systemswould be helpful, and for Chapter 20, some prior exposure to discrete-timelinear systems is necessary.The reader is recommended to read the chaptersaccording to the above chart.However, one will not have too much difficultyjumping around in the book because there should be sufficient references tothe previous relevant sections.This book inherits the writing style from the previous book, namely that allthe derivations are from the first principle. The book contains a large numberof examples,where important points are very often made.To facilitate theuse of the book, there is a summary at the end of each chapter.This book can be used as a textbook or a reference book. As a textbook,it is ideal for a two-semester course, with the first and second semesters covering selected topics from Part I and Part II, respectively.A comprehensiveinstructor's manual is available upon request. Please contact the author atwhyeungQie.cuhk.edu.hkforinformation and access.Just like any other lengthy document, this book for sure contains errorsand omissions. To alleviate the problem, an errata will be maintained at thebookhomepagehttp://www.ie.cuhk.edu.hk/IT_book2/.Hong Kong, ChinaRaymond W.YeungDecember,2007
Preface XI Part I of this book by itself may be regarded as a comprehensive textbook in information theory. The main reason why the book is in the present form is because in my opinion, the discussion of network coding in Part II is incomplete without Part I. Nevertheless, except for Chapter 21 on multi-source network coding, Part II by itself may be used satisfactorily as a textbook on single-source network coding. An elementary course on probability theory and an elementary course on linear algebra are prerequisites to Part I and Part II, respectively. For Chapter 11, some background knowledge on digital communication systems would be helpful, and for Chapter 20, some prior exposure to discrete-time linear systems is necessary. The reader is recommended to read the chapters according to the above chart. However, one will not have too much difficulty jumping around in the book because there should be sufficient references to the previous relevant sections. This book inherits the writing style from the previous book, namely that all the derivations are from the first principle. The book contains a large number of examples, where important points are very often made. To facilitate the use of the book, there is a summary at the end of each chapter. This book can be used as a textbook or a reference book. As a textbook, it is ideal for a two-semester course, with the first and second semesters covering selected topics from Part I and Part II, respectively. A comprehensive instructor’s manual is available upon request. Please contact the author at whyeung@ie.cuhk.edu.hk for information and access. Just like any other lengthy document, this book for sure contains errors and omissions. To alleviate the problem, an errata will be maintained at the book homepage http://www.ie.cuhk.edu.hk/IT book2/. Hong Kong, China Raymond W. Yeung December, 2007
AcknowledgmentsThe current book, an expansion of my previous book A First Course inInformation Theory, was written within the year 2007. Thanks to the generoussupport of the Friedrich Wilhelm Bessel Research Award from the Alexandervon HumboldtFoundation of Germany,I had the luxury of working on theproject full-timefromJanuaryto April when I visited Munich University ofTechnology.I would like to thank Joachim Hagenauer and Ralf Koetter fornominating me for the award and for hosting my visit.I would also like tothank the Department of Information Engineering, The Chinese University ofHong Kong, for making this arrangement possible.There are many individuals who have directly or indirectly contributed tothis book. First, I am indebted to Toby Berger who taught me informationtheory and writing. I am most thankful to Zhen Zhang, Ning Cai, and Bob Lifortheirfriendship and inspiration.Withouttheresults obtained through ourcollaboration,the book cannot possibly be in its current form.Iwould also liketo thank Venkat Anantharam, Vijay Bhargava, Dick Blahut, Agnes and Vin-cent Chan, Tom Cover, Imre Csiszar,Tony Ephremides,Bob Gallager,BruceHajek,Te Sun Han,Jim Massey,Prakash Narayan,Alon Orlitsky,ShlomoShamai, Sergio Verdi, Victor Wei, Frans Willems, and Jack Wolf for theirsupport and encouragement throughout the years.I would also like to thankall the collaborators of my work for their contribution and all the anonymousreviewersfortheirusefulcomments.I would like to thank a number of individuals who helped in the project.I benefited tremendously from the discussions with David Tse who gave a lotof suggestions for writing the chapters on differential entropy and continuous-valued channels. Terence Chan, Ka Wo Cheung, Bruce Hajek, Siu-Wai Ho,Siu Ting Ho, Tat Ming Lok, Prakash Narayan, Will Ng, Sagar Shenvi, Xiang-Gen Xia, Shaohua Yang,Ken Zeger, and Zhixue Zhang gave many valu-able comments at different stages of the writing. My graduate students SilasFong, Min Tan, and Shenghao Yang proofread the chapters on network coding in great detail. Silas Fong also helped compose the figures throughoutthe book
Acknowledgments The current book, an expansion of my previous book A First Course in Information Theory, was written within the year 2007. Thanks to the generous support of the Friedrich Wilhelm Bessel Research Award from the Alexander von Humboldt Foundation of Germany, I had the luxury of working on the project full-time from January to April when I visited Munich University of Technology. I would like to thank Joachim Hagenauer and Ralf Koetter for nominating me for the award and for hosting my visit. I would also like to thank the Department of Information Engineering, The Chinese University of Hong Kong, for making this arrangement possible. There are many individuals who have directly or indirectly contributed to this book. First, I am indebted to Toby Berger who taught me information theory and writing. I am most thankful to Zhen Zhang, Ning Cai, and Bob Li for their friendship and inspiration. Without the results obtained through our collaboration, the book cannot possibly be in its current form. I would also like to thank Venkat Anantharam, Vijay Bhargava, Dick Blahut, Agnes and Vincent Chan, Tom Cover, Imre Csisz´ar, Tony Ephremides, Bob Gallager, Bruce Hajek, Te Sun Han, Jim Massey, Prakash Narayan, Alon Orlitsky, Shlomo Shamai, Sergio Verd´u, Victor Wei, Frans Willems, and Jack Wolf for their support and encouragement throughout the years. I would also like to thank all the collaborators of my work for their contribution and all the anonymous reviewers for their useful comments. I would like to thank a number of individuals who helped in the project. I benefited tremendously from the discussions with David Tse who gave a lot of suggestions for writing the chapters on differential entropy and continuousvalued channels. Terence Chan, Ka Wo Cheung, Bruce Hajek, Siu-Wai Ho, Siu Ting Ho, Tat Ming Lok, Prakash Narayan, Will Ng, Sagar Shenvi, XiangGen Xia, Shaohua Yang, Ken Zeger, and Zhixue Zhang gave many valuable comments at different stages of the writing. My graduate students Silas Fong, Min Tan, and Shenghao Yang proofread the chapters on network coding in great detail. Silas Fong also helped compose the figures throughout the book
XIVAcknowledgmentsOn the domestic side, I am most grateful to my wife Rebecca for her love.During our stay in Munich, she took good care of the whole family so that I wasable to concentrate on my writing.We are most thankful to our familyfriendMs.Pui Yee Wong for taking care of Rebecca when she was ill during the finalstage of the project and to my sister Georgiana for her moral support. In thisregard, we are indebted to Dr. Yu Lap Yip for his timely diagnosis. I wouldalso like to thank my sister-in-law Ophelia Tsang who comes over during theweekends to help taking care of our daughter Shannon, who continues to bethe sweetheart of the family and was most supportive during the time hermomwasill
XIV Acknowledgments On the domestic side, I am most grateful to my wife Rebecca for her love. During our stay in Munich, she took good care of the whole family so that I was able to concentrate on my writing. We are most thankful to our family friend Ms. Pui Yee Wong for taking care of Rebecca when she was ill during the final stage of the project and to my sister Georgiana for her moral support. In this regard, we are indebted to Dr. Yu Lap Yip for his timely diagnosis. I would also like to thank my sister-in-law Ophelia Tsang who comes over during the weekends to help taking care of our daughter Shannon, who continues to be the sweetheart of the family and was most supportive during the time her mom was ill
ContentsThe ScienceofInformatio1PartIComponents of Information Theory12Information Measures2.1Independence and Markov Chains72.2Shannon's Information Measures122.3Continuity of Shannon's Information MeasuresforFixed18Finite Alphabets2.421Chain Rules2.523Informational Divergence262.6The Basic Inequalities2.728Some Useful Information Inequalities.322.8Fano'sInequality.362.9Maximum Entropy Distributions382.10 EntropyRateof aStationarySourceAppendix 2.A:Approximation ofRandomVariables with41CountablyInfinite AlphabetsbyTruncation43Chapter Summary45Problems50Historical Notes3TheI-Measure51523.1 Preliminaries533.2TheI-Measurefor TwoRandomVariables.553.3Construction of the I-Measureμ3.4μ* Can Be Negative.593.5Information Diagrams61673.6ExamplesofApplications74Appendix 3.A:A Variation of the Inclusion-Exclusion Formula76Chapter Summary
Contents 1 The Science of Information . 1 Part I Components of Information Theory 2 Information Measures . 7 2.1 Independence and Markov Chains . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2 Shannon’s Information Measures . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.3 Continuity of Shannon’s Information Measures for Fixed Finite Alphabets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.4 Chain Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.5 Informational Divergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.6 The Basic Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.7 Some Useful Information Inequalities. . . . . . . . . . . . . . . . . . . . . . . 28 2.8 Fano’s Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 2.9 Maximum Entropy Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . 36 2.10 Entropy Rate of a Stationary Source . . . . . . . . . . . . . . . . . . . . . . . 38 Appendix 2.A: Approximation of Random Variables with Countably Infinite Alphabets by Truncation . . . . . . . . . . . . . . . . 41 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 Historical Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3 The I-Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.1 Preliminaries. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.2 The I-Measure for Two Random Variables . . . . . . . . . . . . . . . . . . 53 3.3 Construction of the I-Measure μ* . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.4 μ* Can Be Negative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 3.5 Information Diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 3.6 Examples of Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 Appendix 3.A: A Variation of the Inclusion–Exclusion Formula . . . . 74 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
XVIContents78ProblemsHistorical Notes80814Zero-Error Data Compression814.1The EntropyBound864.2PrefixCodes864.2.1Definitionand Existence884.2.2HuffmanCodes934.3Redundancyof PrefixCodes97Chapter Summary98Problems99Historical Notes5Weak Typicality1015.1TheWeakAEP1015.2The SourceCodingTheorem.104.1065.3EfficientSourceCoding.1075.4The Shannon-McMillan-Breiman Theorem109Chapter SummaryProblems.110Historical Notes1126113StrongTypicality.1136.1 StrongAEP.1216.2StrongTypicalityVersus WeakTypicality1226.3JointTypicality.:..1316.4An Interpretation of theBasic Inequalities.131Chapter Summary.132ProblemsHistorical Notes1357.137DiscreteMemorylessChannels7.1 Definition and Capacity.141.1497.2The Channel Coding Theorem7.3The Converse.1527.4Achievability..1577.5164ADiscussion7.6167Feedback Capacity...1727.7Separation of Sourceand Channel CodingChapterSummary..175.176ProblemsHistorical Notes1818.183Rate-Distortion Theory.1848.1Single-LetterDistortion Measures8.2TheRate-Distortion Function R(D)187
XVI Contents Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 Historical Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 4 Zero-Error Data Compression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 4.1 The Entropy Bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 4.2 Prefix Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 4.2.1 Definition and Existence . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 4.2.2 Huffman Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 4.3 Redundancy of Prefix Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 Historical Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 5 Weak Typicality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 5.1 The Weak AEP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 5.2 The Source Coding Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 5.3 Efficient Source Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 5.4 The Shannon–McMillan–Breiman Theorem . . . . . . . . . . . . . . . . . 107 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 Historical Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 6 Strong Typicality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 6.1 Strong AEP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 6.2 Strong Typicality Versus Weak Typicality . . . . . . . . . . . . . . . . . . 121 6.3 Joint Typicality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 6.4 An Interpretation of the Basic Inequalities . . . . . . . . . . . . . . . . . . 131 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 Historical Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 7 Discrete Memoryless Channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 7.1 Definition and Capacity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 7.2 The Channel Coding Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 7.3 The Converse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 7.4 Achievability. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 7.5 A Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 7.6 Feedback Capacity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 7.7 Separation of Source and Channel Coding . . . . . . . . . . . . . . . . . . 172 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176 Historical Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 8 Rate-Distortion Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 8.1 Single-Letter Distortion Measures . . . . . . . . . . . . . . . . . . . . . . . . . 184 8.2 The Rate-Distortion Function R(D) . . . . . . . . . . . . . . . . . . . . . . . 187