20.4 Huffman Coding and Compression of Data 903 k=ij[k][ip[(c+2)%10][7&m++]]; for(j=0;j<=9;j++) Find which appended digit will check properly. if (!ij[k][ip[j][m&7]])break; *ch=j+48; Convert to ASCll. return k==0; 2 CITED REFERENCES AND FURTHER READING: McNamara,J.E.1982,Technica/Aspects of Data Communication,2nd ed.(Bedford,MA:Digital Press).[1] da Cruz,F.1987,Kermit,A File Transfer Protocol(Bedford,MA:Digital Press).[2] Morse,G.1986,Byte,vol.11,pp.115-124(September).[3] LeVan,J.1987,Byte,vol.12,pp.339-341 (November).[4] 凌 Cam Sarwate,D.V.1988,Communications of the ACM,vol.31,pp.1008-1013.[5] Griffiths,G.,and Stones,G.C.1987,Communications of the ACM,vol.30,pp.617-620.[6] Wagner,N.R.,and Putter,P.S.1989,Communications of the ACM,vol.32,pp.106-110.[7] 9 20.4 Huffman Coding and Compression of Data A lossless data compression algorithm takes a string of symbols(typically 4的 ASCII characters or bytes)and translates it reversibly into another string,one that is on the average of shorter length.The words"on the average"are crucial;it is obvious that no reversible algorithm can make all strings shorter-there just aren't enough short strings to be in one-to-one correspondence with longer strings.Compression algorithms are possible only when,on the input side,some strings,or some input symbols,are more common than others.These can then be encoded in fewer bits than rarer input strings or symbols,giving a net average gain. There exist many,quite different,compression techniques,corresponding to different ways of detecting and using departures from equiprobability in input strings. Recipes Numerica 10621 In this section and the next we shall consider only variable length codes with defined 43106 word inputs.In these,the input is sliced into fixed units,for example ASClI characters,while the corresponding output comes in chunks of variable size.The simplest such method is Huffman coding [1],discussed in this section.Another example,arithmetic compression,is discussed in 820.5. At the opposite extreme from defined-word,variable length codes are schemes that divide up the input into units of variable length(words or phrases of English text. for example)and then transmit these,often with a fixed-length output code.The most widely used code of this type is the Ziv-Lempel code [21.References [3-6]give the flavor of some other compression techniques,with references to the large literature. The idea behind Huffman coding is simply to use shorter bit patterns for more common characters.We can make this idea quantitative by considering the concept of entropy.Suppose the input alphabet has Nch characters,and that these occur in the input string with respective probabilities pi,i=1,...,Nch,so that >pi =1. Then the fundamental theorem of information theory says that strings consisting of
20.4 Huffman Coding and Compression of Data 903 Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). k=ij[k][ip[(c+2) % 10][7 & m++]]; } for (j=0;j<=9;j++) Find which appended digit will check properly. if (!ij[k][ip[j][m & 7]]) break; *ch=j+48; Convert to ASCII. return k==0; } CITED REFERENCES AND FURTHER READING: McNamara, J.E. 1982, Technical Aspects of Data Communication, 2nd ed. (Bedford, MA: Digital Press). [1] da Cruz, F. 1987, Kermit, A File Transfer Protocol (Bedford, MA: Digital Press). [2] Morse, G. 1986, Byte, vol. 11, pp. 115–124 (September). [3] LeVan, J. 1987, Byte, vol. 12, pp. 339–341 (November). [4] Sarwate, D.V. 1988, Communications of the ACM, vol. 31, pp. 1008–1013. [5] Griffiths, G., and Stones, G.C. 1987, Communications of the ACM, vol. 30, pp. 617–620. [6] Wagner, N.R., and Putter, P.S. 1989, Communications of the ACM, vol. 32, pp. 106–110. [7] 20.4 Huffman Coding and Compression of Data A lossless data compression algorithm takes a string of symbols (typically ASCII characters or bytes) and translates it reversibly into another string, one that is on the average of shorter length. The words “on the average” are crucial; it is obvious that no reversible algorithm can make all strings shorter — there just aren’t enough short strings to be in one-to-one correspondence with longer strings. Compression algorithms are possible only when, on the input side, some strings, or some input symbols, are more common than others. These can then be encoded in fewer bits than rarer input strings or symbols, giving a net average gain. There exist many, quite different, compression techniques, corresponding to different ways of detecting and using departures from equiprobability in input strings. In this section and the next we shall consider only variable length codes with defined word inputs. In these, the input is sliced into fixed units, for example ASCII characters, while the corresponding output comes in chunks of variable size. The simplest such method is Huffman coding [1], discussed in this section. Another example, arithmetic compression, is discussed in §20.5. At the opposite extreme from defined-word, variable length codes are schemes that divide up the input into units of variable length (words or phrases of English text, for example) and then transmit these, often with a fixed-length output code. The most widely used code of this type is the Ziv-Lempel code [2]. References [3-6] give the flavor of some other compression techniques, with references to the large literature. The idea behind Huffman coding is simply to use shorter bit patterns for more common characters. We can make this idea quantitative by considering the concept of entropy. Suppose the input alphabet has Nch characters, and that these occur in the input string with respective probabilities pi, i = 1,...,Nch, so that pi = 1. Then the fundamental theorem of information theory says that strings consisting of
904 Chapter 20.Less-Numerical Algorithms independently random sequences of these characters (a conservative,but not always realistic assumption)require,on the average,at least H=-∑plog2p: 20.4.1) bits per character.Here H is the entropy of the probability distribution.Moreover, coding schemes exist which approach the bound arbitrarily closely.For the case of equiprobable characters,with all pi=1/Nch,one easily sees that H=log2 Nch, which is the case of no compression at all.Any other set of pi's gives a smaller entropy,allowing some useful compression. Notice that the bound of(20.4.1)would be achieved if we could encode character iwith a code of length Li=-log2 pi bits:Equation(20.4.1)would then be the average >piLi.The trouble with such a scheme is that -log2 pi is not generally an integer.How can we encode the letter"Q"in 5.32 bits?Huffman coding makes a stab at this by,in effect,approximating all the probabilities pi by integer powers of 1/2,so that all the Li's are integral.If all the pi's are in fact of this form,then 心学 a Huffman code does achieve the entropy bound H. The construction of a Huffman code is best illustrated by example.Imagine 令 a language,Vowellish,with the Nch=5 character alphabet A,E,I,O,and U, occurring with the respective probabilities 0.12,0.42,0.09,0.30,and 0.07.Then the Americ computer, construction of a Huffman code for Vowellish is accomplished in the following table: ART Progra Node Stage: 2 3 4 5 1 A 0.12 0.12■ email 1CIYP ic Copyright (C) E: 0.42 0.42 0.42 0.42■ 0.09■ 4 0 0.30 0.30 0.30■ 5 U: 0.070 Ito directcustserv@cambridge.org 6 UI: 0.16■ 1988-1992 by Numerical Recipes OF SCIENTIFIC COMPUTING(ISBN 12-:621-43106-50 > AUI: 0.28■ AUIO: 0.58■ (outside North Software. EAUIO:1.00 Amer Here is how it works,proceeding in sequence through Neh stages,represented visit website by the columns of the table.The first stage starts with Neh nodes,one for each f machine letter of the alphabet,containing their respective relative frequencies.At each stage, the two smallest probabilities are found,summed to make a new node,and then dropped from the list of active nodes.(A"block"denotes the stage where a node is dropped.)All active nodes(including the new composite)are then carried over to the next stage(column).In the table,the names assigned to new nodes (e.g.,AUI) are inconsequential.In the example shown,it happens that(after stage 1)the two
904 Chapter 20. Less-Numerical Algorithms Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). independently random sequences of these characters (a conservative, but not always realistic assumption) require, on the average, at least H = −pi log2 pi (20.4.1) bits per character. Here H is the entropy of the probability distribution. Moreover, coding schemes exist which approach the bound arbitrarily closely. For the case of equiprobable characters, with all pi = 1/Nch, one easily sees that H = log2 Nch, which is the case of no compression at all. Any other set of pi’s gives a smaller entropy, allowing some useful compression. Notice that the bound of (20.4.1) would be achieved if we could encode character i with a code of length Li = − log2 pi bits: Equation (20.4.1) would then be the average piLi. The trouble with such a scheme is that − log2 pi is not generally an integer. How can we encode the letter “Q” in 5.32 bits? Huffman coding makes a stab at this by, in effect, approximating all the probabilities pi by integer powers of 1/2, so that all the Li’s are integral. If all the pi’s are in fact of this form, then a Huffman code does achieve the entropy bound H. The construction of a Huffman code is best illustrated by example. Imagine a language, Vowellish, with the Nch = 5 character alphabet A, E, I, O, and U, occurring with the respective probabilities 0.12, 0.42, 0.09, 0.30, and 0.07. Then the construction of a Huffman code for Vowellish is accomplished in the following table: Node Stage: 1 2 3 45 1 A: 0.12 0.12 2 E: 0.42 0.42 0.42 0.42 3 I: 0.09 4 O: 0.30 0.30 0.30 5 U: 0.07 6 UI: 0.16 7 AUI: 0.28 8 AUIO: 0.58 9 EAUIO: 1.00 Here is how it works, proceeding in sequence through Nch stages, represented by the columns of the table. The first stage starts with Nch nodes, one for each letter of the alphabet, containing their respective relative frequencies. At each stage, the two smallest probabilities are found, summed to make a new node, and then dropped from the list of active nodes. (A “block” denotes the stage where a node is dropped.) All active nodes (including the new composite) are then carried over to the next stage (column). In the table, the names assigned to new nodes (e.g., AUI) are inconsequential. In the example shown, it happens that (after stage 1) the two
20.4 Huffman Coding and Compression of Data 905 9 EAUIO 1.00 0 2@0.42 8AU100.58 0 1 read able files 7AUI0.28 4⊙0.30 0 1 1@0.126U☐0.16 .com or call 1-800-872- (including this one) internet 0 7 -7423 (North America to any server computer. 1988-1992 by Cambridge University Press. tusers to make one paper from NUMERICAL RECIPES IN C: 5①0.07 3①0.09 Figure 20.4.1.Huffman code for the fictitious language Vowellish,in tree form.A letter (A,E,I, THE O,or U)is encoded or decoded by traversing the tree from the top down;the code is the sequence of 0's and I's on the branches.The value to the right of each node is its probability;to the left,its node ART number in the accompanying table. ictly pror Programs smallest nodes are always an original node and a composite one;this need not be true in general:The two smallest probabilities might be both original nodes,or both composites,or one of each.At the last stage,all nodes will have been collected into one grand composite of total probability 1. 可 Now,to see the code,you redraw the data in the above table as a tree (Figure 20.4.1).As shown,each node of the tree corresponds to a node (row)in the table, OF SCIENTIFIC COMPUTING (ISBN indicated by the integer to its left and probability value to its right.Terminal nodes. 1988-18920 so called,are shown as circles:these are single alphabetic characters.The branches of the tree are labeled 0 and 1.The code for a character is the sequence of zeros and uurggoglrion Numerical Recipes 10-621 ones that lead to it,from the top down.For example,E is simply 0,while U is 1010. Any string of zeros and ones can now be decoded into an alphabetic sequence. Consider,for example,the string 1011111010.Starting at the top of the tree we 431985 descend through 1011 to I,the first character.Since we have reached a terminal node,we reset to the top of the tree,next descending through 11 to O.Finally 1010 (outside gives U.The string thus decodes to IOU. North Software. These ideas are embodied in the following routines.Input to the first routine hufmak is an integer vector of the frequency of occurrence of the nchin =Nch alphabetic characters,i.e.,a set of integers proportional to the pi's.hufmak,along with hufapp,which it calls,performs the construction of the above table,and also the tree of Figure 20.4.1.The routine utilizes a heap structure(see 88.3)for efficiency; for a detailed description,see Sedgewick [71
20.4 Huffman Coding and Compression of Data 905 Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). E EAUIO A U AUI AUIO UI I O 1.00 0.58 0.28 0.30 5 0.07 3 0.09 0.12 6 0.16 2 0.42 9 8 7 4 1 0 1 0 1 0 1 0 1 Figure 20.4.1. Huffman code for the fictitious language Vowellish, in tree form. A letter (A, E, I, O, or U) is encoded or decoded by traversing the tree from the top down; the code is the sequence of 0’s and 1’s on the branches. The value to the right of each node is its probability; to the left, its node number in the accompanying table. smallest nodes are always an original node and a composite one; this need not be true in general: The two smallest probabilities might be both original nodes, or both composites, or one of each. At the last stage, all nodes will have been collected into one grand composite of total probability 1. Now, to see the code, you redraw the data in the above table as a tree (Figure 20.4.1). As shown, each node of the tree corresponds to a node (row) in the table, indicated by the integer to its left and probability value to its right. Terminal nodes, so called, are shown as circles; these are single alphabetic characters. The branches of the tree are labeled 0 and 1. The code for a character is the sequence of zeros and ones that lead to it, from the top down. For example, E is simply 0, while U is 1010. Any string of zeros and ones can now be decoded into an alphabetic sequence. Consider, for example, the string 1011111010. Starting at the top of the tree we descend through 1011 to I, the first character. Since we have reached a terminal node, we reset to the top of the tree, next descending through 11 to O. Finally 1010 gives U. The string thus decodes to IOU. These ideas are embodied in the following routines. Input to the first routine hufmak is an integer vector of the frequency of occurrence of the nchin ≡ N ch alphabetic characters, i.e., a set of integers proportional to the p i’s. hufmak, along with hufapp, which it calls, performs the construction of the above table, and also the tree of Figure 20.4.1. The routine utilizes a heap structure (see §8.3) for efficiency; for a detailed description, see Sedgewick [7]
906 Chapter 20. Less-Numerical Algorithms #include "nrutil.h" typedef struct unsigned long *icod,*ncod,*left,*right,nch,nodemax; huffcode; void hufmak(unsigned long nfreq[],unsigned long nchin,unsigned long *ilong, unsigned long *nlong,huffcode *hcode) Given the frequency of occurrence table nfreq[1..nchin]of nchin characters,construct the Huffman code in the structure hcode.Returned values ilong and nlong are the character number that produced the longest code symbol,and the length of that symbol.You should check that nlong is not larger than your machine's word length. void hufapp(unsigned long index[],unsigned long nprob[],unsigned long n, uns1g即ed1ong1) int ibit: nted for 18881892 long node,*up; unsigned long j,k,*index,n,nused,*nprob; 1-800 static unsigned long setbit [32]={0x1L,Ox2L,0x4L,0x8L,0x10L,Ox20L, 0x40L,0x80L,0x100L,0x200L,0x400L,0x800L,0x1000L,0x2000L, 0x4000L,0x8000L,0x10000L,0x20000L,0x40000L,0x80000L,0x100000L, Cambridge 0x200000L,0x400000L,0x800000L,0x1000000L,0x2000000L,0x4000000L, from NUMERICAL RECIPES IN 0x8000000L,0x10000000L,0x20000000L,0x40000000L,0x80000000L]; (North to make hcode->nch=nchin; Initialization. index=lvector(1,(long)(2*hcode->nch-1)); THE up=(long *)lvector(1,(long)(2*hcode->nch-1)); Vector that will keep track of America server computer, e University Press. nprob=lvector(1,(long)(2*hcode->nch-1)); heap. one paper ART for (nused=0,j=1;j<=hcode->nch;j++) nprob[j]=nfreq[]: hcode->icod[j]=hcode->ncod[j]=0; Programs if (nfreg[j])index[++nused]=j; for (j=nused;j>=1;j--)hufapp(index,nprob,nused,j); strictly prohibited Sort nprob into a heap structure in index. k=hcode->nch; Combine heap nodes,remaking to dir while (nused 1){ node=index[1]; the heap at each stage. index[1]-index [nused--]; 1788-1982 OF SCIENTIFIC COMPUTING(ISBN hufapp(index,nprob,nused,1); nprob[++k]=nprob[index[1]]+nprob[node]; hcode->left[k]=node; Store left and right children of a v@cam hcode->right [k]=index[1]; node 10-621 up[index[1]]-(long)k; Indicate whether a node is a left up[node]=index[1]=k; or right child of its parent. hufapp(index,nprob,nused,1); Numerical Recipes -43108 up[hcode->nodemax=k]=0; for (j=1;j<=hcode->nch;j++){ Make the Huffman code from (outside if (nprob[j]) the tree. for (n=0,ibit=0,node=up[j];node;node=up[node],ibit++){ North Software. if (node 0){ n I=setbit[ibit]; node =-node; hcode->icod[j]=n; hcode->ncod[j]=ibit; *nlong=0; for (j=1;j<=hcode->nch;j++){ if (hcode->ncod[j]*nlong){ *nlong=hcode->ncod[j];
906 Chapter 20. Less-Numerical Algorithms Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). #include "nrutil.h" typedef struct { unsigned long *icod,*ncod,*left,*right,nch,nodemax; } huffcode; void hufmak(unsigned long nfreq[], unsigned long nchin, unsigned long *ilong, unsigned long *nlong, huffcode *hcode) Given the frequency of occurrence table nfreq[1..nchin] of nchin characters, construct the Huffman code in the structure hcode. Returned values ilong and nlong are the character number that produced the longest code symbol, and the length of that symbol. You should check that nlong is not larger than your machine’s word length. { void hufapp(unsigned long index[], unsigned long nprob[], unsigned long n, unsigned long i); int ibit; long node,*up; unsigned long j,k,*index,n,nused,*nprob; static unsigned long setbit[32]={0x1L,0x2L,0x4L,0x8L,0x10L,0x20L, 0x40L,0x80L,0x100L,0x200L,0x400L,0x800L,0x1000L,0x2000L, 0x4000L,0x8000L,0x10000L,0x20000L,0x40000L,0x80000L,0x100000L, 0x200000L,0x400000L,0x800000L,0x1000000L,0x2000000L,0x4000000L, 0x8000000L,0x10000000L,0x20000000L,0x40000000L,0x80000000L}; hcode->nch=nchin; Initialization. index=lvector(1,(long)(2*hcode->nch-1)); up=(long *)lvector(1,(long)(2*hcode->nch-1)); Vector that will keep track of nprob=lvector(1,(long)(2*hcode->nch-1)); heap. for (nused=0,j=1;j<=hcode->nch;j++) { nprob[j]=nfreq[j]; hcode->icod[j]=hcode->ncod[j]=0; if (nfreq[j]) index[++nused]=j; } for (j=nused;j>=1;j--) hufapp(index,nprob,nused,j); Sort nprob into a heap structure in index. k=hcode->nch; while (nused > 1) { Combine heap nodes, remaking node=index[1]; the heap at each stage. index[1]=index[nused--]; hufapp(index,nprob,nused,1); nprob[++k]=nprob[index[1]]+nprob[node]; hcode->left[k]=node; Store left and right children of a hcode->right[k]=index[1]; node. up[index[1]] = -(long)k; Indicate whether a node is a left up[node]=index[1]=k; or right child of its parent. hufapp(index,nprob,nused,1); } up[hcode->nodemax=k]=0; for (j=1;j<=hcode->nch;j++) { Make the Huff man code from if (nprob[j]) { the tree. for (n=0,ibit=0,node=up[j];node;node=up[node],ibit++) { if (node < 0) { n |= setbit[ibit]; node = -node; } } hcode->icod[j]=n; hcode->ncod[j]=ibit; } } *nlong=0; for (j=1;j<=hcode->nch;j++) { if (hcode->ncod[j] > *nlong) { *nlong=hcode->ncod[j];
20.4 Huffman Coding and Compression of Data 907 *ilong=j-1; free_lvector(nprob,1,(long)(2*hcode->nch-1)); free_lvector((unsigned long *)up,1,(long)(2*hcode->nch-1)); free_lvector(index,1,(long)(2*hcode->nch-1)); void hufapp(unsigned long index],unsigned long nprob[,unsigned long n, unsigned long i) Used by hufmak to maintain a heap structure in the array index [1..1]. 83g unsigned long j,k; granted for 19881992 k=index[i]; h11e(1<=(n>>1)){ 1.800 9 if ((j i<<1)<n&&nprob[index[j]]nprob[index[j+1]])j++; if (nprob[k]<nprob[index[j]])break; index [i]=index[j]; from NUMERICAL RECIPESI 1=j: index[i]=k; (Nort 2 America server computer, Note that the structure hcode must be defined and allocated in your main ART program with statements like this: Progra #include "nrutil.h" #define MC 512 Maximum anticipated value of nchin in hufmak. #define MQ (2*MC-1) typedef struct unsigned long *icod,*ncod,*left,*right,nch,nodemax; to dir huffcode; 188 huffcode hcode; OF SCIENTIFIC COMPUTING(ISBN 1920 hcode.icod=(unsigned long *)lvector(1,MQ); Allocate space within hcode. hcode.ncod=(unsigned long *)1vector(1,MQ); hcode.left=(unsigned long *)lvector(1,MQ); 0621 hcode.right=(unsigned long *)1vector(1,MQ); for (j=1;j<=MQ;j++)hcode.icod[j]=hcode.ncod[j]=0; Numerical Recipes -43108 Once the code is constructed,one encodes a string of characters by repeated calls to hufenc,which simply does a table lookup of the code and appends it to the (outside 膜 output message. oftware. #include <stdio.h> Ame #include <stdlib.h> typedef struct unsigned long *icod,*ncod,*left,*right,nch,nodemax; huffcode; void hufenc(unsigned long ich,unsigned char **codep,unsigned long *lcode, unsigned long *nb,huffcode *hcode) Huffman encode the single character ich (in the range 0..nch-1)using the code in the structure hcode,write the result to the character array *codep[1..Icode]starting at bit nb (whose smallest valid value is zero).and increment nb appropriately.This routine is called
20.4 Huffman Coding and Compression of Data 907 Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). *ilong=j-1; } } free_lvector(nprob,1,(long)(2*hcode->nch-1)); free_lvector((unsigned long *)up,1,(long)(2*hcode->nch-1)); free_lvector(index,1,(long)(2*hcode->nch-1)); } void hufapp(unsigned long index[], unsigned long nprob[], unsigned long n, unsigned long i) Used by hufmak to maintain a heap structure in the array index[1..l]. { unsigned long j,k; k=index[i]; while (i <= (n>>1)) { if ((j = i << 1) < n && nprob[index[j]] > nprob[index[j+1]]) j++; if (nprob[k] <= nprob[index[j]]) break; index[i]=index[j]; i=j; } index[i]=k; } Note that the structure hcode must be defined and allocated in your main program with statements like this: #include "nrutil.h" #define MC 512 Maximum anticipated value of nchin in hufmak. #define MQ (2*MC-1) typedef struct { unsigned long *icod,*ncod,*left,*right,nch,nodemax; } huffcode; ... huffcode hcode; ... hcode.icod=(unsigned long *)lvector(1,MQ); Allocate space within hcode. hcode.ncod=(unsigned long *)lvector(1,MQ); hcode.left=(unsigned long *)lvector(1,MQ); hcode.right=(unsigned long *)lvector(1,MQ); for (j=1;j<=MQ;j++) hcode.icod[j]=hcode.ncod[j]=0; Once the code is constructed, one encodes a string of characters by repeated calls to hufenc, which simply does a table lookup of the code and appends it to the output message. #include <stdio.h> #include <stdlib.h> typedef struct { unsigned long *icod,*ncod,*left,*right,nch,nodemax; } huffcode; void hufenc(unsigned long ich, unsigned char **codep, unsigned long *lcode, unsigned long *nb, huffcode *hcode) Huffman encode the single character ich (in the range 0..nch-1) using the code in the structure hcode, write the result to the character array *codep[1..lcode] starting at bit nb (whose smallest valid value is zero), and increment nb appropriately. This routine is called