File Compression Using a fixed number of bits to represent each character,called the encoding of the character,the size of the file depends only on the number of characters in the file. Since the frequency of some characters may be much larger than others,it is reasonable to use variable length encodings
File Compression ◼ Using a fixed number of bits to represent each character, called the encoding of the character, the size of the file depends only on the number of characters in the file. ◼ Since the frequency of some characters may be much larger than others, it is reasonable to use variable length encodings
File Compression Intuitively,those characters with large frequencies should be assigned short encodings,whereas long encodings may be assigned to those characters with small frequencies. ■ When the encodings vary in length,we stipulate that the encoding of one character must not be the prefix of the encoding of another character;such codes are called prefix codes. ■ For instance,if we assign the encodings 10 and 101 to the letters“a”and“b',there will be an ambiguity as to whether 10 is the encoding of“a”or is the prefix of the encoding of the letter“b
File Compression ◼ Intuitively, those characters with large frequencies should be assigned short encodings, whereas long encodings may be assigned to those characters with small frequencies. ◼ When the encodings vary in length, we stipulate that the encoding of one character must not be the prefix of the encoding of another character; such codes are called prefix codes. ◼ For instance, if we assign the encodings 10 and 101 to the letters “a” and “b”, there will be an ambiguity as to whether 10 is the encoding of “a” or is the prefix of the encoding of the letter “b
File Compression Once the prefix constraint is satisfied,the decoding becomes unambiguous;the sequence of bits is scanned until an encoding of some character is found. One way to“parse”a given sequence of bits is to use a full binary tree,in which each internal node has exactly two branches labeled by 0 an 1.The leaves in this tree corresponding to the characters. Each sequence of 0's and 1's on a path from the root to a leaf corresponds to a character encoding
File Compression ◼ Once the prefix constraint is satisfied, the decoding becomes unambiguous; the sequence of bits is scanned until an encoding of some character is found. ◼ One way to “parse” a given sequence of bits is to use a full binary tree, in which each internal node has exactly two branches labeled by 0 an 1. The leaves in this tree corresponding to the characters. Each sequence of 0’s and 1’s on a path from the root to a leaf corresponds to a character encoding
File Compression The algorithm presented is due to Huffman. a The algorithm consists of repeating the following procedure until Cconsists of only one character. >Let c and c be two characters with minimum frequencies. Create a new node cwhose frequency is the sum of the frequencies of c and c and make c and c the children of c. Let C=Cici chuc)
File Compression ◼ The algorithm presented is due to Huffman. ◼ The algorithm consists of repeating the following procedure until C consists of only one character. ➢ Let ci and cj be two characters with minimum frequencies. ➢ Create a new node c whose frequency is the sum of the frequencies of ci and cj , and make ci and cj the children of c. ➢ Let C=C-{ci , cj}{c}
File Compression Example: C=a,b,c,d,er f(a)=20 f(b)=7 f(G)=10 f(d)=4 f(e)=18
File Compression ◼ Example: C={a, b, c, d, e} f (a)=20 f (b)=7 f (c)=10 f (d)=4 f (e)=18