if further approximations could be constructed,but the labor involved becomes enormous at the next stage." A program can automate this laborious task.Our C program to generate order-k Markov chains will store at most five megabytes of text in the array inputchars: int k =2; char inputchars[5000000]; char *word[1000000]; int nword 0; We could implement Shannon's algorithm directly by scanning through the complete input text to generate each word (though this might be slow on large texts).We will instead employ the array word as a kind of suffix array pointing to the characters,except that it starts only on word boundaries(a common modification).The variable mword holds the number of words.We read the file with this code: word[0]inputchars while scanf("号s",word[nword])!=EoF word[nword+1]word[nword]strlen(word[nword])+1 nword++ Each word is appended to inputchars(no other storage allocation is needed),and is terminated by the null character supplied by scanf. After we read the input,we will sort the word array to bring together all pointers that point to the same sequence of k words.This function does the comparisons: int wordncmp(char *p,char*q) n k for(;*p==*q;p++,q++) 1f(*p==0&&--n==0) return 0 return *p -*q It scans through the two strings while the characters are equal.At every null character,it decrements the counter n and returns equal after seeing k identical words.When it finds unequal characters,it returns the difference. After reading the input,we append k null characters(so the comparison function doesn't run off the end),print the first k words in the document(to start the random output),and call the sort: for i=[0,k) word[nword][i]=0 for i=[0,k) print word[i] qsort (word,nword,sizeof(word[0]),sortcmp) The sortcmp function,as usual,adds a level of indirection to its pointers
if further approximations could be constructed, but the labor involved becomes enormous at the next stage.'' A program can automate this laborious task. Our C program to generate order-k Markov chains will store at most five megabytes of text in the array inputchars: int k = 2; char inputchars[5000000]; char *word[1000000]; int nword = 0; We could implement Shannon's algorithm directly by scanning through the complete input text to generate each word (though this might be slow on large texts). We will instead employ the array word as a kind of suffix array pointing to the characters, except that it starts only on word boundaries (a common modification). The variable nword holds the number of words. We read the file with this code: word[0] = inputchars while scanf("%s", word[nword]) != EOF word[nword+1] = word[nword] + strlen(word[nword]) + 1 nword++ Each word is appended to inputchars (no other storage allocation is needed), and is terminated by the null character supplied by scanf. After we read the input, we will sort the word array to bring together all pointers that point to the same sequence of k words. This function does the comparisons: int wordncmp(char *p, char* q) n = k for ( ; *p == *q; p++, q++) if (*p == 0 && --n == 0) return 0 return *p - *q It scans through the two strings while the characters are equal. At every null character, it decrements the counter n and returns equal after seeing k identical words. When it finds unequal characters, it returns the difference. After reading the input, we append k null characters (so the comparison function doesn't run off the end), print the first k words in the document (to start the random output), and call the sort: for i = [0, k) word[nword][i] = 0 for i = [0, k) print word[i] qsort(word, nword, sizeof(word[0]), sortcmp) The sortcmp function, as usual, adds a level of indirection to its pointers
Our space-efficient structure now contains a great deal of information about the k-grams in the text.If k is 1 and the input text is'of the people,by the people,for the people",the word array might look like this: word[0]:by the word[1]:for the word[2]:of the word[3]:people word[4]:people,for word[5]:people,by word[6]:the people, word[7]:the people word[8]:the people, For clarity,this picture shows only the first k+/words pointed to by each element of word,even though more words usually follow.If we seek a word to follow the phrase'the",we look it up in the suffix array to discover three choices:'people,"twice and'people"once. We may now generate nonsense text with this pseudocode sketch: phrase first phrase in input array loop perform a binary search for phrase in word[0..nword-1] for all phrases equal in the first k words select one at random,pointed to by p phrase word following p if k-th word of phrase is length 0 break print k-th word of phrase We initialize the loop by setting phrase to the first characters in the input(recall that those words were already printed on the output file).The binary search uses the code in Section 9.3 to locate the first occurrence of phrase(it is crucial to find the very first occurrence;the binary search of Section 9.3 does just this).The next loop scans through all equal phrases,and uses Solution 12.10 to select one of them at random.If the k-th word of that phrase is of length zero,the current phrase is the last in the document,so we break the loop. The complete pseudocode implements those ideas,and also puts an upper bound on the number of words it will generate. phrase inputchars for (wordsleft 10000;wordsleft 0;wordsleft--) 1=-1 u nword whi1e1+1!=u m=(1+u)/2 if wordncmp (word[m],phrase)<0 1=m else u=m
Our space-efficient structure now contains a great deal of information about the k-grams in the text. If k is 1 and the input text is ``of the people, by the people, for the people'', the word array might look like this: word[0]: by the word[1]: for the word[2]: of the word[3]: people word[4]: people, for word[5]: people, by word[6]: the people, word[7]: the people word[8]: the people, For clarity, this picture shows only the first k+1 words pointed to by each element of word, even though more words usually follow. If we seek a word to follow the phrase ``the'', we look it up in the suffix array to discover three choices: ``people,'' twice and ``people'' once. We may now generate nonsense text with this pseudocode sketch: phrase = first phrase in input array loop perform a binary search for phrase in word[0..nword-1] for all phrases equal in the first k words select one at random, pointed to by p phrase = word following p if k-th word of phrase is length 0 break print k-th word of phrase We initialize the loop by setting phrase to the first characters in the input (recall that those words were already printed on the output file). The binary search uses the code in Section 9.3 to locate the first occurrence of phrase (it is crucial to find the very first occurrence; the binary search of Section 9.3 does just this). The next loop scans through all equal phrases, and uses Solution 12.10 to select one of them at random. If the k-th word of that phrase is of length zero, the current phrase is the last in the document, so we break the loop. The complete pseudocode implements those ideas, and also puts an upper bound on the number of words it will generate: phrase = inputchars for (wordsleft = 10000; wordsleft > 0; wordsleft--) l = -1 u = nword while l+1 != u m = (l + u) / 2 if wordncmp(word[m], phrase) < 0 l = m else u = m
for (i=0;wordncmp(phrase,word[uti])==0;i++) if rand()号(i+1)==0 p word[u+i] phrase skip(p,1) if strlen(skip(phrase,k-1))==0 break print skip(phrase,k-1) Chapter 3 of Kernighan and Pike's Practice of Programming(described in Section 5.9)is devoted to the general topic of''Design and Implementation".They build their chapter around the problem of word-level Markov-text generation because''it is typical of many programs:some data comes in,some data goes out,and the processing depends on a little ingenuity."They give some interesting history of the problem,and implement programs for the task in C,Java,C++,Awk and Perl. The program in this section compares favorably with their C program for the task.This code is about half the length of theirs;representing a phrase by a pointer to k consecutive words is space-efficient and easy to implement. For inputs of size near a megabyte,the two programs are of roughly comparable speed.Because Kernighan and Pike use a slightly larger structure and make extensive use of the inefficient malloc,the program in this column uses an order of magnitude less memory on my system.If we incorporate the speedup of Solution 14 and replace the binary search and sorting by a hash table,the program in this section becomes about a factor of two faster(and increases memory use by about 50%). Next:Section 15.4.Principles. Copyright1999 Lucent Technologies.All rights reserved.Wed 18 Oct 2000
for (i = 0; wordncmp(phrase, word[u+i]) == 0; i++) if rand() % (i+1) == 0 p = word[u+i] phrase = skip(p, 1) if strlen(skip(phrase, k-1)) == 0 break print skip(phrase, k-1) Chapter 3 of Kernighan and Pike's Practice of Programming (described in Section 5.9) is devoted to the general topic of ``Design and Implementation''. They build their chapter around the problem of word-level Markov-text generation because ``it is typical of many programs: some data comes in, some data goes out, and the processing depends on a little ingenuity.'' They give some interesting history of the problem, and implement programs for the task in C, Java, C++, Awk and Perl. The program in this section compares favorably with their C program for the task. This code is about half the length of theirs; representing a phrase by a pointer to k consecutive words is space-efficient and easy to implement. For inputs of size near a megabyte, the two programs are of roughly comparable speed. Because Kernighan and Pike use a slightly larger structure and make extensive use of the inefficient malloc, the program in this column uses an order of magnitude less memory on my system. If we incorporate the speedup of Solution 14 and replace the binary search and sorting by a hash table, the program in this section becomes about a factor of two faster (and increases memory use by about 50%). Next: Section 15.4. Principles. Copyright © 1999 Lucent Technologies. All rights reserved. Wed 18 Oct 2000
Letter-Level Markov Text (Illustrating Section 15.3 Programmng of Programming Pearls) Section 15.3 describes letter-level Markov text and gives a few examples.Here are some more examples,generated by this Second Edition program from several sources. The Book of Romans(King James Version) Jon Bentley Order-1:Runs chg Sprif ighaifay fe;ie llathis,fur Gos ngithigh fLom sist aminth uces yom Je Movin th we hof I juthe peathor ly dis igstredithe the,ist oth t s th uth re?hathlivicoto burecon on,Forer mautht w Lowepe sucthet nese,optro cl cth,alditheakengerdisa of bea wepis nd on ichencouteeinthtemef id,n thard f heans rat ithooowh, halang s mail whet alf it th,avoterd myond che su:pulece t hethy tinen,Forund ur;h,y tousio Order-2:For unto yousay law to do retway hein:thein ther on,Who dopento the he but wit forethered Jesin:ach minto at of the livence,wholet ye judge of heren the me of th.Ford fulne,andethe a refor oure dowe God he hathess.Becom thers,bed hat the sing ousne any med boanined that wer praoh,aryphe knot that law of the ef:I mints Chrom home insgrahalso ded ford to wits norks:but san,wrace to thento the sid;have all the as ith:buth.To antiondignam to unte of So now the rise selievelesed law,As sh to but is of tore wrink mignot him pe:ford me,to prough ithe ard be.Forke eare st;expe counto the pe he is abled an he again forears,the be not de,lesion thre th, Whou witilethaturer an ofesusned to pon aw. Order-3:For righbour in from her own Sion,There not,which confidentillined;For thereignation and thes ves: things is gospel,do nothink wised;and root is of the law?of unto him which hearath ye werefor the est,but child seeth;But neithough them what life.But wise workenned in their we knowing to them whets of Jesurreceivereus, and,and commit is unwise anot,deated are trivet;in put God,thy meat who what want ought,which and thereignor God same by the circumcision.As it wise himselves deate. Order-4:Therein the law to Gomorrha.Owe not atten for it was lieth believed.The gosperously report?For Israel, not in business shalt not hope.For I receiving the works:unto these to that in my hundred you:for of God,the changed began,which is thou,nor all thing as by the law of Christ uncircumcised the me infirmity againstanding and Sarah's way the carnally preferring inst alive the law,thou the law of sin;and bitten as by ther. Order-5:What their lusts there in your father,because also maketh not,it is among ought of his he liveth,if it fulfil things which cause division verily,the gospel of God.For I be saved.For they have mercy.Now if thou boast of patience of envy,murder,debate,deceive tree,which are the righteousness.Much more shall rise thankful;but have you into Spain. King James Bible(Complete Text)
Letter-Level Markov Text (Illustrating Section 15.3 of Programming Pearls) Section 15.3 describes letter-level Markov text and gives a few examples. Here are some more examples, generated by this program from several sources. The Book of Romans (King James Version) Order-1: Runs ch g Sprif ighaifay fe; ie llathis, fur Gos ngithigh f Lom sist aminth uces yom Je Movin th we hof I juthe peathor ly dis igstredithe the, ist oth t s th uth re? hathlivicoto burecon on, Forer mautht w Lowepe sucthet nese, optro cl cth, alditheakengerdisa of bea wepis nd on ichencouteeinthtemef id, n thard f heans rat ithooowh, halang s mail whet alf it th, avoterd myond che su: pulece t hethy tinen, Forund ur; h, y tousio Order-2: For unto yousay law to do retway hein: thein ther on, Who dopento the he but wit forethered Jesin: ach minto at of the livence, wholet ye judge of heren the me of th. Ford fulne, andethe a refor oure dowe God he hathess. Becom thers, bed hat the sing ousne any med boanined that wer praoh, aryphe knot that law of the ef: I mints Chrom home insgrahalso ded ford to wits norks: but san, wrace to thento the sid; have all the as ith: buth. To antiondignam to unte of So now the rise selievelesed law; As sh to but is of tore wrink mignot him pe: ford me, to prough ithe ard be. Forke eare st; expe counto the pe he is abled an he again forears, the be not de, lesion thre th, Whou witilethaturer an ofesusned to pon aw. Order-3: For righbour in from her own Sion, There not, which confidentillined; For thereignation and thes ves: things is gospel, do nothink wised; and root is of the law? of unto him which hearath ye werefor the est, but child seeth; But neithough them what life. But wise workenned in their we knowing to them whets of Jesurreceivereus, and, and commit is unwise anot, deated are trivet; in put God, thy meat who what want ought, which and thereignor God same by the circumcision. As it wise himselves deate. Order-4: Therein the law to Gomorrha. Owe not atten for it was lieth believed. The gosperously report? For Israel, not in business shalt not hope. For I receiving the works: unto these to that in my hundred you: for of God, the changed began, which is thou, nor all thing as by the law of Christ uncircumcised the me infirmity againstanding and Sarah's way the carnally preferring inst alive the law, thou the law of sin; and bitten as by ther. Order-5: What their lusts there in your father, because also maketh not, it is among ought of his he liveth, if it fulfil things which cause division verily, the gospel of God. For I be saved. For they have mercy. Now if thou boast of patience of envy, murder, debate, deceive tree, which are the righteousness. Much more shall rise thankful; but have you into Spain. King James Bible (Complete Text)
Order-1:Ched t ainone wand LORD,Thenathan g u t;wt Sona t essose Anasesed an trer.send Ege fomongold she eyothrofo Andecondiore chizande Gouaip prkecofovire wid,g ie ay I Fag h veds,my ye on They, Theayotingaspuly obe wandoplacco Moued wanere hern tiedef hiverdath ade:e oe thalld mive by ifond nd hand ra. omed weleche thant f u an f sththil,s. Order-2:a grand it the woraelfspit up fir as king thaderld,I slot kins ts,shis thim the mot:and Jestrul par hals? Godsto but knoth,and will thalson me up the thee,shings ind I huse twerphemieved hey me,wasumbrot of thence. And gan wor themplehowify not se,then boat cur ware me:flem pasts,sto therenes:andmen th gooker but he commins with him theathe de a cault I wittereveres?whighte,for of Shinged.Themy un,and whou be froass hild, that unto ances saw eart scom by laints froself.And hall youll led inks carthe Fathrues. Order-3:Against of Ashekeloverth with his uncill be pill prehold,We came womb?God;my be thethe you.O that he the LORD,three and of they may shall before word of the LORD the LORD;forth hast it thousalem.As counto his sould rein this he shalt not side unto hire;Let unto yet,any.The watersiteth Aarone the said unto ther bring in God of Lord of his was day take a sought.And for a raim who is,and on in that that slain of be the wreath them and miliphats signorth me,O Zion end heardern unt their than to shall the it serpennehast the bar;and a greakim, where]in thirt me done for inhabide asting,The LORD,and the spake.Like the had where mand Laisin but he nations. Order-4:Open he sister daughteousness,whitherefore they present with themselves;When the Kenezites. Therefore shall appointed that yea,Lord GOD.And I may servants or lips.And he sons of God.The left horses anger to my right unto this his king of Enan.And the lastiness.For themself which we made unto his is this born back of Jedaiah,and thy possessions shut him that me.Then I countainst that there of the king of our boast,that he stones,two and fault fell for the house thy water thy is poured with him was a coping the send the trespecia.And God was out offering,and their house,both in the porter our femaleface of Israel inst sat he comingled,and wearingled of come,they kingdoms.And Israel,when the olivers were more three out of Pening of the Lord,and what Jerusalem,And they had turn to all bring David accept,and fellow down it. Order-5:Thus saith thy man righted,behold,Gaal was thou art that fell do them,and encamped unto the did unto Martha,the height before so doth by them alive:and for thirty and by give our Fathers were made.Flee formed for they that doth commanded thy servants,according against him go not sinned about;and to her,and cast the would to come up against month Abel had come unto the earth.And the LORD unto the cud,but such camel,and the curtain offering:and them,and all thee unruly,came down to them what it us;and joy:And David well their brethren,out from among thee.And it came,bewailed over Israel said unto all gall:in his he.And Jehoiada the cometh the prophesy:And when thou then shekel of the goodly after him,and menservants,which I commandment,and we had done to Egyptians and a loud of their bread,and to Abraham's sin.Howbeit let him not; then spake us the salt?or stretcheth of Abib,the other to you,That hath promise.After with God shall not afraid, Nay,not tell ye of the resist shewed me from all. Column 10 of Programming Pearls Order-1:nome atwoce te smstarond toce acthare est juthel vers Th ay theome aytinglyt f vinca tiforachthedabj) thionfo rpobaro ske g,beruthainsse iedif ton eane ustioutinde.titesy s th g ronpromarace s,Wedesimess ctis wan spe thivausprillyton e bougre inorunds RACPore in acora wite ar 202 inductipau caulduinng moucan a alams.Ind onthacooon tepancheplofasicenderysppl ay,Otes s. Order-2:Eng gingent intessic diver frow hanamidex,In thatry a triblepromple the le de thes pace ing romemples
Order-1: Ched t ainone wand LORD, Thenathan g u t; w t Sona t essose Anasesed an trer. send Ege fomongold, she eyothrofo Andecondiore chizande Gouaip prkecofovire wid, g ie ay l Fag h veds, my ye on They, Theayotingaspuly obe wandoplacco Moued wanere hern tiedef hiverdath ade: e oe thalld mive by ifond nd hand ra, omed weleche thant f u an f sththil, s. Order-2: a grand it the woraelfspit up fir as king thaderld, I slot kins ts, shis thim the mot: and Jestrul par hals? Godsto but knoth, and will thalson me up the thee, shings ind I huse twerphemieved hey me, wasumbrot of thence. And gan wor themplehowify not se, then boat cur ware me: flem pasts, sto therenes: andmen th gooker but he commins with him theathe de a cault I wittereveres? whighte, for of Shinged. Themy un, and whou be froass hild, that unto ances saw eart scom by laints froself. And hall youll led inks carthe Fathrues. Order-3: Against of Ashekeloverth with his uncill be pill prehold, We came womb? God; my be the the you. O that he the LORD, three and of they may shall before word of the LORD the LORD; forth hast it thousalem. As counto his sould rein this he shalt not side unto hire; Let unto yet, any. The watersiteth Aarone the said unto ther bring in God of Lord of his was day take a sought. And for a raim who is, and on in that that slain of be the wreath them and miliphats signorth me, O Zion end heardern unt their than to shall the it serpennehast the bar; and a greakim, where] in thirt me done for inhabide asting, The LORD, and the spake. Like the had where mand Laisin but he nations. Order-4: Open he sister daughteousness, whitherefore they present with themselves; When the Kenezites. Therefore shall appointed that yea, Lord GOD. And I may servants or lips. And he sons of God. The left horses anger to my right unto this his king of Enan. And the lastiness. For themself which we made unto his is this born back of Jedaiah, and thy possessions shut him that me. Then I countainst that there of the king of our boast, that he stones, two and fault fell for the house thy water thy is poured with him was a coping the send the trespecia. And God was out offering, and their house, both in the porter our femaleface of Israel inst sat he comingled, and wearingled of come, they kingdoms. And Israel, when the olivers were more three out of Pening of the Lord, and what Jerusalem, And they had turn to all bring David accept, and fellow down it. Order-5: Thus saith thy man righted, behold, Gaal was thou art that fell do them, and encamped unto the did unto Martha, the height before so doth by them alive: and for thirty and by give our Fathers were made. Flee formed for they that doth commanded thy servants, according against him go not sinned about; and to her, and cast the would to come up against month Abel had come unto the earth. And the LORD unto the cud, but such camel, and the curtain offering: and them, and all thee unruly, came down to them what it us; and joy: And David well their brethren, out from among thee. And it came, bewailed over Israel said unto all gall: in his he. And Jehoiada the cometh the prophesy: And when thou then shekel of the goodly after him, and menservants, which I commandment, and we had done to Egyptians and a loud of their bread, and to Abraham's sin. Howbeit let him not; then spake us the salt? or stretcheth of Abib, the other to you, That hath promise. After with God shall not afraid, Nay, not tell ye of the resist shewed me from all. Column 10 of Programming Pearls Order-1: nome atwoce te smstarond toce acthare est juthel vers Th ay theome aytinglyt f vinca tiforachthedabj) thionfo rpobaro ske g, beruthainsse iedif ton eane ustioutinde. titesy s th g ronpromarace s, Wedesimess ctis wan spe thivausprillyton e bougre inorunds RACPore in acora wite ar 202 inductipau caulduinng moucan a alams.Ind onthacooon tepancheplofasicenderysppl ay,0tes s. Order-2: Eng gingent intessic diver frow hanamidex, In thatry a triblepromple the le de thes pace ing romemples