TABLE OF CONTENTS xvii CHAPTER SEVEN:PERMUTATIONS 345 7.1 Basic Properties of Permutations 347 7.2 Algorithms on Permutations 355 7.3 Representations of Permutations 358 7.4 Enumeration Problems 366 7.5 Analyzing Properties of Permutations with CGFs 372 7.6 Inversions and Insertion Sorts 384 7.7 Left-to-Right Minima and Selection Sort 393 7.8 Cycles and In Situ Permutation 401 7.9 Extremal Parameters 406 CHAPTER EIGHT:STRINGS AND TRIES 415 8.1 String Searching 416 8.2 Combinatorial Properties of Bitstrings 420 8.3 Regular Expressions 432 8.4 Finite-State Automata and the Knuth-Morris-Pratt 437 Algorithm 8.5 Context-Free Grammars 441 8.6 Tries 448 8.7 Trie Algorithms 453 8.8 Combinatorial Properties of Tries 459 8.9 Larger Alphabets 465 CHAPTER NINE:WORDS AND MAPPINGS 473 9.1 Hashing with Separate Chaining 474 9.2 The Balls-and-Urns Model and Properties of Words 476 9.3 Birthday Paradox and Coupon Collector Problem 485 9.4 Occupancy Restrictions and Extremal Parameters 495 9.5 Occupancy Distributions 501 9.6 Open Addressing Hashing 509 9.7 Mappings 519 9.8 Integer Factorization and Mappings 532 List of Theorems 543 List of Tables 545 List of Figures 547 Index 551 www.it-ebooks.info
T ō Ŏ Ř ő ś Œ C ś Ś Š ő Ś Š ş xvii CŔōŜŠőŞ SőŢőŚ: PőŞřšŠōŠ ŕśŚş 345 7.1 Basic Properties of Permutations 347 7.2 Algorithms on Permutations 355 7.3 Representations of Permutations 358 7.4 Enumeration Problems 366 7.5 Analyzing Properties of Permutations with CGFs 372 7.6 Inversions and Insertion Sorts 384 7.7 Left-to-Right Minima and Selection Sort 393 7.8 Cycles and In Situ Permutation 401 7.9 Extremal Parameters 406 CŔōŜŠőŞ E ŕœŔŠ: SŠŞ ŕŚœş ōŚŐ TŞ ŕőş 415 8.1 String Searching 416 8.2 Combinatorial Properties of Bitstrings 420 8.3 Regular Expressions 432 8.4 Finite-State Automata and the Knuth-Morris-Pratt 437 Algorithm 8.5 Context-Free Grammars 441 8.6 Tries 448 8.7 Trie Algorithms 453 8.8 Combinatorial Properties of Tries 459 8.9 Larger Alphabets 465 CŔōŜŠőŞ N ŕŚő: WśŞŐş ōŚŐ MōŜŜ ŕŚœş 473 9.1 Hashing with Separate Chaining 474 9.2 Ļe Balls-and-Urns Model and Properties of Words 476 9.3 Birthday Paradox and Coupon Collector Problem 485 9.4 Occupancy Restrictions and Extremal Parameters 495 9.5 Occupancy Distributions 501 9.6 Open Addressing Hashing 509 9.7 Mappings 519 9.8 Integer Factorization and Mappings 532 List of Ļeorems 543 List of Tables 545 List of Figures 547 Index 551 www.it-ebooks.info
NOTATION Lx]foor function largest integer less than or equal to x 「x] ceiling function smallest integer greater than or equal to x {x} fractional part x-z] 1gN binary logarithm l0g2N InN natural logarithm loge N n binomial coefficient number of ways to choose k out of n items n k Stirling number of the first kind number of permutations of n elements that have k cycles Stirling number of the second kind number of ways to partition n elements into k nonempty subsets 0 golden ratio (1+√⑤)/2=1.61803… Euler's constant 57721·… Stirlings constant √2m=2.50662.· www.it-ebooks.info
N O T A T I O N ⌊x⌋ łoor function largest integer less than or equal to x ⌈x⌉ ceiling function smallest integer greater than or equal to x {x} fractional part x − ⌊x⌋ lgN binary logarithm log2N lnN natural logarithm logeN ( n k ) binomial coefficient number of ways to choose k out of n items [ n k ] Stirling number of the ŀrst kind number of permutations of n elements that have k cycles { n k } Stirling number of the second kind number of ways to partition n elements into k nonempty subsets ϕ golden ratio (1 + √ 5)/2 = 1.61803 · · · γ Euler’s constant .57721 · · · σ Stirling’s constant √ 2π = 2.50662 · · · www.it-ebooks.info
CHAPTER ONE ANALYSIS OF ALGORITHMS M ATHEMATICAL studies of the properties of computer algorithms have spanned a broad spectrum,from general complexity studies to specific analytic results.In this chapter,our intent is to provide perspective on various approaches to studying algorithms,to place our field of study into context among related fields and to set the stage for the rest of the book. To this end,we illustrate concepts within a fundamental and representative problem domain:the study of sorting algorithms. First,we will consider the general motivations for algorithmic analysis. Why analyze an algorithm?What are the benefits of doing so?How can we simplify the process?Next,we discuss the theory of algorithms and consider as an example mergesort,an"optimal"algorithm for sorting.Following that, we examine the major components of a full analysis for a sorting algorithm of fundamental practical importance,quicksort.This includes the study of vari- ous improvements to the basic quicksort algorithm,as well as some examples illustrating how the analysis can help one adjust parameters to improve per- formance. These examples illustrate a clear need for a background in certain areas of discrete mathematics.In Chapters 2 through 4,we introduce recurrences, generating functions,and asymptotics-basic mathematical concepts needed for the analysis of algorithms.In Chapter 5,we introduce the symbolic method, a formal treatment that ties together much of this book's content.In Chap- ters 6 through 9,we consider basic combinatorial properties of fundamental algorithms and data structures.Since there is a close relationship between fundamental methods used in computer science and classical mathematical analysis,we simultaneously consider some introductory material from both areas in this book. 1.1 Why Analyze an Algorithm?There are several answers to this basic ques- tion,depending on one's frame of reference:the intended use of the algo- rithm,the importance of the algorithm in relationship to others from both practical and theoretical standpoints,the difficulty of analysis,and the accu- racy and precision of the required answer. 3 www.it-ebooks.info
C H A P T E R O N E A N A L Y S I S O F A L G O R I T HM S MATHEMATICAL studies of the properties of computer algorithms have spanned a broad spectrum, from general complexity studies to speciŀc analytic results. In this chapter, our intent is to provide perspective on various approaches to studying algorithms, to place our ŀeld of study into context among related ŀelds and to set the stage for the rest of the book. To this end, we illustrate concepts within a fundamental and representative problem domain: the study of sorting algorithms. First, we will consider the general motivations for algorithmic analysis. Why analyze an algorithm? What are the beneŀts of doing so? How can we simplify the process? Next, we discuss the theory of algorithms and consider as an example mergesort, an “optimal” algorithm for sorting. Following that, we examine the major components of a full analysis for a sorting algorithm of fundamental practical importance, quicksort. Ļis includes the study of various improvements to the basic quicksort algorithm, as well as some examples illustrating how the analysis can help one adjust parameters to improve performance. Ļese examples illustrate a clear need for a background in certain areas of discrete mathematics. In Chapters 2 through 4, we introduce recurrences, generating functions, and asymptotics—basic mathematical concepts needed for the analysis of algorithms. In Chapter 5, we introduce the symbolic method, a formal treatment that ties together much of this book’s content. In Chapters 6 through 9, we consider basic combinatorial properties of fundamental algorithms and data structures. Since there is a close relationship between fundamental methods used in computer science and classical mathematical analysis, we simultaneously consider some introductory material from both areas in this book. 1.1 Why Analyze an Algorithm? Ļere are several answers to this basic question, depending on one’s frame of reference: the intended use of the algorithm, the importance of the algorithm in relationship to others from both practical and theoretical standpoints, the difficulty of analysis, and the accuracy and precision of the required answer. Ț www.it-ebooks.info
4 CHAPTER ONE S.1 The most straightforward reason for analyzing an algorithm is to dis- cover its characteristics in order to evaluate its suitability for various appli- cations or compare it with other algorithms for the same application.The characteristics of interest are most often the primary resources of time and space,particularly time.Put simply,we want to know how long an imple- mentation of a particular algorithm will run on a particular computer,and how much space it will require.We generally strive to keep the analysis inde- pendent of particular implementations-we concentrate instead on obtaining results for essential characteristics of the algorithm that can be used to derive precise estimates of true resource requirements on various actual machines. In practice,achieving independence between an algorithm and char- acteristics of its implementation can be difficult to arrange.The quality of the implementation and properties of compilers,machine architecture,and other major facets of the programming environment have dramatic effects on performance.We must be cognizant of such effects to be sure the results of analysis are useful.On the other hand,in some cases,analysis of an algo- rithm can help identify ways for it to take full advantage of the programming environment. Occasionally,some property other than time or space is of interest,and the focus of the analysis changes accordingly.For example,an algorithm on a mobile device might be studied to determine the effect upon battery life, or an algorithm for a numerical problem might be studied to determine how accurate an answer it can provide.Also,it is sometimes appropriate to address multiple resources in the analysis.For example,an algorithm that uses a large amount of memory may use much less time than an algorithm that gets by with very little memory.Indeed,one prime motivation for doing a careful analysis is to provide accurate information to help in making proper tradeoff decisions in such situations. The term analysis ofalgorithms has been used to describe two quite differ- ent general approaches to putting the study of the performance of computer programs on a scientific basis.We consider these two in turn. The first,popularized by Aho,Hopcroft,and Ullman [2]and Cormen, Leiserson,Rivest,and Stein [6],concentrates on determining the growth of the worst-case performance of the algorithm(an"upper bound").A prime goal in such analyses is to determine which algorithms are optimal in the sense that a matching "lower bound"can be proved on the worst-case performance of any algorithm for the same problem.We use the term theory ofalgorithms www.it-ebooks.info
ț C Ŕ ō Ŝ Š ő Ş O Ś ő §Ș.Ș Ļe most straightforward reason for analyzing an algorithm is to discover its characteristics in order to evaluate its suitability for various applications or compare it with other algorithms for the same application. Ļe characteristics of interest are most often the primary resources of time and space, particularly time. Put simply, we want to know how long an implementation of a particular algorithm will run on a particular computer, and how much space it will require. We generally strive to keep the analysis independent of particular implementations—we concentrate instead on obtaining results for essential characteristics of the algorithm that can be used to derive precise estimates of true resource requirements on various actual machines. In practice, achieving independence between an algorithm and characteristics of its implementation can be difficult to arrange. Ļe quality of the implementation and properties of compilers, machine architecture, and other major facets of the programming environment have dramatic effects on performance. We must be cognizant of such effects to be sure the results of analysis are useful. On the other hand, in some cases, analysis of an algorithm can help identify ways for it to take full advantage of the programming environment. Occasionally, some property other than time or space is of interest, and the focus of the analysis changes accordingly. For example, an algorithm on a mobile device might be studied to determine the effect upon battery life, or an algorithm for a numerical problem might be studied to determine how accurate an answer it can provide. Also, it is sometimes appropriate to address multiple resources in the analysis. For example, an algorithm that uses a large amount of memory may use much less time than an algorithm that gets by with very little memory. Indeed, one prime motivation for doing a careful analysis is to provide accurate information to help in making proper tradeoff decisions in such situations. Ļe term analysis of algorithms has been used to describe two quite different general approaches to putting the study of the performance of computer programs on a scientiŀc basis. We consider these two in turn. Ļe ŀrst, popularized by Aho, Hopcroft, and Ullman [2] and Cormen, Leiserson, Rivest, and Stein [6], concentrates on determining the growth of the worst-case performance of the algorithm (an “upper bound”). A prime goal in such analyses is to determine which algorithms are optimal in the sense that a matching “lower bound” can be proved on the worst-case performance of any algorithm for the same problem. We use the term theory of algorithms www.it-ebooks.info
SI.I ANALY SIS OF ALGORITHMS 5 to refer to this type of analysis.It is a special case of computational complexity, the general study of relationships between problems,algorithms,languages, and machines.The emergence of the theory of algorithms unleashed an Age of Design where multitudes of new algorithms with ever-improving worst- case performance bounds have been developed for multitudes of important problems.To establish the practical utility of such algorithms,however,more detailed analysis is needed,perhaps using the tools described in this book. The second approach to the analysis ofalgorithms,popularized by Knuth [17][18][19][20][22],concentrates on precise characterizations of the best- case,worst-case,and average-case performance ofalgorithms,using a method- ology that can be refined to produce increasingly precise answers when de- sired.A prime goal in such analyses is to be able to accurately predict the performance characteristics of particular algorithms when run on particular computers,in order to be able to predict resource usage,set parameters,and compare algorithms.This approach is scientific:we build mathematical mod- els to describe the performance of real-world algorithm implementations, then use these models to develop hypotheses that we validate through ex- perimentation. We may view both these approaches as necessary stages in the design and analysis of efficient algorithms.When faced with a new algorithm to solve a new problem,we are interested in developing a rough idea of how well it might be expected to perform and how it might compare to other algorithms for the same problem,even the best possible.The theory of algo- rithms can provide this.However,so much precision is typically sacrificed in such an analysis that it provides little specific information that would al- low us to predict performance for an actual implementation or to properly compare one algorithm to another.To be able to do so,we need details on the implementation,the computer to be used,and,as we see in this book, mathematical properties of the structures manipulated by the algorithm.The theory of algorithms may be viewed as the first step in an ongoing process of developing a more refined,more accurate analysis;we prefer to use the term analysis ofalgorithms to refer to the whole process,with the goal of providing answers with as much accuracy as necessary. The analysis of an algorithm can help us understand it better,and can suggest informed improvements.The more complicated the algorithm,the more difficult the analysis.But it is not unusual for an algorithm to become simpler and more elegant during the analysis process.More important,the www.it-ebooks.info
§Ș.Ș A Ś ō Ř ť ş ŕ ş ś Œ A Ř œ ś Ş ŕ Š Ŕ ř ş Ȝ to refer to this type of analysis. It is a special case of computational complexity, the general study of relationships between problems, algorithms, languages, and machines. Ļe emergence of the theory of algorithms unleashed an Age of Design where multitudes of new algorithms with ever-improving worstcase performance bounds have been developed for multitudes of important problems. To establish the practical utility of such algorithms, however, more detailed analysis is needed, perhaps using the tools described in this book. Ļe second approach to the analysis of algorithms, popularized by Knuth [17][18][19][20][22], concentrates on precise characterizations of the bestcase, worst-case, and average-case performance of algorithms, using a methodology that can be reŀned to produce increasingly precise answers when desired. A prime goal in such analyses is to be able to accurately predict the performance characteristics of particular algorithms when run on particular computers, in order to be able to predict resource usage, set parameters, and compare algorithms. Ļis approach is scientiŀc: we build mathematical models to describe the performance of real-world algorithm implementations, then use these models to develop hypotheses that we validate through experimentation. We may view both these approaches as necessary stages in the design and analysis of efficient algorithms. When faced with a new algorithm to solve a new problem, we are interested in developing a rough idea of how well it might be expected to perform and how it might compare to other algorithms for the same problem, even the best possible. Ļe theory of algorithms can provide this. However, so much precision is typically sacriŀced in such an analysis that it provides little speciŀc information that would allow us to predict performance for an actual implementation or to properly compare one algorithm to another. To be able to do so, we need details on the implementation, the computer to be used, and, as we see in this book, mathematical properties of the structures manipulated by the algorithm. Ļe theory of algorithms may be viewed as the ŀrst step in an ongoing process of developing a more reŀned, more accurate analysis; we prefer to use the term analysis of algorithms to refer to the whole process, with the goal of providing answers with as much accuracy as necessary. Ļe analysis of an algorithm can help us understand it better, and can suggest informed improvements. Ļe more complicated the algorithm, the more difficult the analysis. But it is not unusual for an algorithm to become simpler and more elegant during the analysis process. More important, the www.it-ebooks.info