The Complexity of Algorithms 7 computer.Such an abstract model can then be used as the basis for comparing different algorithms.This approach is particularly justified since,in practice,we are often interested in the rate at which the time (space)requirement of a pro- gram increases as the size of the input increases.These ideas are fully discussed in section 1.4.For now,let us begin by introducing a terminology regarding the time and space complexity of an algorithm. n-size:measure of the quantity of input data for an algorithm.For instance, in the search problem,N,the number of elements in the array,is the size of the problem whereas,in the second example the magnitude of the parameter,n,should be taken as the size.When deciding on the size of the input,we must ensure that the behaviour of the algorithm will depend on it.Indeed,to be precise,we should take as the size of the input,the total number of bits or bytes (unit of store)which are needed to represent it.Therefore,the size of the input for the search problem is actually (log a log a2 +..log an)bits where the at values are the elements of the array (logs are to base 2).Similarly,in the sequence problem,the size of the input is actually log n.However, in most real applications,the size of numbers is bounded by the word- length of the computer being used and thus a word of memory is taken as the unit of size.This is why in the search problem we take N as the size of the input.For the second example,however,it is not satisfactory to take the size of the input as one number since the behaviour of the algorithm for the problem is not a function of the upper bound for the argument n,but is a function of the actual num- ber of bits in the binary representation of n.This is,of course,(log n) bits.In this case,however,we may assume that a monadic number representation is used where number n is represented as a sequence of n 'ones'.Naturally,the actual size of the input would then be n. (Chapter 12 discusses monadic and binary representations and their effects on the analysis of algorithms,see page 340.) T(n)-time complexity:time needed by an algorithm to complete execution as a function of size of input n. S(n)-space complexity:space (memory)needed by an algorithm to complete execution as a function of size of input n. Because of similarities between time and space,we shall restrict ourselves to time complexity for the moment.For many algorithms the actual time and space complexities are dependent on a particular input data of size n as well as the size of data n.For instance,the search function is not only dependent on N, the size of the array,but also on the search value C.If,for instance,A[l]=C, the function search would be approximately 128 times faster than if A[128]=C or if C were not in the array at all.For this reason,it is useful to distinguish best- case,worst-case and average-case time and space complexities
8 Abstract Data Types and Algorithms Tmax(n)-worst-case time complexity:maximum over all input of size n. Tmin(n)-best-case time complexity:minimum over all input of size n. Tavg(n)-average-case time complexity:average over all input of size n. Usually,the assumption is made that all input sequences are equally likely.For instance,for the search problem it is assumed that C may be in any location of the array with equal probability. The space complexity notation is similar.As we discussed before,we would like to choose an abstract notion of time and relate this to high-level languages and specific algorithms.For instance,for the search algorithm of section 1.1 we could choose the numbers of test,assignment and addition operations required as a measure of the time requirement.This is particularly suitable since a com- puter has a constant time for each of these operations.Now,we can derive the time complexity of the search function.For simplicity,let us assume that C indeed exists in the array (that is,for some i,1 si 128,A[i]=C).If we use the notations 'a'for assignment,'t'for test and 's'for addition(sum)operation times,then for the function search: Tmin(n)=1a+2t+1t+la=2a+3t This occurs where the first element of the array contains C.The first assignment is for initialising J,two tests are for the while loop (A[J]<C,J<N),one test is for A[J]=C and finally one assignment to search is needed.The worst case occurs when A[N]=C.Thus Tmax=1a+(n-1)a+(n-1)s+(2n)t+1t+1a =(n+1)a+(2n+1)t+(n-1)s For the average case,we need to assume that all sequences of n integers in A are equally probable;that is,the probability that the ith element is C is 1/n,or P(A [i]=C)=1/n.If A[i]=C,then Ti,the time needed would be T=(i+1)a+(2i+1)t+(i-1)s and Tm侧=总xa间=0=8Tx1n=lm三 =1 1=1 =1m2e+1a+(2i+1t+-1s -a(+an(a+2a+n(a”》-
The Complexity of Algorithms 9 Note that Tavg=(Tmax+Tmin)/2;that is,the average-case analysis is the average of the best-case and worst-case analyses.However,this result cannot be generali- sed for all algorithms (for instance,see the tree-searching algorithm in chapter 4). As an abstract measure for space,we can take one integer to be the unit of space and thus S(n)=1 for the search function.This is because only one variable, J,is used in the function. The time and space complexities of the b_search function can be found similarly.The crucial point is that,in the worst case,the body of the while loop is executed logan times.This is because,initially an array of size n is considered. In the second execution of the loop an array of size n/2 is considered and so on: n,nl2,n/4,.,1 This,therefore,implies that the while loop is executed a maximum of logan times.It can be shown that: Tmax(n)3(log2 n)t+2(log2n)a+2(log2 n)s +(log2 n)m+(log2 n)d and S(n)=4 where 'm'and 'd'stand for subtraction (minus)and division respectively.A different way to analyse this function will be discussed in section 1.5.In the following sections we shall show that the growth rate of time complexities is usually more important than absolute time measurements,and we develop a notation to express such growth rates. 1.3 Asymptotic Time Complexity-Big O and Big Notations As can be seen from the formulae of time complexities of the function search, specifying time complexities can become very complex as we consider other types of operations (for example,logical operations,array access etc.).Usually, in practice,we only require the behaviour of an algorithm when the size of the input gets very large.Furthermore,we are more interested in the rate at which time and space complexities grow rather than in absolute complexities.For instance,in the search problem,we are more interested in the fact that,when the size of the problem doubles,the time complexity of the function search approximately doubles whereas the time complexity of the function b_search only increases by a constant amount of time.In such circumstances we can legitimately assume that simple operations such as assignments,tests etc.take the same amount of time as one unit of time and also,for large n,small co- efficients and constants can be eliminated.For instance,we can assert
10 Abstract Data Types and Algorithms search:Tmax(n)n units of time and b-search:Tmax(n)logan units of time These gross approximations indeed yield the general behaviour of an algorithm for increasing large values of n (size)rather than exact measures of the time requirement.They indicate the growth rate of time complexities.When n,the size of input,gets very large,such simplifications yield what is known as the asymptotic complexities. Big O and Big S Notations We say T(n)=O(f(n))if there exist constants C and no such that T(n)Cf(n) for all n>no.In other words,T(n)is bounded above by the function f(n)for large n.T(n)is said to be the order offn).For example 3n=0(n):3n≤4n for all n≥1 n+1024=0(n:n+1024≤1025 n for all n≥1 3n2+5n+3=0n2):3n2+5n+3≤11n2 for all n≥1 n2=0n3):n2≤n3 for all n≥1 Butn3≠On2) For the search function above,we can find its asymptotic behaviour as Tmin(n)=2a+3t <2X+3X where X=max(a,t) ≤5X ≤CX1 where C=5X (a constant)for all n Therefore,according to the definition of Big O: Tmin(n)=O(1) For large n,the difference between assignment and test times can therefore be discarded.Similarly Tavg(n) a+(+2t =0(n)a+O(n)t+0(n)s =0n) and Tmax(n)=(+1)a+(2n+1)t+(n-1)s =O(n)a+O(n)t+O(n)s =0n Tmin(n)=O(1)is said to be of constant time and Tavg(n)=O(n)is said to be of linear time
The Complexity of Algorithms 11 To simplify the derivation of asymptotic time complexities of algorithms, certain arithmetic operations can be performed on them.A list of these is given below.For brevity,f and g are used instead of f(n)and g(n)in some places. O(f)+O(g)=O(max(f,g)) O(f)+O(g)=O(f+g) .0().0(g)=0(f.g) .Ifg(n)≤fn)for all n≥no(for a given no) then O(f)+O(g)=O(f) O(cf(n)=O(f(n)c is a constant f(n)=O(f(n)) Note that the Big O is a mechanism for finding an upper bound for the growth rate of time complexity of an algorithm and not a least upper bound.This is why the first two formulae above give different upper bounds for O(f)+O(g). These are used in different circumstances depending on what upper bound is to be found (see also the end of this chapter). The proofs of these rules can be derived from the definition of Big O.For instance Lemma:If T(n)=O(f(n))and G(n)=O(g(n)), then T(n)+G(n)=O(max(f(n),g(n))) Proof T(n)=O(f(n))=there exists ni and ci such that T(n)scif(n)for all n>n1. G(n)=O(gn)→there exists n2andc2 such that G(n)≤c2g(n)for all n>≥n2. Let us assume na max(n1,n2) C3 max(c1,c2)and h(n)=max(f(n),g(n))for all n Since n3≥n1:T(n≤c1f(n)for all n≥n3. Since c3≥c1:T(n)≤c3f(n)for all n≥3. Since h(n)≥f(n:T(n)≤c3h(n)for all n≥ng. Similarly:G(n)≤c3h(n)for all n≥n3. Therefore:T(n)+G(n)<c3(h(n)+h(n)) ≤2c3h(nm. Thus:T(n)+G(n)=O(f(n))+O(g(n))=O(h(n))where h(n)=max(f(n),g(n)) and the proof is complete.The other rules can be proved in a similar manner. As an example of using these rules,let us consider the following double-loop Pascal statement: for I :1 to n do forJ=】to I do begin 51;S2;s3;S4 end;