J Appl.Pob.40,513-518(2003) Printed in israel O Applied Probability Trust 2003 THE COUPON-COLLECTOR'S PROBLEM REVISITED SHMUEL OREN * AND SHELDON M. ROSS, "s University of california, Berkeley Abstract Consider the classical coupon-collector's problem in which items of m distinct types arrive in sequence. An arriving item is installed in system i2 I if i is the smallest index uch that system i does not contain an item of the arrival,'s type. We study the expected number of items in system at the moment when system l first contains an item of each Keywords: Coupon-collector's problem; hyperharmonic numbers; Poissonization AMS 2000 Subject Classification: Primary 60C05 1. Introduction Consider the classical coupon-collector's problem with m distinct types of items. The items arrive in sequence, with the types of the successive items being independent random variables that are each equal to k with probability Pk, kai pk= 1. An arriving item is installed in systemis 1 if i is the smallest index such that system i does not contain an item of the arrival,'s type. Let U, j> 2, denote the number of unfilled types in systems when system 1 first contains an item of each type. Foata et al. [2]and Foata and Zeilberger [1], using nonelementary mathematics, obtained recursive formulae and generating functions for E[Um for the equally likely case, where Pk 1/m. In Section 2 we derive, using basic probability, the recursion and a closed-form expression for E[U for the equally likely case. The general case is considered in Section 3 where an exact expression and bounds for E[U are determined. Comments concerning computation, as well as a simulation approach, are also presented in Section 3. 2. The equally likely case Assume, in this section, that all Pk =1/m. Furthermore, assume that the problem ends when system I has one item of each type, and let a denote the event that at least j type-k coupons have arrived. With 1(A)denoting the indicator variable for the event A 1-1(A) eceived 20 January 2003 ial Engineering and Operations Research, University of Califomia, Berkeley Email address: smross@ieor. berkeley. edu Research supported by the National Science Foundation Grant ECS-0224779 with the University of California. 513
J. Appl. Prob. 40, 513-518 (2003) Printed in Israel ? Applied Probability Trust 2003 THE COUPON-COLLECTOR'S PROBLEM REVISITED ILAN ADLER,* SHMUEL OREN * AND SHELDON M. ROSS,* ** University of California, Berkeley Abstract Consider the classical coupon-collector's problem in which items of m distinct types arrive in sequence. An arriving item is installed in system i > 1 if i is the smallest index such that system i does not contain an item of the arrival's type. We study the expected number of items in system j at the moment when system 1 first contains an item of each type. Keywords: Coupon-collector's problem; hyperharmonic numbers; Poissonization AMS 2000 Subject Classification: Primary 60C05 1. Introduction Consider the classical coupon-collector's problem with m distinct types of items. The items arrive in sequence, with the types of the successive items being independent random variables that are each equal to k with probability pk, Ek=l Pk = 1. An arriving item is installed in system i > 1 if i is the smallest index such that system i does not contain an item of the arrival's type. Let UJ, j > 2, denote the number of unfilled types in system j when system 1 first contains an item of each type. Foata et al. [2] and Foata and Zeilberger [ 1], using nonelementary mathematics, obtained recursive formulae and generating functions for E[UT] for the equally likely case, where pk = 1/m. In Section 2 we derive, using basic probability, the recursion and a closed-form expression for E[Uj] for the equally likely case. The general case is considered in Section 3 where an exact expression and bounds for E[Uj] are determined. Comments concerning computation, as well as a simulation approach, are also presented in Section 3. 2. The equally likely case Assume, in this section, that all Pk = 1/m. Furthermore, assume that the problem ends when system 1 has one item of each type, and let Ak denote the event that at least j type-k coupons have arrived. With 1(A) denoting the indicator variable for the event A, m U? = E[1 - 1(Ak)]. k=l Received 10 December 2002; revision received 20 January 2003. * Postal address: Department of Industrial Engineering and Operations Research, University of California, Berkeley, CA 94720, USA. ** Email address: smross@ieor.berkeley.edu Research supported by the National Science Foundation Grant ECS-0224779 with the University of California. 513
. ADLER ETAL E[Um]=>[1-P(A)] [1-P(Am)] (1) Let bj.i denote the event that at least j type-m coupons arrive before the first coupon of type i arrives. Then P(A) Bii and the inclusion-exclusion probability equality give(forj2 2) ∑P(B"n1…Bm Using(1), this gives the following result Next, using basic probability arguments, we obtain a recursive expression for E[U l that was first presented in [l] and [2]. Let C be the event that at least j type-k coupons have already arrived at the moment when each of the item types l,..., k-l has arrived. Also, let X be the number of types 1 ,., k- l that have not yet arrived when the first coupon of type k arrives With PK= P(Ck), we obtain that =∑P P/+ ∑P where pk =(k-1)/k for k=1,2 Substituting A=C for j2 2 into(D)gives E[UM]=m[1-Pil. j>2 Thus, using(2)and (3), we obtain that E[U2]=m
I. ADLER ETAL. Thus, m E[Um] = E[1- P(Ak)] k=l =m[l - P(Am)]. (1) Let Be. denote the event that at least j type-m coupons arrive before the first coupon of J,I type i arrives. Then m-1 P(A) = P( Bji) and the inclusion-exclusion probability equality give (for j > 2) m-1 :(,~)_C-i)k+l P(BTmi P(A7) = (-1)k+ L ... B ) E P(Bj,, * BJ,ik k=1 il<i2<' <ik m-l = ( -1)k+l m-(k I) k=l Using (1), this gives the following result. Proposition 1. For j > 2, m (m) i-j Next, using basic probability arguments, we obtain a recursive expression for E[Uj ] that was first presented in [1] and [2]. Let Ck be the event that at least j type-k coupons have already arrived at the moment when each of the item types 1,..., k - 1 has arrived. Also, let Xk be the number of types 1, ..., k - 1 that have not yet arrived when the first coupon of type k arrives. With Pk = P(Ck), we obtain that k-I k = P (Ck Xk =r) r)(X r r=O k-I -= ? ' pr+l k / -1 r=O k -k E -i (2) r=l where Pk = (k - 1)/k fork = 1, 2,.... Substituting Am = Cm for j > 2 into (1) gives E[Um] = ml - Pm], j 2. (3) Thus, using (2) and (3), we obtain that m r m E[U=] =- m - r - r=l k r=1 k=l 514
515 d,forj≥3, EU/]=m-∑P EIUR-JJ U内+-1 We have thus proven the followin LUNi Remark 1. Equating the two expressions for elU I given by Propositions I and 2 yields explicit expression for the hyperharmonic number, which is defined in [2] by the recursive formula given in Proposition 2 3. The general case: Poissonization In the general case, we suppose that each item is of type k with probability Pk, 2kI Pk=1 To analyze this case, let us start by assuming that, rather than stopping when system l is filled, items continue coming forever. Suppose also that successive items arrive at times distributed according to a Poisson process with rate 1. Under this scenario, the arrival processes of the distinct types are independent Poisson processes, with respective rates Pk, k= 1 Because 1-P(a' denotes the probability that there have been less than j type-k arrivals when system I becomes full, we obtain upon conditioning on the arrival time of the jth item of type k 1-P(A4)= e e Pi)dx (-1)! The expected number of unfilled slots in system is now obtained from ∑Ⅱ-P(A),j≥2 5) k=1 The following lemma will be used to obtain bounds on E[UnI Lemma 1. For positive values xi, Ti=(l -e-xi) is a Schur concave function of y
The coupon-collector's problem revisited and, for j > 3, m E[U7] =m-E Pi- k=l -i = m-E(1 -E[UJ-k I _ k=l E[U_1] m E[O We have thus proven the following. Proposition 2. We have m E[U2 ] = k k=l and, for j > 3, m E[Uk_ ] E[Uj?] =j E k=l Remark 1. Equating the two expressions for E[UJ'] given by Propositions 1 and 2 yields an explicit expression for the hyperharmonic number, which is defined in [2] by the recursive formula given in Proposition 2. 3. The general case: Poissonization In the general case, we suppose that each item is of type k with probability Pk, Lkml Pk = 1. To analyze this case, let us start by assuming that, rather than stopping when system 1 is filled, items continue coming forever. Suppose also that successive items arrive at times distributed according to a Poisson process with rate 1. Under this scenario, the arrival processes of the distinct types are independent Poisson processes, with respective rates Pk, k = 1, ..., m. Because 1 - P(Ak) denotes the probability that there have been less than j type-k arrivals when system 1 becomes full, we obtain upon conditioning on the arrival time of the jth item of type k that 1 P(Ak 0I pkPkX (pkx)'j- 1- P(Aj) pke-PX ( ) n - -e-Pix)dx, j > 2. (4) The expected number of unfilled slots in system j is now obtained from m E[UJ7] = Z[1 - P(A5)], j > 2. (5) k=1 The following lemma will be used to obtain bounds on E[UJ]. Lemma 1. For positive values xi, f1=l (1 - e-xi) is a Schur concave function of y = (Y1, , Yr), where yi = ln(xi). 515
516 L. ADLER ET AL. Proof. With y =In(x) Because In(x)in increasing in x, by the Ostrowski condition for Schur concavity(see [31)it suffices to show that rie ( x2)>x2e-(1-e)ifx1<x2 But this inequality follows because xe /(1-e)is a decreasing function of x Lower and upper bounds on E[U], fairly tight for va (1/m, 1/m,., 1/m), can be obtained from the inequalities (-e-mkrm-I ≤4-en)≤(1-e8) where mk=mini*lpi)and gk=(+k Pi)(m-l. That is, &k is the geometric mean of the alues Pi for i#k. The second inequality of(6) follows from Lemma I We obtain from(4)and (6)that 1-P(A)≤/pe e-gkxm-l x (Pkx) Pk pk Pk where A= rgk Pk. Substituting the preceding inequality into(5)and considering both equalities of() gives EU"1≤∑ (-1) We will now derive a second set of lower and upper bounds for E[U]. Let b, denote the event that at least j coupons of type k arrive before the first of type i arrives. Then, using the 3.2.3 of [SD), we obtain that P(A)=P(∪b P(Bk k((pk+ pi)/(Pk+ pi+ pr))
I. ADLER ETAL. Proof With y = In(x), -(1 -e-X) = xe-X ay Because ln(x) in increasing in x, by the Ostrowski condition for Schur concavity (see [3]) it suffices to show that xle-I (1-e-X2) > x2e-X2(1 -e-X) if XI < x2. But this inequality follows because xe-X /( - e-X) is a decreasing function of x. Lower and upper bounds on E[UJ], fairly tight for values of (Pl, P2, .... P) close to (1/m, l/m, .., l/m), can be obtained from the inequalities (1 - e-mkx)m-1 < nH( - ePix) < (1 - e- ") (6) i#k where mk = minigk{pi) and gk = (f-ifk Pi)l/(m-l) That is, gk is the geometric mean of the values pi for i : k. The second inequality of (6) follows from Lemma 1. We obtain from (4) and (6) that 1 - P(AJ) < ke PkX (p k ' (1 - e-gkx)m-1 dx - (j - 1)I m- p-(rm-+Pk)x (PkX)j- l m= (m- 1)(l)r ke-(rgk+ P k) dx r=O where A = rgk + Pk. Substituting the preceding inequality into (5) and considering both inequalities of (6) gives r= r rgk + Pk (j-)! = ri (m - 1 )(rl( Pk ) . ~0^ ~ ~ ~ = r rgk + Pk Pk where X = rgk ? Pk. Substituting the preceding inequality into (5) and considering both E( r) ) r +pk < E[Um] < EI)r ) r )rmk + Pk (r - rgl Pk r=0 k=l r=O k=lrgk?- (7) We will now derive a second set of lower and upper bounds for E[UJ ]. Let Bki denote the event that at least j coupons of type k arrive before the first of type i arrives. Then, using the conditional expectation inequality (Proposition 3.2.3 of [5]), we obtain that P(A?) - P(U Bj,i P(Bk, ) > p(Bj i) (8) i1 + P(M i 1 + Er:i,k J,r lB,) jk _ 1yF ? (Pk/(Pk + Pi))j i~k - r+ ri,k((Pk + Pi)/(Pk + Pi + Pr))' 516
he coupon-collector's problem revisited 517 where(8)follows from the conditional expectation inequality and(9)from P(Bk. B P时,B)=P(B (pk/(pk+pi+pr)j (Pk/(Pk+Pi) pk+ pi Pk+ Pi+ pr Therefore. we obtain our second upper bound for E[1=∑11-P(A分小 E[U]≤ (Pk/(Pk +Pi)) +ik(Pk+ Pi//(Pk+Pi+pr) To obtain a lower bound, let X; denote the time of the first type-i event, and let denote the time of the j th type-k event in the Poissonization scheme(which results in T and Xifor i# k being independent). Then, from(4) P(4=E∏ Using the well-known result that elf(X)g(X)> elf(X)elg(X)] whenever f and g ar increasing functions [4, p. 339], which easily generalizes to the product of any number of positive increasing functions, the preceding equation yields that P(45≥E-cr =∏IP(T>X) ∏[-P(T<X) pk Pi t pk Thus, we have the lower bound E[Um]≥ k=1i≠k Pi t pk Remark 2. (i) Our computational experiments verify that the bounds given in(7)work well for probabilities Pi which are roughly the same, while the bounds given in(10)and(11)are erw (ii) For the equal-probabilities case, the explicit expression for E[Um ]of Proposition 1 is faster to compute than the recursive expression of Proposition 2. However, for large m(say m> 150). the explicit expression(but not the recursive one) is computationally unstable
The coupon-collector's problem revisited where (8) follows from the conditional expectation inequality and (9) from P(BkrBki) P(Bj k Bk ) = n' ,ir j,r ,i' P(Bkj P(Bj,) (Pk/(Pk + Pi + Pr))j (Pk/(Pk + Pi))' = ( Pk + Pi ) Pk + Pi + Pr J Therefore, we obtain our second upper bound for E[Uj ] = km1[1 - P(Ak)]: E[Ujm] <m-L m ?(Pk/(Pk+ PiW (10) JE[? < m Ek - 1 + Erfi,k(Pk + Pi)J/(Pk + Pi + Pr (10) To obtain a lower bound, let Xi denote the time of the first type-i event, and let Tk denote the time of the jth type-k event in the Poissonization scheme (which results in Tk and Xi for i = k being independent). Then, from (4), 1-P(A5)=E fl(-e-PT'k -i:k Using the well-known result that E[f(X)g(X)] > E[f(X)]E[g(X)] whenever f and g are increasing functions [4, p. 339], which easily generalizes to the product of any number of positive increasing functions, the preceding equation yields that 1-P(Ak) > E[1-e PiT"] ifk = P(Tk > Xi) i#k = H[ - p(Tf < Xi)] i#k i9k- t-k -( Pi +Pk J Thus, we have the lower bound k= ik - ( ) Pk iO urputi l i if i i k Remark 2. (i) Our computational experiments verify that the bounds given in (7) work well for probabilities pi which are roughly the same, while the bounds given in (10) and (11) are tighter otherwise. (ii) For the equal-probabilities case, the explicit expression for E[Uj ] of Proposition 1 is faster to compute than the recursive expression of Proposition 2. However, for large m (say m > 150), the explicit expression (but not the recursive one) is computationally unstable. 517