bounds)estimated by the existing results can be arbitrarily is no larger than large depending on either the number of groups [9],the M 1 number of levels [10],or the parameter of the Zipf distribution K'(1- N1+K证 (9) [13].It is remarkable that such a simple coded caching N scheme,with a very simple choice of Ni,can achieve such a Now,suppose that the individual cache size M is much smaller strong performance guarantee,independently of the popularity than Ni,and the global cache size K'M is much larger than distribution. N(note that this is precisely the regime where coded caching As we mentioned earlier,although both our lower bound and will be most helpful[7).Then,we have 1-÷≈1and achievable bound have some similarity to those in [13],our 1+MM.Thus,the expression in (9)is approximately lower bound is sharper in the order sense because it accounts equal to N/M.The significance of this observation is that this for the contribution of unpopular files,and our achievable approximated expression is independent of K.In other words, bound is also tighter in the order sense because it divides in a suitable regime of interest,the exact popularity of the the files into 3 groups under some parameter settings.We "popular files"does not seem to matter!It is then plausible to believe that such difference is the main reason why we can argue that,even when we reduce the popularity values to attain constant-factor performance guarantees independent of in System 2,there is no substantial change in the lower-bound the popularity distribution,while the performance gap in [13] performance.Of course,this argument needs to be carefully only holds when the file popularity follows a Zipf distribution made.Further,we have to account for not only popular files, and when the system sizes approach infinity,and even then the but also unpopular files.The proofs in the next section will performance gap in [13]still depends on the parameter a of make this intuition precise. Zipf distribution(which may approach infinity as a approaches 1).A brief comparison with [13]is presented in Table I. IV.LOWER BOUND ON THE EXPECTED RATE We also note that,for certain ranges of the exponent a of the Zipf distributions,the performance characterization in [13] In this section,we present the proof of Theorem 1,i.e.,the for RLFU may be tighter than the 55 factor reported in(8). lower bound. Further,both our achievable scheme and RLFU are special The proof consists of two parts.Subsections A-C focus on cases of the larger class of RAP-GCC schemes reported in popular files 1 to Ni,and prove the part that R(K,F,P)> [13].Thus,the results in [13]and in this paper combined (N-M).where Mr =max{M,3).This proof is provide a more complete characterization of the performance composed of 5 steps.From the first step to the fourth one, guarantees for coded caching schemes across both Zipf and we map the original system into a series of reduced systems. non-Zipf distributions. whose information-theoretical rate is strictly smaller than previous ones.Then,we bound the rate needed for popular files in the fourth step.We note that the ideas used in the A.Main Intuition 1st to 4th reduction steps are similar to [9][13].In the fifth step,we introduce a novel "merging"technique in Subsection Before we present the proofs for these main results,we D to account for the unpopular files and prove the part that would like to illustrate the main intuition behind.First,con- R(KF,P)(N+N2-M).Finally,in Subsection sider only the "popular files"1 to N.i.e.,assuming that the E,we deal with the case when N+N2<M,and establish unpopular files N+1 to N are removed.Let us refer to the second term in(6).A brief summary of the constructed this system as "System 1".In our proof,we will consider systems are presented in Table II.As we discussed earlier, an alternate system where the popularity of all popular files while some of the techniques for quantifying the impact of is reduced toWe will refer to this alterate system 1 popular files are similar to [913],our treatment of unpopular as "System 2"(see Section IV-A).Intuitively,the average files is new and is the key reason for the sharper constant- transmission rate in System 2 is no larger than that in System factor characterization in our paper. 1.Further,since all files are with the same popularity in System 2,the average-case and the worst-case performance will not differ too much [9].Thus,one can then use System A.Reduction Steps I 2 2 to derive a lower bound on the average transmission rate, Recall that the set of files is given byF=[F1,F2,...,F} and compare it with an upper bound attained by an achievable and their popularity distribution is given by p scheme pi,p2,..,PN).Next,we will compare to a series of reduced However,the potential problem of this argument is that, systems with different sets of files and popularity distributions. when we reduce the popularity of all popular files to i. Again,let N be the integer defined in Theorem 1. some popularity values could be reduced by several orders of In the first constructed system,the set of files is given by magnitude.It is then unclear why the lower bound derived F={Fo,F1,F2,...,FN},where Fo denotes the empty file, from System 2 is still a reasonable lower bound for System which is introduced for ease of presentation.Its correspond- 1.The intuition behind this insensitivity can be explained as ing popularity distribution is P1={po,P1,p2,...,PN}and follows.Suppose that there are Kusers in System 1 that po-1-P In other words,we replace all unpopular request any of the popular files.Then,according to the result files FN+1,...,FN in the original system by the empty file in [7],the worst-case transmission rate to serve these K'users Fo,and reassign their popularity all to Fo.Intuitively,the
6 bounds) estimated by the existing results can be arbitrarily large depending on either the number of groups [9], the number of levels [10], or the parameter of the Zipf distribution [13]. It is remarkable that such a simple coded caching scheme, with a very simple choice of N1, can achieve such a strong performance guarantee, independently of the popularity distribution. As we mentioned earlier, although both our lower bound and achievable bound have some similarity to those in [13], our lower bound is sharper in the order sense because it accounts for the contribution of unpopular files, and our achievable bound is also tighter in the order sense because it divides the files into 3 groups under some parameter settings. We believe that such difference is the main reason why we can attain constant-factor performance guarantees independent of the popularity distribution, while the performance gap in [13] only holds when the file popularity follows a Zipf distribution and when the system sizes approach infinity, and even then the performance gap in [13] still depends on the parameter α of Zipf distribution (which may approach infinity as α approaches 1). A brief comparison with [13] is presented in Table I. We also note that, for certain ranges of the exponent α of the Zipf distributions, the performance characterization in [13] for RLFU may be tighter than the 55 factor reported in (8). Further, both our achievable scheme and RLFU are special cases of the larger class of RAP-GCC schemes reported in [13]. Thus, the results in [13] and in this paper combined provide a more complete characterization of the performance guarantees for coded caching schemes across both Zipf and non-Zipf distributions. A. Main Intuition Before we present the proofs for these main results, we would like to illustrate the main intuition behind. First, consider only the “popular files” 1 to N1, i.e., assuming that the unpopular files N1 + 1 to N are removed. Let us refer to this system as “System 1”. In our proof, we will consider an alternate system where the popularity of all popular files is reduced to 1 KMr . We will refer to this alternate system as “System 2” (see Section IV-A). Intuitively, the average transmission rate in System 2 is no larger than that in System 1. Further, since all files are with the same popularity in System 2, the average-case and the worst-case performance will not differ too much [9]. Thus, one can then use System 2 to derive a lower bound on the average transmission rate, and compare it with an upper bound attained by an achievable scheme. However, the potential problem of this argument is that, when we reduce the popularity of all popular files to 1 KMr , some popularity values could be reduced by several orders of magnitude. It is then unclear why the lower bound derived from System 2 is still a reasonable lower bound for System 1. The intuition behind this insensitivity can be explained as follows. Suppose that there are K′ users in System 1 that request any of the popular files. Then, according to the result in [7], the worst-case transmission rate to serve these K′ users is no larger than K′ (1 − M N1 ) 1 1 + K′M N1 . (9) Now, suppose that the individual cache size M is much smaller than N1, and the global cache size K′M is much larger than N1 (note that this is precisely the regime where coded caching will be most helpful [7]). Then, we have 1 − M N1 ≈ 1 and 1+ K′M N1 ≈ K′M N1 . Thus, the expression in (9) is approximately equal to N1/M. The significance of this observation is that this approximated expression is independent of K′ . In other words, in a suitable regime of interest, the exact popularity of the “popular files” does not seem to matter! It is then plausible to argue that, even when we reduce the popularity values to 1 KMr in System 2, there is no substantial change in the lower-bound performance. Of course, this argument needs to be carefully made. Further, we have to account for not only popular files, but also unpopular files. The proofs in the next section will make this intuition precise. IV. LOWER BOUND ON THE EXPECTED RATE In this section, we present the proof of Theorem 1, i.e., the lower bound. The proof consists of two parts. Subsections A-C focus on popular files 1 to N1, and prove the part that R(K, F,P) ≥ 1 11Mr (N1 − M), where Mr = max{M, 3}. This proof is composed of 5 steps. From the first step to the fourth one, we map the original system into a series of reduced systems, whose information-theoretical rate is strictly smaller than previous ones. Then, we bound the rate needed for popular files in the fourth step. We note that the ideas used in the 1st to 4th reduction steps are similar to [9][13]. In the fifth step, we introduce a novel “merging” technique in Subsection D to account for the unpopular files and prove the part that R(K, F,P) ≥ 1 11Mr (N1 + N2 − M). Finally, in Subsection E, we deal with the case when N1 + N2 ≤ M, and establish the second term in (6). A brief summary of the constructed systems are presented in Table II. As we discussed earlier, while some of the techniques for quantifying the impact of popular files are similar to [9][13], our treatment of unpopular files is new and is the key reason for the sharper constantfactor characterization in our paper. A. Reduction Steps 1 & 2 Recall that the set of files is given by F = {F1, F2, ..., FN } and their popularity distribution is given by P = {p1, p2, ..., pN }. Next, we will compare to a series of reduced systems with different sets of files and popularity distributions. Again, let N1 be the integer defined in Theorem 1. In the first constructed system, the set of files is given by F1 = {F0, F1, F2, ..., FN1 }, where F0 denotes the empty file, which is introduced for ease of presentation. Its corresponding popularity distribution is P1 = {p0, p1, p2, ..., pN1 } and p0 = 1 − PN1 i=1 pi . In other words, we replace all unpopular files FN1+1, ..., FN in the original system by the empty file F0, and reassign their popularity all to F0. Intuitively, the
TABLE II:Brief introduction of constructed systems satisfying property (a).We then map Wi to another request Systems Definition W:,such that the first N elements in W;remain the same as Map the“unpopular”files from N+1to in W:,while other elements are mapped to the empty file.Let System 1 N to a virtual empty file with their sum be the transmission rate to serve the request pattern W.. probability. By such a construction,we now verify property (b)holds. Reduce the popularity of files F1...FN to We wish to show that the probability with which the mapped System 2 K.Some users may request a same file. request patterns is W!={f1,f2...fk},where fk is the With probability (K1),all users request file requested by user k,is the same as the probability with the empty file.With probability 1-(K1), which the same pattern Wi=[f1,f2,...,fr}is requested in System 3 W(K,F1,P1).We first fix a user k and compare the probabili- exactly K3 users request K3 distinct files, ty that user k requests file fk.If f E [F1,F2,...,FN},since and other users request empty files. we did not change their popularity,the probability pw(f) In each request pattern,exactly K3 users request K3 distinct files,and other users that user k requests fk in the mapped request pattern is the System 4 request empty files. same as the probability pw,(f)that user k requests fr in W(K,F1,P1).If fr=0,according to our construction,the (K,F3,P3) Reduce the the popularity of files F1....,FN probability pw(f)that user k requests fr in the mapped in(K,F,P)to立 request pattern is equal to the probability that user k re- (K,F4,P) Merge the unpopular files into virtual files quests any file in [FN+1,...,FN}in W(K,F,P).which is with popularity 2KM and <KM. NPo Hence,it is also the same as the probability Reduce the probability of all files (except the pw,(f)that user k requests the empty file in W(K,F1,P1). (K,F4:Ps) empty file)towhich is similar to the Finally,the probability that the mapped request pattem is W, System 2. 不 P(W={i,,fK})= IIpw:(fe) new system should require a lower transmission rate than the k=1 original system,which is stated in the following lemma. Lemma 1:Let R(K,F1,P1)be the minimum expected rate Πpw,(f) required to meet the requests by the K users,each of which k=1 randomly requests a file in F1 according to the popularity =P(Wj={f1,....fk}) distribution P1.We have Thus,W:has the same distribution as Wi,and hence r and R(K,F,P)≥R(K,F,P) (10) r'have the same distribution,satisfying property (b). Proof:For a given cache placement and transmission Further,for any caching and transmission scheme that can scheme 3,let r=r(Wi)be the transmission rate for satisfy users'request Wi.it must can satisfy W,since W the random request pattern Wi in W(K,F,P),where Wi can be seen as a subset of Wi.Hence,the rate to serve the is chosen randomly in W(K,F,P).Note that since Wi is request pattern W!is clearly no larger than the rate to serve random,r is also a random variable.Further,the average the request pattern Wi.Thus we have r'almost surely, transmission rate is R3(K,F,P)=E[r].Similarly,let satisfying property (c).The result of the lemma then follows. r=r(Wi)denote the transmission rate,where Wi is ■ chosen randomly from the new distribution in W(K,F1,P). We next create another new system by a further adjustment Again,R(K,F1,P1)=Er'].Note that the two expectations on the tuple (K,F1,P1).Note that N<KM.Otherwise, corresponding to two distributions,driven by W(K,F,P)and we will have∑1A>KM,~pNa≥1,which is a contra- W(K,F1,P1),respectively.Therefore,r and r'cannot be diction to1.Define a new popularity distribution directly compared in a pointwise manner,e.g.,the number P2=(1-ki Kit:Kht:Kir}over the files F1.In of request patterns in W(K,F,P)is larger than that in other words,compared to (K,F1,P1),in this new system, W(K,F1,P1). each non-empty file is requested with a smaller probability In the following,we will show that r'sr,i.e.,r'is KMIntuitively,its expected transmission rate should be even stochastic dominated by r.To do so,we will use Theorem 3.1 lower,which is stated below. in [19].Specifically,we need to find two coupled variablesf Lemma 2:Let R(K,F1,P2)be the minimum expected rate and r with the following properties: required to meet the requests by K users,each of which (a)r and f have the same distribution; randomly requests a file in Fi according to the popularity (b)r'and r'have the same distribution; distribution P2.We have (c)r≥ralmost surely.. Then,applying Theorem 3.1 in [19].we can conclude that R(K,F1,P1)>R(K,F1,P2). (11) r'<D r and E(r)=E(f)>E(r)=E(r'). This lemma is similar to the uniformization argument in It remains to show the existence of and which are the proof of Claim 2 in [9].The proof can also use a similar constructed as follows.For every WiE W(K,F,P),we let coupling idea as in Lemma 1.For completeness,we provide r=r and thus they must have exactly the same distribution, the proof in the appendix
7 TABLE II: Brief introduction of constructed systems Systems Definition System 1 Map the “unpopular” files from N1 + 1 to N to a virtual empty file with their sum probability. System 2 Reduce the popularity of files F1,..., FN1 to 1 KMr . Some users may request a same file. System 3 With probability π(K1), all users request the empty file. With probability 1 − π(K1), exactly K3 users request K3 distinct files, and other users request empty files. System 4 In each request pattern, exactly K3 users request K3 distinct files, and other users request empty files. (K, F3,P3) Reduce the the popularity of files F1,..., FN1 in (K, F,P) to 1 KMr . (K, F4,P4) Merge the unpopular files into virtual files with popularity ≥ 1 KMr and < 2 KMr . (K, F4,P5) Reduce the probability of all files (except the empty file) to 1 KMr , which is similar to the System 2. new system should require a lower transmission rate than the original system, which is stated in the following lemma. Lemma 1: Let R(K, F1,P1) be the minimum expected rate required to meet the requests by the K users, each of which randomly requests a file in F1 according to the popularity distribution P1. We have R(K, F,P) ≥ R(K, F1,P1). (10) Proof: For a given cache placement and transmission scheme F, let r = rF(Wi) be the transmission rate for the random request pattern Wi in W(K, F,P), where Wi is chosen randomly in W(K, F,P). Note that since Wi is random, r is also a random variable. Further, the average transmission rate is RF(K, F,P) = E[r]. Similarly, let r ′ = rF(Wj ) denote the transmission rate, where Wj is chosen randomly from the new distribution in W(K, F1,P1). Again, RF(K, F1,P1) = E[r ′ ]. Note that the two expectations corresponding to two distributions, driven by W(K, F,P) and W(K, F1,P1), respectively. Therefore, r and r ′ cannot be directly compared in a pointwise manner, e.g., the number of request patterns in W(K, F,P) is larger than that in W(K, F1,P1). In the following, we will show that r ′ ≤D r, i.e., r ′ is stochastic dominated by r. To do so, we will use Theorem 3.1 in [19]. Specifically, we need to find two coupled variables rˆ and rˆ′ with the following properties: (a) r and rˆ have the same distribution; (b) r ′ and rˆ′ have the same distribution; (c) rˆ ≥ rˆ′ almost surely. Then, applying Theorem 3.1 in [19], we can conclude that r ′ ≤D r and E(r) = E(ˆr) ≥ E(rˆ′) = E(r ′ ). It remains to show the existence of rˆ and rˆ′ , which are constructed as follows. For every Wi ∈ W(K, F,P), we let rˆ = r and thus they must have exactly the same distribution, satisfying property (a). We then map Wi to another request W′ i , such that the first N1 elements in Wi remain the same as in W′ i , while other elements are mapped to the empty file. Let rˆ ′ be the transmission rate to serve the request pattern W′ i . By such a construction, we now verify property (b) holds. We wish to show that the probability with which the mapped request patterns is W′ i = {f1, f2, ..., fK}, where fk is the file requested by user k, is the same as the probability with which the same pattern Wj = {f1, f2, ..., fK} is requested in W(K, F1,P1). We first fix a user k and compare the probability that user k requests file fk. If fk ∈ {F1, F2, ..., FN1 }, since we did not change their popularity, the probability pW′ i (fk) that user k requests fk in the mapped request pattern is the same as the probability pWj (fk) that user k requests fk in W(K, F1,P1). If fk = ∅, according to our construction, the probability pW′ i (fk) that user k requests fk in the mapped request pattern is equal to the probability that user k reP quests any file in {FN1+1, ..., FN } in W(K, F,P), which is N i=N1+1 pi = p0. Hence, it is also the same as the probability pWj (fk) that user k requests the empty file in W(K, F1,P1). Finally, the probability that the mapped request pattern is W′ i , is P(W′ i = {f1, ..., fK}) = Y K k=1 pW′ i (fk) = Y K k=1 pWj (fk) = P(Wj = {f1, ..., fK}). Thus, W′ i has the same distribution as Wj , and hence rˆ′ and r ′ have the same distribution, satisfying property (b). Further, for any caching and transmission scheme that can satisfy users’ request Wi , it must can satisfy W′ i , since W′ i can be seen as a subset of Wi . Hence, the rate to serve the request pattern W′ i is clearly no larger than the rate to serve the request pattern Wi . Thus we have rˆ ≥ rˆ′ almost surely, satisfying property (c). The result of the lemma then follows. We next create another new system by a further adjustment on the tuple (K, F1,P1). Note that N1 ≤ KMr. Otherwise, we will have PN i=1 pi > KMr · pN1 ≥ 1, which is a contradiction to PN i=1 pi = 1. Define a new popularity distribution P2 = {1 − N1 KMr , 1 KMr , 1 KMr , ..., 1 KMr } over the files F1. In other words, compared to (K, F1,P1), in this new system, each non-empty file is requested with a smaller probability 1 KMr . Intuitively, its expected transmission rate should be even lower, which is stated below. Lemma 2: Let R(K, F1,P2) be the minimum expected rate required to meet the requests by K users, each of which randomly requests a file in F1 according to the popularity distribution P2. We have R(K, F1,P1) ≥ R(K, F1,P2). (11) This lemma is similar to the uniformization argument in the proof of Claim 2 in [9]. The proof can also use a similar coupling idea as in Lemma 1. For completeness, we provide the proof in the appendix