Cell-Probe Proofs and Nondeterministic Cell-Probe Complexity Yitong Yin* Department of Computer Science,Yale University yitong.yin@yale.edu. Abstract.We study the nondeterministic cell-probe complexity of static data structures.We introduce cell-probe proofs (CPP),a proof system for the cell-probe model,which describes verifications instead of compu- tations in the cell-probe model.We present a combinatorial characteri- zation of CPP.With this novel tool,we prove the following lower bounds for the nondeterministic cell-probe complexity of static data structures: We show that there exist data structure problems which have super- constant nondeterministic cell-probe complexity.In particular,we show that for the exact nearest neighbor search(NNS)problem or the partial match problem in high dimensional Hamming space,there does not exist a static data structure with Poly(n)cells,each of which contains n(1)bits,such that the nondeterministic cell-probe complexity is O(1),where n is the number of points in the data set for the NNS or partial match problem. For the polynomial evaluation problem,if single-cell nondeterminis- tic probes are sufficient,then either the size of a single cell is close to the size of the whole polynomial,or the total size of the data structure is close to that of a naive data structure that stores results for all possible queries. 1 Introduction We study the problem of nondeterministic cell-probe complexity of static data structures. Given a set Y of data instances,and a set X of possible queries,a data structure problem can be abstractly defined as a function f mapping each pair consisting of a query r E X and a data instance y E Y to an answer.One of the most well-studied examples of data structure problems is the "membership query":X=[m]is a data universe,Y=(m),and f(,y)=1 if ey and f(,y)=0 if otherwise. There are some other important examples of data structure problems: Exact nearest neighbor search (NNS):given a metric space U,let X =U and Y =()and for every zEX and yEy,f(z,y)is defined as the closest point to x in y according to the metric. Supported by a Kempner Foundation Fellowship and NSF grant CNS-0435201
Cell-Probe Proofs and Nondeterministic Cell-Probe Complexity Yitong Yin? Department of Computer Science, Yale University yitong.yin@yale.edu. Abstract. We study the nondeterministic cell-probe complexity of static data structures. We introduce cell-probe proofs (CPP), a proof system for the cell-probe model, which describes verifications instead of computations in the cell-probe model. We present a combinatorial characterization of CPP. With this novel tool, we prove the following lower bounds for the nondeterministic cell-probe complexity of static data structures: – We show that there exist data structure problems which have superconstant nondeterministic cell-probe complexity. In particular, we show that for the exact nearest neighbor search (NNS) problem or the partial match problem in high dimensional Hamming space, there does not exist a static data structure with Poly(n) cells, each of which contains n o(1) bits, such that the nondeterministic cell-probe complexity is O(1), where n is the number of points in the data set for the NNS or partial match problem. – For the polynomial evaluation problem, if single-cell nondeterministic probes are sufficient, then either the size of a single cell is close to the size of the whole polynomial, or the total size of the data structure is close to that of a naive data structure that stores results for all possible queries. 1 Introduction We study the problem of nondeterministic cell-probe complexity of static data structures. Given a set Y of data instances, and a set X of possible queries, a data structure problem can be abstractly defined as a function f mapping each pair consisting of a query x ∈ X and a data instance y ∈ Y to an answer. One of the most well-studied examples of data structure problems is the “membership query”: X = [m] is a data universe, Y = [m] n , and f(x, y) = 1 if x ∈ y and f(x, y) = 0 if otherwise. There are some other important examples of data structure problems: Exact nearest neighbor search (NNS): given a metric space U, let X = U and Y = U n , and for every x ∈ X and y ∈ Y , f(x, y) is defined as the closest point to x in y according to the metric. ? Supported by a Kempner Foundation Fellowship and NSF grant CNS-0435201
Yitong Yin Partial match:=(01,)4,Y=((0)),and f(,y)e11}such that for every z∈X and y∈Y,f(x,y)=1 if and only if there exists z∈y having either xi =zi or i=*for every i. Polynomial evaluation:X=2k is a finite field,Y=2kd is the set of all (d-1)-degree polynomials over the finite field 2,and f(x,y)returns the value of y(r). A classic computational model for static data structures is the cell-probe model [11].For each data instance y,a table of cells is constructed to store y. This table is called a static data structure for some problem f.Upon a query an all-powerful algorithm tries to compute f(r,y),based on adaptive random access (probes)to the cells. The cell-probe model is a clean and general model for static data structures and serves as a great tool for the study of lower bounds.Previous research on static data structures in the cell-probe model has focused on the complexity of adaptive cell-probes.In this work.we focus on the complexity of nondeterministic cell-probes and the tradeoff between the number of probes needed and with space.We speculate that it is an important problem because:(1)in considering the complexity of data structures,nondeterminism is a very natural extension to the cell-probe model;(2)instead of adaptive computations,nondeterministic cell-probes capture the question of verification,which is a natural and important aspect of data structures. Although nondeterministic cell-probe complexity is an important problem, there are few general tools and techniques for studying it,especially for the case of static data structures.In fact,because of the great generality of the cell- probe model,even for deterministic cell-probe complexity,super-constant lower bounds for static data structures are rare.Nondeterminism grants the cell-probe model extra power and makes non-trivial lower bounds even rarer.For many standard examples of data structure problems,such as membership query,it is easy to construct a data structure that has standard space usage and constant nondeterministic cell-probe complexity. It is thus worth asking whether there exists any data structure problem such that in data structures with feasible sizes (polynomial in the size of data set), the nondeterministic cell-probe complexity is super-constant.More importantly, it calls for a general technique to prove lower bounds on the nondeterministic cell-probe complexity of static data structures. 1.1 Our contribution In this paper,we initiate the study of nondeterministic cell-probe complexity for static data structures.As a first step,we characterize the power of a single- cell nondeterministic probe.Although at first glance this may seem like a very restricted case,by applying a trivial parameter reduction,we show that the case of a single-cell probe is actually a canonical case for all nondeterministic cell- probe mechanisms,and is thus sufficient to prove super-constant lower bounds for general nondeterministic cell-probe mechanisms
2 Yitong Yin Partial match: X = {0, 1, ∗}d , Y = {0,1} d n , and f(x, y) ∈ {0, 1} such that for every x ∈ X and y ∈ Y , f(x, y) = 1 if and only if there exists z ∈ y having either xi = zi or xi = ∗ for every i. Polynomial evaluation: X = 2k is a finite field, Y = 2kd is the set of all (d − 1)-degree polynomials over the finite field 2k , and f(x, y) returns the value of y(x). A classic computational model for static data structures is the cell-probe model [11]. For each data instance y, a table of cells is constructed to store y. This table is called a static data structure for some problem f. Upon a query x, an all-powerful algorithm tries to compute f(x, y), based on adaptive random access (probes) to the cells. The cell-probe model is a clean and general model for static data structures and serves as a great tool for the study of lower bounds. Previous research on static data structures in the cell-probe model has focused on the complexity of adaptive cell-probes. In this work, we focus on the complexity of nondeterministic cell-probes and the tradeoff between the number of probes needed and with space. We speculate that it is an important problem because: (1) in considering the complexity of data structures, nondeterminism is a very natural extension to the cell-probe model; (2) instead of adaptive computations, nondeterministic cell-probes capture the question of verification, which is a natural and important aspect of data structures. Although nondeterministic cell-probe complexity is an important problem, there are few general tools and techniques for studying it, especially for the case of static data structures. In fact, because of the great generality of the cellprobe model, even for deterministic cell-probe complexity, super-constant lower bounds for static data structures are rare. Nondeterminism grants the cell-probe model extra power and makes non-trivial lower bounds even rarer. For many standard examples of data structure problems, such as membership query, it is easy to construct a data structure that has standard space usage and constant nondeterministic cell-probe complexity. It is thus worth asking whether there exists any data structure problem such that in data structures with feasible sizes (polynomial in the size of data set), the nondeterministic cell-probe complexity is super-constant. More importantly, it calls for a general technique to prove lower bounds on the nondeterministic cell-probe complexity of static data structures. 1.1 Our contribution In this paper, we initiate the study of nondeterministic cell-probe complexity for static data structures. As a first step, we characterize the power of a singlecell nondeterministic probe. Although at first glance this may seem like a very restricted case, by applying a trivial parameter reduction, we show that the case of a single-cell probe is actually a canonical case for all nondeterministic cellprobe mechanisms, and is thus sufficient to prove super-constant lower bounds for general nondeterministic cell-probe mechanisms
Cell-Probe Proofs and Nondeterministic Cell-Probe Complexity 3 We introduce cell-probe proofs.a proof system in the cell-probe model.This notion of proofs corresponds to considering verifications instead of computations in the cell-probe model.Unlike the fully adaptive computations in the traditional cell-probe model,the formulation of cell-probe proofs shows a combinatorial simplicity.We introduce a combinatorial structure that fully characterizes which problems have single-cell proofs,and general cell-probe proofs are reduced to this case. With these novel tools,we show following lower bounds on nondeterministic cell-probe complexity: We show that there exist static data structure problems with super-constant nondeterministic cell-probe complexity.In particular,we show that for the exact nearest neighbor search(NNS)problem or partial match problem in high dimensional Hamming space,there does not exist a static data structure with Poly(n)cells,each of which contains n()bits,such that the nonde- terministic cell-probe complexity is O(1),where n is the number of points in the data set for the NNS or partial match problem. -For the polynomial evaluation problem,if for a static data structure,the single-cell nondeterministic probes are sufficient to answer queries,then ei- ther the size of the single cell is close to the size of the whole polynomial,or the total size of the data structure is close to that of the naive data structure that stores results for all possible queries. 1.2 Related work To the best of our knowledge,there is no general technique for proving lower bounds for nondeterministic cell-probe complexity of static data structures.Nor do there exist any non-trivial lower bounds for this question.Previous work on static data structures in the cell-probe model have focused on the complexity of adaptive cell-probes.The most important tool for proving such lower bounds is asymmetric communication complexity as introduced by Miltersen et al.in [10]. In [6],Fredman and Saks introduce the chronogram method.This powerful technique is specialized for proving the query /update trade-off for dynamic data structures,especially for the problems which are hard only in the dynamic case. It is worth noting that the chronogram method can prove nondeterministic lower bounds for certain dynamic data structure problems.This is formally addressed by Husfeldt and Rauhe in [7],and recently by Demaine and Patrascu in [5]. However,as pointed in [7],this is only a by-product of the nondeterministic nature of chronogram method and can only yield amortized query/update trade- offs for dynamic data structure problems with a certain property.Because of the unique structure of the chronogram method,this technique can not be utilized to prove lower bounds for static data structures. 2 Cell-probe proofs A static data structure problem or just data structure problem,is rep resented as a boolean function f:Xx Y10,1}.For the purposes of proving
Cell-Probe Proofs and Nondeterministic Cell-Probe Complexity 3 We introduce cell-probe proofs, a proof system in the cell-probe model. This notion of proofs corresponds to considering verifications instead of computations in the cell-probe model. Unlike the fully adaptive computations in the traditional cell-probe model, the formulation of cell-probe proofs shows a combinatorial simplicity. We introduce a combinatorial structure that fully characterizes which problems have single-cell proofs, and general cell-probe proofs are reduced to this case. With these novel tools, we show following lower bounds on nondeterministic cell-probe complexity: – We show that there exist static data structure problems with super-constant nondeterministic cell-probe complexity. In particular, we show that for the exact nearest neighbor search (NNS) problem or partial match problem in high dimensional Hamming space, there does not exist a static data structure with Poly(n) cells, each of which contains n o(1) bits, such that the nondeterministic cell-probe complexity is O(1), where n is the number of points in the data set for the NNS or partial match problem. – For the polynomial evaluation problem, if for a static data structure, the single-cell nondeterministic probes are sufficient to answer queries, then either the size of the single cell is close to the size of the whole polynomial, or the total size of the data structure is close to that of the naive data structure that stores results for all possible queries. 1.2 Related work To the best of our knowledge, there is no general technique for proving lower bounds for nondeterministic cell-probe complexity of static data structures. Nor do there exist any non-trivial lower bounds for this question. Previous work on static data structures in the cell-probe model have focused on the complexity of adaptive cell-probes. The most important tool for proving such lower bounds is asymmetric communication complexity as introduced by Miltersen et al. in [10]. In [6], Fredman and Saks introduce the chronogram method. This powerful technique is specialized for proving the query/update trade-off for dynamic data structures, especially for the problems which are hard only in the dynamic case. It is worth noting that the chronogram method can prove nondeterministic lower bounds for certain dynamic data structure problems. This is formally addressed by Husfeldt and Rauhe in [7], and recently by Demaine and P˘atra¸scu in [5]. However, as pointed in [7], this is only a by-product of the nondeterministic nature of chronogram method and can only yield amortized query/update tradeoffs for dynamic data structure problems with a certain property. Because of the unique structure of the chronogram method, this technique can not be utilized to prove lower bounds for static data structures. 2 Cell-probe proofs A static data structure problem or just data structure problem, is represented as a boolean function f : X × Y → {0, 1}. For the purposes of proving
Yitong Yin lower bounds,we only consider decision problems.We refer to each y E Y as data and each rE X as a query.For each pair of r and y,f(,y)specifies the result of the query r to the data structure that represents the data y. In the cell-probe model (c.f.[11,6]),the data instance y is preprocessed and stored in cells,and for each query z,the value of f(r,y)is decided by adaptive probes to the cells.Formally,a cell-probe scheme consists of a table structure and a query algorithm.The table structure T:Y×I一{0,l}6 specifies a table T:I{0,1]for each data instance y,which maps indices of cells to their contents.Given a query r,the query algorithm makes a sequence of probes i1,i2,...to the cells,where ik depends on x and all previous cell probes (i1,T(i)),(i2,T(i2)),...,(ik-1,T(ik-1)).The value of f(r,y)is decided at last based on the collected information. In this work,we focus on nondeterministic cell-probes.Given a query z to a data instance y,a set of t cells i1,i2,...,it are probed nondeterministi- cally,such that the value of f(,y)is decided based on the probed information (i1,Ty(i1)》,(2,Ty(i2)》,,(it,Ty(it)》. In order to formally characterize nondeterministic cell-probes for data struc- tures,we introduce a new concept,cell-probe proofs,which formalizes the notion of proofs and verifications in the cell-probe model.For a specific data structure problem f,a cell-probe proof system(CPP)may be defined for f as described below. We can think of a cell-probe proof system as a game played between an honest verifier and an untrusted prover.Both of them have unlimited computa- tional power.Given an instance of data,a table of cells is honestly constructed according to the rules known to both prover and verifier.Both the prover and the verifier know the query,but only the prover can observe the whole table and thus knows the data.The prover tries to convince the verifier about the result of the query to the data by revealing certain cells.After observing the revealed cells,the verifier either decides the correct answer,or rejects the proof,but can not be tricked by the prover into returning a wrong answer. Formally,a cell-probe proof system (CPP)consists of three parts: -A table structure T:Y×I→{0,l}b.For any data,a table T:I→{0,1b is a mapping from indices of cells to their contents. -A prover P.For every x and y,Pry CI is a set of cells.We refer to Pry as a proof and {(i,T(i))|iE Pry}as a certificate. -A verifier v,which maps the queries with the certificates to the answers (0,1,L).Given an instance of data y,for any query z,both of the following conditions hold: (Completeness)3Py≤I:v(a,{i,Tg(i)》li∈Pxy})=f(x,y),and (Soundnes)P'cI:e,{k,T,(》lieP')={ 了f(x,) An (s,b,t)-CPP is a CPP such that for every x and y:(1)the table has s cells,i.e.I=s;(2)each cell contains b bits;(3)each proof consists of t cell probes,i.e.Prul =t
4 Yitong Yin lower bounds, we only consider decision problems. We refer to each y ∈ Y as data and each x ∈ X as a query. For each pair of x and y, f(x, y) specifies the result of the query x to the data structure that represents the data y. In the cell-probe model (c.f. [11, 6]), the data instance y is preprocessed and stored in cells, and for each query x, the value of f(x, y) is decided by adaptive probes to the cells. Formally, a cell-probe scheme consists of a table structure and a query algorithm. The table structure T : Y × I → {0, 1} b specifies a table Ty : I → {0, 1} b for each data instance y, which maps indices of cells to their contents. Given a query x, the query algorithm makes a sequence of probes i1, i2, . . . to the cells, where ik depends on x and all previous cell probes hi1, Ty(i1)i,hi2, Ty(i2)i, . . . ,hik−1, Ty(ik−1)i. The value of f(x, y) is decided at last based on the collected information. In this work, we focus on nondeterministic cell-probes. Given a query x to a data instance y, a set of t cells i1, i2, . . . , it are probed nondeterministically, such that the value of f(x, y) is decided based on the probed information hi1, Ty(i1)i,hi2, Ty(i2)i, . . . ,hit, Ty(it)i. In order to formally characterize nondeterministic cell-probes for data structures, we introduce a new concept, cell-probe proofs, which formalizes the notion of proofs and verifications in the cell-probe model. For a specific data structure problem f, a cell-probe proof system (CPP) may be defined for f as described below. We can think of a cell-probe proof system as a game played between an honest verifier and an untrusted prover. Both of them have unlimited computational power. Given an instance of data, a table of cells is honestly constructed according to the rules known to both prover and verifier. Both the prover and the verifier know the query, but only the prover can observe the whole table and thus knows the data. The prover tries to convince the verifier about the result of the query to the data by revealing certain cells. After observing the revealed cells, the verifier either decides the correct answer, or rejects the proof, but can not be tricked by the prover into returning a wrong answer. Formally, a cell-probe proof system (CPP) consists of three parts: – A table structure T : Y ×I → {0, 1} b . For any data y, a table Ty : I → {0, 1} b is a mapping from indices of cells to their contents. – A prover P. For every x and y, Pxy ⊆ I is a set of cells. We refer to Pxy as a proof and {hi, Ty(i)i | i ∈ Pxy} as a certificate. – A verifier v, which maps the queries with the certificates to the answers {0, 1, ⊥}. Given an instance of data y, for any query x, both of the following conditions hold: (Completeness) ∃Pxy ⊆ I : v(x, {hi, Ty(i)i | i ∈ Pxy}) = f(x, y), and (Soundness) ∀P 0 ⊆ I : v(x, {hi, Ty(i)i | i ∈ P 0}) = f(x, y) ⊥ . An (s, b, t)-CPP is a CPP such that for every x and y: (1) the table has s cells, i.e. |I| = s; (2) each cell contains b bits; (3) each proof consists of t cell probes, i.e. |Pxy| = t
Cell-Probe Proofs and Nondeterministic Cell-Probe Complexity 5 Erample:For the membership problem[11],wherex=[m]and Y=(),and f(x,y)=1 if and only if ry,a naive construction shows a 2-cell proof:with a sorted table storing y,if x Ey,the proof is the cell that contains z,if y,the proof consists of two consecutive cells which are the predecessor and successor of z.The same CPP also works for predecessor search[2]. The notion of cell-probe proofs captures the necessary information to answer queries,and characterizes the nondeterministic probes in the cell-probe model. It is natural to see that for a cell-probe scheme,for any query,the cells probed by an adaptive algorithm contain a cell-probe proof.This can be seen as a data structure counterpart of P C NP. It is important to note that although a data structure problem is nothing but a boolean function,CPP is very different from the certificate complexity of boolean functions [4.In CPP,the prover and the verifier communicate with each other via a table structure,which distinguishes CPP from standard certificate complexity.For any data structure problem,the table structure can always store the results for all queries,making one cell-probe sufficient to prove the result, which is generally impossible in the model of certificate complexity. Unlike adaptive cell-probes,CPP has a static nature,which is convenient for reductions.As stated by the following lemma.any CPP can be trivially reduced to 1-cell proofs. Lemma 1 (reduction lemma).For any data structure problem f,if there erists an (s,b,t)-CPP,then there erists an (st,bt,1)-CPP. Proof.Just store every t-tuple of cells in the (s,b,t)-CPP as a new cell in the (s,bt,1)-CPP. 0 3 Characterization of CPPs We now introduce a combinatorial characterization of CPP.Given a set system Fs2r,for any y∈Y,we let F(y)={F∈F|y∈F}.For convenience,for a partition P of Y,we abuse this notation and let P(y)denote the set FP that y∈F Definition 1.We say a set system FC2 is an s x k-partition of y,if F is a union of s number of partitions of Y,where the cardinality of each partition is at most k. This particular notion of partitions of Y fully captures the structure of cell- probe proofs.In this extended abstract,we only provide the characterization of 1-cell proofs.As shown by Lemma 1,this is a canonical case for cell-probe proofs. Theorem 1.There is an(s,b,l)-CPP for f:X×Y→{0,l},if and only if there exists an s x 2b-partition F of Y,such that for every xE X and every y∈Y,there is an F∈F(y)that f(x,F)l=l
Cell-Probe Proofs and Nondeterministic Cell-Probe Complexity 5 Example: For the membership problem[11], where X = [m] and Y = [m] n , and f(x, y) = 1 if and only if x ∈ y, a naive construction shows a 2-cell proof: with a sorted table storing y, if x ∈ y, the proof is the cell that contains x, if x 6∈ y, the proof consists of two consecutive cells which are the predecessor and successor of x. The same CPP also works for predecessor search[2]. The notion of cell-probe proofs captures the necessary information to answer queries, and characterizes the nondeterministic probes in the cell-probe model. It is natural to see that for a cell-probe scheme, for any query, the cells probed by an adaptive algorithm contain a cell-probe proof. This can be seen as a data structure counterpart of P ⊆ NP. It is important to note that although a data structure problem is nothing but a boolean function, CPP is very different from the certificate complexity of boolean functions [4]. In CPP, the prover and the verifier communicate with each other via a table structure, which distinguishes CPP from standard certificate complexity. For any data structure problem, the table structure can always store the results for all queries, making one cell-probe sufficient to prove the result, which is generally impossible in the model of certificate complexity. Unlike adaptive cell-probes, CPP has a static nature, which is convenient for reductions. As stated by the following lemma, any CPP can be trivially reduced to 1-cell proofs. Lemma 1 (reduction lemma). For any data structure problem f, if there exists an (s, b, t)-CPP, then there exists an (s t , bt, 1)-CPP. Proof. Just store every t-tuple of cells in the (s, b, t)-CPP as a new cell in the (s t , bt, 1)-CPP. ut 3 Characterization of CPPs We now introduce a combinatorial characterization of CPP. Given a set system F ⊆ 2 Y , for any y ∈ Y , we let F(y) = {F ∈ F | y ∈ F}. For convenience, for a partition P of Y , we abuse this notation and let P(y) denote the set F ∈ P that y ∈ F. Definition 1. We say a set system F ⊆ 2 Y is an s × k-partition of Y , if F is a union of s number of partitions of Y , where the cardinality of each partition is at most k. This particular notion of partitions of Y fully captures the structure of cellprobe proofs. In this extended abstract, we only provide the characterization of 1-cell proofs. As shown by Lemma 1, this is a canonical case for cell-probe proofs. Theorem 1. There is an (s, b, 1)-CPP for f : X × Y → {0, 1}, if and only if there exists an s × 2 b -partition F of Y , such that for every x ∈ X and every y ∈ Y , there is an F ∈ F(y) that |f(x, F)| = 1