ZOE:Fast Cardinality Estimation for Large-Scale RFID Systems Yuanging Zheng,Mo Li School of Computer Engineering,Nanyang Technological University,Singapore [yuanqing1,limo}@ntu.edu.sg Abstract-Estimating the RFID cardinality with accuracy Further improving the time efficiency of each estimation guarantee is an important task in large-scale RFID systems.This round will significantly benefit the entire cardinality estimation paper proposes a fast RFID cardinality estimation scheme.The process,meet the stringent time requirement of many realtime proposed Zero-One Estimator (ZOE)protocol rapidly converges to optimal parameter settings and achieves high estimation applications,and support larger scale of RFID systems.While efficiency.ZOE significantly improves the cardinality estimation pursuing the estimation efficiency at the optimum,we are efficiency,achieving 3x performance gain compared with existing also aiming at reducing the processing overhead of resource- protocols.Meanwhile,ZOE guarantees arbitrary accuracy re- constrained RFID tags.Most existing probabilistic approaches quirement without imposing computation and memory overhead at RFID tags.Due to the simplicity and robustness,the ZOE require generating large volume of random numbers or alter- protocol provides reliable cardinality estimation even over noisy natively pre-storing them at RFID tags,which leads to heavy channel.We implement a prototype system using the USRP computation and storage burden for RFID tags.We aim at software defined radio and Intel WISP RFID tags.We also shifting such processing overhead from resource-constrained evaluate the performance of ZOE with extensive simulations. RFID tags to the much more powerful RFID readers.Besides, The evaluation of ZOE shows encouraging results in terms of most existing works assume a reliable wireless channel be- estimation accuracy,time efficiency,as well as robustness. tween the RFID reader and tags,which is contradicting with the fact that the wireless channel is mostly error-prone. I.INTRODUCTION In this paper,we present Zero-One Estimator (ZOE):a Radio Frequency Identification (RFID)systems [9]have simple yet fast RFID cardinality estimation protocol with recently received significant interests from both industry and guaranteed accuracy requirement.ZOE first tunes the system academia.A large-scale RFID system usually consists of parameters and converges to optimal settings with a bisection multiple RFID readers and a huge amount of RFID tags [23]. search with small overhead.With the optimized parameter An RFID tag is capable of storing its unique ID as well as settings,ZOE estimates the RFID cardinality with only one some other information and wirelessly transmitting them back bit feedback from tags at each round providing extremely high to readers.By verifying the unique IDs of RFID tags attached estimation efficiency.Through simply XORing the original to physical objects,RFID readers are able to identify and random number stored at each RFID tag with a common itemize the objects.Due to small form factor and low cost of random number generated by RFID readers,ZOE shifts the RFID tags,RFID systems provide us a scalable and economic burden of generating randomness from lightweight tags to way for managing massive objects in a variety of applications relatively powerful readers.We further take the noisy wire- including inventory management [8,14,17,22,25],logistics less channel into consideration and propose Error Estimation (26,281,object tracking [15,181,etc. and Adjustment (EEA)algorithm to adjust estimation results Fast estimating the cardinality of RFID tags,accordingly according to the practical communication error rates. the number of labeled items,is of primary importance in We implement a prototype system using the Universal many applications,e.g.,estimating the number of conference Software Radio Peripheral (USRP)[4]in concert with the attendees with RFID badges.The estimated RFID cardinality Intel Wireless Identification and Sensing Platform (WISP) with guaranteed accuracy may also serve as primary inputs to [21].We implement the ZOE reader functionality with USRP upper-layer RFID protocols.For instance,Aloha-based RFID Software Defined Radio(SDR)that interrogates programmable identification protocols can achieve near-optimal performance WISP tags.The ZOE protocol only requires slight updates to if the frame size is set according to the number of tags the EPCglobal Class 1 Generation 2 (C1G2)standard.We In order to achieve efficient RFID cardinality estimation, also evaluate ZOE with extensive simulations to study its probabilistic estimation approaches have been proposed which performance in large-scale RFID systems. aim to estimate the approximate number of RFID tags.Some The rest of this paper is organized as follows.We first recent approaches achieve O(log n)estimation efficiency to briefly review the related work in Section II.In Section III, the number of RFID tags n [11,16].One most recent protocol, we introduce our system model and describe the problem of Probabilistic Estimation Tree (PET),achieves O(log log n) cardinality estimation.We give detailed description on ZOE time efficiency for each estimation round [271.Nevertheless. in Section IV.We present the implementation of ZOE in existing protocols require many independent estimation rounds Section V.We evaluate the performance of ZOE with extensive so as to achieve high accuracy on the estimation results. simulations in Section VI.Section VII concludes this paper
ZOE: Fast Cardinality Estimation for Large-Scale RFID Systems Yuanqing Zheng, Mo Li School of Computer Engineering, Nanyang Technological University, Singapore {yuanqing1, limo}@ntu.edu.sg Abstract—Estimating the RFID cardinality with accuracy guarantee is an important task in large-scale RFID systems. This paper proposes a fast RFID cardinality estimation scheme. The proposed Zero-One Estimator (ZOE) protocol rapidly converges to optimal parameter settings and achieves high estimation efficiency. ZOE significantly improves the cardinality estimation efficiency, achieving 3x performance gain compared with existing protocols. Meanwhile, ZOE guarantees arbitrary accuracy requirement without imposing computation and memory overhead at RFID tags. Due to the simplicity and robustness, the ZOE protocol provides reliable cardinality estimation even over noisy channel. We implement a prototype system using the USRP software defined radio and Intel WISP RFID tags. We also evaluate the performance of ZOE with extensive simulations. The evaluation of ZOE shows encouraging results in terms of estimation accuracy, time efficiency, as well as robustness. I. INTRODUCTION Radio Frequency Identification (RFID) systems [9] have recently received significant interests from both industry and academia. A large-scale RFID system usually consists of multiple RFID readers and a huge amount of RFID tags [23]. An RFID tag is capable of storing its unique ID as well as some other information and wirelessly transmitting them back to readers. By verifying the unique IDs of RFID tags attached to physical objects, RFID readers are able to identify and itemize the objects. Due to small form factor and low cost of RFID tags, RFID systems provide us a scalable and economic way for managing massive objects in a variety of applications including inventory management [8, 14, 17, 22, 25], logistics [26, 28], object tracking [15, 18], etc. Fast estimating the cardinality of RFID tags, accordingly the number of labeled items, is of primary importance in many applications, e.g., estimating the number of conference attendees with RFID badges. The estimated RFID cardinality with guaranteed accuracy may also serve as primary inputs to upper-layer RFID protocols. For instance, Aloha-based RFID identification protocols can achieve near-optimal performance if the frame size is set according to the number of tags. In order to achieve efficient RFID cardinality estimation, probabilistic estimation approaches have been proposed which aim to estimate the approximate number of RFID tags. Some recent approaches achieve O(log n) estimation efficiency to the number of RFID tags n [11, 16]. One most recent protocol, Probabilistic Estimation Tree (PET), achieves O(log log n) time efficiency for each estimation round [27]. Nevertheless, existing protocols require many independent estimation rounds so as to achieve high accuracy on the estimation results. Further improving the time efficiency of each estimation round will significantly benefit the entire cardinality estimation process, meet the stringent time requirement of many realtime applications, and support larger scale of RFID systems. While pursuing the estimation efficiency at the optimum, we are also aiming at reducing the processing overhead of resourceconstrained RFID tags. Most existing probabilistic approaches require generating large volume of random numbers or alternatively pre-storing them at RFID tags, which leads to heavy computation and storage burden for RFID tags. We aim at shifting such processing overhead from resource-constrained RFID tags to the much more powerful RFID readers. Besides, most existing works assume a reliable wireless channel between the RFID reader and tags, which is contradicting with the fact that the wireless channel is mostly error-prone. In this paper, we present Zero-One Estimator (ZOE): a simple yet fast RFID cardinality estimation protocol with guaranteed accuracy requirement. ZOE first tunes the system parameters and converges to optimal settings with a bisection search with small overhead. With the optimized parameter settings, ZOE estimates the RFID cardinality with only one bit feedback from tags at each round providing extremely high estimation efficiency. Through simply XORing the original random number stored at each RFID tag with a common random number generated by RFID readers, ZOE shifts the burden of generating randomness from lightweight tags to relatively powerful readers. We further take the noisy wireless channel into consideration and propose Error Estimation and Adjustment (EEA) algorithm to adjust estimation results according to the practical communication error rates. We implement a prototype system using the Universal Software Radio Peripheral (USRP) [4] in concert with the Intel Wireless Identification and Sensing Platform (WISP) [21]. We implement the ZOE reader functionality with USRP Software Defined Radio (SDR) that interrogates programmable WISP tags. The ZOE protocol only requires slight updates to the EPCglobal Class 1 Generation 2 (C1G2) standard. We also evaluate ZOE with extensive simulations to study its performance in large-scale RFID systems. The rest of this paper is organized as follows. We first briefly review the related work in Section II. In Section III, we introduce our system model and describe the problem of cardinality estimation. We give detailed description on ZOE in Section IV. We present the implementation of ZOE in Section V. We evaluate the performance of ZOE with extensive simulations in Section VI. Section VII concludes this paper
II.RELATED WORK backscatters a message or keeps silent according to the reader's One closely related problem is RFID identification which command.Such a communication model has been widely aims at collecting the tag IDs of all RFIDs in the interro- adopted in many RFID systems compliant with the de facto gation area [7,19,20,24,29].The identification protocols EPCglobal C1G2 standard [2]. based on collision arbitration can generally be classified into The practical communication channel is mostly error-prone two categories:Ahola-based protocols [20]and Tree-based depending on various factors including transmission power, protocols [7,29].In small-scale RFID systems,we may interrogation distance,antenna gain,interference,etc [6]. directly apply the identification protocols to count the exact tag Due to the channel attenuation even if there are some tags cardinality.Nevertheless,such a method becomes infeasible transmitting back responses,the reader may fail to detect them due to the low efficiency of identification protocols.Rather in practice.We call such missing detection errors as false than identifying all the tags,probabilistic estimation protocols negatives.On the other hand,the reader may falsely detect a estimate the number of tags which meets the customized busy channel due to the interferences,even when no response accuracy requirement. is transmitted.We call such errors as false positives. Kodialam and Nandagopal present Unified Simple Estimator (USE)and Unified Probabilistic Estimator(UPE)[12].Those B.Problem description schemes are vulnerable to multiple counting problems when The objective of this work is to efficiently and accurately multiple RFID readers are deployed to cover the interrogation estimate the tag cardinality.To meet stringent realtime re- region.Besides,the protocols require a cardinality upper bound known in advance.Kodialam et al.propose Enhanced quirement,the estimation protocol should compute an accurate tag cardinality in an efficient manner.Since the cardinality of Zero-Based (EZB)estimator to estimate relatively large num- ber of tags [13]. RFID tags could easily grow up to tens of thousands (e.g.,a Recent probabilistic estimation approaches achieve typical port inventory application concerns hundreds of con- tainers,each of which contains a large number of products), O(log n)estimation efficiency to the total number of RFID we need to design a scalable and efficient estimation approach tags n.Han et al.propose the First Non-Empty slot Based that can guarantee the customized accuracy requirement. (FNEB)estimator with binary search method [11].Qian et al. [16]propose the Lottery Frame (LoF)based estimator,which Consistent with the existing approaches [11,16,27],the is a replicate-insensitive estimation protocol.Both approaches accuracy requirement is presented by (e,6)-approximation.We denote by n the estimated number of the tag cardinality while require the tags to cooperate with the reader by generating large volume of random numbers and respond accordingly. the actual number is n.Given the accuracy requirement of One most recent protocol,Probabilistic Estimating Tree (PET) (,0)-approximation,we expect an estimation result n,which based estimator [27],advances the estimation efficiency and satisfies Pr-n<En}>1-6.For example,when achieves O(log logn)processing time efficiency to the tag the actual tag cardinality is 10000,=5%,6 =1%)- cardinality n.Such approaches only consider an ideal wireless approximation expects an estimation result within the interval 9500,10500 with a probability of 99%and above. channel which is free of transmission error. We abstract the time efficiency with the total time slots used III.SYSTEM MODEL AND PROBLEM DESCRIPTION to estimate the cardinality.Most recent approaches only need to distinguish an idle slot from a busy slot [11,16,27].The A.System model smaller number of time slots means the shorter processing time We consider a large-scale RFID system consisting of three and thus the higher time efficiency,and vice versa.Meanwhile, major components:a large volume of RFID tags,several RFID we seek to reduce the computation and memory burden at readers,and a backend server connecting the readers.Multiple the RFID tags to facilitate the use of low-cost but resource- RFID readers are normally deployed to ensure a full coverage constrained passive tags rather than expensive active ones. in large-scale RFID systems.The backend server coordinates While most recent estimation protocols assume zero error the RFID readers and initiates the cardinality estimation pro- rates in the underlying wireless channel,we target at practical cess.The RFID readers relay the commands received from channel conditions with false negatives and false positives. the backend server and broadcast to tags,and later report When the bit error rate is high,it becomes very difficult to the tags'responses back to the server.When coordinated and derive accurate cardinality estimation.Nevertheless,when the synchronized,the multiple readers can logically be considered channel is in mild conditions,we expect that the estimation as one RFID reader.The RFID system may use lightweight protocol computes a reasonably accurate estimation result. passive RFID tags or powerful active ones. RFID systems may operate over a wide variety of frequency IV.ESTIMATION PROTOCOL bands (e.g.,13.56/433/900MHz).We exclusively focus on the RFID systems operating in the 900MHz ultra-high frequency In this section,we first discuss the design principle of band.We assume that the RFID system works on frame-slotted ZOE protocol.We then consolidate the essential idea with Aloha model.RFID readers initiate interrogation by sending a cardinality estimation protocol,which provides O(1)time operation codes and specifying PHY/MAC parameters.When efficiency for each estimation round.Table I summarizes the energized by the continuous waves from reader,each tag key notations used across this paper
II. RELATED WORK One closely related problem is RFID identification which aims at collecting the tag IDs of all RFIDs in the interrogation area [7, 19, 20, 24, 29]. The identification protocols based on collision arbitration can generally be classified into two categories: Ahola-based protocols [20] and Tree-based protocols [7, 29]. In small-scale RFID systems, we may directly apply the identification protocols to count the exact tag cardinality. Nevertheless, such a method becomes infeasible due to the low efficiency of identification protocols. Rather than identifying all the tags, probabilistic estimation protocols estimate the number of tags which meets the customized accuracy requirement. Kodialam and Nandagopal present Unified Simple Estimator (USE) and Unified Probabilistic Estimator (UPE) [12]. Those schemes are vulnerable to multiple counting problems when multiple RFID readers are deployed to cover the interrogation region. Besides, the protocols require a cardinality upper bound known in advance. Kodialam et al. propose Enhanced Zero-Based (EZB) estimator to estimate relatively large number of tags [13]. Recent probabilistic estimation approaches achieve O(log n) estimation efficiency to the total number of RFID tags n. Han et al. propose the First Non-Empty slot Based (FNEB) estimator with binary search method [11]. Qian et al. [16] propose the Lottery Frame (LoF) based estimator, which is a replicate-insensitive estimation protocol. Both approaches require the tags to cooperate with the reader by generating large volume of random numbers and respond accordingly. One most recent protocol, Probabilistic Estimating Tree (PET) based estimator [27], advances the estimation efficiency and achieves O(log log n) processing time efficiency to the tag cardinality n. Such approaches only consider an ideal wireless channel which is free of transmission error. III. SYSTEM MODEL AND PROBLEM DESCRIPTION A. System model We consider a large-scale RFID system consisting of three major components: a large volume of RFID tags, several RFID readers, and a backend server connecting the readers. Multiple RFID readers are normally deployed to ensure a full coverage in large-scale RFID systems. The backend server coordinates the RFID readers and initiates the cardinality estimation process. The RFID readers relay the commands received from the backend server and broadcast to tags, and later report the tags’ responses back to the server. When coordinated and synchronized, the multiple readers can logically be considered as one RFID reader. The RFID system may use lightweight passive RFID tags or powerful active ones. RFID systems may operate over a wide variety of frequency bands (e.g., 13.56/433/900MHz). We exclusively focus on the RFID systems operating in the 900MHz ultra-high frequency band. We assume that the RFID system works on frame-slotted Aloha model. RFID readers initiate interrogation by sending operation codes and specifying PHY/MAC parameters. When energized by the continuous waves from reader, each tag backscatters a message or keeps silent according to the reader’s command. Such a communication model has been widely adopted in many RFID systems compliant with the de facto EPCglobal C1G2 standard [2]. The practical communication channel is mostly error-prone depending on various factors including transmission power, interrogation distance, antenna gain, interference, etc [6]. Due to the channel attenuation even if there are some tags transmitting back responses, the reader may fail to detect them in practice. We call such missing detection errors as false negatives. On the other hand, the reader may falsely detect a busy channel due to the interferences, even when no response is transmitted. We call such errors as false positives. B. Problem description The objective of this work is to efficiently and accurately estimate the tag cardinality. To meet stringent realtime requirement, the estimation protocol should compute an accurate tag cardinality in an efficient manner. Since the cardinality of RFID tags could easily grow up to tens of thousands (e.g., a typical port inventory application concerns hundreds of containers, each of which contains a large number of products), we need to design a scalable and efficient estimation approach that can guarantee the customized accuracy requirement. Consistent with the existing approaches [11, 16, 27], the accuracy requirement is presented by (ε,δ)-approximation. We denote by nˆ the estimated number of the tag cardinality while the actual number is n. Given the accuracy requirement of (ε,δ)-approximation, we expect an estimation result nˆ, which satisfies Pr{|nˆ − n| ≤ εn} ≥ 1 − δ. For example, when the actual tag cardinality is 10000, (ε = 5%, δ = 1%)- approximation expects an estimation result within the interval [9500, 10500] with a probability of 99% and above. We abstract the time efficiency with the total time slots used to estimate the cardinality. Most recent approaches only need to distinguish an idle slot from a busy slot [11, 16, 27]. The smaller number of time slots means the shorter processing time and thus the higher time efficiency, and vice versa. Meanwhile, we seek to reduce the computation and memory burden at the RFID tags to facilitate the use of low-cost but resourceconstrained passive tags rather than expensive active ones. While most recent estimation protocols assume zero error rates in the underlying wireless channel, we target at practical channel conditions with false negatives and false positives. When the bit error rate is high, it becomes very difficult to derive accurate cardinality estimation. Nevertheless, when the channel is in mild conditions, we expect that the estimation protocol computes a reasonably accurate estimation result. IV. ESTIMATION PROTOCOL In this section, we first discuss the design principle of ZOE protocol. We then consolidate the essential idea with a cardinality estimation protocol, which provides O(1) time efficiency for each estimation round. Table I summarizes the key notations used across this paper
TABLE I FNEB KEY NOTATIONS Symbols Descriptions 234567890▣ n Actual number of tags Oin) ZOE 公 Estimated number of tags 20E OR 12345678 m Estimation rounds P H) A uniform hash function O(log n) 0(1) HB(.) Binary representation of (. ☐Idle slot ☐Busy slot R() Position of right-most zero in HB(.) (a)Estimation frame and interested slot (b)One aggregated slot A threshold affecting the behavior of tags Fig.1.An illustrative comparison of ZOE with conventional schemes of 入 Load factor n/20 FNEB,LoF and PET. Channel error rate cardinality estimation,we present detailed theoretical analysis A.Principle and consolidate the aforementioned design principle with a prototype system in the following. The conventional approaches take advantage of the frame- slotted Aloha protocol to estimate the number of tags.The B.Zero-One Estimator protocol RFID reader normally needs to examine the state of each time slot in the frame.Figure 1(a)presents illustrative examples of We describe the detailed ZOE protocol in this section. most recent protocols,e.g.,FNEB [11].LoF [16],and PET 1)Tag:When probed by a reader in the cardinality esti- [27].In FNEB,each tag randomly selects a slot from O(n) mation process,each tag independently computes a random slots with uniform distribution functions to send a response number with a uniform distribution hash function (id,s), in each estimation round.If the frame size is fixed,then the where s denotes a random seed.For simplicity,we omit the first busy slot (herein,slot 3)indicates the tag population (i.e., notation of s for the hash functions.We denote by HB(id)the the smaller the first busy slot is,the larger the tag population binary representation of H(id).We also denote by R(id)the would be).Leveraging the monotonic feature,FNEB locates index of the right-most zero bit in HB(id)as follows the first busy slot by examining O(logn)slots.LoF and R(id)=min{iHB(id)[i]=0}. (1) PET reduce the frame size to O(logn)by letting each tag If R(idi)>the tag responds to the reader where 6 is a select a slot with geometric distribution functions,such that threshold received from the reader;and if R(id)<0 the tag nearly 1/2i of tags respond in the ith slot.PET achieves keeps silent. O(log log n)time efficiency leveraging a probabilistic binary Let the random variable of R(id;)be Ri,1<i<n,then tree structure.The estimation results of such protocols may we have the probability statistically vary for each estimation round.For instance.the Pr(Ri)=pR(1-p), first busy slot in one estimation round might dramatically deviate from the expected value.Existing protocols thus need where p denotes the probability that a bit of HB(id)turns many independent estimation rounds to derive an average to out to be '1'.Typically,we assume p=0.5,i.e.,the hash accurately approximate the actual number of tags. function is a uniform distribution hash function.Then we have To improve the estimation efficiency,we propose the ZOE p=1-p=0.5,and protocol in which each frame only contains one slot.In particular,all the responses from tags aggregate at a single Pr(Ra)=p= slot,leading to either an idle slot if no tag responds or a busy The probability that a tag keeps silent given a threshold slot otherwise as illustrated in Figure 1(b).Suppose there exist 0-1 8-1 n RFID tags,and each tag responds with the probability of P,and keeps silent with the probability of 1-P.Intuitively, P(,<)=∑Pr(=∑2=1-2 k=0 k=0 the more tags there are,the higher probability that the reader observes a busy slot,and vice versa.We can thus measure the On the other hand,a tag will respond to the reader with the ratio of busy (idle)slots,and infer the tag population with a probability of 1-Pr(Ri<0)=. priori knowledge of P. 2)Reader:a reader initiates cardinality estimation by send- Unlike conventional approaches where only small portion ing a random seed and the threshold 0 to the tags,and waits of tags participate in each time slot,in ZOE the responses for the responses from the tags.In the case that R(id;)<, from all the tags aggregate in the single time slot,which ∀i∈{1,2,.,n,the reader observes no reply from the tags.Therefore,with the assumption of independent,identical allows ZOE to make extensive use of each time slot,thereby achieving higher time efficiency.By intelligently setting the distribution (i.i.d)for Ri,the probability that there is no reply system parameters,ZOE only needs comparable number of from the tags (i.e.,the channel is idle)is as follows estimation rounds with conventional approaches,meaning that 1 ≈e~n/2e =e-入, we reduce the overhead in each estimation round to O(1) Pr(ide=Pr(R:<]”=(1-2p while keeping the number of estimation rounds similar to where the load factor入=n/2,and入>0. those in conventional approaches.Towards the more efficient The probability that there is a reply (no matter a singleton
TABLE I KEY NOTATIONS Symbols Descriptions n Actual number of tags nˆ Estimated number of tags m Estimation rounds H(.) A uniform hash function HB(.) Binary representation of H(.) R(.) Position of right-most zero in HB(.) θ A threshold affecting the behavior of tags λ Load factor n/2 θ q Channel error rate A. Principle The conventional approaches take advantage of the frameslotted Aloha protocol to estimate the number of tags. The RFID reader normally needs to examine the state of each time slot in the frame. Figure 1(a) presents illustrative examples of most recent protocols, e.g., FNEB [11], LoF [16], and PET [27]. In FNEB, each tag randomly selects a slot from O(n) slots with uniform distribution functions to send a response in each estimation round. If the frame size is fixed, then the first busy slot (herein, slot 3) indicates the tag population (i.e., the smaller the first busy slot is, the larger the tag population would be). Leveraging the monotonic feature, FNEB locates the first busy slot by examining O(log n) slots. LoF and PET reduce the frame size to O(log n) by letting each tag select a slot with geometric distribution functions, such that nearly 1/2 i of tags respond in the ith slot. PET achieves O(log log n) time efficiency leveraging a probabilistic binary tree structure. The estimation results of such protocols may statistically vary for each estimation round. For instance, the first busy slot in one estimation round might dramatically deviate from the expected value. Existing protocols thus need many independent estimation rounds to derive an average to accurately approximate the actual number of tags. To improve the estimation efficiency, we propose the ZOE protocol in which each frame only contains one slot. In particular, all the responses from tags aggregate at a single slot, leading to either an idle slot if no tag responds or a busy slot otherwise as illustrated in Figure 1(b). Suppose there exist n RFID tags, and each tag responds with the probability of P, and keeps silent with the probability of 1 − P. Intuitively, the more tags there are, the higher probability that the reader observes a busy slot, and vice versa. We can thus measure the ratio of busy (idle) slots, and infer the tag population with a priori knowledge of P. Unlike conventional approaches where only small portion of tags participate in each time slot, in ZOE the responses from all the tags aggregate in the single time slot, which allows ZOE to make extensive use of each time slot, thereby achieving higher time efficiency. By intelligently setting the system parameters, ZOE only needs comparable number of estimation rounds with conventional approaches, meaning that we reduce the overhead in each estimation round to O(1) while keeping the number of estimation rounds similar to those in conventional approaches. Towards the more efficient LoF PET FNEB 1 2 3 4 5 6 7 8 9 10 11 1 2 3 4 5 6 7 8 ZOE 1 Idle slot Busy slot (a) Estimation frame and interested slot LoF PET FNEB 1 2 3 4 5 6 7 8 9 10 11 1 2 3 4 5 6 7 8 ZOE Idle slot Busy slot id1 id2 idn ... id3 id4 OR P P P P P P ZOE id1 id2 idn OR P P P P id3 P id4 ... P ZOE id1 id2 idn OR P P P P id3 P id4 ... P (b) One aggregated slot Fig. 1. An illustrative comparison of ZOE with conventional schemes of FNEB, LoF and PET. cardinality estimation, we present detailed theoretical analysis and consolidate the aforementioned design principle with a prototype system in the following. B. Zero-One Estimator protocol We describe the detailed ZOE protocol in this section. 1) Tag: When probed by a reader in the cardinality estimation process, each tag independently computes a random number with a uniform distribution hash function H(id, s), where s denotes a random seed. For simplicity, we omit the notation of s for the hash functions. We denote by HB(id) the binary representation of H(id). We also denote by R(id) the index of the right-most zero bit in HB(id) as follows R(id) = min{i|HB(id)[i] = 0}. (1) If R(idi) ≥ θ the tag responds to the reader where θ is a threshold received from the reader; and if R(idi) < θ the tag keeps silent. Let the random variable of R(idi) be Ri , 1 ≤ i ≤ n, then we have the probability Pr(Ri) = p Ri (1 − p), where p denotes the probability that a bit of HB(id) turns out to be ‘1’. Typically, we assume p = 0.5, i.e., the hash function is a uniform distribution hash function. Then we have p = 1 − p = 0.5, and Pr(Ri) = p Ri+1 = 1 2Ri+1 . The probability that a tag keeps silent given a threshold θ is Pr(Ri < θ) = X θ−1 k=0 Pr(k) =X θ−1 k=0 1 2 k+1 = 1 − 1 2 θ . On the other hand, a tag will respond to the reader with the probability of 1 − Pr(Ri < θ) = 1 2 θ . 2) Reader: a reader initiates cardinality estimation by sending a random seed and the threshold θ to the tags, and waits for the responses from the tags. In the case that R(idi) < θ, ∀i ∈ {1, 2, . . . , n}, the reader observes no reply from the tags. Therefore, with the assumption of independent, identical distribution (i.i.d) for Ri , the probability that there is no reply from the tags (i.e., the channel is idle) is as follows Pr(idle) = [Pr(Ri < θ)]n = 1 − 1 2 θ n ≈ e −n/2 θ = e −λ , where the load factor λ = n/2 θ , and λ > 0. The probability that there is a reply (no matter a singleton
Algorithm 1 ZOE algorithm for RFID readers requirement can be represented as follows ca(x)ma 1:m←[exa P Pr{li-ml≤en}=Pr{e-1+e)≤≤e-A-e. 2:Initiate the estimation,broadcast 0 3:fori←1 to m do We define a random variable Y=where 4: Generate a random seed s and broadcast it E(X)=e-入,anda=o()=g9.By the central limit Vm 5: if there is no response in the slot then theorem [10],we know Y is asymptotically standard normal 6: X:←-1 distribution. 7: else 8: X:←-0 Given a particular error probability 6.we can find a constant 9: end if c that satisfies 10:end for 1:←品∑X -6=Pr-e≤y≤c=rf(5) 12:return←-29ln where erf is the Gaussian error function [10].Therefore,we can guarantee the accuracy requirement Prn-n<en}> Algorithm 2 ZOE algorithm for each RFID tag 1-6 if we have the following conditions 1:Receive the threshold 0 e-X(1+e)-e-X 2:while TRUE do ≤-c and e-A1-e)-e-H ≥c.(4) 3: Receive the random seed s;Compute R(id) ifR(id)≥0 then According to (4),we have 5: Respond immediately 6: else w2m 7: Keep silent 8: end if 9:end while ≥ (5) Therefore,with such m estimation frames,ZOE can guar- reply from one tag or replies from multiple tags)is antee the accuracy requirement of Pr{-n<en}>1-6. Pr(reply)=1-Pr(idle)≈1-e-n/2°=1-e-A Algorithm 1 regulates the behavior of the RFID reader.The reader calculates the estimation rounds m according to (5) We define a random variable X which takes value 1 with probability Pr(idle)e-n/2 given an accuracy requirement (line 1).The reader initiates e-and value 0 with probability Pr(reply)1-e-/2=1-e-.Then we the estimation process by energizing the tags and sending the threshold 6(line 2).The reader generates random seeds and have broadcasts them (line 3-4),and records the tags'responses Pr(X=1)≈e-入,Pr(X=0)≈1-e-A (line 5-9).The average X is thus calculated based on the Obviously,the random variable X follows the Bernoulli dis- m estimation rounds (line 11).Finally,the estimated tag tribution.Therefore,the expectation and the standard deviation cardinality is computed according to (3)(line 12). of X are as follows Each tag performs simple tasks as regulated in Algorithm E(X)=e-入,o(X)=VWar(X)=Ve-A(1-e-λ) 2.In each estimation round,when receiving a random seed s,the tag computes the random number R(id)according to The maximum standard deviation of X is (1).The tag keeps silent or responds to the reader according a(X)max =0.5,when e-=0.5. to R(id)and the threshold 0.If R(id)>the tag sends a We define the random process as the response,and otherwise keeps silent (line 2-9). 2 Yet one problem remains.In (5),we see that m depends on average of m independent observations,where Xi denotes the ith observation of random variable X.We assume the trials A=n/20 indicating that the threshold 0 may influence the of Xi(1<i<m)are i.i.d,then we have E(X)=E(X)and estimation efficiency.In the following,we discuss how to set (X)=N the threshold to guarantee the optimal performance of ZOE. m According to the law of large numbers [10],when m is C.Parameter setting large we have =E()=E(X)=e-n/2°=e-A Before we perform the estimation process,we need to set (2) the threshold 0 which directly influences the behaviors of the According to (2),we can estimate the load factor as follows tags and the estimation efficiency.If is too big,the reader 入=-lnx, will consistently observe idle slots,i.e.,X 1;if 0 is too where入denotes the estimation of入. small,the reader will observe busy slots in almost every time slots,i.e.,X=0,with high probability.In either situation,it The observation of can thus be used to estimate the tag consumes extra processing time to meet an accuracy require- cardinality n as follows ment.As a matter of fact,if we look at the lower bound of =-20 In. (3) the estimation round m measured in(⑤),since入=n/2e,the Since the result may vary slightly because of the estimation lower bound depends on the tag cardinality which is not known variance,we seek a guaranteed cardinality estimation result, in advance.We denote by f()=e-(l-e-ea)≈e-xe入, e.g.,Pr-n<En}>1-6.The estimation accuracy the denominator of in (5).To reduce the number
Algorithm 1 ZOE algorithm for RFID readers 1: m ← [ cσ(X)max e−λ(1−e−ελ) ] 2 2: Initiate the estimation, broadcast θ 3: for i ← 1 to m do 4: Generate a random seed s and broadcast it 5: if there is no response in the slot then 6: Xi ← 1 7: else 8: Xi ← 0 9: end if 10: end for 11: X¯ ← 1 m Pm i=1 Xi 12: return nˆ ← −2 θ ln X¯ Algorithm 2 ZOE algorithm for each RFID tag 1: Receive the threshold θ 2: while TRUE do 3: Receive the random seed s; Compute R(id) 4: if R(id) ≥ θ then 5: Respond immediately 6: else 7: Keep silent 8: end if 9: end while reply from one tag or replies from multiple tags) is Pr(reply) = 1 − Pr(idle) ≈ 1 − e −n/2 θ = 1 − e −λ . We define a random variable X which takes value 1 with probability Pr(idle) ≈ e −n/2 θ = e −λ and value 0 with probability Pr(reply) ≈ 1 − e −n/2 θ = 1 − e −λ . Then we have Pr(X = 1) ≈ e −λ ,Pr(X = 0) ≈ 1 − e −λ . Obviously, the random variable X follows the Bernoulli distribution. Therefore, the expectation and the standard deviation of X are as follows E(X) = e −λ , σ(X) = p V ar(X) = q e−λ(1 − e−λ). The maximum standard deviation of X is σ(X)max = 0.5, when e −λ = 0.5. We define the random process X¯ = 1 m Pm i=1 Xi as the average of m independent observations, where Xi denotes the ith observation of random variable X. We assume the trials of Xi(1 ≤ i ≤ m) are i.i.d, then we have E(X¯) = E(X) and σ(X¯) = σ(X) √m . According to the law of large numbers [10], when m is large we have X¯ = E(X¯) = E(X) = e −n/2 θ = e −λ . (2) According to (2), we can estimate the load factor as follows λˆ = − ln X, ¯ where λˆ denotes the estimation of λ. The observation of X¯ can thus be used to estimate the tag cardinality nˆ as follows nˆ = −2 θ ln X. ¯ (3) Since the result may vary slightly because of the estimation variance, we seek a guaranteed cardinality estimation result, e.g., P r{|nˆ − n| ≤ εn} ≥ 1 − δ. The estimation accuracy requirement can be represented as follows Pr{|nˆ − n| ≤ εn} = Pr{e −λ(1+ε) ≤ X¯ ≤ e −λ(1−ε) }. We define a random variable Y = X¯−µ σ , where µ = E(X¯) = e −λ , and σ = σ(X¯) = σ(X) √m . By the central limit theorem [10], we know Y is asymptotically standard normal distribution. Given a particular error probability δ, we can find a constant c that satisfies 1 − δ = Pr{−c ≤ Y ≤ c} = erf( c √ 2 ), where erf is the Gaussian error function [10]. Therefore, we can guarantee the accuracy requirement P r{|nˆ − n| ≤ εn} ≥ 1 − δ if we have the following conditions e −λ(1+ε) − e −λ σ ≤ −c and e −λ(1−ε) − e −λ σ ≥ c. (4) According to (4), we have m ≥ max{[ cσ(X)max e−λ(1 − e−ελ) ] 2 , [ cσ(X)max e−λ(e ελ − 1)] 2 } ≥ [ cσ(X)max e−λ(1 − e−ελ) ] 2 . (5) Therefore, with such m estimation frames, ZOE can guarantee the accuracy requirement of P r{|nˆ −n| ≤ εn} ≥ 1−δ. Algorithm 1 regulates the behavior of the RFID reader. The reader calculates the estimation rounds m according to (5) given an accuracy requirement (line 1). The reader initiates the estimation process by energizing the tags and sending the threshold θ (line 2). The reader generates random seeds and broadcasts them (line 3-4), and records the tags’ responses (line 5-9). The average X¯ is thus calculated based on the m estimation rounds (line 11). Finally, the estimated tag cardinality is computed according to (3) (line 12). Each tag performs simple tasks as regulated in Algorithm 2. In each estimation round, when receiving a random seed s, the tag computes the random number R(id) according to (1). The tag keeps silent or responds to the reader according to R(id) and the threshold θ. If R(id) ≥ θ the tag sends a response, and otherwise keeps silent (line 2-9). Yet one problem remains. In (5), we see that m depends on λ = n/2 θ indicating that the threshold θ may influence the estimation efficiency. In the following, we discuss how to set the threshold to guarantee the optimal performance of ZOE. C. Parameter setting Before we perform the estimation process, we need to set the threshold θ which directly influences the behaviors of the tags and the estimation efficiency. If θ is too big, the reader will consistently observe idle slots, i.e., X¯ = 1; if θ is too small, the reader will observe busy slots in almost every time slots, i.e., X¯ = 0, with high probability. In either situation, it consumes extra processing time to meet an accuracy requirement. As a matter of fact, if we look at the lower bound of the estimation round m measured in (5), since λ = n/2 θ , the lower bound depends on the tag cardinality which is not known in advance. We denote by f(λ) = e −λ (1 − e −ελ) ≈ e −λ ελ, the denominator of cσ(X)max e−λ(1−e−ελ) in (5). To reduce the number
Algorithm 3 Threshold setting algorithm 0=16 6=8 0=120=10 1:low←0.high←-32 2:while low high do 025 3: mid←-(low+high)/2 0.2 .1 4: 0mid,Compute X with Algorithm 1 0. 5: if下>=2+e1and<=兰 te-i then 6: 0←-mid;break; 7: end if if then 32 64 96 8: Time slot E(X) 22 9 high←mid (a) (b) 10: else Fig.2.Parameter setting process:(a)Fast convergence to the optimal 11: low←-mid threshold value with bisection search method;(b)When E(X)0 or 12: end if E(X)→1,the variance is very small.. 13:end while 14:return 0 is sufficient shortly).At the first step (1-32),we start with the threshold 0=16.The reader observes 32 consecutive idle of estimation rounds,we maximize the denominator f(A) slots denoted by'I's in Figure 2(a).Since X>e-,we adjust since the numerator co(X)max is constant given an accu- the parameter by decreasing 0 at the second step(33-64),and racy requirement.We compute the first order derivative of f()≈e-入e入as follows. we repeat again 32 trials with the threshold =8.The reader observes 32 straight busy slots denoted by '0's.At the third df(X) =ee-入(1-X)】 (6) step (65-96),the threshold is tuned to be (8+16)/2 =12. d入 In this case,the reader observes both '1's and '0's ( According to (6).we find the first order derivative vanishes 24/32 =0.75 e-1).At the final step (97-128),we run at 1,and we have se->0.Therefore,the lower bound the estimation with =(8+12)/2=10.in which case mmin is achieved at入≈l,i.e,when X=e-A≈e-l the reader also observes mixed 'I's and '0's (11/32). This observation motivates us to adapt the threshold Since X =11/320.34 at the final step is quite close to according to the observation of a short sequence of the tags' e-1≈0.37,we set the threshold0tobe10. responses such that X becomes close to e-1.When the reader Here,we elaborate why the empirical number of 32 trials observes too many idle slots,i.e.,X>e-1,it decrements is sufficient for the threshold setting.Figure 2(b)plots the the threshold to increase the probability that tags would send variance of X against the expectation of X.We find that responses;when the reader observes almost all the busy slots, when the expectation E(X)0 or E(X)-1,the variance i.e..Xe,it increments the threshold to decrease the Var(X)0,indicating that when 6 is either too big or too probability that tags would send responses. small,X shall be relatively stable around E(X)to tell the The expected value of X is monotonically non-decreasing scale of n.Therefore,it is safe to roughly and rapidly estimate against the threshold.We exploit such a monotonic feature to the scale of tag cardinality and set the threshold accordingly fast converge to an optimal threshold.We can reach a suitable with a small number of runs.This is the reason why we can 0 that provides usXe-1 with bisection search.Since use a small sequence of 32 slots to calculate the optimal 6. we know the target average of X,i.e.,e-10.37,we can The parameter setting process involves several bisection terminate the bisection search when the intermediate value of steps to determine a threshold.This small amount of overhead X becomes very close to e0.37.In particular,we adapt after further reduction by early termination becomes almost and terminate the bisection process when the intermediate negligible (about 3%of the total estimation overhead).There- value X(Qint)reaches the interval [(e-2 +e-1)/2,(e-0.5 fore,we can first tune the threshold and converges to an e)/2]and use int as the threshold. optimal parameter setting at a very small cost.Using such Algorithm 3 presents the threshold setting process using an optimal threshold,we can estimate the accurate cardinality bisection search method.The threshold is set to be the average with minimal number of estimation rounds achieving higher of low and high.The low end and high end are adjusted overall estimation efficiency. according toX (line 8-12).X is computed according to Algorithm I with the parameters0=mid (line 4).Finally,the D.Discussion two ends converge,and the average is used as the threshold 1)Reliable estimation with unreliable channel:Most ex- 6(line 2).When the intermediate value X becomes very isting protocols study the cardinality estimation with reliable close to the target average e-1,the parameter setting process communication channel,while the wireless channel is error- terminates and 0=mid is used as the threshold (line 5-7). prone depending on various conditions.The recent protocols Figure 2(a)depicts an example of setting the threshold.In fail to capture the actual cardinality over noisy channels even the experiment,the actual tag cardinality is 1024,and thus with the knowledge of error rates.For instance,the false the optimal threshold is 6 log2 1024 =10.We repeat a detection of response signal might tamper the monotonic small number of trials in each bisection step to derive X.In feature of response signal along the estimation path in PET Figure 2(a),we see that the experiment consists of 4 steps protocol [27],and substantially degrades estimation accuracy (i.e.,=16,8,12,and 10),and the number of trials is set LoF 16]also relies on channel qualities,and estimation to be an empirical number of 32(we will elaborated why 32 accuracy decreases dramatically even with a small error rate
Algorithm 3 Threshold setting algorithm 1: low ← 0, high ← 32 2: while low < high do 3: mid ← (low + high)/2 4: θ ← mid, Compute X¯ with Algorithm 1 5: if X >¯ = e−2+e−1 2 and X <¯ = e−0.5+e−1 2 then 6: θ ← mid; break; 7: end if 8: if X >¯ e−0.5+e−1 2 then 9: high ← mid 10: else 11: low ← mid 12: end if 13: end while 14: return θ of estimation rounds, we maximize the denominator f(λ) since the numerator cσ(X)max is constant given an accuracy requirement. We compute the first order derivative of f(λ) ≈ e −λ ελ as follows, df(λ) dλ = εe−λ (1 − λ). (6) According to (6), we find the first order derivative vanishes at λ ≈ 1, and we have εe−λ > 0. Therefore, the lower bound mmin is achieved at λ ≈ 1, i.e., when X¯ = e −λ ≈ e −1 . This observation motivates us to adapt the threshold θ according to the observation of a short sequence of the tags’ responses such that X¯ becomes close to e −1 . When the reader observes too many idle slots, i.e., X¯ e −1 , it decrements the threshold to increase the probability that tags would send responses; when the reader observes almost all the busy slots, i.e., X¯ e −1 , it increments the threshold to decrease the probability that tags would send responses. The expected value of X¯ is monotonically non-decreasing against the threshold. We exploit such a monotonic feature to fast converge to an optimal threshold. We can reach a suitable θ that provides us X¯ ≈ e −1 with bisection search. Since we know the target average of X¯, i.e., e −1 ≈ 0.37, we can terminate the bisection search when the intermediate value of X¯ becomes very close to e −1 ≈ 0.37. In particular, we adapt θ and terminate the bisection process when the intermediate value X¯(θint) reaches the interval [(e −2 + e −1 )/2,(e −0.5 + e −1 )/2] and use θint as the threshold. Algorithm 3 presents the threshold setting process using bisection search method. The threshold is set to be the average of low and high. The low end and high end are adjusted according to X¯ (line 8-12). X¯ is computed according to Algorithm 1 with the parameters θ = mid (line 4). Finally, the two ends converge, and the average is used as the threshold θ (line 2). When the intermediate value X¯ becomes very close to the target average e −1 , the parameter setting process terminates and θ = mid is used as the threshold (line 5-7). Figure 2(a) depicts an example of setting the threshold. In the experiment, the actual tag cardinality is 1024, and thus the optimal threshold is θ = log2 1024 = 10. We repeat a small number of trials in each bisection step to derive X¯. In Figure 2(a), we see that the experiment consists of 4 steps (i.e., θ = 16, 8, 12, and 10), and the number of trials is set to be an empirical number of 32 (we will elaborated why 32 0 1 1 32 64 96 128 X Time slot θ=16 θ=8 θ=12 θ=10 (a) 0 0.05 0.1 0.15 0.2 0.25 0.3 0 0.25 0.5 0.75 1 Var(X) E(X) (b) Fig. 2. Parameter setting process: (a) Fast convergence to the optimal threshold value with bisection search method; (b) When E(X) → 0 or E(X) → 1, the variance is very small. is sufficient shortly). At the first step (1-32), we start with the threshold θ = 16. The reader observes 32 consecutive idle slots denoted by ‘1’s in Figure 2(a). Since X¯ e −1 , we adjust the parameter by decreasing θ at the second step (33-64), and we repeat again 32 trials with the threshold θ = 8. The reader observes 32 straight busy slots denoted by ‘0’s. At the third step (65-96), the threshold is tuned to be (8 + 16)/2 = 12. In this case, the reader observes both ‘1’s and ‘0’s (X¯ = 24/32 = 0.75 > e−1 ). At the final step (97-128), we run the estimation with θ = (8 + 12)/2 = 10, in which case the reader also observes mixed ‘1’s and ‘0’s (X¯ = 11/32). Since X¯ = 11/32 ≈ 0.34 at the final step is quite close to e −1 ≈ 0.37, we set the threshold θ to be 10. Here, we elaborate why the empirical number of 32 trials is sufficient for the threshold setting. Figure 2(b) plots the variance of X against the expectation of X. We find that when the expectation E(X) → 0 or E(X) → 1, the variance V ar(X) → 0, indicating that when θ is either too big or too small, X¯ shall be relatively stable around E(X) to tell the scale of n. Therefore, it is safe to roughly and rapidly estimate the scale of tag cardinality and set the threshold accordingly with a small number of runs. This is the reason why we can use a small sequence of 32 slots to calculate the optimal θ. The parameter setting process involves several bisection steps to determine a threshold. This small amount of overhead after further reduction by early termination becomes almost negligible (about 3% of the total estimation overhead). Therefore, we can first tune the threshold θ and converges to an optimal parameter setting at a very small cost. Using such an optimal threshold, we can estimate the accurate cardinality with minimal number of estimation rounds achieving higher overall estimation efficiency. D. Discussion 1) Reliable estimation with unreliable channel: Most existing protocols study the cardinality estimation with reliable communication channel, while the wireless channel is errorprone depending on various conditions. The recent protocols fail to capture the actual cardinality over noisy channels even with the knowledge of error rates. For instance, the false detection of response signal might tamper the monotonic feature of response signal along the estimation path in PET protocol [27], and substantially degrades estimation accuracy. LoF [16] also relies on channel qualities, and estimation accuracy decreases dramatically even with a small error rate