Every Bit Counts-Fast and Scalable RFID Estimation Muhammad Shahzad and Alex X.Liu Department of Computer Science and Engineering Michigan State University East Lansing,Michigan,USA (shahzadm,alexliu@cse.msu.edu ABSTRACT 1.INTRODUCTION Radio Frequency Identification (RFID)systems have been 1.1 Motivation and Problem Statement widely deployed for various applications such as object track- ing,3D positioning,supply chain management,inventory RFID systems are widely used in various applications such control,and access control.This paper concerns the fun- as object tracking 12,3D positioning 19,indoor localiza- damental problem of estimating RFID tag population size, tion 13,supply chain management 9,inventory control, which is needed in many applications such as tag identifi- and access control [4,11]because the cost of commercial cation,warehouse monitoring,and privacy sensitive RFID RFID tags is negligible compared to the value of the prod- systems.In this paper,we propose a new scheme for es- ucts to which they are attached (e.g.,as low as 5 cents per timating tag population size called Average Run based Tag tag [15]).An RFID system consists of tags and readers. estimation (ART).The technique is based on the average A tag is a microchip with an antenna in a compact pack- run-length of ones in the bit string received using the stan- age that has limited computing power and communication dardized framed slotted Aloha protocol.ART is significantly range.There are two types of tags:(1)passive tags,which faster than prior schemes because its estimator has smaller are powered up by harvesting the radio frequency energy from readers (as they do not have their own power sources) variance compared to the variances of estimators of prior schemes.For example,given a required confidence interval and have communication range often less than 20 feet;(2) of 0.1%and a required reliability of 99.9%,ART is consis- active tags,which have their own power sources and have tently 7 times faster than the fastest existing schemes (UPE relatively longer communication range.A reader has a ded- and EZB)for any tag population size.Furthermore,ART's icated power source with significant computing power.It estimation time is observably independent of the tag popula- transmits a query to a set of tags and the tags respond over tion sizes.ART is easy to deploy because it neither requires a shared wireless medium. modification to tags nor to the communication protocol be- This paper concerns the fundamental problem of estimat- tween tags and readers.ART only needs to be implemented ing the size of a given tag population.This is needed in on readers as a software module.ART works with multiple many applications such as tag identification,privacy sensi- readers with overlapping regions. tive RFID systems,and warehouse monitoring.In tag iden- tification protocols,which read the ID stored in each tag, population size is estimated at the start to guide the iden- Categories and Subject Descriptors tification process.For example,for tag identification pro- C.2.1 [Computer-Communication Networks):Network tocols that are based on the framed slotted Aloha protocol Architecture and Design-Wireless communication;C.2.8 (standardized in EPCGlobal Class-1 Generation-2 (C1G2) [Mobile Computing]:Algorithm Design and Analysis RFID standard [3]and implemented in commercial RFID systems),tag estimation is often used to calculate the op- timal frame size [7].In privacy sensitive RFID systems, General Terms such as those used in parks for continuously monitoring the Algorithms,Design,Performance,Experimentation number of visitors in different areas of a park to plan the guided trips efficiently,readers may not have the permission to identify human individuals.In warehouses with RFID Keywords based monitoring systems,managers often need to perform RFID,Tags,Estimation a quick estimation of the number of products left in stock for various purposes such as the detection of employee theft Note that although tag population size can be accurately measured by tag identification,the speed will be too slow. Permission to make digital or hard copies of all or part of this work for We formally define the tag estimation problem as:given personal or classroom use is granted without fee provided that copies are a tag population of unknoun size t,a confidence interval not made or distributed for profit or commercial advantage and that copies B∈(0,l,and a required reliability a∈[0,1),a set of read- bear this notice and the full citation on the first page.To copy otherwise,to ers needs to collaboratively compute the estimated number republish,to post on servers or to redistribute to lists,requires prior specific permission and/or a fee. of tags t so that Plt-tl<Bt>a.When the number of MobiCom'/2,August 22-26,2012,Istanbul,Turkey. readers is one,we call this problem single-reader estimation: Copyright2012ACM978-1-4503-1159-5/12/08.s15.00. otherwise,we call this problem multi-reader estimation. 365
Every Bit Counts - Fast and Scalable RFID Estimation Muhammad Shahzad and Alex X. Liu Department of Computer Science and Engineering Michigan State University East Lansing, Michigan, USA {shahzadm, alexliu}@cse.msu.edu ABSTRACT Radio Frequency Identification (RFID) systems have been widely deployed for various applications such as object tracking, 3D positioning, supply chain management, inventory control, and access control. This paper concerns the fundamental problem of estimating RFID tag population size, which is needed in many applications such as tag identifi- cation, warehouse monitoring, and privacy sensitive RFID systems. In this paper, we propose a new scheme for estimating tag population size called Average Run based Tag estimation (ART). The technique is based on the average run-length of ones in the bit string received using the standardized framed slotted Aloha protocol. ART is significantly faster than prior schemes because its estimator has smaller variance compared to the variances of estimators of prior schemes. For example, given a required confidence interval of 0.1% and a required reliability of 99.9%, ART is consistently 7 times faster than the fastest existing schemes (UPE and EZB) for any tag population size. Furthermore, ART’s estimation time is observably independent of the tag population sizes. ART is easy to deploy because it neither requires modification to tags nor to the communication protocol between tags and readers. ART only needs to be implemented on readers as a software module. ART works with multiple readers with overlapping regions. Categories and Subject Descriptors C.2.1 [Computer-Communication Networks]: Network Architecture and Design – Wireless communication; C.2.8 [Mobile Computing]: Algorithm Design and Analysis General Terms Algorithms, Design, Performance, Experimentation Keywords RFID, Tags, Estimation Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. MobiCom’12, August 22–26, 2012, Istanbul, Turkey. Copyright 2012 ACM 978-1-4503-1159-5/12/08 ...$15.00. 1. INTRODUCTION 1.1 Motivation and Problem Statement RFID systems are widely used in various applications such as object tracking [12], 3D positioning [19], indoor localization [13], supply chain management [9], inventory control, and access control [4, 11] because the cost of commercial RFID tags is negligible compared to the value of the products to which they are attached (e.g., as low as 5 cents per tag [15]). An RFID system consists of tags and readers. A tag is a microchip with an antenna in a compact package that has limited computing power and communication range. There are two types of tags: (1) passive tags, which are powered up by harvesting the radio frequency energy from readers (as they do not have their own power sources) and have communication range often less than 20 feet; (2) active tags, which have their own power sources and have relatively longer communication range. A reader has a dedicated power source with significant computing power. It transmits a query to a set of tags and the tags respond over a shared wireless medium. This paper concerns the fundamental problem of estimating the size of a given tag population. This is needed in many applications such as tag identification, privacy sensitive RFID systems, and warehouse monitoring. In tag identification protocols, which read the ID stored in each tag, population size is estimated at the start to guide the identification process. For example, for tag identification protocols that are based on the framed slotted Aloha protocol (standardized in EPCGlobal Class-1 Generation-2 (C1G2) RFID standard [3] and implemented in commercial RFID systems), tag estimation is often used to calculate the optimal frame size [7]. In privacy sensitive RFID systems, such as those used in parks for continuously monitoring the number of visitors in different areas of a park to plan the guided trips efficiently, readers may not have the permission to identify human individuals. In warehouses with RFIDbased monitoring systems, managers often need to perform a quick estimation of the number of products left in stock for various purposes such as the detection of employee theft. Note that although tag population size can be accurately measured by tag identification, the speed will be too slow. We formally define the tag estimation problem as: given a tag population of unknown size t, a confidence interval β ∈ (0, 1], and a required reliability α ∈ [0, 1), a set of readers needs to collaboratively compute the estimated number of tags t ˜ so that P |t ˜− t| ≤ βt ≥ α. When the number of readers is one, we call this problem single-reader estimation; otherwise, we call this problem multi-reader estimation. 365
A tag estimation scheme should satisfy the following three some prior schemes require modification to tags and some requirements: demand unrealistic system parameters. For example.the scheme in [14 requires each tag to store thousands of hash 1.Reliability:The actual reliability should always be functions.which is not practical to implement on passive greater than or equal to the required reliability.The tags and is not compliant with the C1G2 standard.As reliability a given as input is called the required re- another example,the scheme in 6 uses increasingly large liability.The reliability that an estimation scheme frame sizes as population size increases (e.g.,the frame size achieves is called its actual reliability. required by the scheme in[6]is greater than half of tag popu- lation size),which soon exceeds the maximum limit allowed 2.Scalability:The estimation time needs to be scalable by the C1G2 Standard. to large population sizes because in many applications, the number of passive tags can be very large due to their low cost,easy disposability,and powerless oper- 2.RELATED WORK ation. The first tag estimation scheme,called Unified Proba- bilistic Estimator (UPE),was proposed by Kodialam and 3.Deployability:The estimation scheme needs to be com- Nandagopal in 2006 7.UPE uses the framed slotted Aloha pliant with the C1G2 standard and should not require protocol and makes estimation based on either the number of any changes to tags empty slots or that of collision slots in a frame.Besides this 1.2 Proposed Approach estimator having larger variance than ART,UPE requires the differentiation among empty,single,and collision slots, In this paper,we propose a new scheme called Aver- which takes significantly more time than differentiating be- age Run based Tag estimation(ART),which satisfies all of tween empty and non-empty slots.According to C1G2. the above three requirements.The communication protocol a reader requires 300us to detect an empty slot,1500us used by ART is the standardized framed slotted Aloha pro- to detect a collision,and 3000us to complete a successful tocol,in which a reader first broadcasts a value f to the tags read.In [8],Kodialam et al.proposed an improved framed in its vicinity where f represents the number of time slots slotted Aloha protocol based estimation scheme called En- present in a forthcoming frame.Then each tag randomly hanced Zero Based (EZB)estimator,which performs esti- picks a time slot in the frame and replies during that slot mation based on the total number of Os in a frame.While Thus,the reader gets a binary sequence of Os and Is by rep- UPE estimates population size in each round and averages resenting a slot with no tag replies as 0 and a slot with one the estimated sizes when all rounds are finished,EZB only or more tag replies as 1.The key idea of ART is to estimate records the total number of 0s in each frame and at the end tag population size based on the average run size of ls in of all rounds,EZB first averages the recorded values and the binary sequence.We show that the average run size of then uses it to do estimation. Is in a frame monotonously increases with the increase in the size of tag population.Thus,average run size of 1s is an In [14],Qian et al.proposed an estimation scheme called Lottery Frame (LoF).Compared to UPE and EZB,LoF is indicator of tag population size. faster;however,it is impractical to implement as it requires 1.3 Advantages of ART over Prior Art each tag to store a large number (i.e.,the number of bits in a tag ID times the number of frames,which can be in the scale ART is advantageous in terms of speed and deployability. of thousands)of unique hash functions.LoF needs to modify For speed,ART is faster than all prior schemes.For exam- both tags and the communication protocol between readers ple,given a confidence interval of 0.1%and the required reli- and tags,which makes it non-compliant with C1G2.Han et ability of 99.9%.ART is consistently 7 times faster than the al.proposed a tag estimation scheme called First Non Empty fastest existing schemes(i.e.,UPE [7]and EZB [8])for any Based (FNEB)estimator.which is based on the size of the tag population size.The reason behind ART being faster first run of 0s in a frame [6].FNEB is based on an unrealistic than prior schemes is that the new estimator that we pro- assumption that frame size can be arbitrarily large.Li et al pose in this paper,namely the average run size of Is,has sig- proposed an estimation scheme called Maximum Likelihood nificantly smaller variance compared to the estimators used Estimator(MLE)for active tags with the goal of minimizing in prior schemes (such as the total number of 0s 7,8 and power consumption of active tags [10].In [17],Shah and the location of the first 1 in the binary sequence 6),as we Wong proposed a multi-reader tag estimation scheme which analytically show in Section 4.2.An estimator with small is based on an unrealistic assumption that any tag covered variance is faster because the estimation process needs to be by multiple readers only replies to one reader. repeated fewer times to achieve the required reliability.The intuitive reason that our estimator has smaller variance than prior estimators is that our estimator uses more information 3.ART-SCHEME OVERVIEW available in the bit sequence.Furthermore,ART estimation time is observably independent of tag population sizes.In 3.1 Communication Protocol Overview contrast,as tag volume increases,the estimation time of ART uses the framed slotted Aloha protocol specified in some prior schemes (e.g.,FNEB 6)increases. C1G2 as its MAC layer communication protocol.In this pro For deployability,ART neither requires modification to tocol,the reader first tells tags the frame size f,which is typ- the tags nor to the communication protocol between tags ically no more than 512 slots for practical reasons16.and a and readers.ART only needs to be implemented on the random seed number R.Later in the paper,we will see how a reader side as a software module without any hardware mod- simple use of seed number R will make it straightforward to ifications.ART also does not demand any unpractical sys- extend our estimation scheme to use multiple readers with tem parameters beyond the C1G2 standard.In contrast overlapping regions.Each tag within the transmission range 366
A tag estimation scheme should satisfy the following three requirements: 1. Reliability: The actual reliability should always be greater than or equal to the required reliability. The reliability α given as input is called the required reliability. The reliability that an estimation scheme achieves is called its actual reliability. 2. Scalability: The estimation time needs to be scalable to large population sizes because in many applications, the number of passive tags can be very large due to their low cost, easy disposability, and powerless operation. 3. Deployability: The estimation scheme needs to be compliant with the C1G2 standard and should not require any changes to tags. 1.2 Proposed Approach In this paper, we propose a new scheme called Average Run based Tag estimation (ART), which satisfies all of the above three requirements. The communication protocol used by ART is the standardized framed slotted Aloha protocol, in which a reader first broadcasts a value f to the tags in its vicinity where f represents the number of time slots present in a forthcoming frame. Then each tag randomly picks a time slot in the frame and replies during that slot. Thus, the reader gets a binary sequence of 0s and 1s by representing a slot with no tag replies as 0 and a slot with one or more tag replies as 1. The key idea of ART is to estimate tag population size based on the average run size of 1s in the binary sequence. We show that the average run size of 1s in a frame monotonously increases with the increase in the size of tag population. Thus, average run size of 1s is an indicator of tag population size. 1.3 Advantages of ART over Prior Art ART is advantageous in terms of speed and deployability. For speed, ART is faster than all prior schemes. For example, given a confidence interval of 0.1% and the required reliability of 99.9%, ART is consistently 7 times faster than the fastest existing schemes (i.e., UPE [7] and EZB [8]) for any tag population size. The reason behind ART being faster than prior schemes is that the new estimator that we propose in this paper, namely the average run size of 1s, has significantly smaller variance compared to the estimators used in prior schemes (such as the total number of 0s [7, 8] and the location of the first 1 in the binary sequence [6]), as we analytically show in Section 4.2. An estimator with small variance is faster because the estimation process needs to be repeated fewer times to achieve the required reliability. The intuitive reason that our estimator has smaller variance than prior estimators is that our estimator uses more information available in the bit sequence. Furthermore, ART estimation time is observably independent of tag population sizes. In contrast, as tag volume increases, the estimation time of some prior schemes (e.g., FNEB [6]) increases. For deployability, ART neither requires modification to the tags nor to the communication protocol between tags and readers. ART only needs to be implemented on the reader side as a software module without any hardware modifications. ART also does not demand any unpractical system parameters beyond the C1G2 standard. In contrast, some prior schemes require modification to tags and some demand unrealistic system parameters. For example, the scheme in [14] requires each tag to store thousands of hash functions, which is not practical to implement on passive tags and is not compliant with the C1G2 standard. As another example, the scheme in [6] uses increasingly large frame sizes as population size increases (e.g., the frame size required by the scheme in [6] is greater than half of tag population size), which soon exceeds the maximum limit allowed by the C1G2 Standard. 2. RELATED WORK The first tag estimation scheme, called Unified Probabilistic Estimator (UPE), was proposed by Kodialam and Nandagopal in 2006 [7]. UPE uses the framed slotted Aloha protocol and makes estimation based on either the number of empty slots or that of collision slots in a frame. Besides this estimator having larger variance than ART, UPE requires the differentiation among empty, single, and collision slots, which takes significantly more time than differentiating between empty and non-empty slots. According to C1G2, a reader requires 300μs to detect an empty slot, 1500μs to detect a collision, and 3000μs to complete a successful read. In [8], Kodialam et al. proposed an improved framed slotted Aloha protocol based estimation scheme called Enhanced Zero Based (EZB) estimator, which performs estimation based on the total number of 0s in a frame. While UPE estimates population size in each round and averages the estimated sizes when all rounds are finished, EZB only records the total number of 0s in each frame and at the end of all rounds, EZB first averages the recorded values and then uses it to do estimation. In [14], Qian et al. proposed an estimation scheme called Lottery Frame (LoF). Compared to UPE and EZB, LoF is faster; however, it is impractical to implement as it requires each tag to store a large number (i.e., the number of bits in a tag ID times the number of frames, which can be in the scale of thousands) of unique hash functions. LoF needs to modify both tags and the communication protocol between readers and tags, which makes it non-compliant with C1G2. Han et al. proposed a tag estimation scheme called First Non Empty Based (FNEB) estimator, which is based on the size of the first run of 0s in a frame [6]. FNEB is based on an unrealistic assumption that frame size can be arbitrarily large. Li et al. proposed an estimation scheme called Maximum Likelihood Estimator (MLE) for active tags with the goal of minimizing power consumption of active tags [10]. In [17], Shah and Wong proposed a multi-reader tag estimation scheme which is based on an unrealistic assumption that any tag covered by multiple readers only replies to one reader. 3. ART — SCHEME OVERVIEW 3.1 Communication Protocol Overview ART uses the framed slotted Aloha protocol specified in C1G2 as its MAC layer communication protocol. In this protocol, the reader first tells tags the frame size f, which is typically no more than 512 slots for practical reasons [16], and a random seed number R. Later in the paper, we will see how a simple use of seed number R will make it straightforward to extend our estimation scheme to use multiple readers with overlapping regions. Each tag within the transmission range 366
of the reader then uses f,R,and its ID to select a slot in cation to tags,this probability is implemented by"virtually" the frame by evaluating a hash function h(f,R,ID)whose extending frame size 1/p times,i.e.,the reader announces a result is in 1,f following a uniform distribution.Each tag frame size of f/p but terminates the frame after the first f has a counter initialized with the slot number it chose to slots.According to C1G2.the reader can terminate a frame reply.After each slot.the reader first transmits an end of at any point.By adjusting p,ART is able to estimate tag slot signal and then each tag decrements its counter by one population of arbitrarily large sizes In any given slot,all the tags whose counters are equal to 1 respond to the reader.In essence,each tag picks a random 3.3 Formal Development:Overview and As- slot from 1 to f following a uniform distribution.If no tag sumptions replies in a slot,it is called an empty slot;if exactly one tag To formally develop an estimator,we first need to de- replies,it is called a singleton slot:and if two or more tags rive the equation for the expected value of average run size reply,it is called a collision slot. of Is as a function of frame size f,tag population size t, 3.2 Estimation Scheme Overview and persistence probability p.We then use the inverse of this function to get the estimated value t from the observed At the end of a frame,the reader obtains a sequence of 0s value of the average run size of Is.To achieve the required and Is by representing an empty slot with 0 and a singleton reliability and minimized estimation time,we optimize f,p, or collision slot with 1.In this binary sequence,a run is a and the number of rounds n so that the total number of slots subsequence where all bits in this subsequence are 0s (or 1s) (f+3)×n is minimized while satisfying P{lt-t≤Bt}≥a but the bits before and after the subsequence are 1s (or 0s), We add 3 to f because at the start of each frame,the reader if they exist.For example,frame 011100 has 3 runs:0.111. waits for about 1ms (3 empty slots)to let the tags get and 00. energized [3,16].We use f to denote f+3. ART uses the average run size of Is to estimate tag popu- To make the formal development tractable,we assume lation size.The intuition is that as tag population increases, that instead of picking a single slot to reply at the start the average run size of 1s increases(and that of Os decreases). of frame of size f,a tag independently decides to reply in We illustrate this intuition using the simulation results in each slot of the frame with probability 1/f regardless of Figure 1,which shows that the average run size of ls in- its decision about previous or forthcoming slots.Vogt first creases as tag population size increases from 0 to 160.The used this assumption for the analysis of framed slotted Aloha markers in this figure are the average of 100 runs.The lines protocol for RFID and justified its use by recognizing that above and below each marker show the standard deviation this problem belongs to a class of problems known as "occu- of the experiments.This figure shows that given a tag pop- pancy problems",which deals with the allocation of balls to ulation size and a frame size,there is a distinct expected urns 18.Ever since,the use of this assumption has been a value of the average run size of 1s.The expected value of norm in the formal analysis of all Aloha based RFID proto- the average run size of 1s is a monotonic function of the cols2,6-8,10.14.17,18,20. number of tags,which means that a unique inverse of this The implication of this assumption is that when a tag in- function exists.Thus,given the observed average run size dependently chooses a slot to reply,it can end up choosing of Is,using the inverse function,we can get the estimated more than one slots in the same frame or even not choosing value t of tag population size t.Similar to other tag esti- any at all,which is not in accordance with C1G2 standard mation schemes.ART also uses multiple frames obtained that requires a tag to pick exactly one slot in a frame.Note from multiple rounds of the framed slotted Aloha protocol that even with the independence assumption,the expected to reduce its estimation variance and therefore increase its number of slots that a tag chooses in a frame is still one estimation reliability.Using different seed values for different As we draw our estimate from a large number of frames to frames,in each frame,the same tag will choose a different achieve required reliability,we can expect to observe this ex- slot to respond. pected number.Therefore,the analysis with the assumption of independence is asymptotically the same as that with- 20 ×1 out the independence assumption.Bordenave et al.further 0 explained in detail why this independence assumption in an alyzing Aloha based protocols provides results just as accu- rate as if all the analysis was done without this assump- tion 1.Note that this independence assumption is made 10 only to make the formal development tractable.The simu- lations in Section 6 are based on the C1G2 standard where a tag chooses exactly one slot at the start of frame. 4.ART-ESTIMATION ALGORITHM 100 Next.we first focus on the single-reader version of ART.In Section 5.6,we present a method to extend ART to handle multiple-readers with overlapping regions.Table 1 lists the Figure 1:Average run size vs.t (f=16) symbols used in this paper. To scale to large tag population sizes,ART uses a persis- 4.1 Average Run Based Tag Estimation tence probability p by which a tag decides whether it should For ART.in each round of the Aloha protocol,we calcu- reply to the reader in a given frame.The persistence proba- late the average run size of b.For example,the average run bility was first introduced in 7.To avoid making any modifi- size of 1 in frame 01110011 (which has two runs of 1,i.e.. 367
of the reader then uses f, R, and its ID to select a slot in the frame by evaluating a hash function h(f, R, ID) whose result is in [1, f] following a uniform distribution. Each tag has a counter initialized with the slot number it chose to reply. After each slot, the reader first transmits an end of slot signal and then each tag decrements its counter by one. In any given slot, all the tags whose counters are equal to 1 respond to the reader. In essence, each tag picks a random slot from 1 to f following a uniform distribution. If no tag replies in a slot, it is called an empty slot; if exactly one tag replies, it is called a singleton slot; and if two or more tags reply, it is called a collision slot. 3.2 Estimation Scheme Overview At the end of a frame, the reader obtains a sequence of 0s and 1s by representing an empty slot with 0 and a singleton or collision slot with 1. In this binary sequence, a run is a subsequence where all bits in this subsequence are 0s (or 1s) but the bits before and after the subsequence are 1s (or 0s), if they exist. For example, frame 011100 has 3 runs: 0, 111, and 00. ART uses the average run size of 1s to estimate tag population size. The intuition is that as tag population increases, the average run size of 1s increases (and that of 0s decreases). We illustrate this intuition using the simulation results in Figure 1, which shows that the average run size of 1s increases as tag population size increases from 0 to 160. The markers in this figure are the average of 100 runs. The lines above and below each marker show the standard deviation of the experiments. This figure shows that given a tag population size and a frame size, there is a distinct expected value of the average run size of 1s. The expected value of the average run size of 1s is a monotonic function of the number of tags, which means that a unique inverse of this function exists. Thus, given the observed average run size of 1s, using the inverse function, we can get the estimated value t ˜ of tag population size t. Similar to other tag estimation schemes, ART also uses multiple frames obtained from multiple rounds of the framed slotted Aloha protocol to reduce its estimation variance and therefore increase its estimation reliability. Using different seed values for different frames, in each frame, the same tag will choose a different slot to respond. 0 50 100 150 0 5 10 15 20 Number of tags t Average size of runs 1s 0s Figure 1: Average run size vs. t (f = 16) To scale to large tag population sizes, ART uses a persistence probability p by which a tag decides whether it should reply to the reader in a given frame. The persistence probability was first introduced in [7]. To avoid making any modifi- cation to tags, this probability is implemented by “virtually” extending frame size 1/p times, i.e., the reader announces a frame size of f /p but terminates the frame after the first f slots. According to C1G2, the reader can terminate a frame at any point. By adjusting p, ART is able to estimate tag population of arbitrarily large sizes. 3.3 Formal Development: Overview and Assumptions To formally develop an estimator, we first need to derive the equation for the expected value of average run size of 1s as a function of frame size f, tag population size t, and persistence probability p. We then use the inverse of this function to get the estimated value t ˜ from the observed value of the average run size of 1s. To achieve the required reliability and minimized estimation time, we optimize f, p, and the number of rounds n so that the total number of slots (f + 3)×n is minimized while satisfying P{|t ˜−t| ≤ βt} ≥ α. We add 3 to f because at the start of each frame, the reader waits for about 1ms (≈ 3 empty slots) to let the tags get energized [3, 16]. We use f to denote f + 3. To make the formal development tractable, we assume that instead of picking a single slot to reply at the start of frame of size f, a tag independently decides to reply in each slot of the frame with probability 1/f regardless of its decision about previous or forthcoming slots. Vogt first used this assumption for the analysis of framed slotted Aloha protocol for RFID and justified its use by recognizing that this problem belongs to a class of problems known as “occupancy problems”, which deals with the allocation of balls to urns [18]. Ever since, the use of this assumption has been a norm in the formal analysis of all Aloha based RFID protocols [2, 6–8, 10, 14, 17, 18, 20]. The implication of this assumption is that when a tag independently chooses a slot to reply, it can end up choosing more than one slots in the same frame or even not choosing any at all, which is not in accordance with C1G2 standard that requires a tag to pick exactly one slot in a frame. Note that even with the independence assumption, the expected number of slots that a tag chooses in a frame is still one. As we draw our estimate from a large number of frames to achieve required reliability, we can expect to observe this expected number. Therefore, the analysis with the assumption of independence is asymptotically the same as that without the independence assumption. Bordenave et al. further explained in detail why this independence assumption in analyzing Aloha based protocols provides results just as accurate as if all the analysis was done without this assumption [1]. Note that this independence assumption is made only to make the formal development tractable. The simulations in Section 6 are based on the C1G2 standard where a tag chooses exactly one slot at the start of frame. 4. ART — ESTIMATION ALGORITHM Next, we first focus on the single-reader version of ART. In Section 5.6, we present a method to extend ART to handle multiple-readers with overlapping regions. Table 1 lists the symbols used in this paper. 4.1 Average Run Based Tag Estimation For ART, in each round of the Aloha protocol, we calculate the average run size of b. For example, the average run size of 1 in frame 01110011 (which has two runs of 1, i.e., 367
Table 1:Symbols used in the paper THEOREM 1.Let St.i be the random variable representing Symbol Description the run size of b starting at location i in a frame of size f. actual tag population size Let a=f-i+1.The expectation and variance of Ssi are: t tm upper bound on of tags (2) tM maximum of tags that can be estimated s小=0- estimated of tags a required reliability Var(5)=B+2a+1g+2-g6)-ga+" (3) B required confidence interval (1-96)2 frame size fop optimal frame size PROOF.To calculate the expected value of S.i,we first j f+3 to include delay between two need its probability density function PiS.=s.The prob- consecutive frames ability that a run starting at location i will be of length s is n of rounds (i.e.,frames) the product of the probability that s consecutive slots are b nop optimal of rounds (i.e..frames) p persistence probability and s+ith slot is B,where 6=1-b.Thus, Pop optimal persistence probability i(1-g)ifs<a R random seed from reader P(Sb.i=s)= if s=a h(f,R,D)unform hash function in 1,f 96 0 if s>a b value of a slot:b=0 or b=1 、6 1-6 where 1≤i≤fand1≤s≤a.To derive the expected 96 probability that a slot is b value and variance of S.,we use the moment generating Sbi random variable for run size of b starting at i function: E. expected value Var(.) variance (r)=Eles1=∑e“PSt=sy Cove)】 covariance = Yi random variable for of b slots in frame element of sample space of Y random variable for of runs of b in frame =(ge')°+1-p)∑(ge') Ro 8=1 T element of sample space of R Xb random variable for average run size of b in Differentiating both side w.r.t.7,we get a frame a-1 4 expected value of Xo standard deviation ofo p'(r)=a(ge')°+(1-go)ge')∑s(ge')-l o0】 $=1 number of ways in which y occurrences of b ξ{,y,r} and f-y occurrences of b can be arranged d (1-(qoe")a =alge'y°+1-gp)lge)ae可(1-9%er -1 in f slots while ensuring that the number of runs of b are r (1-96)(a-1)(g6e')a+1-a(g%e')°+qg%e alge")a+ (1-9%er)2 111 and 11)is (3+2)/2 =2.5.After n rounds,we obtain n average run sizes of b and then calculate the average of Evaluation of o'(r)at r =0 gives us Equation (2).To find these n values.This final value is then used as the expected Var(S.),we calculate E[S]by taking the second deriva- value of the average run size of b in a frame to estimate the tive of '(r)and setting T=0.Thus, tag population size The probability that a slot in a frame is b.where b=0 or ES,=6+96+2a-19+2-2a+1)g8+1 (1-96)2 1,can be calculated using Lemma 1. Evaluating E[S]-E2[S6.]gives Equation (3). LEMMA 1.Let t be the actual tag population size,f be the frame size,p be the persistence probability (i.e.,the proba- Let Xo be the random variable representing the average run bility that a tag participates in a frame),and go be the prob- size of b in a frame.Next,we calculate the expectation and ability that a slot in a frame is b.Thus: variance of Xo using the expectation and variance of Sa.i ∫(1-)fb=0 from Theorem 1.The expectation of Xo will be used to esti- 9s=1-(-)fb=1 (1) mate the tag population size and the variance of Xo will be used to calculate optimal values for f,p,and n.Let Yo be the random variable representing the number of times b oc- PROOF.The probability that a tag chooses a given slot in a frame is p/f.The probability that it does not choose that ead6 meo爱 slot is 1-2.The probability that none of the tags choose holds for any frame.Next,we first calculate E[Yo],Var(Y), that slot is (1-),which is the value of go.As the tags ER,Var(R),and Cov(Y,R)in Lemmas 2 and 3.Then choose the slots independently,g is the same for each slot of we use them to calculate E[Xo]and Var(X)in Theorem 2. the frame.The probability that a slot is chosen by at least Using Equation (14)in Theorem 2,replacing E[X]by the one tag is 1-go,which is the value of gi. average run size of b from n frames,we obtain an equation with only one variable t.Finally,we use Brent's method to Let S.be the random variable representing the run size obtain the numerical solution of this equation.The result of b starting at location i in a frame.If the i-th slot is not is the estimated tag population size.Since ART uses Xo to the starting point of a run,then S.=0.The expectation estimate the tag population size,we call Xo the estimator and variance of So.i can be calculated using Theorem 1. of ART. 368
Table 1: Symbols used in the paper Symbol Description t actual tag population size tm upper bound on # of tags tM maximum # of tags that can be estimated t ˜ estimated # of tags α required reliability β required confidence interval f frame size fop optimal frame size f f + 3 to include delay between two consecutive frames n # of rounds (i.e., frames) nop optimal # of rounds (i.e., frames) p persistence probability pop optimal persistence probability R random seed from reader h(f, R, ID) unform hash function in [1, f] b value of a slot: b = 0 or b = 1 b 1-b qb probability that a slot is b Sb,i random variable for run size of b starting at i E[.] expected value Var(.) variance Cov(.) covariance Yb random variable for # of b slots in frame y element of sample space of Yb Rb random variable for # of runs of b in frame r element of sample space of Rb Xb random variable for average run size of b in a frame μ{.} expected value of Xb σ{.} standard deviation of Xb ξ{f, y, r} number of ways in which y occurrences of b and f − y occurrences of b can be arranged in f slots while ensuring that the number of runs of b are r 111 and 11) is (3 + 2)/2=2.5. After n rounds, we obtain n average run sizes of b and then calculate the average of these n values. This final value is then used as the expected value of the average run size of b in a frame to estimate the tag population size. The probability that a slot in a frame is b, where b = 0 or 1, can be calculated using Lemma 1. Lemma 1. Let t be the actual tag population size, f be the frame size, p be the persistence probability ( i.e., the probability that a tag participates in a frame), and qb be the probability that a slot in a frame is b. Thus: qb = (1 − p f ) t if b = 0 1 − (1 − p f ) t if b = 1 (1) Proof. The probability that a tag chooses a given slot in a frame is p/f. The probability that it does not choose that slot is 1 − p f . The probability that none of the tags choose that slot is (1 − p f ) t , which is the value of q0. As the tags choose the slots independently, qb is the same for each slot of the frame. The probability that a slot is chosen by at least one tag is 1 − q0, which is the value of q1. Let Sb,i be the random variable representing the run size of b starting at location i in a frame. If the i-th slot is not the starting point of a run, then Sb,i = 0. The expectation and variance of Sb,i can be calculated using Theorem 1. Theorem 1. Let Sb,i be the random variable representing the run size of b starting at location i in a frame of size f. Let a = f − i + 1. The expectation and variance of Sb,i are: E[Sb,i] = qb 1 − qb (1 − qa b ) (2) Var(Sb,i) = qb + (2a + 1)(qa+2 b − qa+1 b ) − q 2(a+1) b (1 − qb)2 (3) Proof. To calculate the expected value of Sb,i, we first need its probability density function P{Sb,i = s}. The probability that a run starting at location i will be of length s is the product of the probability that s consecutive slots are b and s + i th slot is b, where b = 1 − b. Thus, P{Sb,i = s} = ⎧ ⎨ ⎩ qs b (1 − qb) if s<a qs b if s = a 0 if s>a where 1 ≤ i ≤ f and 1 ≤ s ≤ a. To derive the expected value and variance of Sb,i, we use the moment generating function: φ(τ ) = E[e τSb,i ] = a s=1 e τsP{Sb,i = s} = (qbe τ ) a + (1 − qb) a −1 s=1 (qbe τ ) s Differentiating both side w.r.t. τ , we get φ (τ )=a(qbe τ ) a + (1 − qb)(qbe τ ) a −1 s=1 s(qbe τ ) s−1 = a(qbe τ ) a + (1 − qb)(qbe τ ) d d(qbeτ ) 1 − (qbeτ ) a 1 − qbeτ − 1 = a(qbe τ ) a + (1 − qb) (a − 1)(qbeτ ) a+1 − a(qbeτ ) a + qbeτ (1 − qbeτ )2 Evaluation of φ (τ ) at τ = 0 gives us Equation (2). To find Var(Sb,i), we calculate E[S2 b,i] by taking the second derivative of φ (τ ) and setting τ = 0. Thus, E[S2 b,i] = qb + q2 b + (2a − 1)qa+2 b − (2a + 1)qa+1 b (1 − qb)2 Evaluating E[S2 b,i] − E2[Sb,i] gives Equation (3). Let Xb be the random variable representing the average run size of b in a frame. Next, we calculate the expectation and variance of Xb using the expectation and variance of Sb,i from Theorem 1. The expectation of Xb will be used to estimate the tag population size and the variance of Xb will be used to calculate optimal values for f, p, and n. Let Yb be the random variable representing the number of times b occurs in a frame and Rb be the random variable representing the number of runs of b in a frame. By definition, Xb = Yb Rb holds for any frame. Next, we first calculate E[Yb], Var(Yb), E[Rb], Var(Rb), and Cov(Yb, Rb) in Lemmas 2 and 3. Then, we use them to calculate E[Xb] and Var(Xb) in Theorem 2. Using Equation (14) in Theorem 2, replacing E[Xb] by the average run size of b from n frames, we obtain an equation with only one variable t. Finally, we use Brent’s method to obtain the numerical solution of this equation. The result is the estimated tag population size. Since ART uses Xb to estimate the tag population size, we call Xb the estimator of ART. 368
LEMMA 2.Let Yi be the random variable representing the Here we used the fact that the frame size is always greater mumber of times b occurs in a frame and Ro be the random than 1 during the estimation process whenever the informa- variable representing the number of runs of b in a frame. tion about runs is used.As I;~Bernoulli(o),its variance Given tag population size t,frame size f,and persistence is that of a bernoulli random variable given by probability p,we have: Var(Ii)=E[Ii](1-E[Ii]) (⑧) E[Yi]=fqb (4) Var(Y)=fqb(1-q6) (5) Note that Ii and I,are dependent on each other if and only if (iff)i=j-1 because Ij-1 and I;can not both be 1 in ER]=(9地+f(1-9B) (6) the same frame.Other than that,Vi <j-1,I;and Ij are Var(Ro)=f(go -4q+6qi -3q6)+(3q6-8q6 +5q6) independent.Thus, (7) 0 ifi<j-1 PROOF.The expected size of a run of b starting at loca- Cov(Ii,Ij)= -EIE[]=-E[(1-B) tion i is E[S..A run of b starting at location i means that if i=j-1 the slot i-1 is b.Thus Hence we have: EYa=ES6,l+∑ES,dl×P{slot i-1is Var(R)=Var(I1)+Var(Ij)+2Cov(I1,I2) i=2 1=2 "0-+-x +2>Cov(Ij-1,Ij) 1=3 =-或+mú-)-*(色二 =qb(1-9B)+(f-1)g(1-9b){1-qB(1-9b)} 1-9%一 -2q6(1-6)-2(J-2)96(1-9%)2 =∫q地 =f(9%-4g6+6g-3q6)+(3q6-8q+5q6)▣ Each slot i of frame f has probability go of being b.So Y~Binom(f,go).Thus,Var(Yo)is simply the variance of LEMMA 3.Given tag population size t,frame size f,and a binomial distribution with parameters f and g: persistence probability p,we have: Var(Y)=fgo(1-gb) 「1 Cov(Y6,R)=yrq%(1-q)/-".{,,r} Let 72,...,Yf represent the sequence of binary random =0r=0 variables representing the value of each slot in a frame of size -E[Y]E[Ru] (9) f.Since each tag randomly and independently picks a slot in the frame,all i are identically distributed.Furthermore. where P=b}=go.Let Ii be the indicator random variable whose value is 1 if a run of b begins at Yi. ()[2)+2")+--] ifr>1∧0<y<fAr≤yAr≤f-y-1 4={0=)v6=6-1=&i>) (-)[2)+-g-] Thus. 5{f,,r}= ifr=1A0<y<fAr≤yAr≤f-y-1 R=I 1 ifr=1Ay=f =1 Because 1 ifr=0Ay=0 P{Y:=b}=9b if i=1 E[= P{-1=b,=b}=9%(1-96)ifi>1 0 otherwise we get PROOF.By definition,we have E=2=+2n1-)=n+-m》 Cov(Yi,R)=>yPYi=y.R.=r}-E[Yi]E[Rs] =2 y=0T=0 (10) As Ro is the sum of f identically distributed random vari- Here P=y,R=r}represents the probability that ex- ables,we use the general expression for the variance of the actly y out of f slots in the frame are b and at the same time sum of identically distributed random variables to obtain the number of runs of b is r.This probability is difficult to the variance of R. evaluate directly,but conditioning on Yo simplifies the task. P{Y=,B=r}=P{R=rY=}×P{=} Var(Rb)=Var(Ii) (11) As Y~Binom(f,)we have: a+2∑cowu. P=}= (12) j=2i< 369
Lemma 2. Let Yb be the random variable representing the number of times b occurs in a frame and Rb be the random variable representing the number of runs of b in a frame. Given tag population size t, frame size f, and persistence probability p, we have: E[Yb] = fqb (4) Var(Yb) = fqb(1 − qb) (5) E[Rb] = qb qb + f(1 − qb) (6) Var(Rb) = f(qb − 4q2 b + 6q3 b − 3q4 b ) + (3q2 b − 8q3 b + 5q4 b ) (7) Proof. The expected size of a run of b starting at location i is E[Sb,i]. A run of b starting at location i means that the slot i − 1 is b. Thus, E[Yb] = E[Sb,1] + f i=2 E[Sb,i] × P slot i − 1 is b = qb 1 − qb (1 − qf b ) + f i=2 qb 1 − qb (1 − qf−i+1 b ) × (1 − qb) = qb 1 − qb (1 − qf b ) + qb (f − 1) − qf+1 b q−2 b − q −(f+1) b 1 − q−1 b = fqb Each slot i of frame f has probability qb of being b. So Yb ∼ Binom(f, qb). Thus, Var(Yb) is simply the variance of a binomial distribution with parameters f and qb: Var(Yb) = fqb(1 − qb) Let γ1, γ2,..., γf represent the sequence of binary random variables representing the value of each slot in a frame of size f. Since each tag randomly and independently picks a slot in the frame, all γi are identically distributed. Furthermore, P {γi = b} = qb. Let Ii be the indicator random variable whose value is 1 if a run of b begins at γi. Ii = 1 if (γi = b, i = 1) ∨ (γi = b ∧ γi−1 = b, i > 1) 0 otherwise Thus, Rb = f i=1 Ii Because E[Ii] = P {γi = b} = qb if i = 1 P γi−1 = b, γi = b = qb(1 − qb) if i > 1 we get E[Rb] = f i=1 E[Ii] = qb + f i=2 qb(1 − qb) = qb qb + f(1 − qb) As Rb is the sum of f identically distributed random variables, we use the general expression for the variance of the sum of identically distributed random variables to obtain the variance of Rb. Var(Rb) = Var( f i=1 Ii) = f i=1 Var(Ii)+2 f j=2 ∀i<j Cov(Ii, Ij ) Here we used the fact that the frame size is always greater than 1 during the estimation process whenever the information about runs is used. As Ii ∼ Bernoulli(qb), its variance is that of a bernoulli random variable given by Var(Ii) = E[Ii](1 − E[Ii]) (8) Note that Ii and Ij are dependent on each other if and only if (iff) i = j − 1 because Ij−1 and Ij can not both be 1 in the same frame. Other than that, ∀i<j − 1, Ii and Ij are independent. Thus, Cov(Ii, Ij )= ⎧ ⎨ ⎩ 0 if i<j − 1 −E[Ii]E[Ij ] = −E[Ii]qb(1 − qb) if i = j − 1 Hence we have: Var(Rb) = Var(I1) + f j=2 Var(Ij ) + 2Cov(I1, I2) +2 f j=3 Cov(Ij−1, Ij ) = qb(1 − qb)+(f − 1)qb(1 − qb) {1 − qb(1 − qb)} −2q2 b (1 − qb) − 2(f − 2)q2 b (1 − qb) 2 = f(qb − 4q2 b + 6q3 b − 3q4 b ) + (3q2 b − 8q3 b + 5q4 b ) Lemma 3. Given tag population size t, frame size f, and persistence probability p, we have: Cov(Yb, Rb) = f y=0 f 2 r=0 yrqy b (1 − qb) f−y.ξ {f, y, r} −E[Yb]E[Rb] (9) where ξ {f, y, r}= ⎧ ⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨ ⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩ y−1 r−1 f−y−1 r−2 + 2f−y−1 r−1 + f−y−1 r if r > 1 ∧ 0 <y<f ∧ r ≤ y ∧ r ≤ f − y − 1 y−1 r−1 2 f−y−1 r−1 + f−y−1 r if r = 1 ∧ 0 <y<f ∧ r ≤ y ∧ r ≤ f − y − 1 1 if r = 1 ∧ y = f 1 if r = 0 ∧ y = 0 0 otherwise Proof. By definition, we have Cov(Yb, Rb) = f y=0 f r=0 yrP {Yb = y, Rb = r} − E[Yb]E[Rb] (10) Here P {Yb = y, Rb = r} represents the probability that exactly y out of f slots in the frame are b and at the same time the number of runs of b is r. This probability is difficult to evaluate directly, but conditioning on Yb simplifies the task. P {Yb = y, Rb = r} = P {Rb = r|Yb = y} × P {Yb = y} (11) As Yb ∼ Binom(f, qb), we have: P {Yb = y} = f y qy b (1 − qb) f−y (12) 369