Energy Efficient Algorithms for the RFID Estimation Problem Tao Lif Samuel Wuts Shigang Chent Mark Yang§ Department of Computer Information Science Engineering Department of Epidemiology and Health Policy Research SDepartment of Statistics University of Florida,Gainesville,FL 32611,USA Abstract-RFID has been gaining popularity for inventory methods that estimate the number of tags with an accuracy control,object tracking,and supply chain management in ware- that can be arbitrarily set.This is called the RFID estimation houses,retail stores,hospitals,etc.Periodically and automatically problem.The follow-up work by Qian et al.[5]significantly estimating the number of RFID tags deployed in a large area has many important applications in inventory management and reduces the estimation time.It can be shown that even for theft detection.The prior work focuses on designing time-efficient the applications that require reading the actual IDs of all tags, algorithms that can estimate tens of thousands of tags in seconds. estimating the number of tags as a pre-processing step will We observe that,for a RFID reader to access tags in a large help make the main procedure of reading the IDs much more area,active tags are likely to be used.These tags are battery- efficient [3].Another advantage of estimating the number of powered and use their own energy for information transmission. However,recharging batteries for tens of thousands of tags is tags without reading the IDs is that it ensures the anonymity laborious.Unlike the prior work,this paper studies the RFID of the tags,which may be useful in privacy-sensitive scenarios estimation problem from the energy angle.Our goal is to reduce involving RFID-enhanced passports or driver's licences,where the amount of energy that is consumed by the tags during counting the number of people present is needed but revealing the estimation procedure.We design several energy-efficient their identities is not necessary. probabilistic algorithms that iteratively refine a control parameter to optimize the information carried in the transmissions from the Is time efficiency the only performance metric for the RFID tags,such that both the number and the size of the transmissions estimation problem?We argue that energy cost is also an are minimized. important issue that must be carefully dealt with.In any application that requires a RFID reader to access tags in a large I.INTRODUCTION area,it is likely that battery-powered active tags will be used. The radio-frequency identification (RFID)technology has Passive tags harvest energy from the radio signal of the reader been widely used in various commercial applications,including and use such minute amount of energy to deliver information inventory control,object tracking,and supply chain manage- back to the reader.Their typical reading range is only several ment.RFID tags (each storing a unique ID)are attached meters,which do not fit well with the big warehouse scenario. to goods at warehouses,merchandizes at retail stores,or Active tags use their own power to transmit.A longer reading equipment at hospitals,allowing an authenticated RFID reader range can be achieved by transmitting at higher power.They are to quickly access the properties of each individual item or also richer in resources for implementing advanced functions. collect the statistical information about a large group of items. Their price becomes less of a concern if they are used for This paper focuses on a RFID-enabled function that is very expensive merchandizes (such as refrigerators)or reused many useful in inventory management.Imagine a large warehouse times as goods moving in and out of the warehouse.But storing thousands of refrigerators,tens of thousands of furni- active tags also have a problem.They are powered by batteries. ture pieces,or hundreds of thousands of footwear.A national Recharging batteries for tens of thousands of tags is a laborious retail survey showed that administration error,vendor fraud operation,considering that the tagged products may stack up, and employee theft caused about 20 billion dollars lost a year making tags not easily accessible.To prolong the tags'lifetime [1].Hence,it is desirable to have a quick way of counting the and reduce the frequency of battery recharge,all functions that number of items in the warehouse or in each section of the involve large-scale transmission by many tags should be made warehouse.To timely detect theft or management errors,such energy-efficient.The prior work focuses on energy-efficient counting may be performed frequently. anti-collision protocols that minimize the energy consumption If each item is attached with a RFID tag,the counting of a mobile reader [6],[7]when the reader collects the IDs of problem can be solved by a RFID reader that receives the the tags.We believe this paper is the first to study energy- IDs transmitted (or backscattered)from the tags [2].However. efficient solutions for the RFID estimation problem (which reading the actual IDs of the tags can be time-consuming does not require reading the IDs of all tags). because so many of them have to be delivered in the same The paper has three major contributions.First,we observe low-rate channel and collisions caused by simultaneous trans- that there exists an asymmetry in energy cost.Solving the RFID missions by different tags make the matter worse.To address estimation problem incurs energy cost both at the RFID reader this problem,Kodialam and Nandagopal [3],[4]showed that and at the active tags.The asymmetry is that the energy cost the reading time can be greatly reduced through probabilistic at the tags should be minimized while the energy cost at the
Energy Efficient Algorithms for the RFID Estimation Problem Tao Li† Samuel Wu‡§ Shigang Chen† Mark Yang§ †Department of Computer & Information Science & Engineering ‡Department of Epidemiology and Health Policy Research §Department of Statistics University of Florida, Gainesville, FL 32611, USA Abstract—RFID has been gaining popularity for inventory control, object tracking, and supply chain management in warehouses, retail stores, hospitals, etc. Periodically and automatically estimating the number of RFID tags deployed in a large area has many important applications in inventory management and theft detection. The prior work focuses on designing time-efficient algorithms that can estimate tens of thousands of tags in seconds. We observe that, for a RFID reader to access tags in a large area, active tags are likely to be used. These tags are batterypowered and use their own energy for information transmission. However, recharging batteries for tens of thousands of tags is laborious. Unlike the prior work, this paper studies the RFID estimation problem from the energy angle. Our goal is to reduce the amount of energy that is consumed by the tags during the estimation procedure. We design several energy-efficient probabilistic algorithms that iteratively refine a control parameter to optimize the information carried in the transmissions from the tags, such that both the number and the size of the transmissions are minimized. I. INTRODUCTION The radio-frequency identification (RFID) technology has been widely used in various commercial applications, including inventory control, object tracking, and supply chain management. RFID tags (each storing a unique ID) are attached to goods at warehouses, merchandizes at retail stores, or equipment at hospitals, allowing an authenticated RFID reader to quickly access the properties of each individual item or collect the statistical information about a large group of items. This paper focuses on a RFID-enabled function that is very useful in inventory management. Imagine a large warehouse storing thousands of refrigerators, tens of thousands of furniture pieces, or hundreds of thousands of footwear. A national retail survey showed that administration error, vendor fraud and employee theft caused about 20 billion dollars lost a year [1]. Hence, it is desirable to have a quick way of counting the number of items in the warehouse or in each section of the warehouse. To timely detect theft or management errors, such counting may be performed frequently. If each item is attached with a RFID tag, the counting problem can be solved by a RFID reader that receives the IDs transmitted (or backscattered) from the tags [2]. However, reading the actual IDs of the tags can be time-consuming because so many of them have to be delivered in the same low-rate channel and collisions caused by simultaneous transmissions by different tags make the matter worse. To address this problem, Kodialam and Nandagopal [3], [4] showed that the reading time can be greatly reduced through probabilistic methods that estimate the number of tags with an accuracy that can be arbitrarily set. This is called the RFID estimation problem. The follow-up work by Qian et al. [5] significantly reduces the estimation time. It can be shown that even for the applications that require reading the actual IDs of all tags, estimating the number of tags as a pre-processing step will help make the main procedure of reading the IDs much more efficient [3]. Another advantage of estimating the number of tags without reading the IDs is that it ensures the anonymity of the tags, which may be useful in privacy-sensitive scenarios involving RFID-enhanced passports or driver’s licences, where counting the number of people present is needed but revealing their identities is not necessary. Is time efficiency the only performance metric for the RFID estimation problem? We argue that energy cost is also an important issue that must be carefully dealt with. In any application that requires a RFID reader to access tags in a large area, it is likely that battery-powered active tags will be used. Passive tags harvest energy from the radio signal of the reader and use such minute amount of energy to deliver information back to the reader. Their typical reading range is only several meters, which do not fit well with the big warehouse scenario. Active tags use their own power to transmit. A longer reading range can be achieved by transmitting at higher power. They are also richer in resources for implementing advanced functions. Their price becomes less of a concern if they are used for expensive merchandizes (such as refrigerators) or reused many times as goods moving in and out of the warehouse. But active tags also have a problem. They are powered by batteries. Recharging batteries for tens of thousands of tags is a laborious operation, considering that the tagged products may stack up, making tags not easily accessible. To prolong the tags’ lifetime and reduce the frequency of battery recharge, all functions that involve large-scale transmission by many tags should be made energy-efficient. The prior work focuses on energy-efficient anti-collision protocols that minimize the energy consumption of a mobile reader [6], [7] when the reader collects the IDs of the tags. We believe this paper is the first to study energyefficient solutions for the RFID estimation problem (which does not require reading the IDs of all tags). The paper has three major contributions. First, we observe that there exists an asymmetry in energy cost. Solving the RFID estimation problem incurs energy cost both at the RFID reader and at the active tags. The asymmetry is that the energy cost at the tags should be minimized while the energy cost at the
reader is relatively a less concern because the reader's battery respond in those slots.Using the probabilistic counting meth- can be easily replaced or it may be powered by an external ods,the reader estimates the number of tags based on the source.To exploit this asymmetry,our new algorithms follows number of empty slots or the number of collision slots.Their a common framework that trades more energy cost at the reader best estimator is called the Unified Probabilistic Estimator for less cost at the tags.The reader will continuously refine and (UPE).A follow-up work by the same authors proposes the broadcast a control parameter called contention probability, Enhanced Zero-Based Estimator (EZB)[4],which makes its which optimizes the amount of information the reader can estimation based on the number of empty slots.The Lottery- extract from the transmissions by the tags.This in turn reduces Frame scheme (LoF)[5]by Qian et al.employs a geometric the number of transmissions by the tags that are necessary to distribution-based scheme to determine which slot in a time achieve a certain estimation accuracy. frame each tag will respond.It significantly reduces the esti- Second,we investigate statistical methods,including the mation time.However,every tag must respond during each of maximum likelihood estimation (MLE)and the average sum the time frames(their number ranges from tens to thousands), estimation (ASE)that are different from the probabilistic resulting in large energy cost when active tags use their own counting methods [8]used by [3].[4].Our estimation al- power to transmit.Also related is a novel security protocol gorithms optimize their performance by iteratively applying proposed by Tan et al.to monitor the missing RFID tags in MLE or ASE with continuously refined parameters.The new the presence of dishonest RFID readers [15]. algorithms not only require fewer transmissions by the tags None of the above estimators are designed with energy but also minimize the size of each transmission.The number conservation in mind.In the following,we will present our of transmissions made by the tags in our best algorithm is less energy efficient estimators. than one third of the number in the state-of-the-art algorithms. III.PROBLEM DEFINITION AND SYSTEM MODEL In terms of the total number of bits transmitted by the tags,it is more than an order of magnitude smaller. A.RFID Estimation Problem Third,we formally analyze the confidence intervals of the The problem is to design efficient algorithms to estimate the estimations made by the new algorithms and establish the number of RFID tags in a deployment area without actually termination conditions for any given accuracy requirement.We reading the ID of each tag.Let N be the actual number of tags perform extensive simulations to demonstrate that the measured and N be the estimate.The estimation accuracy is specified results match well with the analytical results and that the new by a confidence interval with two parameters:a probability algorithms perform far better in terms of energy saving than value a and an error bound B,both in the range of(0,1).The the best existing algorithms. requirement is that the probability forto fall in the interval The rest of the paper is organized as follows.Section 2 [1-B,1+B]is at least a,i.e., discusses the related work.Section 3 defines the problem to be solved and the system model.Sections 4-6 propose three new Prob(1-B)N≤N≤(1+3)N)≥a. solutions for the RFID estimation problem.Section 7 evaluates Our goal is to achieve the above estimation accuracy with the new algorithms through simulations.Section 8 draws the minimum energy overhead to the active tags. conclusion. B.System Model II.RELATED WORK The type of active RFID systems considered in this paper is Most existing work focuses on how to efficiently read the applicable to a large deployment area that are hundreds of feet IDs of the RFID tags.When queried by the RFID reader,the or more across.Passive tags are beyond the scope of this paper. tags will respond with their IDs.Since the communication If they were used,one would have to take the reader and move environment is wireless,inevitably collisions will happen when around the whole area,collecting tag information once every multiple tags choose the same slot.Collision arbitration pro- few feet.Active tags allow a RFID reader to collect information tocols mainly fall into two categories:framed ALOHA-based from one location.The tagged goods (such as apparel)may approach [9],[10],[11],[7]and tree-based approach [12],[13]. stack in piles,and there may be obstacles,such as racks filled [14],[6].In the former approach,the polling request carries with merchandize,between a tag and the reader.We expect the the frame length,and each tag individually chooses a slot in active tags are designed to transmit with significant power that the frame to transmit its ID.The process repeats until all the is high enough to ensure reliable information delivery in such tags successfully transmit their IDs to the reader.In the later a demanding environment.Hence,the energy cost due to the approach,the reader first sends out an ID prefix string.The tags'transmissions is the main concern in our algorithm design; tags whose IDs match the string will respond.If a collision it increases at least in the square of the maximum distance to be happens,the reader will append a'0'or'1'to the prefix string covered by the RFID system.Energy consumption that powers and send out the new string.This process repeats until only the tag's circuit for computing and receiving information is not one tag responds.Essentially the approach traverses a binary affected by long distance and obstacles.The energy consumed tree with the IDs of the tags being the leaf nodes. by the RFID reader is of less concern.We assume the reader Instead of identifying individual RFID tags,Kodialam and transmits at sufficiently high power. Nandagopal [3]study the problem of estimating the cardinality We use the following communication protocol between the of a tag set.In their approach,information from tags are reader and the tags.The reader first synchronizes the clocks collected by a RFID reader in a series of time frames.Each of the tags and then performs a sequence of pollings.In each frame consists of a number of slots,and tags probabilistically polling,the reader sends out a request,which is followed by a
reader is relatively a less concern because the reader’s battery can be easily replaced or it may be powered by an external source. To exploit this asymmetry, our new algorithms follows a common framework that trades more energy cost at the reader for less cost at the tags. The reader will continuously refine and broadcast a control parameter called contention probability, which optimizes the amount of information the reader can extract from the transmissions by the tags. This in turn reduces the number of transmissions by the tags that are necessary to achieve a certain estimation accuracy. Second, we investigate statistical methods, including the maximum likelihood estimation (MLE) and the average sum estimation (ASE) that are different from the probabilistic counting methods [8] used by [3], [4]. Our estimation algorithms optimize their performance by iteratively applying MLE or ASE with continuously refined parameters. The new algorithms not only require fewer transmissions by the tags but also minimize the size of each transmission. The number of transmissions made by the tags in our best algorithm is less than one third of the number in the state-of-the-art algorithms. In terms of the total number of bits transmitted by the tags, it is more than an order of magnitude smaller. Third, we formally analyze the confidence intervals of the estimations made by the new algorithms and establish the termination conditions for any given accuracy requirement. We perform extensive simulations to demonstrate that the measured results match well with the analytical results and that the new algorithms perform far better in terms of energy saving than the best existing algorithms. The rest of the paper is organized as follows. Section 2 discusses the related work. Section 3 defines the problem to be solved and the system model. Sections 4-6 propose three new solutions for the RFID estimation problem. Section 7 evaluates the new algorithms through simulations. Section 8 draws the conclusion. II. RELATED WORK Most existing work focuses on how to efficiently read the IDs of the RFID tags. When queried by the RFID reader, the tags will respond with their IDs. Since the communication environment is wireless, inevitably collisions will happen when multiple tags choose the same slot. Collision arbitration protocols mainly fall into two categories: framed ALOHA-based approach [9], [10], [11], [7] and tree-based approach [12], [13], [14], [6]. In the former approach, the polling request carries the frame length, and each tag individually chooses a slot in the frame to transmit its ID. The process repeats until all the tags successfully transmit their IDs to the reader. In the later approach, the reader first sends out an ID prefix string. The tags whose IDs match the string will respond. If a collision happens, the reader will append a ‘0’ or ‘1’ to the prefix string and send out the new string. This process repeats until only one tag responds. Essentially the approach traverses a binary tree with the IDs of the tags being the leaf nodes. Instead of identifying individual RFID tags, Kodialam and Nandagopal [3] study the problem of estimating the cardinality of a tag set. In their approach, information from tags are collected by a RFID reader in a series of time frames. Each frame consists of a number of slots, and tags probabilistically respond in those slots. Using the probabilistic counting methods, the reader estimates the number of tags based on the number of empty slots or the number of collision slots. Their best estimator is called the Unified Probabilistic Estimator (UPE). A follow-up work by the same authors proposes the Enhanced Zero-Based Estimator (EZB) [4], which makes its estimation based on the number of empty slots. The LotteryFrame scheme (LoF) [5] by Qian et al. employs a geometric distribution-based scheme to determine which slot in a time frame each tag will respond. It significantly reduces the estimation time. However, every tag must respond during each of the time frames (their number ranges from tens to thousands), resulting in large energy cost when active tags use their own power to transmit. Also related is a novel security protocol proposed by Tan et al. to monitor the missing RFID tags in the presence of dishonest RFID readers [15]. None of the above estimators are designed with energy conservation in mind. In the following, we will present our energy efficient estimators. III. PROBLEM DEFINITION AND SYSTEM MODEL A. RFID Estimation Problem The problem is to design efficient algorithms to estimate the number of RFID tags in a deployment area without actually reading the ID of each tag. Let N be the actual number of tags and Nˆ be the estimate. The estimation accuracy is specified by a confidence interval with two parameters: a probability value α and an error bound β, both in the range of (0, 1). The requirement is that the probability for N Nˆ to fall in the interval [1 − β, 1 + β] is at least α, i.e., P rob((1 − β)Nˆ ≤ N ≤ (1 + β)Nˆ) ≥ α. Our goal is to achieve the above estimation accuracy with minimum energy overhead to the active tags. B. System Model The type of active RFID systems considered in this paper is applicable to a large deployment area that are hundreds of feet or more across. Passive tags are beyond the scope of this paper. If they were used, one would have to take the reader and move around the whole area, collecting tag information once every few feet. Active tags allow a RFID reader to collect information from one location. The tagged goods (such as apparel) may stack in piles, and there may be obstacles, such as racks filled with merchandize, between a tag and the reader. We expect the active tags are designed to transmit with significant power that is high enough to ensure reliable information delivery in such a demanding environment. Hence, the energy cost due to the tags’ transmissions is the main concern in our algorithm design; it increases at least in the square of the maximum distance to be covered by the RFID system. Energy consumption that powers the tag’s circuit for computing and receiving information is not affected by long distance and obstacles. The energy consumed by the RFID reader is of less concern. We assume the reader transmits at sufficiently high power. We use the following communication protocol between the reader and the tags. The reader first synchronizes the clocks of the tags and then performs a sequence of pollings. In each polling, the reader sends out a request, which is followed by a
slotted time frame during which the tags respond.The polling A.Overview request from the reader carries a contention probability 0< MLEA uses the polling protocol described in Section III-B p 1 and a frame size f.Each tag will participate in the The frame size f is fixed to be one slot.The reader adjusts the current polling with probability p.If it decides to participate, contention probability for each polling.Let pi be the contention it will pick a slot uniformly at random from the frame,and probability of the ith polling.MLEA only records whether the transmit a bit string (called response)in that slot.The format of the response depends on the application.If the tag decides sole slot in each polling is empty or non-empty.Based on this information,it refines the estimate N until the accuracy to not participate,it will keep silent.In our solutions,p will requirement is met.Let z;be the slot state of the ith polling. be set in the order of When at least one tag responds,the slot is non-empty and If we know a lower bound Nmin of N,the contention zi=1.When no tag responds,it is empty and zi=0.The probability can be implemented efficiently to conserve energy. sequence of zi,i>1,forms the response vector. For example,a company's inventory of certain goods may be At the ith polling,each tag has a probability of pi to transmit routinely in thousands and never be reduced below a certain and,if any tag transmits,z;will be one.Hence, number before,or the company has a policy on the minimum inventory,or the RFID estimation becomes unnecessary when Prob{zi=1}=1-(1-pi)N1-e-NpL, (1) the number of tags is below a threshold.In these cases,we will have a lower bound Nmin,which can be much smaller than N. where N is the the actual number of tags. At the beginning of a polling,each tag makes a probabilistic If the contention probabilities of the pollings are picked decision:It goes to sleep for the whole polling period with too small,the response vector will contain mostly zeros.If probability 1-N and wakes up until the next period starts. the contention probabilities are picked too large,the response or it stays awaketo receive the polling request with probability vector will contain mostly ones.Both cases do not provide 1and then decides to respond with probability px Nmin. sufficient statistical information for accurate estimation.As Nmin For example,if N =10,000 and Nmin =1,000,then only will be discussed shortly,our analysis shows that the optimal 10 tags stay awake in each polling.In Section IV-D,another contention probability is pi=1.594/N.The problem is that energy-reduction method,called request-less pollings,will be we do not know N(which is the quantity we want to estimate). proposed to eliminate most polling requests. In order to determine pi,MLEA consists of an initialization Optionally,the reader's request may include a predicate phase and an iterative phase.The former quickly produces and only tags that satisfy the predicate will participate in the a coarse estimation of N.The latter refines the contention polling.For example,suppose all tags deployed in one section probability and generates the estimation result of a warehouse carry the 96-bit GEN2 IDs that begin with "000"in the Serial Number field.In order to estimate the B.Initialization Phase number of tags in this section,the request carries a predicate We want to pick a small value for the initial contention testing whether the first three bits of a tag's Serial Number is probability.The expected number of responding tags at the “000*」 first polling is Np1.If p is picked too large,a lot of A slot is said to be empty if no tag responds (transmits) tags will respond,which is wasteful because one response in the slot.It is called a singleton slot if exactly one tag or many responses produce the same information-a non- responds.It is a collision slot if more than one tag responds.A empty slot.Suppose we know an upper bound Nmax of N. singleton or collision slot is also called a non-empty slot.The This information is often available in practice.For example, Philips I-Code system [16]requires a slot length of 10 bits we know Nmaz is 10,000 if the warehouse is designed to in order to distinguish singleton slots from collision slots.On hold no more than 10,000 microwaves (each tagged with a the contrary.one bit is enough if we only need to distinguish RFID),or the company's inventory policy requires that in-store empty slots from non-empty slots-'0'means empty and'1' microwaves should not exceed 10,000,or the warehouse only means non-empty.Hence,the response will be much shorter has 10,000 RFID tags in use.Nmar can be much bigger than (or consume much less energy)if an algorithm only needs to N.We pick p=such that the expected number of know empty/non-empty slots,instead of all three types of slots responding tags is no more than one.If=0.we multiply the as required by [3]. contention probability by a constant C(>1),i.e.,p2=p1x C In order to prolong the lifetime of the tags,there are for the second polling.We continue multiplying the contention two ways to reduce their energy consumption:reducing the probability by C after each polling until a non-empty slot is size of each response and reducing the number of responses. observed.When that happens (say,at the Ith polling),we have We will design algorithms that require only the knowledge a coarse estimation of N to be 1/p.Then we move to the of empty/non-empty slots and employ statistical methods to next phase.When C is relatively large,the initialization phase minimize the transmissions needed from the tags. only takes a few pollings to complete due to the exponential increase of the contention probability. IV.MAXIMUM LIKELIHOOD ESTIMATION ALGORITHM Our first estimator for the number of RFID tags is called C.Iterative Phase the maximum likelihood estimation algorithm (MLEA).It fully This phase iteratively refines the estimation result after each utilizes the information from all pollings in order to minimize polling,and terminates when the specified accuracy require- the number of tag responses it needs to meet the accuracy ment is met.Let N;be the estimated number of tags after the requirement ith polling.The reader performs three tasks.First,it sets the
slotted time frame during which the tags respond. The polling request from the reader carries a contention probability 0 < p ≤ 1 and a frame size f. Each tag will participate in the current polling with probability p. If it decides to participate, it will pick a slot uniformly at random from the frame, and transmit a bit string (called response) in that slot. The format of the response depends on the application. If the tag decides to not participate, it will keep silent. In our solutions, p will be set in the order of 1 N . If we know a lower bound Nmin of N, the contention probability can be implemented efficiently to conserve energy. For example, a company’s inventory of certain goods may be routinely in thousands and never be reduced below a certain number before, or the company has a policy on the minimum inventory, or the RFID estimation becomes unnecessary when the number of tags is below a threshold. In these cases, we will have a lower bound Nmin, which can be much smaller than N. At the beginning of a polling, each tag makes a probabilistic decision: It goes to sleep for the whole polling period with probability 1− 1 Nmin and wakes up until the next period starts, or it stays awake to receive the polling request with probability 1 Nmin and then decides to respond with probability p × Nmin. For example, if N = 10, 000 and Nmin = 1, 000, then only 10 tags stay awake in each polling. In Section IV-D, another energy-reduction method, called request-less pollings, will be proposed to eliminate most polling requests. Optionally, the reader’s request may include a predicate and only tags that satisfy the predicate will participate in the polling. For example, suppose all tags deployed in one section of a warehouse carry the 96-bit GEN2 IDs that begin with “000” in the Serial Number field. In order to estimate the number of tags in this section, the request carries a predicate testing whether the first three bits of a tag’s Serial Number is “000”. A slot is said to be empty if no tag responds (transmits) in the slot. It is called a singleton slot if exactly one tag responds. It is a collision slot if more than one tag responds. A singleton or collision slot is also called a non-empty slot. The Philips I-Code system [16] requires a slot length of 10 bits in order to distinguish singleton slots from collision slots. On the contrary, one bit is enough if we only need to distinguish empty slots from non-empty slots — ‘0’ means empty and ‘1’ means non-empty. Hence, the response will be much shorter (or consume much less energy) if an algorithm only needs to know empty/non-empty slots, instead of all three types of slots as required by [3]. In order to prolong the lifetime of the tags, there are two ways to reduce their energy consumption: reducing the size of each response and reducing the number of responses. We will design algorithms that require only the knowledge of empty/non-empty slots and employ statistical methods to minimize the transmissions needed from the tags. IV. MAXIMUM LIKELIHOOD ESTIMATION ALGORITHM Our first estimator for the number of RFID tags is called the maximum likelihood estimation algorithm (MLEA). It fully utilizes the information from all pollings in order to minimize the number of tag responses it needs to meet the accuracy requirement. A. Overview MLEA uses the polling protocol described in Section III-B. The frame size f is fixed to be one slot. The reader adjusts the contention probability for each polling. Let pi be the contention probability of the ith polling. MLEA only records whether the sole slot in each polling is empty or non-empty. Based on this information, it refines the estimate Nˆ until the accuracy requirement is met. Let zi be the slot state of the ith polling. When at least one tag responds, the slot is non-empty and zi = 1. When no tag responds, it is empty and zi = 0. The sequence of zi , i ≥ 1, forms the response vector. At the ith polling, each tag has a probability of pi to transmit and, if any tag transmits, zi will be one. Hence, P rob{zi = 1} = 1 − (1 − pi) N ≈ 1 − e −Npi , (1) where N is the the actual number of tags. If the contention probabilities of the pollings are picked too small, the response vector will contain mostly zeros. If the contention probabilities are picked too large, the response vector will contain mostly ones. Both cases do not provide sufficient statistical information for accurate estimation. As will be discussed shortly, our analysis shows that the optimal contention probability is pi = 1.594/N. The problem is that we do not know N (which is the quantity we want to estimate). In order to determine pi , MLEA consists of an initialization phase and an iterative phase. The former quickly produces a coarse estimation of N. The latter refines the contention probability and generates the estimation result. B. Initialization Phase We want to pick a small value for the initial contention probability. The expected number of responding tags at the first polling is N p1. If p1 is picked too large, a lot of tags will respond, which is wasteful because one response or many responses produce the same information — a nonempty slot. Suppose we know an upper bound Nmax of N. This information is often available in practice. For example, we know Nmax is 10,000 if the warehouse is designed to hold no more than 10,000 microwaves (each tagged with a RFID), or the company’s inventory policy requires that in-store microwaves should not exceed 10,000, or the warehouse only has 10,000 RFID tags in use. Nmax can be much bigger than N. We pick p1 = 1 Nmax such that the expected number of responding tags is no more than one. If z1 = 0, we multiply the contention probability by a constant C(> 1), i.e., p2 = p1 ×C for the second polling. We continue multiplying the contention probability by C after each polling until a non-empty slot is observed. When that happens (say, at the lth polling), we have a coarse estimation of N to be 1/pl . Then we move to the next phase. When C is relatively large, the initialization phase only takes a few pollings to complete due to the exponential increase of the contention probability. C. Iterative Phase This phase iteratively refines the estimation result after each polling, and terminates when the specified accuracy requirement is met. Let Nˆ i be the estimated number of tags after the ith polling. The reader performs three tasks. First, it sets the
contention probability as follows before sending out the polling 3)Termination Condition:We show in Appendix A that, request: when i is large,N;approximately follows the Gaussian distri- Pi= (2) bution: N-1 Norm( (1-(1-p)) i(1-p)Nln2(1-p) where N-1 is the estimate after the previous polling and w is a constant,which will be determined shortly.Second,based According to (6),the variance of N can be written as p on the received z;and the history information,the reader finds Hence,the confidence interval of N is the new estimate of N that maximizes the following likelihood function: eNiPI-1 N:±Za·1 ip明 (9) L:=Π1-p)1-1-(1-p)), (3) where Z is the a percentile for the standard Gaussian distribu- =1 tion.For example,when a=95%,Za =1.96.Because N is where (1-p)N(1-)(1-(1-pj)N)=is the probability that undetermined,we use N;as an approximation when computing the standard deviation in (9) the state of the slot in the jth polling is zj.Namely,we want The termination condition for EMLA is therefore to find Ni arg max{Li} eNiP -1 (4) N ≤N·B, (10) 评 Third,after computing Ni,the reader has to determine if the where B is the error bound.The above inequality can be confidence interval of the estimation meets the requirement.In rewritten as the following,we show how the above tasks can be achieved. 1)Determine the Contention Probability:Using the i≥ ZaVeNIp:-1 (11) 6-method [17],we derive the variance of Ni in Appendix NipiB A. Because p:=1.594/N,we have Var(N)≈ 1-(1-p)N (5) i(1-p)N1n2(1-p) 1.544.Z8 i≥ (12) 32 When N is large and pi is small,we can approximate(1-p:)N as e-Npe and In(1-pi)as pi.The above variance becomes Hence we can theoretically compute the approximate number of pollings that is required in order to meet the accuracy Var()≈e-l requirements.For example,if a =95%and B =5%,2372 (6) ip pollings will be required.Note that (12)is independent with the actual number of tags,N.Hence,our approach has perfect We pick the value of pi that minimizes Var(Ni).Differen- scalability. tiating the right side of (6)and letting it be zero,we have Fig.I shows the simulation result of MLEA when N pi=1.594/N.Because MLE approach provides statistically 10.000.a =95%and 3=5%.The simulation setup can be consistent estimate,when i is large,N can be closely approx- found in Section VII.The middle curve is the estimated number imated by Ni-1.Hence,we set the value of pi as follows: of tags,Ni,with respect to the number pollings.It converges 1.594 to the true value N represented by the central straight line.The P1= (7) upper and lower curves represent the 95%confidence interval, N-1 which shrinks as the number of pollings increases. 2)Compute the value of Ni:We compute the new estimate D.Request-less Pollings of N that maximizes (3).Since the maxima is not affected by We observe that,after a number of pollings,the value of pi monotone transformations,we use logarithm to turn the right will stay in an increasingly small range and does not change side of the equation from product to summation: much.It becomes unnecessary for the reader to transmit it at each polling.Hence,we improve MLEA as follows:If In(Li 1-z)ln(1-p)+n(1-(1-p) the percentage change in pi during a certain number M of consecutive pollings is within a small threshold,the reader will broadcast the latest value of pi and indicate that it will no To find the maxima,we differentiate both sides: longer transmit polling requests (unless the value of p:changes beyond the threshold,in which case the reader will broadcast aln(Li) )血(1-)--p)1- the new value of pi one more time).Without receiving further 1-(1-p5)N polling requests,the tags will respond with the same contention (8) probability in the subsequent slots.This is called the request- We then set the right side to zero to solve for estimate Ni. less pollings.Note that the termination condition in (10) Note that the derivative is a monotone function of N.we can remains correct.With the threshold being 10%and M=10, numerically obtain Ni through the bisection search. the simulation results (which are omitted to save space)show
contention probability as follows before sending out the polling request: pi = ω Nˆ i−1 , (2) where Nˆ i−1 is the estimate after the previous polling and ω is a constant, which will be determined shortly. Second, based on the received zi and the history information, the reader finds the new estimate of N that maximizes the following likelihood function: Li = Y i j=1 (1 − pj ) N(1−zj ) (1 − (1 − pj ) N ) zj , (3) where (1−pj ) N(1−zj ) (1−(1−pj ) N ) zj is the probability that the state of the slot in the jth polling is zj . Namely, we want to find Nˆ i = arg max{Li} N . (4) Third, after computing Nˆ i , the reader has to determine if the confidence interval of the estimation meets the requirement. In the following, we show how the above tasks can be achieved. 1) Determine the Contention Probability: Using the δ−method [17], we derive the variance of Nˆ i in Appendix A. V ar(Nˆ i) ≈ 1 − (1 − pi) N i(1 − pi)N ln2 (1 − pi) . (5) When N is large and pi is small, we can approximate (1−pi) N as e −Npi and ln(1 − pi) as pi . The above variance becomes V ar(Nˆ i) ≈ e Npi − 1 ip2 i . (6) We pick the value of pi that minimizes V ar(Nˆ i). Differentiating the right side of (6) and letting it be zero, we have pi = 1.594/N. Because MLE approach provides statistically consistent estimate, when i is large, N can be closely approximated by Nˆ i−1. Hence, we set the value of pi as follows: pi = 1.594 Nˆ i−1 . (7) 2) Compute the value of Nˆ i: We compute the new estimate of N that maximizes (3). Since the maxima is not affected by monotone transformations, we use logarithm to turn the right side of the equation from product to summation: ln(Li) = X i j=1 · N(1 − zj ) ln(1 − pj ) + zj ln(1 − (1 − pj ) N ) ¸ . To find the maxima, we differentiate both sides: ∂ ln(Li) ∂N = Xi j=1 · (1 − zj ) ln(1 − pj ) − zj (1 − pj ) N ln(1 − pj ) 1 − (1 − pj )N ¸ . (8) We then set the right side to zero to solve for estimate Nˆ i . Note that the derivative is a monotone function of N, we can numerically obtain Nˆ i through the bisection search. 3) Termination Condition: We show in Appendix A that, when i is large, Nˆ i approximately follows the Gaussian distribution: Normµ N, (1 − (1 − pi) N ) i(1 − pi)N ln2 (1 − pi) ¶ . According to (6), the variance of Nˆ i can be written as e Npi−1 ip2 i . Hence, the confidence interval of N is Nˆ i ± Zα · s eNˆipi − 1 ip2 i , (9) where Zα is the α percentile for the standard Gaussian distribution. For example, when α = 95%, Zα = 1.96. Because N is undetermined, we use Nˆ i as an approximation when computing the standard deviation in (9). The termination condition for EMLA is therefore Zα · s eNˆipi − 1 ip2 i ≤ Nˆ i · β, (10) where β is the error bound. The above inequality can be rewritten as √ i ≥ Zα p eNˆipi − 1 Nˆ ipiβ . (11) Because pi = 1.594/Nˆ i , we have i ≥ 1.544 · Z 2 α β 2 . (12) Hence we can theoretically compute the approximate number of pollings that is required in order to meet the accuracy requirements. For example, if α = 95% and β = 5%, 2372 pollings will be required. Note that (12) is independent with the actual number of tags, N. Hence, our approach has perfect scalability. Fig. 1 shows the simulation result of MLEA when N = 10, 000, α = 95% and β = 5%. The simulation setup can be found in Section VII. The middle curve is the estimated number of tags, Nˆ i , with respect to the number pollings. It converges to the true value N represented by the central straight line. The upper and lower curves represent the 95% confidence interval, which shrinks as the number of pollings increases. D. Request-less Pollings We observe that, after a number of pollings, the value of pi will stay in an increasingly small range and does not change much. It becomes unnecessary for the reader to transmit it at each polling. Hence, we improve MLEA as follows: If the percentage change in pi during a certain number M of consecutive pollings is within a small threshold, the reader will broadcast the latest value of pi and indicate that it will no longer transmit polling requests (unless the value of pi changes beyond the threshold, in which case the reader will broadcast the new value of pi one more time). Without receiving further polling requests, the tags will respond with the same contention probability in the subsequent slots. This is called the requestless pollings. Note that the termination condition in (10) remains correct. With the threshold being 10% and M = 10, the simulation results (which are omitted to save space) show
20000 ASEA also consists of an initialization phase and an iterative phase.The initialization phase of ASEA is the same as the 15000 initialization phase of MLEA,except that when the reader 10000 obtains the first non-zero result z at the lth polling with probability pi.it computes a coarse estimation of N as 000 Then it moves to the next phase,which is described below. B.Iterative Phase 500100015002000 number of pollings This phase iteratively refines the estimation after each polling,and terminates when the specified accuracy require- Fig.1.The middle curve shows the estimated number of tags with respect ment is met.The reader performs four tasks in the ith iteration. to the number of pollings.The upper and lower curves show the confidence interval.The straightline shows the true number of tags First,it computes the contention probability pi by (13)before sending out the polling request. 1 that the performance difference caused by request-less pollings =- (13) is negligibly small even though the contention probability during request-less pollings may be slightly off the value set by (7).Request-less pollings can also be applied to the algorithms where Nis the estimation after the previous polling. in the next two sections. Second,the reader computes the number of responses zi in the current frame. E.Information Loss due to Collision Third,it computes Ni as follows: MLEA has a frame size of one slot.It obtains only binary information at each polling.Since the target contention prob- N= ∑=1+1z(1+ (14) ability is set to be 1.594/N,the probability for more than one ∑=+1 tag to respond in each polling is significant.However,no matter where s is introduced to compensate for collision and the how many tags respond,the information that the reader receives is always the same,i.e.,zi=1,which implies information iterative phase begins from the(1+1)th polling.The intuition loss when two or more tags decide to transmit at a polling behind (14)is given as follows:All pollings are performed Let's compare two scenarios.In one scenario,only one tag independent of each other.Hence,the previous pollings with responds at a polling.In the other,two tags respond.These the contention probability pi(l<j<i)and the number two scenarios generate the same information but the energy of responses zj can be treated as one polling with the con- cost of the second scenario is twice of the first.To address tention probability and the number of responses this problem.we design two more algorithms that reduce the j=11j.The correctness of this treatment will be estab- probability of collision and,moreover,compensate the impact lished by the analysis on the confidence interval and also by of collision in its computation. the simulation results. Fourth,after computing Ni,the reader determines if the V.AVERAGE SUM ESTIMATION ALGORITHM estimate meets the accuracy requirement.In the following,we The average sum estimation algorithm (ASEA)is our sec- discuss the details of the second and fourth steps. ond estimator for the number of RFID tags.It also utilizes 1)Compute the number of responses:At the ith polling, history information from previous pollings.However,instead the reader measures the number of non-empty slots in the of only obtaining binary information,it computes the number frame,denoted as zi,which is an integer in the range of [0..f] of responses in each polling.Because more information can be Due to possible collision,the actual number of responses, extracted,it requires a fewer number of responses than MLEA. denoted as,can be greater.However,we show that i closely Moreover,the computation performed by ASEA after each approximates x;.Their tiny difference can be computed and polling is much simpler than MLEA's complicated bisection corrected. search. Since each tag independently decides to respond with prob- A.Overview ability pi,follows a binomial distribution with parameters N and pi,i.e.. ASEA uses the same polling protocol as MLEA does,except that its frame size f is larger than one in order to reduce the Probfi=k}= p (1-p:)N-k (15) probability of collision.The result of the ith polling,i,is no longer a binary value.Instead,it is an estimate of the number IfpP;≈l/N and N is sufficiently large,.Prob{x幸=2}≈ of tags that respond at the polling. 0.1839,Prob{x=3}≈0.0613,Prob{x=4}≈0.0153, ASEA takes three steps to solve the collision.First,it slightly and the probability decreases exponentially with respect to k. reduces the contention probability p;.Second,it increases the frame size f such that the tags that decide to respond at a Probfz>4}is only about 0.0037.Next we compute the polling are likely to respond at different slots in the frame.We probability for collisions to happen at the ith polling,which is pick values for pi and f such that the collision probability is denoted as Probifcollision}. negligibly small.Third,we compensate the impact of collision 1We could also use (7).In that case,the frame size f would increase in our computation. accordingly to keep a small collision probability
0 5000 10000 15000 20000 0 500 1000 1500 2000 number of pollings Fig. 1. The middle curve shows the estimated number of tags with respect to the number of pollings. The upper and lower curves show the confidence interval. The straightline shows the true number of tags. that the performance difference caused by request-less pollings is negligibly small even though the contention probability during request-less pollings may be slightly off the value set by (7). Request-less pollings can also be applied to the algorithms in the next two sections. E. Information Loss due to Collision MLEA has a frame size of one slot. It obtains only binary information at each polling. Since the target contention probability is set to be 1.594/N, the probability for more than one tag to respond in each polling is significant. However, no matter how many tags respond, the information that the reader receives is always the same, i.e., zi = 1, which implies information loss when two or more tags decide to transmit at a polling. Let’s compare two scenarios. In one scenario, only one tag responds at a polling. In the other, two tags respond. These two scenarios generate the same information but the energy cost of the second scenario is twice of the first. To address this problem, we design two more algorithms that reduce the probability of collision and, moreover, compensate the impact of collision in its computation. V. AVERAGE SUM ESTIMATION ALGORITHM The average sum estimation algorithm (ASEA) is our second estimator for the number of RFID tags. It also utilizes history information from previous pollings. However, instead of only obtaining binary information, it computes the number of responses in each polling. Because more information can be extracted, it requires a fewer number of responses than MLEA. Moreover, the computation performed by ASEA after each polling is much simpler than MLEA’s complicated bisection search. A. Overview ASEA uses the same polling protocol as MLEA does, except that its frame size f is larger than one in order to reduce the probability of collision. The result of the ith polling, xi , is no longer a binary value. Instead, it is an estimate of the number of tags that respond at the polling. ASEA takes three steps to solve the collision. First, it slightly reduces the contention probability pi . Second, it increases the frame size f such that the tags that decide to respond at a polling are likely to respond at different slots in the frame. We pick values for pi and f such that the collision probability is negligibly small. Third, we compensate the impact of collision in our computation. ASEA also consists of an initialization phase and an iterative phase. The initialization phase of ASEA is the same as the initialization phase of MLEA, except that when the reader obtains the first non-zero result xl at the lth polling with probability pl , it computes a coarse estimation of N as xl pl . Then it moves to the next phase, which is described below. B. Iterative Phase This phase iteratively refines the estimation after each polling, and terminates when the specified accuracy requirement is met. The reader performs four tasks in the ith iteration. First, it computes the contention probability pi by (13) before sending out the polling request. pi = 1 Nˆ i−1 , (13) where Nˆ i−1 is the estimation after the previous polling. 1 Second, the reader computes the number of responses xi in the current frame. Third, it computes Nˆ i as follows: Nˆ i = Pi j=l+1 xj (1 + ε) Pi j=l+1 pj , (14) where ε is introduced to compensate for collision and the iterative phase begins from the (l + 1)th polling. The intuition behind (14) is given as follows: All pollings are performed independent of each other. Hence, the previous pollings with the contention probability pj (l < j ≤ i) and the number of responses xj can be treated as one polling with the contention probability Pi j=l+1 pj and the number of responses Pi j=l+1 xj . The correctness of this treatment will be established by the analysis on the confidence interval and also by the simulation results. Fourth, after computing Nˆ i , the reader determines if the estimate meets the accuracy requirement. In the following, we discuss the details of the second and fourth steps. 1) Compute the number of responses: At the ith polling, the reader measures the number of non-empty slots in the frame, denoted as xi , which is an integer in the range of [0..f]. Due to possible collision, the actual number of responses, denoted as x ∗ i , can be greater. However, we show that xi closely approximates x ∗ i . Their tiny difference can be computed and corrected. Since each tag independently decides to respond with probability pi , x ∗ i follows a binomial distribution with parameters N and pi , i.e., P rob{x ∗ i = k} = µ N k ¶ p k i (1 − pi) N−k . (15) If pi ≈ 1/N and N is sufficiently large, P rob{x ∗ i = 2} ≈ 0.1839, P rob{x ∗ i = 3} ≈ 0.0613, P rob{x ∗ i = 4} ≈ 0.0153, and the probability decreases exponentially with respect to k. P rob{x ∗ i > 4} is only about 0.0037. Next we compute the probability for collisions to happen at the ith polling, which is denoted as P robi{collision}. 1We could also use (7). In that case, the frame size f would increase accordingly to keep a small collision probability