2013 Proceedings IEEE INFOCOM An Efficient Protocol for RFID Multigroup Threshold-based Classification Wen Luo Yan Oiao Shigang Chen Department of Computer Information Science Engineering,University of Florida Abstract-RFID technology has many applications such as different countries or manufacturers in a port.One challenge object tracking,automatic inventory control,and supply chain is to determine whether the number of tags in each group is management.They can be used to identify individual objects or above or below a prescribed threshold value.The threshold count the population of each type of objects in a deployment area,no matter whether the objects are passports,retail may be set high to identify the populous groups,it may be products,books or even humans.Most existing work adopts a set to a level that triggers certain actions such as replenishing "flat"RFID system model and performs functions of collecting the stocks,or even multiple thresholds can be used to classify tag IDs,estimating the number of tags,or detecting the groups based on the range of their population sizes.Solving missing tags.However,in practice,tags are often attached to this multigroup threshold-based classification problem gives objects of different groups,which may represent a different product type in a warehouse,a different book category in a us a basic tool to access a large population of numerous library,etc.An interesting problem,called multigroup threshold- groups. based classification,is to determine whether the number of Precise classification requires us to know the precise objects in each group is above or below a prescribed threshold number of tags in each group.Tag identification protocols value.Solving this problem is important for inventory tracking [4,[5,[6],[7],[8],[9],[10l,[11]can do that,,but it takes applications.If the number of groups is very large,it will be inefficient to measure the groups one at a time.The best them significant time to complete if the number of tags is very existing solution for multigroup threshold-based classification large.One way to improve efficiency is relaxing the problem is based on generic group testing,whose design is however from accurate classification to approximate classification [2]. geared towards detecting a small number of populous groups where the classification accuracy can be tuned to meet a Its performance degrades quickly when the number of groups above the threshold become large.In this paper,we propose a pre-defined requirement.We may use cardinality estimation new classification protocol based on logical bitmaps.It achieves protocols [4].[5].[12],[13].[14]to estimate the number high efficiency by measuring all groups in a mixed fashion. of tags in each group,and classify the group based on In the meantime,we show that the new method is able to the estimation.However,those protocols are efficient when perform threshold-based classification with an accuracy that estimating a small number of large groups,but they are not can be pre-set to any desirable level,allowing tradeoff between time efficiency and accuracy. efficient when estimating a large number of small groups, because their execution time for each group is largely I.INTRODUCTION indifferent in group size,as we will demonstrate shortly. In [2],Sheng et al.apply group testing to approximately Radio-frequency identification(RFID)has rich application detect popular groups.When the number of groups above in cyber-physical systems for object tracking,automatic the threshold is small,their performance is good.However, inventory control,and supply chain management [1].[2].the performance of the group-testing-based solution degrades [3].Practical RFID systems widely exist for automatic toll quickly (in terms of the execution time)when the number of payment,access control to parking garages,object tracking, groups above threshold becomes large. theft prevention,tracking and monitoring.An RFID system In this paper,we propose a new classification protocol typically consists of three components:readers,tags and the that is scalable to a large number of groups.Its design is middleware software.Small RFID tags,each with a unique drastically different from traditional approaches that measure ID,are attached to objects,allowing an RFID reader to the size of one group at a time.It measures the sizes of quickly access the properties of each individual object or all groups together at once in a mixed fashion.Yet,the collect statistical information about a large group of objects.new protocol is able to perform threshold-based classification Much of the existing work on RFID systems is to design with an accuracy that can be pre-set to any desirable level, tag identification protocols that read the IDs from tags [4].allowing tradeoff between time efficiency and accuracy.Our [5],[6],[7],[8],[9],[10],[11].Other work designs efficient main contributions are summarized as follows: protocols to estimate the number of tags in a large RFID 1.We design an iterative protocol for threshold-based system [4],[5],[12],[13],[14],detects missing tags [15],classification in a multi-group RFID system based on logical [16],[17],or collects useful information [18]. bitmaps that share time slots uniformly at random among all This paper investigates a different problem.In practice, groups during the process of measuring their populations. tags are often attached to objects belonging to different We use the maximum likelihood estimation method to groups,for instance,different brands of shoes in a large shoe extract per-group information from the shared slots.Such store,different titles of books in a bookstore,and goods from slot sharing greatly reduces the amount of time it takes to 978-1-4673-5946-7/13/$31.00©20131EEE 890
An Efficient Protocol for RFID Multigroup Threshold-based Classification Wen Luo Yan Qiao Shigang Chen Department of Computer & Information Science & Engineering, University of Florida Abstract—RFID technology has many applications such as object tracking, automatic inventory control, and supply chain management. They can be used to identify individual objects or count the population of each type of objects in a deployment area, no matter whether the objects are passports, retail products, books or even humans. Most existing work adopts a “flat” RFID system model and performs functions of collecting tag IDs, estimating the number of tags, or detecting the missing tags. However, in practice, tags are often attached to objects of different groups, which may represent a different product type in a warehouse, a different book category in a library, etc. An interesting problem, called multigroup thresholdbased classification, is to determine whether the number of objects in each group is above or below a prescribed threshold value. Solving this problem is important for inventory tracking applications. If the number of groups is very large, it will be inefficient to measure the groups one at a time. The best existing solution for multigroup threshold-based classification is based on generic group testing, whose design is however geared towards detecting a small number of populous groups. Its performance degrades quickly when the number of groups above the threshold become large. In this paper, we propose a new classification protocol based on logical bitmaps. It achieves high efficiency by measuring all groups in a mixed fashion. In the meantime, we show that the new method is able to perform threshold-based classification with an accuracy that can be pre-set to any desirable level, allowing tradeoff between time efficiency and accuracy. I. INTRODUCTION Radio-frequency identification (RFID) has rich application in cyber-physical systems for object tracking, automatic inventory control, and supply chain management [1], [2], [3]. Practical RFID systems widely exist for automatic toll payment, access control to parking garages, object tracking, theft prevention, tracking and monitoring. An RFID system typically consists of three components: readers, tags and the middleware software. Small RFID tags, each with a unique ID, are attached to objects, allowing an RFID reader to quickly access the properties of each individual object or collect statistical information about a large group of objects. Much of the existing work on RFID systems is to design tag identification protocols that read the IDs from tags [4], [5], [6], [7], [8], [9], [10], [11]. Other work designs efficient protocols to estimate the number of tags in a large RFID system [4], [5], [12], [13], [14], detects missing tags [15], [16], [17], or collects useful information [18]. This paper investigates a different problem. In practice, tags are often attached to objects belonging to different groups, for instance, different brands of shoes in a large shoe store, different titles of books in a bookstore, and goods from different countries or manufacturers in a port. One challenge is to determine whether the number of tags in each group is above or below a prescribed threshold value. The threshold may be set high to identify the populous groups, it may be set to a level that triggers certain actions such as replenishing the stocks, or even multiple thresholds can be used to classify groups based on the range of their population sizes. Solving this multigroup threshold-based classification problem gives us a basic tool to access a large population of numerous groups. Precise classification requires us to know the precise number of tags in each group. Tag identification protocols [4], [5], [6], [7], [8], [9], [10], [11] can do that, but it takes them significant time to complete if the number of tags is very large. One way to improve efficiency is relaxing the problem from accurate classification to approximate classification [2], where the classification accuracy can be tuned to meet a pre-defined requirement. We may use cardinality estimation protocols [4], [5], [12], [13], [14] to estimate the number of tags in each group, and classify the group based on the estimation. However, those protocols are efficient when estimating a small number of large groups, but they are not efficient when estimating a large number of small groups, because their execution time for each group is largely indifferent in group size, as we will demonstrate shortly. In [2], Sheng et al. apply group testing to approximately detect popular groups. When the number of groups above the threshold is small, their performance is good. However, the performance of the group-testing-based solution degrades quickly (in terms of the execution time) when the number of groups above threshold becomes large. In this paper, we propose a new classification protocol that is scalable to a large number of groups. Its design is drastically different from traditional approaches that measure the size of one group at a time. It measures the sizes of all groups together at once in a mixed fashion. Yet, the new protocol is able to perform threshold-based classification with an accuracy that can be pre-set to any desirable level, allowing tradeoff between time efficiency and accuracy. Our main contributions are summarized as follows: 1. We design an iterative protocol for threshold-based classification in a multi-group RFID system based on logical bitmaps that share time slots uniformly at random among all groups during the process of measuring their populations. We use the maximum likelihood estimation method to extract per-group information from the shared slots. Such slot sharing greatly reduces the amount of time it takes to 978-1-4673-5946-7/13/$31.00 ©2013 IEEE 2013 Proceedings IEEE INFOCOM 890
2013 Proceedings IEEE INFOCOM complete classification. of interest in this paper.The first type is called a tag-ID slot, 2.Given an accuracy requirement,we show analytically whose length is denoted as Tiag,during which a reader is how to compute optimal system parameters that minimize able to broadcast a 96-bit tag ID.The second type is called a the protocol execution time under the constraint of the short-response slot,whose length is denoted as Tshort,which requirement.Our estimation method based on logical bitmaps carries one-bit information:'0'for an empty slot when no tag ensures that false positive/false negative ratios are bounded, transmits.and '1'for an non-empty slot when one or more where false positive occurs when a below-threshold group is tags transmit a signal to make the channel busy. reported as above-threshold and false negative occurs when Significant asymmetry exists between readers and tags: an above-threshold group is not reported. Tags are supposed to be used in large quantities,and they 3.We comprehensively evaluate the proposed solution and must be cheap.The cost for a reader is less of a concern compare it with existing protocols.Our simulation results because its number in use is much fewer.Therefore,unlike match well with the analytical results,which demonstrate that tags,the reader is not limited in storage space,computation the new protocol performs far better in terms of execution power,or energy supply.If necessary,it can be connected to time than the best existing work. a powerful server for resources.With a high-quality antenna, The rest of the paper is organized as follows.Section II a reader is able to receive weak signals from tags.With low- presents the system model and defines the problem to be quality antennas,although tags can receive strong signals solved.Section III discusses the related work and gives the from the reader,they cannot receive each other's weak motivation for our solution.Section IV proposes our two- signals.They may not even sense whether the channel is phase protocol for the RFID threshold-based classification busy or idle,i.e.,whether another tag is transmitting.Nor can problem.Section V evaluates the new protocol through they sense if collision has occurred when two tags transmit simulations.Section VI draws the conclusion. simultaneously.Hence,a CSMA/CA-like MAC protocol [19] cannot be assumed in an RFID system.But the reader can II.PROBLEM DEFINITION AND SYSTEM MODEL detect whether the channel is idle or whether collision occurs A.System Model Such asymmetry points out a design principle that we should There are three types of RFID tags.Passive tags are follow:pushing the complexity to the reader while leaving most widely deployed today.They are cheap,but do the tags simple not have internal power sources.Passive tags rely on radio waves emitted from an RFID reader to power their circuit and transmit information back to the reader through B.Multigroup Threshold-based Classification Problem backscattering.They have short operational ranges,typically a few meters in an indoor environment.To cover a large Consider a big warehouse with tens of thousands of items area,arrays of RFID reader antennas must be installed.Semi- Each item is attached with an RFID tag for communication passive tags carry batteries to power their circuit,but still with an RFID reader.The items are divided into different rely on backscattering to transmit information.Active tags groups based on certain properties,which can be the use their own battery power to transmit,and consequently product sub-category,production date,or production place. do not need any energy supply from the reader.Active tags To support grouping,each tag ID should contain two operate at a much longer distance,making them particularly components:a group ID,which identifies the group the tag suitable for applications that cover a large area,where one belongs to,and a member ID,which identifies a specific tag or a few RFID readers are installed to access all tagged in the group.Clearly,all tags in a group must carry the same objects and perform management functions automatically. group ID,while tags in different groups carry different group With richer onboard resources,active tags are likely to gain IDs.We assume that the RFID reader knows the group IDs more popularity in the future,particularly when their prices in the system. drop over time as manufactural technologies are improved We define the population or size of a group as the and markets are expanded. number of tags in this group.As we have explained in Communication between readers and tags is time-slotted. the introduction,while it is possible to perform precise Readers send out a request,which is followed by a slotted multigroup classification at high cost,the focus of this paper time frame during which tags transmit in their selected is to study efficient solutions for approximate multigroup slots.The readers may take turns to transmit the request classification.We formally define the problem as follows: in order to avoid interference,or a more sophisticated Let h be the threshold and a be a large probability value. scheduling algorithm may be used to allow readers that do not We require that any group whose population exceeds h interfere to transmit simultaneously.When a tag transmits,should be reported with a probability of at least a.Let I as long as one reader receives the transmission correctly,be another integer parameter smaller than h and B be a the transmission will be successful.In our protocol design, small probability value.We also require that the probability we can logically treat all readers as one,which transmits of reporting any group with I or fewer tags should be no a request and then listens to the tags'responses.There are more than B.Let k:be the population of an arbitrary group different types of time slots [9],among which two types are g.Our performance objectives can be expressed in terms of 89
complete classification. 2. Given an accuracy requirement, we show analytically how to compute optimal system parameters that minimize the protocol execution time under the constraint of the requirement. Our estimation method based on logical bitmaps ensures that false positive/false negative ratios are bounded, where false positive occurs when a below-threshold group is reported as above-threshold and false negative occurs when an above-threshold group is not reported. 3. We comprehensively evaluate the proposed solution and compare it with existing protocols. Our simulation results match well with the analytical results, which demonstrate that the new protocol performs far better in terms of execution time than the best existing work. The rest of the paper is organized as follows. Section II presents the system model and defines the problem to be solved. Section III discusses the related work and gives the motivation for our solution. Section IV proposes our twophase protocol for the RFID threshold-based classification problem. Section V evaluates the new protocol through simulations. Section VI draws the conclusion. II. PROBLEM DEFINITION AND SYSTEM MODEL A. System Model There are three types of RFID tags. Passive tags are most widely deployed today. They are cheap, but do not have internal power sources. Passive tags rely on radio waves emitted from an RFID reader to power their circuit and transmit information back to the reader through backscattering. They have short operational ranges, typically a few meters in an indoor environment. To cover a large area, arrays of RFID reader antennas must be installed. Semipassive tags carry batteries to power their circuit, but still rely on backscattering to transmit information. Active tags use their own battery power to transmit, and consequently do not need any energy supply from the reader. Active tags operate at a much longer distance, making them particularly suitable for applications that cover a large area, where one or a few RFID readers are installed to access all tagged objects and perform management functions automatically. With richer onboard resources, active tags are likely to gain more popularity in the future, particularly when their prices drop over time as manufactural technologies are improved and markets are expanded. Communication between readers and tags is time-slotted. Readers send out a request, which is followed by a slotted time frame during which tags transmit in their selected slots. The readers may take turns to transmit the request in order to avoid interference, or a more sophisticated scheduling algorithm may be used to allow readers that do not interfere to transmit simultaneously. When a tag transmits, as long as one reader receives the transmission correctly, the transmission will be successful. In our protocol design, we can logically treat all readers as one, which transmits a request and then listens to the tags’ responses. There are different types of time slots [9], among which two types are of interest in this paper. The first type is called a tag-ID slot, whose length is denoted as Ttag, during which a reader is able to broadcast a 96-bit tag ID. The second type is called a short-response slot, whose length is denoted as Tshort, which carries one-bit information: ‘0’ for an empty slot when no tag transmits, and ‘1’ for an non-empty slot when one or more tags transmit a signal to make the channel busy. Significant asymmetry exists between readers and tags: Tags are supposed to be used in large quantities, and they must be cheap. The cost for a reader is less of a concern because its number in use is much fewer. Therefore, unlike tags, the reader is not limited in storage space, computation power, or energy supply. If necessary, it can be connected to a powerful server for resources. With a high-quality antenna, a reader is able to receive weak signals from tags. With lowquality antennas, although tags can receive strong signals from the reader, they cannot receive each other’s weak signals. They may not even sense whether the channel is busy or idle, i.e., whether another tag is transmitting. Nor can they sense if collision has occurred when two tags transmit simultaneously. Hence, a CSMA/CA-like MAC protocol [19] cannot be assumed in an RFID system. But the reader can detect whether the channel is idle or whether collision occurs. Such asymmetry points out a design principle that we should follow: pushing the complexity to the reader while leaving the tags simple. B. Multigroup Threshold-based Classification Problem Consider a big warehouse with tens of thousands of items. Each item is attached with an RFID tag for communication with an RFID reader. The items are divided into different groups based on certain properties, which can be the product sub-category, production date, or production place. To support grouping, each tag ID should contain two components: a group ID, which identifies the group the tag belongs to, and a member ID, which identifies a specific tag in the group. Clearly, all tags in a group must carry the same group ID, while tags in different groups carry different group IDs. We assume that the RFID reader knows the group IDs in the system. We define the population or size of a group as the number of tags in this group. As we have explained in the introduction, while it is possible to perform precise multigroup classification at high cost, the focus of this paper is to study efficient solutions for approximate multigroup classification. We formally define the problem as follows: Let h be the threshold and α be a large probability value. We require that any group whose population exceeds h should be reported with a probability of at least α. Let l be another integer parameter smaller than h and β be a small probability value. We also require that the probability of reporting any group with l or fewer tags should be no more than β. Let k be the population of an arbitrary group g. Our performance objectives can be expressed in terms of 2013 Proceedings IEEE INFOCOM 891
2013 Proceedings IEEE INFOCOM 140 UPE broadcasts a polling request,followed by a slotted time EZB 120 frame during which tags respond.Receiving the request,each Enhanced FNEB EMLEA tag independently picks up a time slot to transmit its ID. 100 80 To address collision,the reader has to repeat this process 60 for a certain number of pollings before all tag IDs can be successfully received.The second category is tree-based [8], 30 [9],[10],[11].These protocols resolve collision by traversing 转教结衬 a binary tree with the IDs of the tags being the leaf nodes.The 0 10000 2000030000 4000050000 reader first broadcasts an ID prefix string.The tags whose Group size IDs match the string will respond.If collision happens,the reader will append an '0'or'1'to the prefix string and send Fig.1.The estimation time with respect to the group size for UPE,EZB. Enhanced FNEB and EMLEA.when a'=99%and B=1%. out the new string.This process repeats until only one tag responds.Applying to the problem in this paper,these two TABLE I NOTATIONS types of protocols do not work well for large-scale RFID systems due to their long identification time. Symbols Descriptions A probabilistic analytical model is proposed by Kodialam 1-a upper bound of false negative ratio and Nandagopal for anonymously estimating tag population B upper bound of false positive ratio [12].Their estimators are the Zero Estimator(ZE),Collision m bit length of logical bitmap Estimator (CE),and the Unified Probabilistic Estimator number of tags (UPE),which collect information from tags in a series of s number of above-threshold groups time frames and estimates the population of tags based on k actual number of tags in an arbitrary group the number of empty slots and/or the number of collision estimated number of tags in an arbitrary group slots.A follow-up work by the same authors proposes ri random number in the ith polling the Enhanced Zero-Based Estimator (EZB)[13],which is length of the time frame each polling an asymptotically unbiased estimator and makes estimation H() hash function whose range is 0,f-1 only based on the number of empty slots.Qian et al. mid a tag's member ID provide a replicate-insensitive estimation algorithm called the 9id a tag's group ID Lottery-Frame scheme (LoF)[14].The Enhanced First Non- a prescribed higher bound threshold value Empty slots Based Estimator(Enhanced FNEB)[20]can be a prescribed lower bound threshold value used to estimate tag population in both static and dynamic number of pollings environments by measuring the position of the first non- empty slot in each frame.Li et al.[21]study the estimation problem for large-scale RFID systems from the energy angle based on Maximum Likelihood Estimation (MLE). conditional probabilities as follows: They design several energy-efficient probabilistic algorithms Prob{group g is reported by the reader k≥h}≥a that iteratively refine a control parameter to optimize the (1) information carried in the transmissions from tags,such Prob[group g is reported by the readerk≤I}≤B that both the number and the size of the transmissions are We treat the report of a group with l or fewer tags as a minimized. false positive,and the non-report of a group with h or more B.Motivation tags as a false negative.Hence,the above objectives can also We first show the performance of some existing work be stated as bounding the false positive ratio by B and the false negative ratio by 1-a. through simulation.From the simulation results,we argue that these protocols are not time-efficient when they are III.PRELIMINARY applied to the multigroup threshold-based classification problem. A.Prior Work Figure 1 presents the execution time of four existing One possible solution for the multigroup threshold-based protocols [12].[13],[20],[21]with respect to the number classification problem is to use a reader to collect the of tags in a group;details about the simulation setting and actual tag IDs from tags,where each ID contains bits that parameters can be found in Section V-A.The protocols identify the group of the tag.For a reader to successfully are designed to estimate the size of a single group.To collect tag IDs in proximity,collision arbitration protocols perform multigroup threshold-based classification,they have must be considered so that replies from multiple tags to estimate one group at a time.Their estimation accuracy will not be garbled due to collision.In general,collision is specified by a confidence interval:The probability for the arbitration protocols can be classified into two categories.estimate to deviate from the true group size by B'percentage The first category is ALOHA-based [4].[5],[6].[7].In or more should not exceed a',where a'and B'are two pre- these protocols,communication is initialized when a reader specified system parameters.They are set to 99%and 1% 892
0 20 40 60 80 100 120 140 0 10000 20000 30000 40000 50000 Time in seconds Group size UPE EZB Enhanced FNEB EMLEA Fig. 1. The estimation time with respect to the group size for UPE, EZB, Enhanced FNEB and EMLEA, when α = 99% and β = 1%. TABLE I NOTATIONS Symbols Descriptions 1 − α upper bound of false negative ratio β upper bound of false positive ratio m bit length of logical bitmap n number of tags S number of above-threshold groups k actual number of tags in an arbitrary group kˆ estimated number of tags in an arbitrary group ri random number in the ith polling f length of the time frame each polling H(·) hash function whose range is [0, f − 1] mid a tag’s member ID gid a tag’s group ID h a prescribed higher bound threshold value l a prescribed lower bound threshold value w number of pollings conditional probabilities as follows: P rob{ group g is reported by the reader |k ≥ h} ≥ α P rob{ group g is reported by the reader |k ≤ l} ≤ β (1) We treat the report of a group with l or fewer tags as a false positive, and the non-report of a group with h or more tags as a false negative. Hence, the above objectives can also be stated as bounding the false positive ratio by β and the false negative ratio by 1 − α. III. PRELIMINARY A. Prior Work One possible solution for the multigroup threshold-based classification problem is to use a reader to collect the actual tag IDs from tags, where each ID contains bits that identify the group of the tag. For a reader to successfully collect tag IDs in proximity, collision arbitration protocols must be considered so that replies from multiple tags will not be garbled due to collision. In general, collision arbitration protocols can be classified into two categories. The first category is ALOHA-based [4], [5], [6], [7]. In these protocols, communication is initialized when a reader broadcasts a polling request, followed by a slotted time frame during which tags respond. Receiving the request, each tag independently picks up a time slot to transmit its ID. To address collision, the reader has to repeat this process for a certain number of pollings before all tag IDs can be successfully received. The second category is tree-based [8], [9], [10], [11]. These protocols resolve collision by traversing a binary tree with the IDs of the tags being the leaf nodes. The reader first broadcasts an ID prefix string. The tags whose IDs match the string will respond. If collision happens, the reader will append an ‘0’ or ‘1’ to the prefix string and send out the new string. This process repeats until only one tag responds. Applying to the problem in this paper, these two types of protocols do not work well for large-scale RFID systems due to their long identification time. A probabilistic analytical model is proposed by Kodialam and Nandagopal for anonymously estimating tag population [12] . Their estimators are the Zero Estimator (ZE), Collision Estimator (CE), and the Unified Probabilistic Estimator (UPE), which collect information from tags in a series of time frames and estimates the population of tags based on the number of empty slots and/or the number of collision slots. A follow-up work by the same authors proposes the Enhanced Zero-Based Estimator (EZB) [13], which is an asymptotically unbiased estimator and makes estimation only based on the number of empty slots. Qian et al. provide a replicate-insensitive estimation algorithm called the Lottery-Frame scheme (LoF) [14]. The Enhanced First NonEmpty slots Based Estimator (Enhanced FNEB) [20] can be used to estimate tag population in both static and dynamic environments by measuring the position of the first nonempty slot in each frame. Li et al. [21] study the estimation problem for large-scale RFID systems from the energy angle based on Maximum Likelihood Estimation (MLE). They design several energy-efficient probabilistic algorithms that iteratively refine a control parameter to optimize the information carried in the transmissions from tags, such that both the number and the size of the transmissions are minimized. B. Motivation We first show the performance of some existing work through simulation. From the simulation results, we argue that these protocols are not time-efficient when they are applied to the multigroup threshold-based classification problem. Figure 1 presents the execution time of four existing protocols [12], [13], [20], [21] with respect to the number of tags in a group; details about the simulation setting and parameters can be found in Section V-A. The protocols are designed to estimate the size of a single group. To perform multigroup threshold-based classification, they have to estimate one group at a time. Their estimation accuracy is specified by a confidence interval: The probability for the estimate to deviate from the true group size by β percentage or more should not exceed α , where α and β are two prespecified system parameters. They are set to 99% and 1% 2013 Proceedings IEEE INFOCOM 892
2013 Proceedings IEEE INFOCOM respectively in our simulation.From the figure,we observe IV.AN EFFICIENT THRESHOLD-BASED CLASSIFICATION that the estimation time changes very little with respect to the PROTOCOL number of tags.For example,UPE takes about 20 seconds to This s estimate the tag population in the range from 500 to 50,000.If section presents an efficient Threshold-Based Classification (TBC)Protocol,which is a combination of there are two groups of 25,000 tags each,the total estimation time for the two groups will be 40 seconds.However,if dynamic slot sharing among groups and maximum likelihood estimation of group sizes. there are 100 groups of 500 tags each,the total estimation time will be 2,000 seconds!Hence,these protocols are not A.Dynamic Slot Sharing suitable when there are numerous small groups Let's begin with a single group and use a well known Sheng,Tan and Li studied how to reduce the execution approach [12]of estimating the group population for our time for identifying populous groups whose sizes are discussion:The RFID reader broadcasts a polling request, larger than a threshold [2].They start with a simple fast asking tags to respond in a subsequent time frame of f slots. threshold checking scheme (TCS),which approximately Upon receiving the request,each tag randomly selects a slot answers whether the number of involved tags exceeds a in the frame to transmit.Listening to the channel,the reader threshold with high probability.Based on TCS.they propose converts the time frame into a bitmap.Each bit corresponds two probabilistic protocols.The first one is based on generic to a slot,'0'for an empty slot and'1'for a non-empty slot. group testing(GT),and the second protocol is a combination The reader then counts the number of '0'bits.Intuitively. of group testing and divide-and-conquer.Simulations show when there are more tags transmitting,fewer slots will be left that their best protocol can significantly reduce the execution empty,which means fewer '0'bits.A functional relationship time for populous group discovery.However,it is also can be established between the number of tags in the group shown in the simulation results that their execution time is and the number of '0'bits in the bitmap [12],and we can approximately proportional to the number of groups above use this function to estimate the former from the latter.When the threshold.Hence,the performance of the protocol will there are multiple groups,we can repeat the above scheme, deteriorate if the number of groups above the threshold using a different time frame for each group. is large.In addition,the RFID reader must be able to Analysis has shown that '0'bits must account for a distinguish three types of slots:(1)empty slot,during which reasonable portion of the bitmap in order to ensure accuracy. no tag transmit;(2)singleton slot,during which only one tag Hence,to handle a large tag group,we should conservatively transmits,and(3)collision slot,during which more than one set the frame size to be large enough such that a reasonable tag transmits. number of '0'bits will remain after all tags pick their slots to transmit.But when there are a mix of large and small We follow two general design principles when designing groups,there is no one-frame-size-fit-all:A small frame size our time-efficient classification protocol.First,we want for all groups will not do well for large groups;a large to minimize the length of each time slot.Based on the frame size is good for large groups,but it is wasteful for parameters of the Philips I-Code specification [22].in order small groups.This is particularly true when the majority of to transmit a 96-bit ID from a tag to an RFID read,we need all groups are small but there are a few large ones.Can we a slot of 2.11 ms.However,if the reader is not interested in choose a different frame size for each group based on its IDs but wants to distinguish collision slots from singleton or population?No,because per-group population is not given empty slots [2],tags should transmit 10-bit long responses, but what we want to know. using slots of 0.491 ms each.Furthermore,if the reader One way to solve the above dilemma is to share all does not need to distinguish collision slots from singleton slots among all groups.To do so,we have to abandon the slots but only wants to know whether the slots are empty or traditional approach of one time frame per group [2],[12], not,tags can transmit one-bit short responses,using slots [13],[20],[21],and instead use a single time frame for all of 0.321 ms each to carry one bit information (channel groups-a new approach that measures the sizes of all busy or idle);this is the type of slots we will use in our groups together:Each group ID is pseudo-randomly hashed protocol design.Note that 0.302 ms idle time is included to a certain number of slots in the time frame.Each tag in in each slot to separate it from neighboring slots.Second,the group will probabilistically pick one of these slots to we want to minimize the number of slots.Figure 1 clearly transmit.Listening to the channel,the reader converts the shows that traditional approaches of measuring one group at time frame into a bitmap.For each group,it extracts the bits a time will not work well when there are a large number that the group ID is hashed to.Those bits form the logical of groups.We take a new design that is drastically different bitmap of the group,from which the group size is estimated. from traditional ones:measuring the groups all at once.This In this approach,each bit and the corresponding slot may design has an interesting feature that its execution time is be shared by more than one group.This sharing introduces largely insensitive to the number of groups if the total number noise;the logical bitmap of one group may carry some bits of tags is about the same.It makes our protocol particularly that are set to'1'not by transmissions of tags in this group, suitable for situations where there are a large number of small but by transmissions of tags from other groups that happen to or medium-sized groups be hashed to the same time slots.Fortunately,in a bird's-eye 893
respectively in our simulation. From the figure, we observe that the estimation time changes very little with respect to the number of tags. For example, UPE takes about 20 seconds to estimate the tag population in the range from 500 to 50,000. If there are two groups of 25,000 tags each, the total estimation time for the two groups will be 40 seconds. However, if there are 100 groups of 500 tags each, the total estimation time will be 2,000 seconds! Hence, these protocols are not suitable when there are numerous small groups. Sheng, Tan and Li studied how to reduce the execution time for identifying populous groups whose sizes are larger than a threshold [2]. They start with a simple fast threshold checking scheme (TCS), which approximately answers whether the number of involved tags exceeds a threshold with high probability. Based on TCS, they propose two probabilistic protocols. The first one is based on generic group testing (GT), and the second protocol is a combination of group testing and divide-and-conquer. Simulations show that their best protocol can significantly reduce the execution time for populous group discovery. However, it is also shown in the simulation results that their execution time is approximately proportional to the number of groups above the threshold. Hence, the performance of the protocol will deteriorate if the number of groups above the threshold is large. In addition, the RFID reader must be able to distinguish three types of slots: (1) empty slot, during which no tag transmit; (2) singleton slot, during which only one tag transmits, and (3) collision slot, during which more than one tag transmits. We follow two general design principles when designing our time-efficient classification protocol. First, we want to minimize the length of each time slot. Based on the parameters of the Philips I-Code specification [22], in order to transmit a 96-bit ID from a tag to an RFID read, we need a slot of 2.11 ms. However, if the reader is not interested in IDs but wants to distinguish collision slots from singleton or empty slots [2], tags should transmit 10-bit long responses, using slots of 0.491 ms each. Furthermore, if the reader does not need to distinguish collision slots from singleton slots but only wants to know whether the slots are empty or not, tags can transmit one-bit short responses, using slots of 0.321 ms each to carry one bit information (channel busy or idle); this is the type of slots we will use in our protocol design. Note that 0.302 ms idle time is included in each slot to separate it from neighboring slots. Second, we want to minimize the number of slots. Figure 1 clearly shows that traditional approaches of measuring one group at a time will not work well when there are a large number of groups. We take a new design that is drastically different from traditional ones: measuring the groups all at once. This design has an interesting feature that its execution time is largely insensitive to the number of groups if the total number of tags is about the same. It makes our protocol particularly suitable for situations where there are a large number of small or medium-sized groups. IV. AN EFFICIENT THRESHOLD-BASED CLASSIFICATION PROTOCOL This section presents an efficient Threshold-Based Classification (TBC) Protocol, which is a combination of dynamic slot sharing among groups and maximum likelihood estimation of group sizes. A. Dynamic Slot Sharing Let’s begin with a single group and use a well known approach [12] of estimating the group population for our discussion: The RFID reader broadcasts a polling request, asking tags to respond in a subsequent time frame of f slots. Upon receiving the request, each tag randomly selects a slot in the frame to transmit. Listening to the channel, the reader converts the time frame into a bitmap. Each bit corresponds to a slot, ‘0’ for an empty slot and ‘1’ for a non-empty slot. The reader then counts the number of ‘0’ bits. Intuitively, when there are more tags transmitting, fewer slots will be left empty, which means fewer ‘0’ bits. A functional relationship can be established between the number of tags in the group and the number of ‘0’ bits in the bitmap [12], and we can use this function to estimate the former from the latter. When there are multiple groups, we can repeat the above scheme, using a different time frame for each group. Analysis has shown that ‘0’ bits must account for a reasonable portion of the bitmap in order to ensure accuracy. Hence, to handle a large tag group, we should conservatively set the frame size to be large enough such that a reasonable number of ‘0’ bits will remain after all tags pick their slots to transmit. But when there are a mix of large and small groups, there is no one-frame-size-fit-all: A small frame size for all groups will not do well for large groups; a large frame size is good for large groups, but it is wasteful for small groups. This is particularly true when the majority of all groups are small but there are a few large ones. Can we choose a different frame size for each group based on its population? No, because per-group population is not given but what we want to know. One way to solve the above dilemma is to share all slots among all groups. To do so, we have to abandon the traditional approach of one time frame per group [2], [12], [13], [20], [21], and instead use a single time frame for all groups — a new approach that measures the sizes of all groups together: Each group ID is pseudo-randomly hashed to a certain number of slots in the time frame. Each tag in the group will probabilistically pick one of these slots to transmit. Listening to the channel, the reader converts the time frame into a bitmap. For each group, it extracts the bits that the group ID is hashed to. Those bits form the logical bitmap of the group, from which the group size is estimated. In this approach, each bit and the corresponding slot may be shared by more than one group. This sharing introduces noise; the logical bitmap of one group may carry some bits that are set to ‘1’ not by transmissions of tags in this group, but by transmissions of tags from other groups that happen to be hashed to the same time slots. Fortunately, in a bird’s-eye 2013 Proceedings IEEE INFOCOM 893
2013 Proceedings IEEE INFOCOM view,all slots are shared by all groups uniformly at random Clearly,for tags of group g,the indices of their (through independent hashing),which means the noise is selected slots in the frame can only be H(gid F(ri,0)), uniformly distributed in the whole time frame.Such uniform H(gid F(ri,1))....,H(gidF(ri,m-1)).These slots noise is measurable.To estimate the size of a group,we will or more precisely,the bits converted from these slots,form use the logical bitmap of that group.but subtract the noise the logical bitmap of g.denoted as LB(g). that other groups introduce into the bitmap. Note that the value of H(tid)mod m gives the index of To further improve performance,the reader repeats the the corresponding bit in the logical bitmap.For example,if above approach multiple times to gather multiple independent a tag selects the H(gidF(ri,H(tid)mod m))th slot to logical bitmaps for each group,and estimation based on transmit,the (H(tid)mod m)th bit in the logical bitmap multiple logical bitmaps reduces the variance of the result. will be set to '1'.Essentially,we embed the logical bitmaps of all in Bi. B.Overview Our threshold-based classification(TBC)protocol consists D.Report Phase of three phases:the parameter-precomputing phase,the frame phase,and the report phase.The parameter-precomputing After w pollings,the reader obtains w bitmaps,Bi, phase computes system parameters for optimal performance l≤i≤w.It sends the bitmaps to an offline data of the protocol.Using these parameters,the frame phase processing module.There,the logical bitmaps of each group makes w(1)polling requests,each of them followed by is extracted.For an arbitrary group g,we extract a logical a time frame,during which tags of all groups transmit in bitmap LB;(g)from B;as follows:Set the jth bit of LB;(g) to be the H(gid F(ri,j))th bit in Bi,i.e..LB;(g)i]= selected slots.The reader converts each time frame into a bitmap,from which logical bitmaps are extracted.Using B[H(gd⊕F(r,j)l,where1≤i≤wand0≤j≤m-l. these logical bitmaps,the report phase employs the Maximum Let z;be the number of zeros observed in LBi(g).Let Likelihood Estimation (MLE)method to report the above- n be the total number of tags in the system and k be the threshold groups. actual population of group g.Below we derive the formula There are three system parameters:1)w is the number of to compute an estimate k of the population. pollings (or time frames),2)f is the size of each time frame, Consider the ith polling in the frame phase and an arbitrary i.e.,the number of slots in a frame or the number of bits in bit b in LBi(g).A tag in group g has a probability of to the bitmap that the frame is converted to,and 3)m is the select this bit and set it to'1'because the tag only sets one of size of each logical bitmap.Clearly,m<f.The execution the m bits in the logical bitmap of g.Any tag in other groups time of TBC is dominated by the w time frames,which have has a probability of to set this bit to '1'due to dynamic wx f time slots in total.We want to find the optimal values slot sharing across the whole frame.Hence,the probability for w,f and m such that the constraints in (1)are met and for b to remain zero is the value of w x f is minimized. g=0-7--a (2) Before presenting the parameter-precomputing phase for how the optimal system parameters are computed,we will Hence,the likelihood function L;for us to observe z;bits first describe the frame phase and the report phase because of zeros in LBi(g)is deriving the formulas for optimal system parameters relies on the knowledge of how the frame phase works. =a--1- C.Frame Phase The frame phase is composed of w pollings,which are x0-1--1-m (3) performed in a similar way:In the ith polling(where 1 i<w),an RFID reader first broadcasts a request message, The likelihood function L for us to observe all zi values, including a random number ri and the system parameters, 1≤i≤w,in the w logical bitmaps is L=Π1La.That f and m.The request message also serves the purpose of 1s, synchronizing the clocks of all tags for starting a time frame of f slots right after the request. =Π[-子-1-r Consider an arbitrary tag t in an arbitrary group g.The tag computes a hash value H(gid F(ri,H(tid)mod m)) as the index of the time slot selected for its transmission, ×a-1-n-a-aym-] (4) where H()is a hash function whose range is [0,f-1], gid is the tag's group ID,tid is the tag's member ID,and We want to find an estimate k that maximizes L,namely, F(z,y)is a pseudo-random number function that takes two input parameters:z and y.F(r,y)uses x as the seed, =arg max (5) generates y random numbers,and outputs the yth number. k The transmission from all participating tags forms a bitmap Since the maximum is not affected by monotone Bi. transformations,we take the logarithm of both sides 894
view, all slots are shared by all groups uniformly at random (through independent hashing), which means the noise is uniformly distributed in the whole time frame. Such uniform noise is measurable. To estimate the size of a group, we will use the logical bitmap of that group, but subtract the noise that other groups introduce into the bitmap. To further improve performance, the reader repeats the above approach multiple times to gather multiple independent logical bitmaps for each group, and estimation based on multiple logical bitmaps reduces the variance of the result. B. Overview Our threshold-based classification (TBC) protocol consists of three phases: the parameter-precomputing phase, the frame phase, and the report phase. The parameter-precomputing phase computes system parameters for optimal performance of the protocol. Using these parameters, the frame phase makes w(≥ 1) polling requests, each of them followed by a time frame, during which tags of all groups transmit in selected slots. The reader converts each time frame into a bitmap, from which logical bitmaps are extracted. Using these logical bitmaps, the report phase employs the Maximum Likelihood Estimation (MLE) method to report the abovethreshold groups. There are three system parameters: 1) w is the number of pollings (or time frames), 2) f is the size of each time frame, i.e., the number of slots in a frame or the number of bits in the bitmap that the frame is converted to, and 3) m is the size of each logical bitmap. Clearly, m<f. The execution time of TBC is dominated by the w time frames, which have w × f time slots in total. We want to find the optimal values for w, f and m such that the constraints in (1) are met and the value of w × f is minimized. Before presenting the parameter-precomputing phase for how the optimal system parameters are computed, we will first describe the frame phase and the report phase because deriving the formulas for optimal system parameters relies on the knowledge of how the frame phase works. C. Frame Phase The frame phase is composed of w pollings, which are performed in a similar way: In the ith polling (where 1 ≤ i ≤ w), an RFID reader first broadcasts a request message, including a random number ri and the system parameters, f and m. The request message also serves the purpose of synchronizing the clocks of all tags for starting a time frame of f slots right after the request. Consider an arbitrary tag t in an arbitrary group g. The tag computes a hash value H(gid F(ri, H(tid) mod m)) as the index of the time slot selected for its transmission, where H(·) is a hash function whose range is [0, f − 1], gid is the tag’s group ID, tid is the tag’s member ID, and F(x, y) is a pseudo-random number function that takes two input parameters: x and y. F(x, y) uses x as the seed, generates y random numbers, and outputs the yth number. The transmission from all participating tags forms a bitmap Bi. Clearly, for tags of group g, the indices of their selected slots in the frame can only be H(gid F(ri, 0)), H(gid F(ri, 1)), ... , H(gid F(ri, m − 1)). These slots or more precisely, the bits converted from these slots, form the logical bitmap of g, denoted as LBi(g). Note that the value of H(tid) mod m gives the index of the corresponding bit in the logical bitmap. For example, if a tag selects the H(gid F(ri, H(tid) mod m))th slot to transmit, the (H(tid) mod m)th bit in the logical bitmap will be set to ‘1’. Essentially, we embed the logical bitmaps of all in Bi. D. Report Phase After w pollings, the reader obtains w bitmaps, Bi, 1 ≤ i ≤ w. It sends the bitmaps to an offline data processing module. There, the logical bitmaps of each group is extracted. For an arbitrary group g, we extract a logical bitmap LBi(g) from Bi as follows: Set the jth bit of LBi(g) to be the H(gid F(ri, j))th bit in Bi, i.e., LBi(g)[j] = Bi[H(gid F(ri, j))], where 1 ≤ i ≤ w and 0 ≤ j ≤ m−1. Let xi be the number of zeros observed in LBi(g). Let n be the total number of tags in the system and k be the actual population of group g. Below we derive the formula to compute an estimate ˆ k of the population. Consider the ith polling in the frame phase and an arbitrary bit b in LBi(g). A tag in group g has a probability of 1 m to select this bit and set it to ‘1’ because the tag only sets one of the m bits in the logical bitmap of g. Any tag in other groups has a probability of 1 f to set this bit to ‘1’ due to dynamic slot sharing across the whole frame. Hence, the probability for b to remain zero is q = (1 − 1 f ) n−k(1 − 1 m) k. (2) Hence, the likelihood function Li for us to observe xi bits of zeros in LBi(g) is Li = ((1 − 1 f ) n−k(1 − 1 m) k) xi × (1 − (1 − 1 f ) n−k(1 − 1 m) k) m−xi . (3) The likelihood function L for us to observe all xi values, 1 ≤ i ≤ w, in the w logical bitmaps is L = w i=1 Li. That is, L = w j=1 ((1 − 1 f ) n−k(1 − 1 m) k) xi × (1 − (1 − 1 f ) n−k(1 − 1 m) k) m−xi . (4) We want to find an estimate ˆ k that maximizes L, namely, ˆ k = arg max{L} k . (5) Since the maximum is not affected by monotone transformations, we take the logarithm of both sides 2013 Proceedings IEEE INFOCOM 894