Probabilistic Optimal Tree Hopping for RFID Identification Muhammad Shahzad and Alex X.Liu Department of Computer Science and Engineering Michigan State University East Lansing,Michigan,USA [shahzadm,alexliu@cse.msu.edu ABSTRACT 1.INTRODUCTION Radio Frequency Identification (RFID)systems are widely used in various applications such as supply chain manage- 1.1 Background and Problem Statement ment,inventory control,and object tracking.Identifying As the cost of commercial RFID tags,which is as low as RFID tags in a given tag population is the most fundamental 5 cents per tag 21,has become negligible compared to the operation in RFID systems.While the Tree Walking(TW) prices of the products to which they are attached,RFID sys- protocol has become the industrial standard for identifying tems are being increasingly used in various applications such RFID tags,little is known about the mathematical nature as supply chain management [13],indoor localization [18], of this protocol and only some ad-hoc heuristics exist for 3D positioning [27],object tracking [17],inventory control, optimizing it.In this paper,first,we analytically model the electronic toll collection,and access control [5,16].For ex- TW protocol,and then using that model,propose the Tree ample,Walmart has started to use RFID tags to track jeans Hopping(TH)protocol that optimizes TW both theoreti- and underwear for better inventory control.An RFID sys- cally and practically.The key novelty of TH is to formulate tem consists of tags and readers.A tag is a microchip com- tag identification as an optimization problem and find the bined with an antenna in a compact package that has limited optimal solution that ensures the minimal average number computing power and communication range.There are two of queries.With this solid theoretical underpinning,for dif- types of tags:(1)passive tags,which do not have their own ferent tag population sizes ranging from 100 to 100K tags, power source,are powered up by harvesting the radio fre- TH significantly outperforms the best prior tag identifica- quency energy from readers,and have communication ranges tion protocols on the metrics of the total number of queries often less than 20 feet;(2)active tags.which come with their per tag,the total identification time per tag,and the average own power sources and have relatively longer communica- number of responses per tag by an average of 50%,10%,and tion ranges.A reader has a dedicated power source with 30%,respectively,when tag IDs are uniformly distributed in significant computing power.RFID systems mostly work in the ID space,and of 26%,37%,and 26%,respectively,when a query-response fashion where a reader transmits queries tag IDs are non-uniformly distributed. to a set of tags and the tags respond with their IDs over a shared wireless medium. Categories and Subject Descriptors This paper addresses the fundamental RFID tag identifi- cation problem,namely reading all IDs of a given set of tags. C.2.1 [Computer-Communication Networks):Network which is needed in almost all RFID systems.Because tags Architecture and Design -Wireless communication;C.2.8 respond over a shared wireless medium,tag identification Mobile Computing]:Algorithm Design and Analysis protocols are also called collision arbitration,tag singula- tion,or tag anti-collision protocols.Tag identification pro- tocols need to be scalable as the number of tags that need to General Terms be identified could be as large as tens of thousands with the Algorithms.Design,Performance,Experimentation increasing adoption of RFID tags.An RFID system with a large number of tags may require multiple readers with overlapping regions.In this paper,we first focus on the sin- Keywords gle reader version of the tag identification problem and then RFID:Tags:Identification extend our solution to the multiple reader problem. 1.2 Summary and Limitations of Prior Art The industrial standard.EPCGlobal Class 1 Generation 2 Permission to make digital or hard copies of all or part of this work for (C1G2)RFID 9,adopted two tag identification protocols personal or classroom use is granted without fee provided that copies are namely framed slotted Aloha and Tree Walking (TW).In not made or distributed for profit or commercial advantage and that copies framed slotted Aloha.a reader first broadcasts a value f to bear this notice and the full citation on the first page.To copy otherwise,to the tags in its vicinity where f represents the number of time republish,to post on servers or to redistribute to lists,requires prior specific permission and/or a fee. slots present in a forthcoming frame.Then each tag whose S/GMETR/CS'/3.June 17-21,2013.Pittsburgh,PA,USA inventory bit is 0 randomly picks a time slot in the frame Copyright2013ACM978-1-4503-1900-3/13/06..$15.00. and replies during that slot.Each C1G2 compliant tag has 293
Probabilistic Optimal Tree Hopping for RFID Identification Muhammad Shahzad and Alex X. Liu Department of Computer Science and Engineering Michigan State University East Lansing, Michigan, USA {shahzadm, alexliu}@cse.msu.edu ABSTRACT Radio Frequency Identification (RFID) systems are widely used in various applications such as supply chain management, inventory control, and object tracking. Identifying RFID tags in a given tag population is the most fundamental operation in RFID systems. While the Tree Walking (TW) protocol has become the industrial standard for identifying RFID tags, little is known about the mathematical nature of this protocol and only some ad-hoc heuristics exist for optimizing it. In this paper, first, we analytically model the TW protocol, and then using that model, propose the Tree Hopping (TH) protocol that optimizes TW both theoretically and practically. The key novelty of TH is to formulate tag identification as an optimization problem and find the optimal solution that ensures the minimal average number of queries. With this solid theoretical underpinning, for different tag population sizes ranging from 100 to 100K tags, TH significantly outperforms the best prior tag identification protocols on the metrics of the total number of queries per tag, the total identification time per tag, and the average number of responses per tag by an average of 50%, 10%, and 30%, respectively, when tag IDs are uniformly distributed in the ID space, and of 26%, 37%, and 26%, respectively, when tag IDs are non-uniformly distributed. Categories and Subject Descriptors C.2.1 [Computer-Communication Networks]: Network Architecture and Design – Wireless communication; C.2.8 [Mobile Computing]: Algorithm Design and Analysis General Terms Algorithms, Design, Performance, Experimentation Keywords RFID; Tags; Identification Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. SIGMETRICS’13, June 17-21, 2013, Pittsburgh, PA, USA. Copyright 2013 ACM 978-1-4503-1900-3/13/06 ...$15.00. 1. INTRODUCTION 1.1 Background and Problem Statement As the cost of commercial RFID tags, which is as low as 5 cents per tag [21], has become negligible compared to the prices of the products to which they are attached, RFID systems are being increasingly used in various applications such as supply chain management [13], indoor localization [18], 3D positioning [27], object tracking [17], inventory control, electronic toll collection, and access control [5, 16]. For example, Walmart has started to use RFID tags to track jeans and underwear for better inventory control. An RFID system consists of tags and readers. A tag is a microchip combined with an antenna in a compact package that has limited computing power and communication range. There are two types of tags: (1) passive tags, which do not have their own power source, are powered up by harvesting the radio frequency energy from readers, and have communication ranges often less than 20 feet; (2) active tags, which come with their own power sources and have relatively longer communication ranges. A reader has a dedicated power source with significant computing power. RFID systems mostly work in a query-response fashion where a reader transmits queries to a set of tags and the tags respond with their IDs over a shared wireless medium. This paper addresses the fundamental RFID tag identifi- cation problem, namely reading all IDs of a given set of tags, which is needed in almost all RFID systems. Because tags respond over a shared wireless medium, tag identification protocols are also called collision arbitration, tag singulation, or tag anti-collision protocols. Tag identification protocols need to be scalable as the number of tags that need to be identified could be as large as tens of thousands with the increasing adoption of RFID tags. An RFID system with a large number of tags may require multiple readers with overlapping regions. In this paper, we first focus on the single reader version of the tag identification problem and then extend our solution to the multiple reader problem. 1.2 Summary and Limitations of Prior Art The industrial standard, EPCGlobal Class 1 Generation 2 (C1G2) RFID [9], adopted two tag identification protocols, namely framed slotted Aloha and Tree Walking (TW). In framed slotted Aloha, a reader first broadcasts a value f to the tags in its vicinity where f represents the number of time slots present in a forthcoming frame. Then each tag whose inventory bit is 0 randomly picks a time slot in the frame and replies during that slot. Each C1G2 compliant tag has 293
(a)Nodes visited using TW protocol (b)Nodes visited using TH protocol Figure 1:Identifying a population of 9 tags over ID space of 2 using Tree Walking and Tree Hopping an inventory bit,which is initialized to be 0.In any slot, nodes in the binary tree.Although several heuristics have if exactly one tag responds,the reader successfully gets the been proposed to reduce the number of visits to collision ID of that tag and issues a command to the tag to change nodes 15,19,all these heuristics based methods are not its inventory bit to 1.The key limitation of framed slotted guaranteed to minimize such futile visits.Prior Aloha-TW Aloha is that it can not identify large tag populations due to hybrid protocols also have this limitation. the finite possible size of f,which is typically no more than 512 [23].Qian et al.have shown that framed slotted Aloha 1.3 System Model is most efficient when f is equal to the number of tags 20 As most commercially available tags and readers already Therefore.although theoretically any arbitrarily large tag comply with the C1G2 standard,we do not assume changes population can be identified by indefinitely increasing the to either tags or the physical protocol that they use to com- frame size,practically this is infeasible because during the municate with readers.We assume that readers can be repro- entire identification process,Aloha based protocols require grammed to adopt new tag identification software.For reli- all tags,including those that have been identified,to stay able tag identification,we are given the probability of suc- powered up and listen to all the messages from the reader in cessful query-response communication between the reader order to maintain the value of the inventory bit.This results and a tag. in high instability because any intermittent loss of power at a tag will set its inventory bit back to 0,leading the tag to 1.4 Proposed Approach contend in the subsequent frame.The instability of Aloha To address the fundamental limitations that lie in the based protocols has formally been proven by Rosenkrantz heuristic nature of prior TW based protocols,we propose and Towsley in [22]. a new approach to tag identification called Tree Hopping TW is a fundamental multiple access protocol,which was (TH).The key novel idea of TH is to formulate the tag iden- first invented by U.S.Army for testing soldiers for syphilis tification problem as an optimization problem and find the during World War II [4].TW was proposed as an RFID optimal solution that ensures the minimal expected number tag identification protocol by Law et al.in [12].In TW,a of queries(i.e.,nodes visited on the binary tree).In TH,we reader first queries 0 and all the tags whose IDs start with first quickly estimate the tag population size.Second,based 0 respond.If result of the query is a successful read (i.e., on the estimated tag population size,we calculate the opti- exactly one tag responds)or an empty read (i.e.,no tag mal level to start tree traversal so that the expected number responds),the reader queries 1 and all the tags whose IDs of queries is minimal,hop directly to the left most node start with 1 respond.If the result of the query is a collision, on that level,and then perform DFT on the subtree rooted the reader generates two new query strings by appending a at that node.Third,after that subtree is traversed,we re 0 and a 1 to the previous query string and queries the tags estimate the size of remaining unidentified tag population. with these new query strings.All the tags whose IDs start re-calculate the new optimal level,hop directly to the new with the new query string respond.This process continues optimal node,and perform DFT on the subtree rooted at until all the tags have been identified.This identification that node.Hopping to optimal nodes in this manner skips a process is essentially a partial Depth First Traversal(DFT) large number of collision nodes.This process continues un- on the complete binary tree over the tag ID space,and the til all the tags have been identified.Figure 1(b)shows the actual traversal forms a binary tree where the leaf nodes nodes traversed by TH for the same population of 9 tags as represent successful or empty reads and the internal nodes in Figure 1(a).Here a skipped node is one that TW visits represent collisions.Figure 1(a)shows the tree walking pro- but TH does not.We can see that TH traverses 11 nodes to cess for identifying 9 tags over a tag ID space of size 2.Here identify these 9 tags.In comparison,TW traverses 16 nodes a successful read node is one that an identification proto- as shown in Figure 1(a).This difference scales significantly col visits and there is exactly one tag in the subtree rooted as tag population size increases. at this node,an empty read node is one that an identifica- tion protocol visits and there is no tag in the subtree rooted 1.4.1 Population Size Estimation at this node,and a collision node is one that an identifi- TH first uses a framed slotted Aloha based method to cation protocol visits and there are more than one tags in quickly estimate the tag population size.For this,TH re- the subtree rooted at this node.The key limitation of TW quires each tag to respond to the reader with a probability based protocols is that they visit a large number of collision q.As C1G2 compliant tags do not support this probabilistic 294
(3,0) (3,3) (3,4) (3,5) (3,6) (3,7) 11 12 14 15 6 (3,2) 3 4 7 (3,1) 10 13 2 5 9 16 1 8 1 (2,0) (2,1) (2,2) (2,3) (4,0) (1,0) (1,1) (4,1) (4,2) (4,3) (4,4) (4,5) (4,6) (4,7) (4,8) (4,9) (4,10) (4,11) (4,12) (4,13) (4,14) (4,15) Tag Skipped Node (0,0) 0 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 (3,0) (3,3) (3,4) (3,5) (3,6) (3,7) 6 7 9 10 3 (3,2) 1 2 4 (3,1) 5 8 11 (2,0) (2,1) (2,2) (2,3) (4,0) (1,0) (1,1) (4,1) (4,2) (4,3) (4,4) (4,5) (4,6) (4,7) (4,8) (4,9) (4,10) (4,11) (4,12) (4,13) (4,14) (4,15) Empty Read Collision Successful Read (0,0) 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 0 1 0 1 0 1 (a) Nodes visited using TW protocol (b) Nodes visited using TH protocol 1 Figure 1: Identifying a population of 9 tags over ID space of 2 4 using Tree Walking and Tree Hopping an inventory bit, which is initialized to be 0. In any slot, if exactly one tag responds, the reader successfully gets the ID of that tag and issues a command to the tag to change its inventory bit to 1. The key limitation of framed slotted Aloha is that it can not identify large tag populations due to the finite possible size of f, which is typically no more than 512 [23]. Qian et al. have shown that framed slotted Aloha is most efficient when f is equal to the number of tags [20]. Therefore, although theoretically any arbitrarily large tag population can be identified by indefinitely increasing the frame size, practically this is infeasible because during the entire identification process, Aloha based protocols require all tags, including those that have been identified, to stay powered up and listen to all the messages from the reader in order to maintain the value of the inventory bit. This results in high instability because any intermittent loss of power at a tag will set its inventory bit back to 0, leading the tag to contend in the subsequent frame. The instability of Aloha based protocols has formally been proven by Rosenkrantz and Towsley in [22]. TW is a fundamental multiple access protocol, which was first invented by U.S. Army for testing soldiers for syphilis during World War II [4]. TW was proposed as an RFID tag identification protocol by Law et al. in [12]. In TW, a reader first queries 0 and all the tags whose IDs start with 0 respond. If result of the query is a successful read (i.e., exactly one tag responds) or an empty read (i.e., no tag responds), the reader queries 1 and all the tags whose IDs start with 1 respond. If the result of the query is a collision, the reader generates two new query strings by appending a 0 and a 1 to the previous query string and queries the tags with these new query strings. All the tags whose IDs start with the new query string respond. This process continues until all the tags have been identified. This identification process is essentially a partial Depth First Traversal (DFT) on the complete binary tree over the tag ID space, and the actual traversal forms a binary tree where the leaf nodes represent successful or empty reads and the internal nodes represent collisions. Figure 1(a) shows the tree walking process for identifying 9 tags over a tag ID space of size 24 . Here a successful read node is one that an identification protocol visits and there is exactly one tag in the subtree rooted at this node, an empty read node is one that an identification protocol visits and there is no tag in the subtree rooted at this node, and a collision node is one that an identifi- cation protocol visits and there are more than one tags in the subtree rooted at this node. The key limitation of TW based protocols is that they visit a large number of collision nodes in the binary tree. Although several heuristics have been proposed to reduce the number of visits to collision nodes [15, 19], all these heuristics based methods are not guaranteed to minimize such futile visits. Prior Aloha-TW hybrid protocols also have this limitation. 1.3 System Model As most commercially available tags and readers already comply with the C1G2 standard, we do not assume changes to either tags or the physical protocol that they use to communicate with readers. We assume that readers can be reprogrammed to adopt new tag identification software. For reliable tag identification, we are given the probability of successful query-response communication between the reader and a tag. 1.4 Proposed Approach To address the fundamental limitations that lie in the heuristic nature of prior TW based protocols, we propose a new approach to tag identification called Tree Hopping (TH). The key novel idea of TH is to formulate the tag identification problem as an optimization problem and find the optimal solution that ensures the minimal expected number of queries (i.e., nodes visited on the binary tree). In TH, we first quickly estimate the tag population size. Second, based on the estimated tag population size, we calculate the optimal level to start tree traversal so that the expected number of queries is minimal, hop directly to the left most node on that level, and then perform DFT on the subtree rooted at that node. Third, after that subtree is traversed, we reestimate the size of remaining unidentified tag population, re-calculate the new optimal level, hop directly to the new optimal node, and perform DFT on the subtree rooted at that node. Hopping to optimal nodes in this manner skips a large number of collision nodes. This process continues until all the tags have been identified. Figure 1(b) shows the nodes traversed by TH for the same population of 9 tags as in Figure 1(a). Here a skipped node is one that TW visits but TH does not. We can see that TH traverses 11 nodes to identify these 9 tags. In comparison, TW traverses 16 nodes as shown in Figure 1(a). This difference scales significantly as tag population size increases. 1.4.1 Population Size Estimation TH first uses a framed slotted Aloha based method to quickly estimate the tag population size. For this, TH requires each tag to respond to the reader with a probability q. As C1G2 compliant tags do not support this probabilistic 294
responding,we implement this by"virtually"extending the tree of (2,2),which is wasteful as all the tags in the subtree frame sizetimes.To estimate the tag population size,the of(2,2)have already been identified.Similarly,if there had reader announces a frame size of I but terminates it after the been a third leftmost node on the new optimal level and if first slot.The reader issues several single-slot frames while TH starts the DFT from that third left most node.it will not reducing q with a geometric distribution (i.e.,q=in visit the subtree of(2,3),resulting in tag (4,13)not being the i-th frame)until the reader gets an empty slot.Suppose identified.To avoid both scenarios,i.e.,some subtrees being the empty slot occurred in the i-th frame,TH estimates the traversed multiple times and some subtrees with tags not tag population size to be 1.2897 x 2-2 based on Flajolet being traversed,after the optimal level is recalculated,TH and Martin's algorithm used in databases 6. hops to the root of the largest subtree that can contain the next tag to be identified but does not contain any previously 1.4.2 Finding Optimal Level identified tag.The level at which this root is located can not To determine the optimal level Yop that TH directly hops be smaller than the new optimal level.For the example in to,we first calculate the expected number of nodes that Figure 1(b),after the subtree rooted at node (2,2)has been TH will visit if it starts DFTs from nodes on any given traversed,the recalculated optimal level is 1 and the next level y.Let b be the number of bits in each tag ID(which node that TH hops to is(2,3). is 64 for C1G2 compliant tags),then,we have 1 <y< Our experimental results in Figure 2 show that when the b.If y is small,more collision nodes will be visited while tags are not uniformly distributed in the ID space,our tech- if it is large,more empty read nodes will be visited.Our nique of dynamically adjusting Yp according to the left- objective is to calculate an optimal level Yop that will result over population size significantly reduces the total number in the smallest number of nodes visited.To find Yop,we first of queries and the average number of responses per tag. derive the expression for calculating the expected number The two curves "TH w re-estimation-Seg"and "TH w/o re of nodes visited by TH if TH directly hops to level y.Then estimation-Seg"show the total number of queries needed. we calculate the value of which minimizes this expression. respectively,with and without the dynamic adjustment of This value of y is the value of optimal level Yp.We present Yop for non-uniformly distributed tag IDs.For example,for the technical details of finding Yop in Section 3. 10K tags,this dynamic level adjustment reduces the total number of queries by 31.5%.Our experimental results in Fig- 1.4.3 Population Size Re-estimation ure 2 also show that when the tags are uniformly distributed If the tags that we want to identify are uniformly dis- in the ID space,there is no need to dynamically adjust Yop. tributed in the ID space [0,2-1],then performing DFTs The two curves“TH w re-estimation-Uni”and“THw/ore from each node on level Yop will result in minimum num- estimation-Uni"show the total number of queries needed. ber of nodes visited.However,in reality,the tags may not respectively,with and without the dynamic adjustment for be uniformly distributed.In such cases,each time when the uniformly distributed tag IDs.These two curves are quite DFT of a subtree is finished,TH needs to re-estimate the close because for uniformly distributed tag IDs,Yop does total tag population size to find the next optimal level and not usually change after each DFT and thus the benefit of the hoping destination node.TH performs the re-estimation dynamically adjusting Yop is relatively small. as follows.Let z be the first tag population size estimated using the Aloha based method,c be the number of tags that have been identified,and s be the size of the tag ID space TH w/o re-estimation-Seq OTH w re-estimation-Sec covered by the nodes visited.Naturally,z-r is an estimate ATH w/o re-estimation-Un of the remaining tag population size;however,we cannot use TH w re-estimation-Unl this estimate to calculate the next optimal level because the remaining leftover ID space may not form a complete binary tree.Instead,based on the node density in the remaining ID space,TH extrapolates the total tag population size to be and uses it to find the next hopping destination node.Note that if tags are uniformly distributed,we have 10 10 ×2=2 #18s 1.4.4 Finding Hopping Destination Figure 2:Impact of re-estimation Each time after a DFT is done and the new optimal level is recalculated,TH needs to find the next node to hop to which may not be the leftmost node on the optimal level 2. RELATED WORK Consider the example shown in Figure 1(b).Assuming a uni- We review existing tag identification protocols,which can form distribution,the optimal level to start the DFT is 3.In be classified as nondeterministic,deterministic.or hybrid this paper,we use(,p)to denote the ph node on level l.TH protocols. performs DFTs on the subtrees of nodes (3,0)to (3,5)and identifies 8 out of 9 tags.Based on the number of remaining 2.1 Nondeterministic Identification Protocols tags after the last DFT,which is 1,the optimal level for the Existing such protocols are either based on framed slot- next hop is changed from 3 to 1.However,if TH starts the ted Aloha [28]or Binary Splitting(BS)[2].As we discussed DFT from the leftmost node on level 1,which is (1,0),it above.Aloha based protocols only work for small tag popu- will result in identifying all tags in its subtree again which lations.In BS[2],the identification process starts with the is wasteful.Similarly,if TH starts the DFT from the second reader asking the tags to respond.If more than one tag eftmost node on level 1,which is (1,1),it will visit the sub- responds,BS divides and subdivides the population into 295
responding, we implement this by “virtually” extending the frame size 1 q times. To estimate the tag population size, the reader announces a frame size of 1 q but terminates it after the first slot. The reader issues several single-slot frames while reducing q with a geometric distribution (i.e., q = 1 2i−1 in the i-th frame) until the reader gets an empty slot. Suppose the empty slot occurred in the i-th frame, TH estimates the tag population size to be 1.2897 × 2 i−2 based on Flajolet and Martin’s algorithm used in databases [6]. 1.4.2 Finding Optimal Level To determine the optimal level γop that TH directly hops to, we first calculate the expected number of nodes that TH will visit if it starts DFTs from nodes on any given level γ. Let b be the number of bits in each tag ID (which is 64 for C1G2 compliant tags), then, we have 1 ≤ γ ≤ b. If γ is small, more collision nodes will be visited while if it is large, more empty read nodes will be visited. Our objective is to calculate an optimal level γop that will result in the smallest number of nodes visited. To find γop, we first derive the expression for calculating the expected number of nodes visited by TH if TH directly hops to level γ. Then we calculate the value of γ which minimizes this expression. This value of γ is the value of optimal level γop. We present the technical details of finding γop in Section 3. 1.4.3 Population Size Re-estimation If the tags that we want to identify are uniformly distributed in the ID space [0, 2 b − 1], then performing DFTs from each node on level γop will result in minimum number of nodes visited. However, in reality, the tags may not be uniformly distributed. In such cases, each time when the DFT of a subtree is finished, TH needs to re-estimate the total tag population size to find the next optimal level and the hoping destination node. TH performs the re-estimation as follows. Let z be the first tag population size estimated using the Aloha based method, x be the number of tags that have been identified, and s be the size of the tag ID space covered by the nodes visited. Naturally, z − x is an estimate of the remaining tag population size; however, we cannot use this estimate to calculate the next optimal level because the remaining leftover ID space may not form a complete binary tree. Instead, based on the node density in the remaining ID space, TH extrapolates the total tag population size to be z−x 2b−s × 2 b and uses it to find the next hopping destination node. Note that if tags are uniformly distributed, we have z−x 2b−s × 2 b = z. 1.4.4 Finding Hopping Destination Each time after a DFT is done and the new optimal level is recalculated, TH needs to find the next node to hop to, which may not be the leftmost node on the optimal level. Consider the example shown in Figure 1(b). Assuming a uniform distribution, the optimal level to start the DFT is 3. In this paper, we use (l, p) to denote the p th node on level l. TH performs DFTs on the subtrees of nodes (3, 0) to (3, 5) and identifies 8 out of 9 tags. Based on the number of remaining tags after the last DFT, which is 1, the optimal level for the next hop is changed from 3 to 1. However, if TH starts the DFT from the leftmost node on level 1, which is (1, 0), it will result in identifying all tags in its subtree again which is wasteful. Similarly, if TH starts the DFT from the second leftmost node on level 1, which is (1, 1), it will visit the subtree of (2, 2), which is wasteful as all the tags in the subtree of (2, 2) have already been identified. Similarly, if there had been a third leftmost node on the new optimal level and if TH starts the DFT from that third left most node, it will not visit the subtree of (2, 3), resulting in tag (4, 13) not being identified. To avoid both scenarios, i.e., some subtrees being traversed multiple times and some subtrees with tags not being traversed, after the optimal level is recalculated, TH hops to the root of the largest subtree that can contain the next tag to be identified but does not contain any previously identified tag. The level at which this root is located can not be smaller than the new optimal level. For the example in Figure 1(b), after the subtree rooted at node (2, 2) has been traversed, the recalculated optimal level is 1 and the next node that TH hops to is (2, 3). Our experimental results in Figure 2 show that when the tags are not uniformly distributed in the ID space, our technique of dynamically adjusting γop according to the leftover population size significantly reduces the total number of queries and the average number of responses per tag. The two curves “TH w re-estimation-Seq” and “TH w/o reestimation-Seq” show the total number of queries needed, respectively, with and without the dynamic adjustment of γop for non-uniformly distributed tag IDs. For example, for 10K tags, this dynamic level adjustment reduces the total number of queries by 31.5%. Our experimental results in Figure 2 also show that when the tags are uniformly distributed in the ID space, there is no need to dynamically adjust γop. The two curves “TH w re-estimation-Uni” and “TH w/o reestimation-Uni” show the total number of queries needed, respectively, with and without the dynamic adjustment for uniformly distributed tag IDs. These two curves are quite close because for uniformly distributed tag IDs, γop does not usually change after each DFT and thus the benefit of dynamically adjusting γop is relatively small. 102 103 104 1 1.5 2 2.5 3 # Tags # Queries / # Tags TH w/o re−estimation − Seq TH w re−estimation − Seq TH w/o re−estimation − Uni TH w re−estimation − Uni Figure 2: Impact of re-estimation 2. RELATED WORK We review existing tag identification protocols, which can be classified as nondeterministic, deterministic, or hybrid protocols. 2.1 Nondeterministic Identification Protocols Existing such protocols are either based on framed slotted Aloha [28] or Binary Splitting (BS) [2]. As we discussed above, Aloha based protocols only work for small tag populations. In BS [2], the identification process starts with the reader asking the tags to respond. If more than one tag responds, BS divides and subdivides the population into 295
smaller groups until each group has only one or no tag. This process of random subdivision incurs a lot of collisions. Table 1:Notations Furthermore.BS requires the tags to perform operations Symbol Description that are not supported by the C1G2 standard.ABS is a BS b of bits in tag ID,which is 64 for C1G2 tags based protocol that is designed for continuous identification n size of the whole ID space,which is 20 of tags 14. node whose top-to-down vertical level is (L,p) 1≤l≤b and left-.to-right horizontal position 2.2 Deterministic Identification Protocols is0≤p≤2b-1 estimated number of tags in the population There are 3 such protocols:(1)the basic TW protocol 12 level from which TH performs DFTs (2)the Adaptive Tree Walking (ATW)protocol 24,and (3) Yop optimal level to perform DFTs the TW-based Smart Trend Traversal (STT)protocol 19 0 tag response probability used in estimation ATW is an optimized version of TW that always starts DFTs I(L.p) indicator random variable,1 if (l,p)is visited from the level of log z,where z is the size of tag population. T random variable for total of nodes visited This is the traditional wisdom for optimizing TW.The key ET expected of nodes visited to identify all tags limitation of ATW is that it is optimal only when all tag in the population IDs are evenly spaced in the ID space;however,this is often Pi(t.p) probability of visiting (l,p)if it is left child Pri.p) probability of visiting (l.p)if it is right child not true in real-world applications.In contrast,during the identification process,our TH protocol adaptively chooses size of ID space covered by the parent of the current node being visited the optimal level to hop to based on distribution of IDs.STT improves TW using some ad-hoc heuristics to select prefixes 不 of tags covered by the parent of the current node being visited for next queries based upon the type of response to previous Ps:Pe probabilities of successful read,collision.or queries.It assumes that the number of tags identified in the P empty read at parent of the current node past k queries is the same as the number of tags that will B repetitions of query to handle unreliable channel be identified in the next k queries.This may not be true in 9 actual probability of reading a tag u reality. required probability of reading a tag maximum of nodes visited to identify all tags 2.3 Hybrid Identification Protocols 0o of subtrees covering no tags 61 of subtrees covering 1 tag Hybrid protocols combine features from nondeterministic and deterministic protocols.There are two major such pro tocols:Multi slotted scheme with Assigned Slots (MAS)[15 3.1 Average Number of Queries and Adaptively Splitting-based Arbitration Protocol (ASAP) 20.MAS is a TW-based protocol in which each tag that Let random variable T denote the total number of nodes matches the reader's query picks up one of the f time slots that TH visits to identify all tags.Note that each node visit to respond.For large populations,due to the finite prac- corresponds to one reader query.We next calculate E[T. tical size of f 512,for queries corresponding to higher Let I(l.p)be an indicator random variable whose value is levels in the binary tree,the response in each of the f slots 1 if and only if node (I,p)is visited.Thus,T is the sum of is most likely a collision.which increases the identification I(I.p)for all l and all p. time.ASAP divides and subdivides the tag population until 14 the size of each subset is below a certain threshold and then T I(L.p) (1) applies Aloha on each subset.For this,ASAP requires tags l=1p=0 to be able to pick slots using a geometric distribution,which makes it incompliant with the C1G2 standard.Furthermore, Let P[(l,p)}be the probability that TH visits node (l,p). subdividing the population before the actual identification Thus,ET can be expressed as follows: is in itself very time consuming. b2-1 3. OPTIMAL TREE HOPPING P{(Lp)} (2) l=1p=0 After quick population size estimation using Flajolet and Martin's algorithm [6],TH needs to find the optimal level Next,we focus on expressing P{(,p)}using variable 7, to hop to.First,we derive an expression to calculate the where y denotes the level that TH hops to.Recall that TH expected number of queries (i.e.,the number of nodes that skips all nodes on levels from 1 to y-1 and performs DET TH will visit)if it starts DFTs from the nodes on level y,as- on each of the2'nodes on level,where1≤y≤b. suming that tags are uniformly distributed in the ID space. Note that the root node of the whole binary tree is al- Second,as the derived expression is too complex to calculate ways meaningless to visit as it corresponds to a query of the optimal value of y that minimizes the expected number length 0.Here P{(l,p)}is calculated differently depending of queries by simply differentiating the expression with re- on whether node (1,p)is the left child of its parent or the spect to y,we present a numerical method to calculate the right.Let P{(,p)}and P{(l,p)}denote the probability optimal level Yop.If tags are not uniformly distributed,each of visiting (I,p)when (l,p)is the left and right child of its time when the DFT on a node is completed,as stated in Sec- parent,respectively.If the estimated total number of tags tion 1.4.TH re-estimates the total population size based on z is zero,then P{(l,p))=P(l,p)}=0 for all I and p. the initial estimate and the number of tags that have been Below we assume z>0.As TH skips all nodes from levels identified,re-calculates the new optimal level,and finds the 1 to y-1,we have hopping destination node.Table 1 summarizes the symbols P(,p)=P,p)=0if1<I<Y (3) used in this paper. 296
smaller groups until each group has only one or no tag. This process of random subdivision incurs a lot of collisions. Furthermore, BS requires the tags to perform operations that are not supported by the C1G2 standard. ABS is a BS based protocol that is designed for continuous identification of tags [14]. 2.2 Deterministic Identification Protocols There are 3 such protocols: (1) the basic TW protocol [12], (2) the Adaptive Tree Walking (ATW) protocol [24], and (3) the TW-based Smart Trend Traversal (STT) protocol [19]. ATW is an optimized version of TW that always starts DFTs from the level of log z, where z is the size of tag population. This is the traditional wisdom for optimizing TW. The key limitation of ATW is that it is optimal only when all tag IDs are evenly spaced in the ID space; however, this is often not true in real-world applications. In contrast, during the identification process, our TH protocol adaptively chooses the optimal level to hop to based on distribution of IDs. STT improves TW using some ad-hoc heuristics to select prefixes for next queries based upon the type of response to previous queries. It assumes that the number of tags identified in the past k queries is the same as the number of tags that will be identified in the next k queries. This may not be true in reality. 2.3 Hybrid Identification Protocols Hybrid protocols combine features from nondeterministic and deterministic protocols. There are two major such protocols: Multi slotted scheme with Assigned Slots (MAS) [15] and Adaptively Splitting-based Arbitration Protocol (ASAP) [20]. MAS is a TW-based protocol in which each tag that matches the reader’s query picks up one of the f time slots to respond. For large populations, due to the finite practical size of f ≤ 512, for queries corresponding to higher levels in the binary tree, the response in each of the f slots is most likely a collision, which increases the identification time. ASAP divides and subdivides the tag population until the size of each subset is below a certain threshold and then applies Aloha on each subset. For this, ASAP requires tags to be able to pick slots using a geometric distribution, which makes it incompliant with the C1G2 standard. Furthermore, subdividing the population before the actual identification is in itself very time consuming. 3. OPTIMAL TREE HOPPING After quick population size estimation using Flajolet and Martin’s algorithm [6], TH needs to find the optimal level to hop to. First, we derive an expression to calculate the expected number of queries (i.e., the number of nodes that TH will visit) if it starts DFTs from the nodes on level γ, assuming that tags are uniformly distributed in the ID space. Second, as the derived expression is too complex to calculate the optimal value of γ that minimizes the expected number of queries by simply differentiating the expression with respect to γ, we present a numerical method to calculate the optimal level γop. If tags are not uniformly distributed, each time when the DFT on a node is completed, as stated in Section 1.4, TH re-estimates the total population size based on the initial estimate and the number of tags that have been identified, re-calculates the new optimal level, and finds the hopping destination node. Table 1 summarizes the symbols used in this paper. Table 1: Notations Symbol Description b # of bits in tag ID, which is 64 for C1G2 tags n size of the whole ID space, which is 2b (l, p) node whose top-to-down vertical level is 1 ≤ l ≤ b and left-to-right horizontal position is 0 ≤ p ≤ 2 b − 1 z estimated number of tags in the population γ level from which TH performs DFTs γop optimal level to perform DFTs q tag response probability used in estimation I(l, p) indicator random variable, 1 if (l, p) is visited T random variable for total # of nodes visited E[T] expected # of nodes visited to identify all tags in the population Pl {(l, p)} probability of visiting (l, p) if it is left child Pr {(l, p)} probability of visiting (l, p) if it is right child m size of ID space covered by the parent of the current node being visited k # of tags covered by the parent of the current node being visited Ps, Pc, probabilities of successful read, collision, or Pe empty read at parent of the current node β repetitions of query to handle unreliable channel g actual probability of reading a tag u required probability of reading a tag V maximum # of nodes visited to identify all tags θ0 # of subtrees covering no tags θ1 # of subtrees covering 1 tag 3.1 Average Number of Queries Let random variable T denote the total number of nodes that TH visits to identify all tags. Note that each node visit corresponds to one reader query. We next calculate E[T ]. Let I(l, p) be an indicator random variable whose value is 1 if and only if node (l, p) is visited. Thus, T is the sum of I(l, p) for all l and all p. T = Xb l=1 2Xl−1 p=0 I(l, p) (1) Let P {(l, p)} be the probability that TH visits node (l, p). Thus, E[T ] can be expressed as follows: E[T ] = Xb l=1 2Xl−1 p=0 P {(l, p)} (2) Next, we focus on expressing P {(l, p)} using variable γ, where γ denotes the level that TH hops to. Recall that TH skips all nodes on levels from 1 to γ − 1 and performs DFT on each of the 2γ nodes on level γ, where 1 ≤ γ ≤ b. Note that the root node of the whole binary tree is always meaningless to visit as it corresponds to a query of length 0. Here P {(l, p)} is calculated differently depending on whether node (l, p) is the left child of its parent or the right. Let Pl {(l, p)} and Pr {(l, p)} denote the probability of visiting (l, p) when (l, p) is the left and right child of its parent, respectively. If the estimated total number of tags z is zero, then Pl {(l, p)} = Pr {(l, p)} = 0 for all l and p. Below we assume z > 0. As TH skips all nodes from levels 1 to γ − 1, we have Pl {(l, p)} = Pr {(l, p)} = 0 if 1 ≤ l < γ (3) 296
As TH performs DFT from each node on level y,it visits n-罗>z-1,then the probability that TH visits(亿,p)is each node on this level.Thus,we have equal to the probability that (l,p-1)is not an empty read, B{(L,p)}=P,{(l,p)}=1ifl=Y (4) which is 1-()/(2)based on Equation (6).Finally,we have For each remaining level y<I<b,when (l,p)is the left child of its parent,P(l,p)is equal to the probability that the parent of(l,p)is a collision node.When (I,p)is the P.{(L,p) ifn-受>z-1 (10) right child of its parent,if the parent is a collision node ifn-受≤z-1 and (l,p-1)is an empty read node,then (p)will also be a collision node.Thus,instead of visiting (p),TH should directly hop to the left child of(,p).Therefore,P(l,p)}is Case 2:n-m=z-1. equal to the probability that the parent of(I,p)is a collision In this case,.z-k≤n-m=z-l,which means k≥1. node and (l,p-1)is not an empty read node. As the parent of(l,p)covers k 1 tags,the probability Let k denote the number of tags covered by the parent of of the parent of(l,p)being an empty read is 0 and the node (l,p)(i.e.,the number of tags that are in the subtree rooted at the parent of (,p).Let mdenote the probability of the parent of(l,p)being a successful read is m(-T)/(g)=m(-)/(g)=m/(g)based on Equation maximum number of tags that the parent of(亿,p)can cover (7).If (p)is the left child of its parent,then TH visits it and n=2 denote the number of tags in the whole ID space. if and only if the parent of(I,p)is a collision node.Thus, The probability that the parent of(l,p)covers k of z tags the probability of visiting(l,p)is equal to the probability of follows a hypergeometric distribution: the parent of(l,p)being a collision node,which is equal to )二 1-P-P.Thus,we have P(#tags=k}= () (5) A{,p}=1-P-B=1-百 n (11) Let P be the probability that the parent of(l,p)is an empty read.Thus, If (l,p)is the right child of its parent,then TH visits it if and only if both the parent of (I,p)is a collision node and (1,p-1) n一m P.=P{#tags =0}= (6) is not an empty read.The probability that the parent of(,p) is a collision node is 1-m/()as calculated above.Given that the parent of (l,p)is a collision node,the probability Let P be the probability that the parent of(l,p)is a suc- cessful read.Thus. that(亿,p-1)is an empty read is(-)-受)/(g)-m P,P{#tags=1)= m(c) (7) (12) g Let Pe be the probability that the parent of(l,p)is a collision node.Thus, Case 3.n-m>z-1. In this case,k 0.Similar to the above calculations, Pe=1-(P+P)=1- (m)m("m) (8) based on Equations (6)and (7),we have: (G) Next we calculate P{(l,p)}and P{(,p)}for y<I<b B{Lp)}=1-P-P=1- )+m( (13) for the following three cases:n-m<z-1,n-m =2-1. () and n-m z-1.Note that n-m is the size of the ID space that is not covered by the parent of (l,p),and z-k P{L,p)}= 1 )+m(二 is the remaining number of tags that are not covered by the parent of (p).Thus,z-k<n-m. x1- 二)-{m)+受(二] (14) Case 1:n-m<z-1. -{m)+m(-} In this case,.z-k≤n-m<z-l,which means k≥2. Finally,Equations (3)through(14)completely define the Thus,as the parent of(l,p)covers at least two tags,it must probabilities P{(,p)}and P {(l,p)}.Note that as tags are be a collision node,i.e.P=1.Thus,if (l,p)is the left child uniformly distributed,the probability of visiting node(l,p) of its parent,TH for sure visits it: is independent of the horizontal position p. B{(0,p)}=1 (9) The expected number of queries can now be calculated using Theorem 1. If(,p)is the right child of its parent,TH visits it if and only if node (l,p-1),which is the left sibling of (l,p),is THEOREM 1.For a population of z tags uniformly dis- not an empty read.If (,p-1)is an empty read,as its tributed in the ID space,where each tag has an ID of b bits, parent is a collision node,(t,p)must also be a collision node, if TH hops to level y to perform DET from each node on which means that TH will directly visit the left child of this level,the erpected number of queries for identifying all (1,p)instead of (l,p).The size of the ID space covered by z tags is: (L,p-1)is受.Ifn-受≤z-l,then node(l,p-1)covers at least one tag,which means that (l,p-1)is not an empty E四=2+∑2-[B{,p}+P{0,pH(15) read and TH for sure visits (l,p),i.e.,P.{(l,p)}=1.If l=y+1 297
As TH performs DFT from each node on level γ, it visits each node on this level. Thus, we have Pl {(l, p)} = Pr {(l, p)} = 1 if l = γ (4) For each remaining level γ < l ≤ b, when (l, p) is the left child of its parent, Pl {(l, p)} is equal to the probability that the parent of (l, p) is a collision node. When (l, p) is the right child of its parent, if the parent is a collision node and (l, p − 1) is an empty read node, then (l, p) will also be a collision node. Thus, instead of visiting (l, p), TH should directly hop to the left child of (l, p). Therefore, Pr {(l, p)} is equal to the probability that the parent of (l, p) is a collision node and (l, p − 1) is not an empty read node. Let k denote the number of tags covered by the parent of node (l, p) (i.e., the number of tags that are in the subtree rooted at the parent of (l, p)). Let m = 2b−l+1 denote the maximum number of tags that the parent of (l, p) can cover and n = 2b denote the number of tags in the whole ID space. The probability that the parent of (l, p) covers k of z tags follows a hypergeometric distribution: P {#tags = k} = m k n−m z−k n z (5) Let Pe be the probability that the parent of (l, p) is an empty read. Thus, Pe = P {#tags = 0} = n−m z n z (6) Let Ps be the probability that the parent of (l, p) is a successful read. Thus, Ps = P {#tags = 1} = m n−m z−1 n z (7) Let Pc be the probability that the parent of (l, p) is a collision node. Thus, Pc = 1 − (Pe + Ps) = 1 − n−m z n z − m n−m z n z (8) Next we calculate Pl {(l, p)} and Pr {(l, p)} for γ < l ≤ b for the following three cases: n − m < z − 1, n − m = z − 1, and n − m > z − 1. Note that n − m is the size of the ID space that is not covered by the parent of (l, p), and z − k is the remaining number of tags that are not covered by the parent of (l, p). Thus, z − k ≤ n − m. Case 1: n − m < z − 1. In this case, z − k ≤ n − m < z − 1, which means k ≥ 2. Thus, as the parent of (l, p) covers at least two tags, it must be a collision node, i.e. Pc = 1. Thus, if (l, p) is the left child of its parent, TH for sure visits it: Pl {(l, p)} = 1 (9) If (l, p) is the right child of its parent, TH visits it if and only if node (l, p − 1), which is the left sibling of (l, p), is not an empty read. If (l, p − 1) is an empty read, as its parent is a collision node, (l, p) must also be a collision node, which means that TH will directly visit the left child of (l, p) instead of (l, p). The size of the ID space covered by (l, p − 1) is m 2 . If n − m 2 ≤ z − 1, then node (l, p − 1) covers at least one tag, which means that (l, p − 1) is not an empty read and TH for sure visits (l, p), i.e., Pr {(l, p)} = 1. If n − m 2 > z − 1, then the probability that TH visits (l, p) is equal to the probability that (l, p − 1) is not an empty read, which is 1 − n− m 2 z / n z based on Equation (6). Finally, we have Pr {(l, p)} = 1 − ( n− m 2 z ) ( n z ) if n − m 2 > z − 1 1 if n − m 2 ≤ z − 1 (10) Case 2: n − m = z − 1. In this case, z − k ≤ n − m = z − 1, which means k ≥ 1. As the parent of (l, p) covers k ≥ 1 tags, the probability of the parent of (l, p) being an empty read is 0 and the probability of the parent of (l, p) being a successful read is m n−m z−1 / n z = m z−1 z−1 / n z = m/n z based on Equation (7). If (l, p) is the left child of its parent, then TH visits it if and only if the parent of (l, p) is a collision node. Thus, the probability of visiting (l, p) is equal to the probability of the parent of (l, p) being a collision node, which is equal to 1 − Pe − Ps. Thus, we have Pl {(l, p)} = 1 − Pe − Ps = 1 − mn z (11) If (l, p) is the right child of its parent, then TH visits it if and only if both the parent of (l, p) is a collision node and (l, p−1) is not an empty read. The probability that the parent of (l, p) is a collision node is 1 − m/n z as calculated above. Given that the parent of (l, p) is a collision node, the probability that (l, p−1) is an empty read is n− m 2 z − m 2 / n z −m . Pr {(l, p)} = 1 − mn z . 1 − n− m 2 z − m 2 n z − m (12) Case 3. n − m > z − 1. In this case, k ≥ 0. Similar to the above calculations, based on Equations (6) and (7), we have: Pl {(l, p)} = 1 − Pe − Ps = 1 − n−m z + m n−m z−1 n z (13) Pr {(l, p)} = 1 − n−m z + m n−m z−1 n z × 1 − n− m 2 z − n−m z + m 2 n−m z−1 n z − n−m z + m n−m z−1 (14) Finally, Equations (3) through (14) completely define the probabilities Pl {(l, p)} and Pr {(l, p)}. Note that as tags are uniformly distributed, the probability of visiting node (l, p) is independent of the horizontal position p. The expected number of queries can now be calculated using Theorem 1. Theorem 1. For a population of z tags uniformly distributed in the ID space, where each tag has an ID of b bits, if TH hops to level γ to perform DFT from each node on this level, the expected number of queries for identifying all z tags is: E[T ] = 2γ + Xb l=γ+1 2 l−1 [Pl {(l, p)} + Pr {(l, p)}] (15) 297