Tell Me What I See:Recognize RFID Tagged Objects in Augmented Reality Systems Lei Xief,Jiangiang Sun',Qingliang Cai',Chuyu Wang',Jie Wu,and Sanglu Lu' State Key Laboratory of Novel Software Technology,Nanjing University,China *Department of Computer Information and Sciences,Temple University,USA (lxie,sanglu)@nju.edu.cn,{SunJQ,caiqingliang,wangcyu217)@dislab.nju.edu.cn,jiewu@temple.edu ABSTRACT Nowadays,people usually depend on augmented reality(AR) systems to obtain an augmented view in a real-world environ- ment.With the help of advanced AR technology (e.g.object recognition),users can effectively distinguish multiple objects of different types.However,these techniques can only offer limited degrees of distinctions among different objects and cannot provide more inherent information about these objects. In this paper,we leverage RFID technology to further label different objects with RFID tags.We deploy additional RFID antennas to the COTS depth camera and propose a continuous scanning-based scheme to scan the objects,i.e.,the system Figure 1.Tell me what I see from the augmented reality system continuously rotates and samples the depth of field and RF- signals from these tagged objects.In this way,by pairing the conduct object recognition based on pattern recognition tech- tags with the objects according to the correlations between nology.Therefore,users can effectively distinguish multiple the depth of field and RF-signals,we can accurately identify objects of different categories,e.g.,a specified object in the and distinguish multiple tagged objects to realize the vision camera can be recognized as a vase,a laptop,or a pillow of"tell me what I see"from the augmented reality system. based on its natural features.However,these techniques can For example,in front of multiple unknown people wearing only offer a limited degree of distinctions among different RFID tagged badges in public events,our system can iden- objects,since multiple objects of the same type may have tify these people and further show their inherent information very similar features,e.g.,the system cannot effectively dis- from the RFID tags,such as their names,jobs,titles,etc.We tinguish between two laptops of the same brand,even if they have implemented a prototype system to evaluate the actual are owned by different people.Moreover,they cannot provide performance.The experiment results show that our solution more inherent information about these objects,e.g.,the spe- achieves an average match ratio of 91%in distinguishing up cific configurations,the manufacturers,and production date to dozens of tagged objects with a high deployment density. of the laptop.Therefore,it is rather difficult to provide these ACM Classification Keywords functions by purely leveraging the AR technology. H.5.m.Information Interfaces and Presentation (e.g.HCD): Miscellaneous Fortunately,the rise of RFID technology has brought new op- portunities to meet the new demands [27,31.15].The RFID Author Keywords tags can be used to label different objects,and store inherent RFID;Augmented Reality System;Prototype Design information of these objects in their onboard memory.More- over,in comparison to the optical markers such as QR code, INTRODUCTION the COTS RFID tag has an onboard memory with up to 4K or As the proliferation of augmented reality technology,people 8K bytes,and it can be effectively identified even if it is hidden nowadays start to leverage augmented reality (AR)systems (e.g.Microsoft Kinect,Google Glass)to obtain an augmented in/under the object.This provides us with an opportunity to effectively distinguish these objects,even if they belong to the view in a real-world environment.For example,devices like same brand and have the same features of appearance.Figure the Microsoft Kinect [13],i.e.,a depth camera,can effectively 1 shows a typical application scenario of the above vision.In this scenario,multiple people are standing or sitting together Permission to make digital or hard copies of all or part of this work for personal or in the cafe,while they are wearing the RFID tagged badges. classroom use is granted without fee provided that copies are not made or distributed From the camera's view,the depth camera can recognize mul- tiple objects,or rather human subjects,as well as the depth from its embedded depth sensor,which is associated with the must be honored.Abstracting with credit is permitted.To copy otherwise,or republish, distance to the camera.The RFID reader can identify multiple fee.Request permissions from Permissions@acm.org. tags within the scanning range,moreover,it is able to extract UbiComp'16.September 12-16,2016.Heidelberg.Germany 2016ACM.1SBN978-1-4503-4461-6/16/09.S15.00 the signal features like the received signal strength(RSSD) D0 http/dk.doi.org/10.1145/2971648.2971661
Tell Me What I See: Recognize RFID Tagged Objects in Augmented Reality Systems Lei Xie† , Jianqiang Sun† , Qingliang Cai† , Chuyu Wang† , Jie Wu‡ , and Sanglu Lu† †State Key Laboratory of Novel Software Technology, Nanjing University, China ‡Department of Computer Information and Sciences, Temple University, USA {lxie,sanglu}@nju.edu.cn, {SunJQ,caiqingliang,wangcyu217}@dislab.nju.edu.cn, jiewu@temple.edu ABSTRACT Nowadays, people usually depend on augmented reality (AR) systems to obtain an augmented view in a real-world environment. With the help of advanced AR technology (e.g. object recognition), users can effectively distinguish multiple objects of different types. However, these techniques can only offer limited degrees of distinctions among different objects and cannot provide more inherent information about these objects. In this paper, we leverage RFID technology to further label different objects with RFID tags. We deploy additional RFID antennas to the COTS depth camera and propose a continuous scanning-based scheme to scan the objects, i.e., the system continuously rotates and samples the depth of field and RFsignals from these tagged objects. In this way, by pairing the tags with the objects according to the correlations between the depth of field and RF-signals, we can accurately identify and distinguish multiple tagged objects to realize the vision of “tell me what I see” from the augmented reality system. For example, in front of multiple unknown people wearing RFID tagged badges in public events, our system can identify these people and further show their inherent information from the RFID tags, such as their names, jobs, titles, etc. We have implemented a prototype system to evaluate the actual performance. The experiment results show that our solution achieves an average match ratio of 91% in distinguishing up to dozens of tagged objects with a high deployment density. ACM Classification Keywords H.5.m. Information Interfaces and Presentation (e.g. HCI): Miscellaneous Author Keywords RFID; Augmented Reality System; Prototype Design INTRODUCTION As the proliferation of augmented reality technology, people nowadays start to leverage augmented reality (AR) systems (e.g. Microsoft Kinect, Google Glass) to obtain an augmented view in a real-world environment. For example, devices like the Microsoft Kinect [13], i.e., a depth camera, can effectively Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Permissions@acm.org. UbiComp ’16, September 12-16, 2016, Heidelberg, Germany ©2016 ACM. ISBN 978-1-4503-4461-6/16/09...$15.00 DOI: http://dx.doi.org/10.1145/2971648.2971661 Figure 1. Tell me what I see from the augmented reality system conduct object recognition based on pattern recognition technology. Therefore, users can effectively distinguish multiple objects of different categories, e.g., a specified object in the camera can be recognized as a vase, a laptop, or a pillow based on its natural features. However, these techniques can only offer a limited degree of distinctions among different objects, since multiple objects of the same type may have very similar features, e.g., the system cannot effectively distinguish between two laptops of the same brand, even if they are owned by different people. Moreover, they cannot provide more inherent information about these objects, e.g., the specific configurations, the manufacturers, and production date of the laptop. Therefore, it is rather difficult to provide these functions by purely leveraging the AR technology. Fortunately, the rise of RFID technology has brought new opportunities to meet the new demands [27, 31, 15]. The RFID tags can be used to label different objects, and store inherent information of these objects in their onboard memory. Moreover, in comparison to the optical markers such as QR code, the COTS RFID tag has an onboard memory with up to 4K or 8K bytes, and it can be effectively identified even if it is hidden in/under the object. This provides us with an opportunity to effectively distinguish these objects, even if they belong to the same brand and have the same features of appearance. Figure 1 shows a typical application scenario of the above vision. In this scenario, multiple people are standing or sitting together in the cafe, while they are wearing the RFID tagged badges. From the camera’s view, the depth camera can recognize multiple objects, or rather human subjects, as well as the depth from its embedded depth sensor, which is associated with the distance to the camera. The RFID reader can identify multiple tags within the scanning range, moreover, it is able to extract the signal features like the received signal strength (RSSI)
and phase from the RFID tags.By effectively pairing these between the depth of field and RF-signals from the tagged information together,the system can realize the vision of"tell objects.We thus propose continuous scanning-based solutions me what I see from the augmented reality system".For exam- and respectively leverage the RSSI and phase value from RF- ple,as shown in Figure 1,the inherent information extracted signals to accurately distinguish the multiple tagged objects. from the RFID tags,such as their names,jobs and titles can 3)We implemented a prototype system and evaluated the ac- be directly associated with the corresponding human subjects tual performance with case studies.Our solution achieves an in the camera's view.This provides us more opportunities to average match ratio of 91%in distinguishing up to dozens of communicate with unknown people by leveraging this novel RFID tagged objects with a high deployment density. RFID assisted augmented reality. RELATED WORK Although many schemes for RFID-based localization [32.28. Depth camera-based pattern recognition:Depth camera- 34]have been proposed,they mainly focus on the absolute based pattern recognition aims at using the depth and RGB object localization,and usually require anchor nodes like ref- captured from the camera to recognize objects in a more ac- erence tags for accurate localization.They are not suitable curate approach.Based on the depth processing [11,18],a for distinguishing multiple tagged objects because of two rea- number of technologies are proposed in object recognition [23] sons.First,we only require to distinguish the relative location and gesture recognition [5,21,8,30,22].Nirjon et al.solve instead of absolute location of multiple tagged objects,by pair- the problem of localizing and tracking household objects us- ing the tags to the objects based on the correlation between ing depth-camera sensors [20].Kinect-based pose estimation the depth of field and RF-signals.Second,the depth camera method [21]is proposed in the context of physical exercise. cannot effectively use the anchor nodes,and it is impractical to examining the accuracy of joint localization and robustness of deploy multiple anchor nodes in conventional AR applications. pose estimation with respect to the orientation and occlusions. In this paper,we leverage the RFID technology [33,16]to RFID in Ubiquitous Applications:RFID has been investi- further label different objects with RFID tags.We deploy addi- gated in various ubiquitous applications,including indoor tional RFID antennas to the COTS depth camera and propose localization [34,24],activity sensing [2],tabletop inter- a continuous scanning-based scheme to scan the objects,i.e.. action[9],physical object search [19],etc.Prior work on the system continuously rotates and samples the depth of field RFID-based localization primarily relied on Received Signal and RF-signals from these tagged objects.In this way,we can Strength [34,24]or Angle of Arrival [1 to acquire the abso- accurately identify and distinguish multiple tagged objects,by lute location of an object.The state-of-the-art systems use the sufficiently exploring the inherent correlations between the phase value to estimate the absolute or relative location of an depth of field and the received RF-signal.Specifically,we object with higher accuracy [33,27,17,25].RF-IDraw uses a respectively extract the RSSI and phase value from RF-signal, 2-dimensional array of RFID antennas to track the movement and pair the tags with the objects according to the correlation trajectory of one finger attached with an RFID tag so that between the depth value and RSSI/phase value. it can reconstruct the trajectory shape of the specified finger 29.Tagoram exploits tag mobility to build a virtual antenna However,there are several challenges in distinguishing mul- array,and uses differential augmented hologram to facilitate tiple tagged objects in AR systems.The first challenge is to the instant tracking of a mobile RFID tag [32].Find My Stuff conduct accurate paring between the objects and the tags.In (FiMS)provides search support for physical objects inside real applications,the tagged objects are usually placed in very furniture,on room level,and in multiple locations [19] close proximity,and the number of objects is usually in the order of dozens.In this situation,it is difficult to realize accu- Combined use in augmented reality environment:Recent rate paring due to the large cardinality and mutual interference. works further consider using both depth camera and RFID The second challenge is to mitigate the interference from the is- for indoor localization and object recognition in augmented sues like the multi-path effect and object occlusion in realistic reality environment [26,14,6,3].Wang et al.propose an settings.These issues can lead to nonnegligible interference indoor real-time location system combined with active RFID to pair the tags with the objects,such as the missing tags or and Kinect by leveraging the positioning feature of identified objects which fail to be identified.The third challenge is in RFID and the object extraction ability of Kinect.Klompmaker devising an efficient solution without any additional assistance, et al.use RFID and depth-sensing cameras to enable person- like the anchor nodes.It is impractical to intentionally deploy alized authenticated tangible interactions on a tabletop [14] anchor nodes in real applications due to intensive deployment Galatas et al.propose a multimodal context-aware localization costs on manpower and time system,by using RFID and 3-D audio-visual information from 2 Kinect sensors deployed at various locations [6].Cerrada This paper represents the first study of using RFID technology et al.present a method to improve the object recognition by to precisely distinguish multiple objects in augmented reality combining the vision-based techniques applied to the range- systems.Specifically,we make three key contributions in this sensor captured 3D data,and object identification obtained paper.1)To the best of our knowledge,we are the first to con- from RFID tags [3]. sider identifying and distinguishing multiple tagged objects with RFID systems,it provides a key supporting technology SYSTEM OVERVIEW for the augmented reality systems to realize the vision"tell me what I see from the AR system".2)We conduct an exten- Design Goals We aim to implement a supporting technology for the AR sive experimental study to explore the inherent correlations systems to realize the vision of"tell me what I see from the
and phase from the RFID tags. By effectively pairing these information together, the system can realize the vision of “tell me what I see from the augmented reality system”. For example, as shown in Figure 1, the inherent information extracted from the RFID tags, such as their names, jobs and titles can be directly associated with the corresponding human subjects in the camera’s view. This provides us more opportunities to communicate with unknown people by leveraging this novel RFID assisted augmented reality. Although many schemes for RFID-based localization [32, 28, 34] have been proposed, they mainly focus on the absolute object localization, and usually require anchor nodes like reference tags for accurate localization. They are not suitable for distinguishing multiple tagged objects because of two reasons. First, we only require to distinguish the relative location instead of absolute location of multiple tagged objects, by pairing the tags to the objects based on the correlation between the depth of field and RF-signals. Second, the depth camera cannot effectively use the anchor nodes, and it is impractical to deploy multiple anchor nodes in conventional AR applications. In this paper, we leverage the RFID technology [33, 16] to further label different objects with RFID tags. We deploy additional RFID antennas to the COTS depth camera and propose a continuous scanning-based scheme to scan the objects, i.e., the system continuously rotates and samples the depth of field and RF-signals from these tagged objects. In this way, we can accurately identify and distinguish multiple tagged objects, by sufficiently exploring the inherent correlations between the depth of field and the received RF-signal. Specifically, we respectively extract the RSSI and phase value from RF-signal, and pair the tags with the objects according to the correlation between the depth value and RSSI/phase value. However, there are several challenges in distinguishing multiple tagged objects in AR systems. The first challenge is to conduct accurate paring between the objects and the tags. In real applications, the tagged objects are usually placed in very close proximity, and the number of objects is usually in the order of dozens. In this situation, it is difficult to realize accurate paring due to the large cardinality and mutual interference. The second challenge is to mitigate the interference from the issues like the multi-path effect and object occlusion in realistic settings. These issues can lead to nonnegligible interference to pair the tags with the objects, such as the missing tags or objects which fail to be identified. The third challenge is in devising an efficient solution without any additional assistance, like the anchor nodes. It is impractical to intentionally deploy anchor nodes in real applications due to intensive deployment costs on manpower and time. This paper represents the first study of using RFID technology to precisely distinguish multiple objects in augmented reality systems. Specifically, we make three key contributions in this paper. 1) To the best of our knowledge, we are the first to consider identifying and distinguishing multiple tagged objects with RFID systems, it provides a key supporting technology for the augmented reality systems to realize the vision “tell me what I see from the AR system”. 2) We conduct an extensive experimental study to explore the inherent correlations between the depth of field and RF-signals from the tagged objects. We thus propose continuous scanning-based solutions and respectively leverage the RSSI and phase value from RFsignals to accurately distinguish the multiple tagged objects. 3) We implemented a prototype system and evaluated the actual performance with case studies. Our solution achieves an average match ratio of 91% in distinguishing up to dozens of RFID tagged objects with a high deployment density. RELATED WORK Depth camera-based pattern recognition: Depth camerabased pattern recognition aims at using the depth and RGB captured from the camera to recognize objects in a more accurate approach. Based on the depth processing [11, 18], a number of technologies are proposed in object recognition [23] and gesture recognition [5, 21, 8, 30, 22]. Nirjon et al. solve the problem of localizing and tracking household objects using depth-camera sensors [20]. Kinect-based pose estimation method [21] is proposed in the context of physical exercise, examining the accuracy of joint localization and robustness of pose estimation with respect to the orientation and occlusions. RFID in Ubiquitous Applications: RFID has been investigated in various ubiquitous applications, including indoor localization [34, 24] , activity sensing [2], tabletop interaction[9], physical object search [19], etc. Prior work on RFID-based localization primarily relied on Received Signal Strength [34, 24] or Angle of Arrival [1] to acquire the absolute location of an object. The state-of-the-art systems use the phase value to estimate the absolute or relative location of an object with higher accuracy [33, 27, 17, 25]. RF-IDraw uses a 2-dimensional array of RFID antennas to track the movement trajectory of one finger attached with an RFID tag so that it can reconstruct the trajectory shape of the specified finger [29]. Tagoram exploits tag mobility to build a virtual antenna array, and uses differential augmented hologram to facilitate the instant tracking of a mobile RFID tag [32]. Find My Stuff (FiMS) provides search support for physical objects inside furniture, on room level, and in multiple locations [19]. Combined use in augmented reality environment: Recent works further consider using both depth camera and RFID for indoor localization and object recognition in augmented reality environment [26, 14, 6, 3]. Wang et al. propose an indoor real-time location system combined with active RFID and Kinect by leveraging the positioning feature of identified RFID and the object extraction ability of Kinect. Klompmaker et al. use RFID and depth-sensing cameras to enable personalized authenticated tangible interactions on a tabletop [14]. Galatas et al. propose a multimodal context-aware localization system, by using RFID and 3-D audio-visual information from 2 Kinect sensors deployed at various locations [6]. Cerrada et al. present a method to improve the object recognition by combining the vision-based techniques applied to the rangesensor captured 3D data, and object identification obtained from RFID tags [3]. SYSTEM OVERVIEW Design Goals We aim to implement a supporting technology for the AR systems to realize the vision of “tell me what I see from the
augmented system",by leveraging RFID tags to label differ- 3D- Applications ent objects.In order to achieve this goal,we need to collect Camera the responses from multiple tags and objects,and then pair RFID Matching the RFID tags to the corresponding objects,according to the Algorithm correlations between the depth of field and RF-signals.There- 1 Fcature Sampling fore,we need to consider the following metrics in regard to and Extraction system performance:1)Accuracy:Since the objects are usu- Laptop RSSI RFID Depth ally placed in very close proximity,there is a high accuracy Reader requirement in distinguishing these objects,i.e.,the average @0回 match ratios should be greater than a certain value,e.g.,85%. 30 2)Time-efficiency:Since the AR applications are usually exe- cuted in a real-time approach,it is essential to reduce the time (a)Prototype System (b)Software framework delay in identifying and distinguishing the multiple objects. Figure 2.System Framework 3)Robustness:The environmental factors,like the multi-path peatable experiments.We set a typical indoor environment, effect and partial occlusion,may cause the responses from the i.e..a 10mx 8m lobby,as the testing environment. tagged objects to be missing or distorted.Besides.the tagged objects could be partially hidden behind each other due to the Extract the Depth of Field from Depth-Camera randomness in the deployment.The solution should be robust Depth cameras,such as the Microsoft Kinect,are a kind of to these noises and distractions range camera,which produces a 2D image showing the dis- System Framework tance to points in a scene from a specific point,normally We design a prototype system as shown in Figure 2(a).We associated with a depth sensor.Therefore,the depth camera deploy one or two additional RFID antennas to the COTS can effectively estimate the distance to a specified object ac- depth camera.The RFID antenna(s)and the depth camera are cording to the depth.because the depth is linearly increasing fixed to a rotating shaft so that they can rotate simultaneously. with the distance.If multiple objects are placed in different For the RFID system,we use the COTS ImpinJ R420 reader positions in the scene,they are usually at different distances [10],one or two Laird S9028 antennas,and multiple Alien away from the depth camera.Therefore,it is possible to dis- 9640 general purpose tags;for the depth camera,we use the tinguish among different objects according to the depth values Microsoft Kinect for windows.They are both connected to from the depth camera. a laptop placed on the mobile robot.The mobile robot can perform a 360 degree rotation along with the rotation axis.In Experiment Observations the following sections,without loss of generality,we evaluate We first conduct experiment to evaluate the characteristics of the depth.We arbitrarily place three objects A.B,and C the performance using the above configurations.By attaching in front of the depth camera,i.e..Microsoft Kinect,object the RFID tags to the specified objects,we propose a continuous scanning-based scheme to scan the objects,i.e.,the system A is a box at distance 68cm,object B is a can at distance continuously rotates and samples the depth of field and RF- 95cm and object C is a tripod at distance 150cm.We then signals from these tagged objects.In this way,we can obtain collect the depth histogram from the depth sensor.As shown in the depth of specified objects from the depth sensor inside the Figure 3(a),the X-axis denotes the depth value,and the Y-axis depth camera,we can also extract the signal features such as denotes the number of pixels at the specified depth.We find the RSSI and phase values from the RF-signals of the RFID that,as A and B are regular-shaped objects,there are respective tags.By accurately pairing these information,the tags and the peaks in the depth histogram for object A and B,meaning that objects can be effectively bound together. many pixels are detected from this distance.Therefore,A and B can be easily distinguished according to the distance Figure 2(b)further shows the software framework.The system However,there exist two peaks in the corresponding distance is mainly composed of three layers:the sensor data collection of object C,because object C is an irregularly-shaped object layer,the middleware layer,and the application layer.For the (the concave shape of the tripod).there might be a number of sensor data collection layer,the depth camera recognizes mul- pixels at different distances.Moreover,we can also find some tiple objects and collects the corresponding depth distribution, background noises past the distance of 175 cm,which can be while the RFID system collects multiple tag IDs and extracts produced by background objects.such as the wall and floor. the corresponding RSSIs or phases from the RF-signals of This implies that,for the object with a continuous surface. RFID tags.For the middleware layer,we aim to sample and the depth sensor usually detects a peak in the vicinity of its extract some features from the raw sensor data,and conduct an distance,for an irregularly-shaped object,the depth sensor accurate matching among the objects and RFID tags.For the detects multiple peaks with intermittent depths.Nevertheless application layer,the AR applications can use the matching we find that these peaks are usually very close in distance. results directly to realize various objectives. In order to further validate the relationship between the depth FEATURE SAMPLING AND EXTRACTION and distance,we set multiple horizontal lines with different In this section,we investigate the feature sampling and extrac- distances to the Kinect(from 500 mm to 2500 mm).For each tion based on the observations from empirical studies.Without horizontal line,we then move a certain object along the line loss of generality,in the following each experiment observa- and respectively obtain the depth value from the Kinect.We tion is summarized from the statistical properties of 100 re- show the experiment results in Figure 3(b).Here we find
augmented system”, by leveraging RFID tags to label different objects. In order to achieve this goal, we need to collect the responses from multiple tags and objects, and then pair the RFID tags to the corresponding objects, according to the correlations between the depth of field and RF-signals. Therefore, we need to consider the following metrics in regard to system performance: 1) Accuracy: Since the objects are usually placed in very close proximity, there is a high accuracy requirement in distinguishing these objects, i.e., the average match ratios should be greater than a certain value, e.g., 85%. 2) Time-efficiency: Since the AR applications are usually executed in a real-time approach, it is essential to reduce the time delay in identifying and distinguishing the multiple objects. 3) Robustness: The environmental factors, like the multi-path effect and partial occlusion, may cause the responses from the tagged objects to be missing or distorted. Besides, the tagged objects could be partially hidden behind each other due to the randomness in the deployment. The solution should be robust to these noises and distractions. System Framework We design a prototype system as shown in Figure 2(a). We deploy one or two additional RFID antennas to the COTS depth camera. The RFID antenna(s) and the depth camera are fixed to a rotating shaft so that they can rotate simultaneously. For the RFID system, we use the COTS ImpinJ R420 reader [10], one or two Laird S9028 antennas, and multiple Alien 9640 general purpose tags; for the depth camera, we use the Microsoft Kinect for windows. They are both connected to a laptop placed on the mobile robot. The mobile robot can perform a 360 degree rotation along with the rotation axis. In the following sections, without loss of generality, we evaluate the performance using the above configurations. By attaching the RFID tags to the specified objects, we propose a continuous scanning-based scheme to scan the objects, i.e., the system continuously rotates and samples the depth of field and RFsignals from these tagged objects. In this way, we can obtain the depth of specified objects from the depth sensor inside the depth camera, we can also extract the signal features such as the RSSI and phase values from the RF-signals of the RFID tags. By accurately pairing these information, the tags and the objects can be effectively bound together. Figure 2(b) further shows the software framework. The system is mainly composed of three layers: the sensor data collection layer, the middleware layer, and the application layer. For the sensor data collection layer, the depth camera recognizes multiple objects and collects the corresponding depth distribution, while the RFID system collects multiple tag IDs and extracts the corresponding RSSIs or phases from the RF-signals of RFID tags. For the middleware layer, we aim to sample and extract some features from the raw sensor data, and conduct an accurate matching among the objects and RFID tags. For the application layer, the AR applications can use the matching results directly to realize various objectives. FEATURE SAMPLING AND EXTRACTION In this section, we investigate the feature sampling and extraction based on the observations from empirical studies. Without loss of generality, in the following each experiment observation is summarized from the statistical properties of 100 re- 3DCamera RFID Antennas Rotating Module RFID Reader Laptop Rotation Axis (a) Prototype System Applications Application Matching Algorithm Feature Sampling and Extraction Middleware Depth 3D Camera RFID System RSSI Phase Sensor data collection (b) Software framework Figure 2. System Framework peatable experiments. We set a typical indoor environment, i.e., a 10m× 8m lobby, as the testing environment. Extract the Depth of Field from Depth-Camera Depth cameras, such as the Microsoft Kinect, are a kind of range camera, which produces a 2D image showing the distance to points in a scene from a specific point, normally associated with a depth sensor. Therefore, the depth camera can effectively estimate the distance to a specified object according to the depth, because the depth is linearly increasing with the distance. If multiple objects are placed in different positions in the scene, they are usually at different distances away from the depth camera. Therefore, it is possible to distinguish among different objects according to the depth values from the depth camera. Experiment Observations We first conduct experiment to evaluate the characteristics of the depth. We arbitrarily place three objects A, B, and C in front of the depth camera, i.e., Microsoft Kinect, object A is a box at distance 68cm, object B is a can at distance 95cm and object C is a tripod at distance 150cm. We then collect the depth histogram from the depth sensor. As shown in Figure 3(a), the X-axis denotes the depth value, and the Y-axis denotes the number of pixels at the specified depth. We find that, as A and B are regular-shaped objects, there are respective peaks in the depth histogram for object A and B, meaning that many pixels are detected from this distance. Therefore, A and B can be easily distinguished according to the distance. However, there exist two peaks in the corresponding distance of object C, because object C is an irregularly-shaped object (the concave shape of the tripod), there might be a number of pixels at different distances. Moreover, we can also find some background noises past the distance of 175 cm, which can be produced by background objects, such as the wall and floor. This implies that, for the object with a continuous surface, the depth sensor usually detects a peak in the vicinity of its distance, for an irregularly-shaped object, the depth sensor detects multiple peaks with intermittent depths. Nevertheless, we find that these peaks are usually very close in distance. In order to further validate the relationship between the depth and distance, we set multiple horizontal lines with different distances to the Kinect (from 500 mm to 2500 mm). For each horizontal line, we then move a certain object along the line and respectively obtain the depth value from the Kinect. We show the experiment results in Figure 3(b). Here we find
4000 that.as the distance between the tag and the reader increases from 50 cm to 150 cm,the RSSI decreases rapidly;when the distance further increases,the RSSI then decreases slowly. Moreover,in regard to a certain distance,the RSSI from the 1000 tag always reaches the maximum value when the antenna is 0事n市男0 directly facing towards the tag.As we further increase the 150-105005010150 offset degree in rotation,the RSSI gradually decreases.This is Depth (cm) The horizontal coordinate:x (cm) because the antenna outputs the maximum transmitting power (a)The depth histogram of multiple (b)The depth value of objects in in the central area of the beam,and thus the RSSI of the objects different horizontal lines Figure 3.Experiment results of depth value backscattered RF-signals reaches the maximum value when the tag is in the center.As the tag's position is deviated from that,for each horizontal line,the depth values of the object the center of the antenna beam,the RSSI of the backscattered keep nearly constant,with rather small deviation;for different RF-signals thus decreases.We call the position of achieving horizontal lines,these depth values have obvious variations. the peak value in RSSI the perpendicular point,since the Due to the limitation of the Kinect's view,the Kinect has perpendicular bisector of the RFID antenna crosses this point. smaller view angle in closer distance.This observation implies that,the depth value collected from the depth cameras depicts the vertical distance rather than the absolute distance between the objects and the depth camera. Depth Feature Extraction To extract the depth of specified objects from the depth his- togram of multiple objects,we set a threshold t to detect the -30-20-i0o9n020 peaks in regard to the number of pixels.We thus iterate from Figure 4.The variation of RSSI via rotating the RFID antenna the minimum depth to the maximum depth in the histogram. Although the RSSI can only be used to measure the vertical if the number of pixels for a certain depth is larger than t,we distance between the tag and the antenna in a coarse granu identify it as a peak p(di,n;)with the depth d;and the number larity,nevertheless,with different offset degrees from the tag of pixels ni.In order to address the multiple-peaks problem to the center of antenna beam,the RSSI changes in a convex of irregularly-shaped objects,we set another threshold Ad.If curve with the peak value at the perpendicular point.We can the differences of these peaks'depth values are smaller than further leverage this property to differentiate the positions of Ad,we then combine them as one peak.Both the value of various objects in the horizontal aspect. t and Ad are selected based on the empirical value from a Extract the Phase Value from RF-Signals number of experimental studies (t=200 and Ad=10cm in our Background implementation).Then,each peak actually represents a speci- fied object.For each peak,we respectively find the leftmost Phase is a basic attribute of a signal along with amplitude and frequency.The phase value of an RF signal describes the depth d and the rightmost depth d,with the number of pixels degree that the received signal offsets from the sent signal, n>0.We then compute the average depth for the specified ranging from 0 to 360 degrees.Let d be the distance between object as follows:d(d).The average depth the RFID antenna and the tag,the signal traverses a round- is calculated in a weighted average approach according to the trip with a distance of 2d in each backscatter communication. number of pixels for each depth around the peak. Therefore,the phase value 0 output by the RFID reader can be expressed as [25,4]: Extract the Received Signal Strength from RF-signals The received signal strength(RSSI)measures the power of 2π =(元x2+四 mod2π. (1) received radio signal,which is inversely proportional to the distance between the tag and the reader.However,according to where A is the wave length.Besides the RF phase rotation the previous study [34],the RSSI is impacted by various issues over distance,the reader's transmitter,the tag's reflection char- like multi-path effect,and path loss,etc.This indicates that acteristic,and the reader's receiver will also introduce some the RSSI does not always have a monotonic relationship with additional phase rotation,denoted as er,Og and erAG respec- the distance.Therefore,with the RSSI from a specified tag, tively.We use u =er+eg+erac to denote this diversity the RFID system can roughly estimate the distance between term in Eq.(1).Since u is rather stable according to the previ- the reader and the tag. ous results [321.and it is only related to the physical properties Experiment Observations of the specified tag-antenna pair,we can record u for different It is found that,inside the RFID antenna's effective scanning tags in advance.Then,according to each tag's response,we can calibrate the phase value by offsetting the diversity term. range,the RSSI from the tag is also impacted by its position Thus,the phase value can be used as an accurate and stable offset from the center of the antenna beam.In order to validate metric to measure distance. the above judgment,we separate the RFID reader and the tag with a distance d,and then we evaluate the average RSSI value Estimate the Vertical Distance from Phase Value by gradually rotating the antenna from an offset degree of According to the definition in Eq.(1),the phase value is a -40 to +40.Figure 4 shows the experiment results.We find periodical function of the distance.Hence,given a specified
Depth (cm) 100 150 200 250 Number of pixels 0 1000 2000 3000 4000 Depth value:x(cm) 140 145 150 155 160 Number of pixels 0 100 200 300 400 500 Noise A B C (a) The depth histogram of multiple objects The horizontal coordinate: x (cm) -150 -100 -50 0 50 100 150 Depth(cm) 0 50 100 150 200 250 300 (b) The depth value of objects in different horizontal lines Figure 3. Experiment results of depth value that, for each horizontal line, the depth values of the object keep nearly constant, with rather small deviation; for different horizontal lines, these depth values have obvious variations. Due to the limitation of the Kinect’s view, the Kinect has smaller view angle in closer distance. This observation implies that, the depth value collected from the depth cameras depicts the vertical distance rather than the absolute distance between the objects and the depth camera. Depth Feature Extraction To extract the depth of specified objects from the depth histogram of multiple objects, we set a threshold t to detect the peaks in regard to the number of pixels. We thus iterate from the minimum depth to the maximum depth in the histogram, if the number of pixels for a certain depth is larger than t, we identify it as a peak p(di ,ni) with the depth di and the number of pixels ni . In order to address the multiple-peaks problem of irregularly-shaped objects, we set another threshold ∆d. If the differences of these peaks’ depth values are smaller than ∆d, we then combine them as one peak. Both the value of t and ∆d are selected based on the empirical value from a number of experimental studies (t=200 and ∆d=10cm in our implementation). Then, each peak actually represents a speci- fied object. For each peak, we respectively find the leftmost depth dl and the rightmost depth dr with the number of pixels nr > 0. We then compute the average depth for the specified object as follows: d = ∑ r i=l (di × ni ∑ r i=l ni ). The average depth is calculated in a weighted average approach according to the number of pixels for each depth around the peak. Extract the Received Signal Strength from RF-signals The received signal strength (RSSI) measures the power of received radio signal, which is inversely proportional to the distance between the tag and the reader. However, according to the previous study [34], the RSSI is impacted by various issues like multi-path effect, and path loss, etc. This indicates that the RSSI does not always have a monotonic relationship with the distance. Therefore, with the RSSI from a specified tag, the RFID system can roughly estimate the distance between the reader and the tag. Experiment Observations It is found that, inside the RFID antenna’s effective scanning range, the RSSI from the tag is also impacted by its position offset from the center of the antenna beam. In order to validate the above judgment, we separate the RFID reader and the tag with a distance d, and then we evaluate the average RSSI value by gradually rotating the antenna from an offset degree of −40◦ to +40◦ . Figure 4 shows the experiment results. We find that, as the distance between the tag and the reader increases from 50 cm to 150 cm, the RSSI decreases rapidly; when the distance further increases, the RSSI then decreases slowly. Moreover, in regard to a certain distance, the RSSI from the tag always reaches the maximum value when the antenna is directly facing towards the tag. As we further increase the offset degree in rotation, the RSSI gradually decreases. This is because the antenna outputs the maximum transmitting power in the central area of the beam, and thus the RSSI of the backscattered RF-signals reaches the maximum value when the tag is in the center. As the tag’s position is deviated from the center of the antenna beam, the RSSI of the backscattered RF-signals thus decreases. We call the position of achieving the peak value in RSSI the perpendicular point, since the perpendicular bisector of the RFID antenna crosses this point. −40 −30 −20 −10 0 10 20 30 40 −70 −65 −60 −55 −50 −45 −40 −35 Rotation angle RSSI value(dBm) Distance=50cm Distance=100cm Distance=150cm Distance=200cm Figure 4. The variation of RSSI via rotating the RFID antenna Although the RSSI can only be used to measure the vertical distance between the tag and the antenna in a coarse granularity, nevertheless, with different offset degrees from the tag to the center of antenna beam, the RSSI changes in a convex curve with the peak value at the perpendicular point. We can further leverage this property to differentiate the positions of various objects in the horizontal aspect. Extract the Phase Value from RF-Signals Background Phase is a basic attribute of a signal along with amplitude and frequency. The phase value of an RF signal describes the degree that the received signal offsets from the sent signal, ranging from 0 to 360 degrees. Let d be the distance between the RFID antenna and the tag, the signal traverses a roundtrip with a distance of 2d in each backscatter communication. Therefore, the phase value θ output by the RFID reader can be expressed as [25, 4]: θ = (2π λ ×2d + µ) mod 2π, (1) where λ is the wave length. Besides the RF phase rotation over distance, the reader’s transmitter, the tag’s reflection characteristic, and the reader’s receiver will also introduce some additional phase rotation, denoted as θT , θR and θTAG respectively. We use µ = θT + θR + θTAG to denote this diversity term in Eq. (1). Since µ is rather stable according to the previous results [32], and it is only related to the physical properties of the specified tag-antenna pair, we can record µ for different tags in advance. Then, according to each tag’s response, we can calibrate the phase value by offsetting the diversity term. Thus, the phase value can be used as an accurate and stable metric to measure distance. Estimate the Vertical Distance from Phase Value According to the definition in Eq. (1), the phase value is a periodical function of the distance. Hence, given a specified
phase value from the RF-signal,there can be multiple solu- the extracted phase value from RF-signals.Suppose the RFID tions for estimating the distance between the tag and antenna system respectively obtains two phase values 0 and 2 from Therefore,we can deploy an RFID antenna array to scan the two separated RFID antennas,then,according to the definition tags from slightly different positions.so as to figure out the in Eg.(1).the possible distances from the tag to the two unique solution of the distance.Without loss of generality,in this paper,we separate two RFID antennas with a distance of antennas are:d=克(景+k1)入,and da=(经+ka)元, Here,k and k2 are integers ranging from 0 to +o.Due to d,and use them to scan the RFID tags and respectively obtain the multiple solutions of k and k2,there could be multiple their phase values from the RF-signals,as shown in Figure 5. candidate positions for the tag.However,since the difference Since the depth value from the depth cameras like Kinect of the lengths of two sides is smaller than the length of the measures the vertical distance,instead of the absolute distance third side in a triangle,i.e.,d-d2<d,we can leverage this between the objects and the depth camera,in order to achieve a constraint to effectively eliminate many infeasible solutions perfect match between the collected RF-signals and the depth of k and k2.Besides,due to the limited scanning range of of field,it is essential to measure the vertical distance between the RFID system (the maximum scanning range is usually the tags and RFID antennas.However,it is rather difficult smaller than 10 m),the value of k and k2 should be upper to directly measure the vertical distance via the phase value. bounded by a certain threshold,i.. Figure 5 shows the relationship between the vertical distance Figure 6 shows an example of feasible positions of the target and the absolute distance.In regard to a specified RFID tag, tag according to the obtained phase values 0 and 6.The fea- suppose its absolute distances to Antenna 1 and Antenna 2 sible solutions include multiple positions like A~D,which are respectively di and d2,then we need to derive its vertical respectively belong to two hyperbolas Hi and H2.Due to distance h to the antenna pairs. the existence of multiple solutions,we can use these hyper- Y-axis bolas to denotes a superset of these feasible positions in a straightforward approach. HyPerbola H: HyPerbola H Vertical d A2 X-axis A中1 Amenna 1 Amenna2 Figure 5.Compute the (x,y)coordinate of the tag 9 9 If we respectively use A and A2 to denote the midpoint of Figure 6.Estimate the distance from phase value of RF signals Antenna I and Antenna 2,and use T to denote the position of the tag,as a matter of fact,the three sides of (T,A1). MATCHING ALGORITHM VIA CONTINUOUS SCANNING (T,A2),and (A1,A2)form a triangle.Since Antenna Al and Antenna A2 are separated with a fixed distance d,ac- Motivation cording to Heron's formula [12],the area of this triangle is To identify and distinguish the multiple tagged objects,a A=Vs(s-d1)(s-d2)(s-d),where s is the semiperimeter straightforward solution is to scan the tags in a static approach, of the triangle,i.e.s=.Moreover,since the area of where both the depth camera and RFID antenna(s)are de- ployed in a fixed position without moving.The system scans this triangle can also be computed as A=hx d,we can thus the objects and tags simultaneously and respectively collect compute the vertical distanceThen. the depth value and RF-signals from these tagged objects.We according to the Apollonius'theorem [7],for a triangle com- can further pair the tags with the objects accordingly.However, when multiple tagged objects are placed at close vertical dis- posed of point A1,A2 and T,the length of median TObisecting tance to the system,this solution cannot effectively distinguish the side AA2 is equal to m=2d+2d-d2.Hence.the multiple tagged objects in different horizontal distances. horizontal distance between the tag and the midpoint of the To address this problem,we propose a continuous scanning- two antennas,i.e.,T'O,should be vm2-h2.Therefore,if based solution as follows:we continuously rotate the scanning we build a local coordinate system with the origin set to the system(including the depth camera and RFID antennas),and the midpoint of the two antennas,the coordinate (,y)is simultaneously sample the depth of field and RF-signals from computed as follows: multiple tagged objects.Hence,we are able to collect a con- tinuous series of features like depth,RSSI and phase values Vid+id-id2-h2 d≥d2 during continuous scanning.While the scanning system is ro- (2) -(√+-d-h2) tating,the vertical distances between multiple objects and the d <d scanning system are continuously changing,from which we y=h. (3) can further derive the differences of multiple tagged objects in different horizontal distances.In this way,we are able to Therefore,the next problem we need to address is to estimate further distinguish multiple tagged objects with close vertical the absolute distance between the tag and antenna according to distance but in different positions
phase value from the RF-signal, there can be multiple solutions for estimating the distance between the tag and antenna. Therefore, we can deploy an RFID antenna array to scan the tags from slightly different positions, so as to figure out the unique solution of the distance. Without loss of generality, in this paper, we separate two RFID antennas with a distance of d, and use them to scan the RFID tags and respectively obtain their phase values from the RF-signals, as shown in Figure 5. Since the depth value from the depth cameras like Kinect measures the vertical distance, instead of the absolute distance between the objects and the depth camera, in order to achieve a perfect match between the collected RF-signals and the depth of field, it is essential to measure the vertical distance between the tags and RFID antennas. However, it is rather difficult to directly measure the vertical distance via the phase value. Figure 5 shows the relationship between the vertical distance and the absolute distance. In regard to a specified RFID tag, suppose its absolute distances to Antenna 1 and Antenna 2 are respectively d1 and d2, then we need to derive its vertical distance h to the antenna pairs. Antenna 1 Antenna 2 d Tag G G O m h X-axis Y-axis Vertical distance T A1 A2 Tÿ Figure 5. Compute the (x, y) coordinate of the tag If we respectively use A1 and A2 to denote the midpoint of Antenna 1 and Antenna 2, and use T to denote the position of the tag, as a matter of fact, the three sides of hT,A1i, hT,A2i, and hA1,A2i form a triangle. Since Antenna A1 and Antenna A2 are separated with a fixed distance d, according to Heron’s formula [12], the area of this triangle is A = p s(s−d1)(s−d2)(s−d), where s is the semiperimeter of the triangle, i.e., s = (d1+d2+d) 2 . Moreover, since the area of this triangle can also be computed as A = 1 2 h×d, we can thus compute the vertical distance h = 2 √ s(s−d1)(s−d2)(s−d) d . Then, according to the Apollonius’ theorem [7], for a triangle composed of point A1,A2 and T, the length of median TO bisecting the side A1A2 is equal to m = 1 2 q 2d 2 1 +2d 2 2 −d 2 . Hence, the horizontal distance between the tag and the midpoint of the two antennas, i.e., T 0O, should be √ m2 −h 2 . Therefore, if we build a local coordinate system with the origin set to the the midpoint of the two antennas, the coordinate (x 0 , y 0 ) is computed as follows: x 0 = q 1 2 d 2 1 + 1 2 d 2 2 − 1 4 d 2 −h 2 d1 ≥ d2 −( q 1 2 d 2 1 + 1 2 d 2 2 − 1 4 d 2 −h 2) d1 < d2 (2) y 0 = h. (3) Therefore, the next problem we need to address is to estimate the absolute distance between the tag and antenna according to the extracted phase value from RF-signals. Suppose the RFID system respectively obtains two phase values θ1 and θ2 from two separated RFID antennas, then, according to the definition in Eq. (1), the possible distances from the tag to the two antennas are: d1 = 1 2 ·( θ1 2π +k1)· λ, and d2 = 1 2 ·( θ2 2π +k2)· λ. Here, k1 and k2 are integers ranging from 0 to +∞. Due to the multiple solutions of k1 and k2, there could be multiple candidate positions for the tag. However, since the difference of the lengths of two sides is smaller than the length of the third side in a triangle, i.e., |d1 −d2| < d, we can leverage this constraint to effectively eliminate many infeasible solutions of k1 and k2. Besides, due to the limited scanning range of the RFID system (the maximum scanning range l is usually smaller than 10 m), the value of k1 and k2 should be upper bounded by a certain threshold, i.e., 2l λ . Figure 6 shows an example of feasible positions of the target tag according to the obtained phase values θ1 and θ2. The feasible solutions include multiple positions like A ∼ D, which respectively belong to two hyperbolas H1 and H2. Due to the existence of multiple solutions, we can use these hyperbolas to denotes a superset of these feasible positions in a straightforward approach. Antenna 1 Antenna 2 d<Ȝ/2 Target A B D C Ȝ/2 G T T T T T HyPerbola H1 HyPerbola H2 G Figure 6. Estimate the distance from phase value of RF signals MATCHING ALGORITHM VIA CONTINUOUS SCANNING Motivation To identify and distinguish the multiple tagged objects, a straightforward solution is to scan the tags in a static approach, where both the depth camera and RFID antenna(s) are deployed in a fixed position without moving. The system scans the objects and tags simultaneously and respectively collect the depth value and RF-signals from these tagged objects. We can further pair the tags with the objects accordingly. However, when multiple tagged objects are placed at close vertical distance to the system, this solution cannot effectively distinguish multiple tagged objects in different horizontal distances. To address this problem, we propose a continuous scanningbased solution as follows: we continuously rotate the scanning system (including the depth camera and RFID antennas), and simultaneously sample the depth of field and RF-signals from multiple tagged objects. Hence, we are able to collect a continuous series of features like depth, RSSI and phase values during continuous scanning. While the scanning system is rotating, the vertical distances between multiple objects and the scanning system are continuously changing, from which we can further derive the differences of multiple tagged objects in different horizontal distances. In this way, we are able to further distinguish multiple tagged objects with close vertical distance but in different positions