Multi-Touch in the Air:Device-Free Finger Tracking and Gesture Recognition via COTS RFID Chuyu Wang',Jian Liu,Yingying Chen,Hongbo Liu",Lei Xief,Wei Wangt,Bingbing Hef,Sanglu Lut iState Key Laboratory for Novel Software Technology,Nanjing University,China Email:{wangcyu217,hebb}@dislab.nju.edu.cn,{Ixie,ww,sanglu@nju.edu.cn WINLAB,Rutgers University,New Brunswick,NJ,USA Email:jianliu @winlab.rutgers.edu,yingche@scarletmail.rutgers.edu "Indiana University-Purdue University,Indianapolis,IN,USA Email:hl45@iupui.edu Abstract-Recently,gesture recognition has gained consider- able attention in emerging applications (e.g.,AR/VR systems) Track the finger Recognize multi- to provide a better user experience for human-computer inter- on smart devices touch gesture action.Existing solutions usually recognize the gestures based on wearable sensors or specialized signals (e.g.,WiFi,acoustic and visible light),but they are either incurring high energy consumption or susceptible to the ambient environment,which prevents them from efficiently sensing the fine-grained finger RFID tag array Manipulate in movements.In this paper,we present RF-finger,a device-free VR gaming system based on Commercial-Off-The-Shelf(COTS)RFID,which leverages a tag array on a letter-size paper to sense the fine- grained finger movements performed in front of the paper. Particularly,we focus on two kinds of sensing modes:finger tracking recovers the moving trace of finger writings;multi-touch Fig.1.Illustrations of application of RF-finger. gesture recognition identifies the multi-touch gestures involving Therefore,accurately recognizing gestures in the air,especially multiple fingers.Specifically,we build a theoretical model to extract the fine-grained reflection feature from the raw RF-signal, fine-grained finger movements,has a great potential to provide which describes the finger influence on the tag array in cm- a better user experience in emerging VR applications and IoT level resolution.For the finger tracking,we leverage K-Nearest manipulations,which will have a market value of USD 48.56 Neighbors(KNN)to pinpoint the finger position relying on the billion by the year of 2024 [2]. fine-grained reflection features,and obtain a smoothed trace via Kalman filter.Additionally,we construct the reflection image of Existing gesture recognition solutions can be divided into each multi-touch gesture from the reflection features by regarding two categories:(i)Device-based approaches usually require the multiple fingers as a whole.Finally,we use a Convolutional the user to wear sensors,e.g.,RFID tag or smartwatch,and Neural Network(CNN)to identify the multi-touch gestures based track the motion of the sensors to recognize the gestures [15, on the images.Extensive experiments validate that RF-finger can 17.These studies usually derive the gestures by building achieve as high as 88%and 92%accuracy for finger tracking theoretical models to depict the signal changes received from and multi-touch gesture recognition,respectively. the sensors.However,device-based approaches either suffer I.INTRODUCTION from the uncomfortable user experience (e.g.,attaching the With the flourishing of ubiquitous sensing techniques,the RFID tag on the finger)or the short life cycles due to the high human-computer interaction is undergoing a reform:the nat- energy consumption.(ii)Device-free approaches recognize ural human gestures,e.g.,finger movements in the air,is pro- the gestures from ambient signals through different kinds of gressively replacing the traditional typing-based input devices techniques without requiring the user to wear any devices.As such as keyboards to provide a better user experience.Such the most popular solutions,camera-based solutions,such as gesture-based interactions have promoted the development of Kinect and LeapMotion,construct the body or finger structure both Virtual Reality (VR)and Argument Reality (AR)systems, from the video streams for accurately gesture recognition. where users could directly control the virtual objects via per- Nevertheless,they usually involve high computation and may forming gestures in the air,e.g.,writing words,manipulating incur privacy concerns of the users.More recent works try the tellurion or playing the VR games.Toward this end,the to recognize the gestures based on WiFi [16],acoustic sig- gesture-based interaction can further enable the operations on nals [18]and visible light [9].However,these solutions are the smart devices in the Internet-of-Things (loT)environments, either easily affected by the environmental noise or incapable e.g.,withdrawing the curtains,controlling the smart TVs. of sensing fine-grained gestures at the finger level.In this work,we are in search of a new device-free mechanism that Yingying Chen and Lei Xie are the co-corresponding authors can recognize finger-level gestures to facilitate the growing
Multi-Touch in the Air: Device-Free Finger Tracking and Gesture Recognition via COTS RFID Chuyu Wang† , Jian Liu‡ , Yingying Chen‡ , Hongbo Liu* , Lei Xie† , Wei Wang† , Bingbing He† , Sanglu Lu† †State Key Laboratory for Novel Software Technology, Nanjing University, China Email: {wangcyu217, hebb}@dislab.nju.edu.cn, {lxie, ww, sanglu}@nju.edu.cn ‡ WINLAB, Rutgers University, New Brunswick, NJ, USA Email: jianliu@winlab.rutgers.edu, yingche@scarletmail.rutgers.edu * Indiana University-Purdue University, Indianapolis, IN, USA Email: hl45@iupui.edu Abstract—Recently, gesture recognition has gained considerable attention in emerging applications (e.g., AR/VR systems) to provide a better user experience for human-computer interaction. Existing solutions usually recognize the gestures based on wearable sensors or specialized signals (e.g., WiFi, acoustic and visible light), but they are either incurring high energy consumption or susceptible to the ambient environment, which prevents them from efficiently sensing the fine-grained finger movements. In this paper, we present RF-finger, a device-free system based on Commercial-Off-The-Shelf (COTS) RFID, which leverages a tag array on a letter-size paper to sense the finegrained finger movements performed in front of the paper. Particularly, we focus on two kinds of sensing modes: finger tracking recovers the moving trace of finger writings; multi-touch gesture recognition identifies the multi-touch gestures involving multiple fingers. Specifically, we build a theoretical model to extract the fine-grained reflection feature from the raw RF-signal, which describes the finger influence on the tag array in cmlevel resolution. For the finger tracking, we leverage K-Nearest Neighbors (KNN) to pinpoint the finger position relying on the fine-grained reflection features, and obtain a smoothed trace via Kalman filter. Additionally, we construct the reflection image of each multi-touch gesture from the reflection features by regarding the multiple fingers as a whole. Finally, we use a Convolutional Neural Network (CNN) to identify the multi-touch gestures based on the images. Extensive experiments validate that RF-finger can achieve as high as 88% and 92% accuracy for finger tracking and multi-touch gesture recognition, respectively. I. INTRODUCTION With the flourishing of ubiquitous sensing techniques, the human-computer interaction is undergoing a reform: the natural human gestures, e.g., finger movements in the air, is progressively replacing the traditional typing-based input devices such as keyboards to provide a better user experience. Such gesture-based interactions have promoted the development of both Virtual Reality (VR) and Argument Reality (AR) systems, where users could directly control the virtual objects via performing gestures in the air, e.g., writing words, manipulating the tellurion or playing the VR games. Toward this end, the gesture-based interaction can further enable the operations on the smart devices in the Internet-of-Things (IoT) environments, e.g., withdrawing the curtains, controlling the smart TVs. Yingying Chen and Lei Xie are the co-corresponding authors. Track the finger on smart devices Manipulate in VR gaming RFID tag array Recognize multitouch gesture Fig. 1. Illustrations of application of RF-finger. Therefore, accurately recognizing gestures in the air, especially fine-grained finger movements, has a great potential to provide a better user experience in emerging VR applications and IoT manipulations, which will have a market value of USD 48.56 billion by the year of 2024 [2]. Existing gesture recognition solutions can be divided into two categories: (i) Device-based approaches usually require the user to wear sensors, e.g., RFID tag or smartwatch, and track the motion of the sensors to recognize the gestures [15, 17]. These studies usually derive the gestures by building theoretical models to depict the signal changes received from the sensors. However, device-based approaches either suffer from the uncomfortable user experience (e.g., attaching the RFID tag on the finger) or the short life cycles due to the high energy consumption. (ii) Device-free approaches recognize the gestures from ambient signals through different kinds of techniques without requiring the user to wear any devices. As the most popular solutions, camera-based solutions, such as Kinect and LeapMotion, construct the body or finger structure from the video streams for accurately gesture recognition. Nevertheless, they usually involve high computation and may incur privacy concerns of the users. More recent works try to recognize the gestures based on WiFi [16], acoustic signals [18] and visible light [9]. However, these solutions are either easily affected by the environmental noise or incapable of sensing fine-grained gestures at the finger level. In this work, we are in search of a new device-free mechanism that can recognize finger-level gestures to facilitate the growing 1
RFID Peak-to-peak antenna amplitude of RSSI RFID tag Peak-to-peak amplitude of phas 60 Vertical Horizontal 500 1000 1500 200 4006008001000 1200 Sample index Sample index (a)Experiment setup (b)Received signal of vertical movement (c)Received signal of horizontal movement Fig.2.Preliminary study of the RF signal reflection. VR applications and IoT operations. perpendicular way to reduce the interference. The recent advances demonstrate that the emerging RFID The contributions of RF-finger are summarized as follows: technology not only can sense the status of objects with i)We design a new device-free solution based on Commercial- device-based solutions [7,10-12,20],but also has the po- Off-The-Shelf (COTS)RFID for both finger tracking and tential to provide device-free sensing by leveraging the multi- multi-touch gesture recognition.To the best of our knowledge, path effect [4,21].In this work,we present RF-finger,a we are the first to recognize the multi-touch gestures based on device-free system based on RFID tag array,to sense the a RFID system through a device-free approach.ii)We build a fine-grained finger movements.Unlike previous studies,which theoretical model to depict the reflection relationship between either locate the human body in a coarse-grained manner [21] the tag array and the fingers caused by the multi-path effect. or simply detect single stroke from the hand movement for The theoretical model provides guidelines to develop two algo- letter recognition [4],RF-finger focuses on tracking the finger rithms to track the finger trajectories and recognize the multi- trace and recognizing the multi-touch gestures,which involves touch gestures.iii)We experimentally investigate the impact a smaller tracking subject and more complicated multi-touch of tag array deployment on the signal quality.We analyze gestures than existing problems.As shown in Figure 1,by the mutual interference between tags via a signal model and leveraging the tag array attached on a letter-size paper,RF- provide recommendations on tag deployment to reduce the finger seeks to support different applications including writing, interference.iv)We implement a system prototype,RF-finger, multi-touch operations,gaming,etc. for finger tacking and gesture recognition.Experiments show Specifically,we deploy only one RFID antenna behind that RF-finger can achieve the average accuracy of 88%and the tag array to continuously measure the signals emitted 92%for finger tracking and gesture recognition,respectively. from the tag array,and recognize the gestures based on II.PRELIMINARIES CHALLENGES the corresponding signal changes.In designing the RF-finger system,we need to solve three main challenging problems.i) In order to design a system to track the fine-grained finger How to track the trajectory of the finger writings?Since the movements,we first conduct several preliminary studies on finger usually affects several adjacent tags due to the multi- the impact of finger movement on the RF-signals,and the path effect,it is inaccurate to locate the finger as the position feasibility to use RFID tag array for gesture recognition. of tags.In our work,we theoretically model the impact of Based on the observations,we summary three challenges for the moving finger on the tag array to extract the reflection designing our system. features,and then exploit the reflection feature to pinpoint A.Preliminaries the finger with a cm-level resolution.ii)How to recognize Impact of Finger Movement on RF-Signals.RFID tech- the multi-touch gesture?Multi-touch gesture indicates the RF- nique has been widely used in locating and sensing system signals reflected from multiple fingers are mixed together in based on the physical modalities on RF-signal [20],i.e.,phase the tag array,making it even more difficult to distinguish these and Received Signal Strength Indicator (RSSD).Moreover. fingers for gesture recognition.To address this problem,we when a human moves around the tag,both the phase and regard the multiple fingers as a whole for recognition and RSSI are changing accordingly due to the multi-path en- then extract the reflection feature of the multiple fingers as vironment variance [21.Therefore,we first investigate the images.We then leverage a Convolutional Neural Network impact of finger movement on RF-signals,which is much (CNN)to automatically classify the corresponding gestures smaller than human body.As shown in Figure 2(a),a typical from the image features.iii)How to obtain stable signal finger movement can be decomposed into two basic directions: quality from the tag array?In real RFID systems,misreading horizontal movement (i.e.,swipe in front of the tag)and is a common phenomenon due to the dynamic environments vertical movement (i.e.,approach/departure the tag).Hence. that affects the signal quality,especially when reading multiple we conduct two experiments to investigate the influence of tags simultaneously,such as a tag array.To address this these two finger movements.Figure 2(b)presents the signal's problem,we utilize a signal model to depict the mutual phase and RSSI readings when the finger is moving towards interference between tags,which provides recommendations (i.e.,vertically)the tag from 20cm away.We find that both on tag deployment that re-arranges the adjacent tags in a the phase and RSSI readings change in a wavy pattern,and 2
RFID antenna RFID tag Vertical Horizontal (a) Experiment setup Sample index 0 500 1000 1500 Phase (radian) 1 2 3 4 RSSI (dBm) -65 -60 -55 -50 Peak-to-peak amplitude of RSSI Peak-to-peak amplitude of phase (b) Received signal of vertical movement Sample index 0 200 400 600 800 1000 1200 Phase (radian) 3 3.5 4 4.5 5 RSSI (dBm) -48 -47 -46 -45 -44 -43 (c) Received signal of horizontal movement Fig. 2. Preliminary study of the RF signal reflection. VR applications and IoT operations. The recent advances demonstrate that the emerging RFID technology not only can sense the status of objects with device-based solutions [7, 10–12, 20], but also has the potential to provide device-free sensing by leveraging the multipath effect [4, 21]. In this work, we present RF-finger, a device-free system based on RFID tag array, to sense the fine-grained finger movements. Unlike previous studies, which either locate the human body in a coarse-grained manner [21] or simply detect single stroke from the hand movement for letter recognition [4], RF-finger focuses on tracking the finger trace and recognizing the multi-touch gestures, which involves a smaller tracking subject and more complicated multi-touch gestures than existing problems. As shown in Figure 1, by leveraging the tag array attached on a letter-size paper, RF- finger seeks to support different applications including writing, multi-touch operations, gaming, etc. Specifically, we deploy only one RFID antenna behind the tag array to continuously measure the signals emitted from the tag array, and recognize the gestures based on the corresponding signal changes. In designing the RF-finger system, we need to solve three main challenging problems. i) How to track the trajectory of the finger writings? Since the finger usually affects several adjacent tags due to the multipath effect, it is inaccurate to locate the finger as the position of tags. In our work, we theoretically model the impact of the moving finger on the tag array to extract the reflection features, and then exploit the reflection feature to pinpoint the finger with a cm-level resolution. ii) How to recognize the multi-touch gesture? Multi-touch gesture indicates the RFsignals reflected from multiple fingers are mixed together in the tag array, making it even more difficult to distinguish these fingers for gesture recognition. To address this problem, we regard the multiple fingers as a whole for recognition and then extract the reflection feature of the multiple fingers as images. We then leverage a Convolutional Neural Network (CNN) to automatically classify the corresponding gestures from the image features. iii) How to obtain stable signal quality from the tag array? In real RFID systems, misreading is a common phenomenon due to the dynamic environments that affects the signal quality, especially when reading multiple tags simultaneously, such as a tag array. To address this problem, we utilize a signal model to depict the mutual interference between tags, which provides recommendations on tag deployment that re-arranges the adjacent tags in a perpendicular way to reduce the interference. The contributions of RF-finger are summarized as follows: i) We design a new device-free solution based on CommercialOff-The-Shelf (COTS) RFID for both finger tracking and multi-touch gesture recognition. To the best of our knowledge, we are the first to recognize the multi-touch gestures based on a RFID system through a device-free approach. ii) We build a theoretical model to depict the reflection relationship between the tag array and the fingers caused by the multi-path effect. The theoretical model provides guidelines to develop two algorithms to track the finger trajectories and recognize the multitouch gestures. iii) We experimentally investigate the impact of tag array deployment on the signal quality. We analyze the mutual interference between tags via a signal model and provide recommendations on tag deployment to reduce the interference. iv) We implement a system prototype, RF-finger, for finger tacking and gesture recognition. Experiments show that RF-finger can achieve the average accuracy of 88% and 92% for finger tracking and gesture recognition, respectively. II. PRELIMINARIES & CHALLENGES In order to design a system to track the fine-grained finger movements, we first conduct several preliminary studies on the impact of finger movement on the RF-signals, and the feasibility to use RFID tag array for gesture recognition. Based on the observations, we summary three challenges for designing our system. A. Preliminaries Impact of Finger Movement on RF-Signals. RFID technique has been widely used in locating and sensing system based on the physical modalities on RF-signal [20], i.e., phase and Received Signal Strength Indicator (RSSI). Moreover, when a human moves around the tag, both the phase and RSSI are changing accordingly due to the multi-path environment variance [21]. Therefore, we first investigate the impact of finger movement on RF-signals, which is much smaller than human body. As shown in Figure 2(a), a typical finger movement can be decomposed into two basic directions: horizontal movement (i.e., swipe in front of the tag) and vertical movement (i.e., approach/departure the tag). Hence, we conduct two experiments to investigate the influence of these two finger movements. Figure 2(b) presents the signal’s phase and RSSI readings when the finger is moving towards (i.e., vertically) the tag from 20cm away. We find that both the phase and RSSI readings change in a wavy pattern, and 2
2050.09.0500513530529 256.25级.17.78054872618 >358.963.259.2-581613617575 RFID Reflection 452752854.0523520520520 s世n Free-space ● 553.9-49.048.0-490-490-49851.0 signal ● ● 23567 RFID (a)Universal deployment (b)RSSI distribution of universal deploy- antenna ment Fig.3.Preliminary study of the tag array deployment. Fig.4.Reflection model of a tag. the peak-to-peak amplitude [1 increases slowly with the to track finger trace in fine granularity. approaching finger.This indicates that the approaching finger Recognizing Multi-touch Gestures.Unlike the finger- leads to larger refection effect. writing,multi-touch gesture indicates several parts of the tag Additionally,when we swipe the fingers 40cm along the array are affected by different fingers.However,the distance horizontal direction as shown in Figure 2(a),we observe sim- between adjacent fingers is similar to the size of the tag, ilar phenomenon in Figure 2(c).The peak-to-peak amplitude and the finger may affect the tags even though it is 10cm first increases and then decreases as the fingers swipe across away as shown in Figure 2(b)and Figure 2(c).Hence,it is the tag.The results indicate that the peak-to-peak amplitude difficult to distinguish these fingers from the coarse-grained correlates with the distance between the finger and the tag, tag information.To address the challenge,we treat multiple which is later analyzed in Section III.Since the peak-to-peak fingers as a whole without distinguishing each finger and amplitude indicates the linear distance between finger and tag, design a novel solution to recognize the multi-touch gestures we can deploy a tag array to track the moving finger. from the whole of the multiple fingers. Signal Interference within a Tag Array.When we deploy Reducing the Mutual Interference of Tag Array.The the tag array to capture the finger movement,the density of the received signal of the RFID tag can be easily affected by array is a fundamental factor on understanding the granularity the adjacent tags,as shown in Figure 3(b).Such interference of the gestures.For example,a sparse tag array can only may lead to large tracking error,we thus need to find a way recognize the coarse-grained strokes based on the detected to obtain the uniform signal across all tags by reducing the tags affected by the whole hand [41.Therefore,to recognize mutual interference effect of the tag array. the finger-level gestures,we should exploit a dense tag array III.MODELING FINGER TRACKING UPON A TAG ARRAY deployment to serve better recognition capability. In this work,we use the small RFID tag AZ-9629,whose In this section,we introduce the reflection effect of RFID size is only 2.25cm x 2.25cm,so that the tags can ar- tag array with a theoretical wireless model.Particularly,we range tightly.Specifically,we deploy a 5 x 7 tag array into start from the reflection of a single tag,which explains the 15cm x 21cm rectangular space,while each tag only occupies experimental results in Section III and introduces to extract 3cm x 3cm space.A simple deployment is to universally the reflection feature in our system.Then,we move forward deploy all tags with the same orientation as shown in Fig- to the reflection of a tag array,which integrates the reflection ure 3(a).Under this deployment,Figure 3(b)shows the RSSI features of nearby tags to facilitate the perception of the fine- distribution of 35 tags in the unit of dBm when there is no grained finger movement and the multi-touch gestures. finger around.We observe that the RSSI readings vary greatly A.Impact of Finger Movement on a Single Tag across different tags due to the electromagnetic induction between the dense tags [8].In particular,larger RSSI values The signal received from the tag is typically represented as are captured from the marginal tags than those from the tags a stream of complex numbers.In theory,it can be expressed in the center.Therefore,a new deployment is proposed in as: S=X·Sh, (1) Section IV-B to provide stable and uniform RF-signals. where X is the stream of binary bits modulated by the tag,and B.Challenges Sh=ae is the channel parameter of the received signal.In RFID system,we can obtain the channel related information. To develop the finger-level gesture tracking system under including both the RSS in the unit of dBm as R and the phase realistic settings,a number of challenges need to be addressed. value as 6,thus the channel parameter Sh can be calculated Tracking Fine-grained Finger-writing.Given the area size as: 3cm x 3cm of each tag,it can only achieve a coarse-grained 10器 8=V10R/10-3eJ0 (2) resolution of the finger moving trace by detecting the sig- Sh= 1000e nificantly disturbed tag.Moreover,the dense tag deployment Figure 4 illustrates the reflections in RFID system with a may also lead to the detecting errors due to the mutual tag simple case,where the finger swipes across a tag.Besides the interference as shown in Figure 3(b).Therefore,we should free-space signals directly sent from the RFID antenna,the tag have an in-depth understanding about the signals from the tag would also receive the signals reflected by the moving finger. array during the finger movement and then develop the system In the corresponding I-Q plane,two received signals can be 3
(a) Universal deployment -52.0 -56.2 -58.9 -52.7 -53.9 -50.0 -56.1 -63.2 -52.8 -49.0 -49.0 -57.7 -59.2 -54.0 -48.0 -50.0 -58.0 -58.1 -52.3 -49.0 -51.3 -54.8 -61.3 -52.0 -49.0 -53.0 -57.2 -61.7 -52.0 -49.8 -52.9 -61.8 -57.5 -52.0 -51.0 X 1 2 3 4 5 6 7 Y 1 2 3 4 5 (b) RSSI distribution of universal deployment Fig. 3. Preliminary study of the tag array deployment. the peak-to-peak amplitude [1] increases slowly with the approaching finger. This indicates that the approaching finger leads to larger reflection effect. Additionally, when we swipe the fingers 40cm along the horizontal direction as shown in Figure 2(a), we observe similar phenomenon in Figure 2(c). The peak-to-peak amplitude first increases and then decreases as the fingers swipe across the tag. The results indicate that the peak-to-peak amplitude correlates with the distance between the finger and the tag, which is later analyzed in Section III. Since the peak-to-peak amplitude indicates the linear distance between finger and tag, we can deploy a tag array to track the moving finger. Signal Interference within a Tag Array. When we deploy the tag array to capture the finger movement, the density of the array is a fundamental factor on understanding the granularity of the gestures. For example, a sparse tag array can only recognize the coarse-grained strokes based on the detected tags affected by the whole hand [4]. Therefore, to recognize the finger-level gestures, we should exploit a dense tag array deployment to serve better recognition capability. In this work, we use the small RFID tag AZ-9629, whose size is only 2.25cm × 2.25cm, so that the tags can arrange tightly. Specifically, we deploy a 5 × 7 tag array into 15cm×21cm rectangular space, while each tag only occupies 3cm × 3cm space. A simple deployment is to universally deploy all tags with the same orientation as shown in Figure 3(a). Under this deployment, Figure 3(b) shows the RSSI distribution of 35 tags in the unit of dBm when there is no finger around. We observe that the RSSI readings vary greatly across different tags due to the electromagnetic induction between the dense tags [8]. In particular, larger RSSI values are captured from the marginal tags than those from the tags in the center. Therefore, a new deployment is proposed in Section IV-B to provide stable and uniform RF-signals. B. Challenges To develop the finger-level gesture tracking system under realistic settings, a number of challenges need to be addressed. Tracking Fine-grained Finger-writing. Given the area size 3cm × 3cm of each tag, it can only achieve a coarse-grained resolution of the finger moving trace by detecting the significantly disturbed tag. Moreover, the dense tag deployment may also lead to the detecting errors due to the mutual tag interference as shown in Figure 3(b). Therefore, we should have an in-depth understanding about the signals from the tag array during the finger movement and then develop the system I O Q RFID antenna RFID tag Reflection signal Free-space signal !"#$%"& !'()) Swipe *()'&)#$ after swipe *"#$%"& after swipe Swipe Fig. 4. Reflection model of a tag. to track finger trace in fine granularity. Recognizing Multi-touch Gestures. Unlike the fingerwriting, multi-touch gesture indicates several parts of the tag array are affected by different fingers. However, the distance between adjacent fingers is similar to the size of the tag, and the finger may affect the tags even though it is 10cm away as shown in Figure 2(b) and Figure 2(c). Hence, it is difficult to distinguish these fingers from the coarse-grained tag information. To address the challenge, we treat multiple fingers as a whole without distinguishing each finger and design a novel solution to recognize the multi-touch gestures from the whole of the multiple fingers. Reducing the Mutual Interference of Tag Array. The received signal of the RFID tag can be easily affected by the adjacent tags, as shown in Figure 3(b). Such interference may lead to large tracking error, we thus need to find a way to obtain the uniform signal across all tags by reducing the mutual interference effect of the tag array. III. MODELING FINGER TRACKING UPON A TAG ARRAY In this section, we introduce the reflection effect of RFID tag array with a theoretical wireless model. Particularly, we start from the reflection of a single tag, which explains the experimental results in Section III and introduces to extract the reflection feature in our system. Then, we move forward to the reflection of a tag array, which integrates the reflection features of nearby tags to facilitate the perception of the finegrained finger movement and the multi-touch gestures. A. Impact of Finger Movement on a Single Tag The signal received from the tag is typically represented as a stream of complex numbers. In theory, it can be expressed as: S = X · Sh, (1) where X is the stream of binary bits modulated by the tag, and Sh = αeJθ is the channel parameter of the received signal. In RFID system, we can obtain the channel related information, including both the RSS in the unit of dBm as R and the phase value as θ, thus the channel parameter Sh can be calculated as: Sh = s 10 R 10 1000 e Jθ = p 10R/10−3e Jθ . (2) Figure 4 illustrates the reflections in RFID system with a simple case, where the finger swipes across a tag. Besides the free-space signals directly sent from the RFID antenna, the tag would also receive the signals reflected by the moving finger. In the corresponding I-Q plane, two received signals can be 3
Signal Stream:S(t) Signal Preprocessing Data Calibration Semengtation CNN-based Training Model Reflection Feature 口A X (cm) Extraction (a)Reflections from tag array (b)Reflection RSS of tag array Finger Tracking Multi-touch Recognition Fig.5.Reflection model of a tag array NN-based Localization Correlation-based Image Construction represented as Sfree and Sreftect,respectively.Therefore,the 中 actual signal received by the reader can be represented as: Kalman-based Trace Smoothing CNN-based Gesture Sactual Sfree Sreflect. (3) Recognition Here,the finger movement affects Sreftect due to the change Fig.6.System framework of reflection path,thus both the RSS and phase of Sactual also Figure 5(b)illustrates the case where the finger is at (0,0)co- vary accordingly.In order to track the finger movements,we ordinate with 3cm height and each 1cmx lcm grid is supposed need to separate Srefteet from the received signals to roughly to deploy a tag.We set C to 1 for simplicity in this figure.We describe the distance of the reflection path.Specifically,we note that the power highly concentrates at the position of the can estimate Sreflect by subtracting Sactual by Sfree,where finger.Therefore,we can use the theoretical power distribution Sfree can be measured without the reflection object. as a pattern to estimate the finger position from the measured B.Impact of Finger Movement on a Tag Array power distribution of the whole tag array.By computing the The single tag model depicts the signal change on one tag theoretical power distribution in a fine-resolution manner,we caused by the finger movement,but the tag array involving are able to refine the recognition resolution of the tag array multiple tags,meaning the finger affects several adjacent tags with the correlation-based interpolation.In Section V-B,we at the same time.To better understand the reflected signals will show the effectiveness of the tag array model by extracting from the finger,we derive the theoretical model of tag array the refection feature from the reflection power distribution. as follows.In Figure 5(a),we use a one-dimension tag array to illustrate the finger impact on the tag array for simplicity. IV.SYSTEM OVERVIEW Specifically,the antenna A interrogates six tags 71 to 76,while A.System Architecture the finger H is hanging upon the tag array. The major objective of our work is to recognize the fine- According to the single tag model,we can derive the grained finger gestures via a device-free approach.Towards reflection feature Sreftect for each tag.Additionally,Sreftect this end,we design an RFID-based system,RF-finger,which can be further divided into two parts based on the reflection captures the signal changes on the tag array for gesture path in Figure 5(a): recognition.As shown in Figure 6,RF-finger consists of four Sreflect=SA→HSH→T: (4) main components:two core modules Signal Pre-processing where SAH represents the signal from A to H.SH and Reflection Feature Extraction,followed by two function- represents the signal reflected from H to 7i,and varies based ality modules Finger Tracking and Multi-touch Recognition. on the tag's position.In an ideal channel model [5].S Specifically,RF-finger takes as input the time-series signal is defined as: 1 si(t)received from each tag i of the tag array,including both SH-T=dT. (5) the RSSI and phase information.The Signal Pre-processing where diT is the distance between H and Ti.T is the module first calibrates the measured signal by interpolating the phase shift over the distance dFormally,the phase shift misreading signal and smoothing the signal.Next,we divide can be calculated from the wave length A as: the smoothed signals into separated gestures by analyzing the signal variance of all tags,which accurately estimates the 0HT4=2 dHT:mod 2n (6) 入 starting and ending point of a gesture.Then,the Reflection For each tag Ti,we can combine Eq.(4)and Eq.(5)to Feature Extraction module extracts the reflection features of calculate the power of the Srefteet [5]as: the gesture based on our reflection model in Section III. 1 Preflect =Sreflect2=C*- (7) After extracting the reflection features from RF signal,two main functionality modules are followed for finger tracking where denotes the module of the complex parameter to get and multi-touch gesture recognition.For the finger-writing, the power and C=SA2 is a constant power.Therefore,the Finger Tracking module locates the finger from the re- the magnitude of Preftect is determined by dHT,meaning flection features in each time stamp based on the K-Nearest the finger leads to larger reflection power to the close tags. Neighbors (KNN)algorithm.Locations in consecutive time Given the position of H,we can calculate the distribution stamps are connected together and smoothed via Kalman filter of reflection power Preftect in the 2D space from Eq.(7).to obtain a fine-grained trace.For the multi-touch gestures
O X Y A H T1 T2 T3 T4 T5 T6 H !H!Ti !A!H (a) Reflections from tag array X (cm) -5 0 5 Y (cm) -5 0 5 (b) Reflection RSS of tag array Fig. 5. Reflection model of a tag array. represented as Sf ree and Sref lect, respectively. Therefore, the actual signal received by the reader can be represented as: Sactual = Sf ree + Sref lect. (3) Here, the finger movement affects Sref lect due to the change of reflection path, thus both the RSS and phase of Sactual also vary accordingly. In order to track the finger movements, we need to separate Sref lect from the received signals to roughly describe the distance of the reflection path. Specifically, we can estimate Sref lect by subtracting Sactual by Sf ree, where Sf ree can be measured without the reflection object. B. Impact of Finger Movement on a Tag Array The single tag model depicts the signal change on one tag caused by the finger movement, but the tag array involving multiple tags, meaning the finger affects several adjacent tags at the same time. To better understand the reflected signals from the finger, we derive the theoretical model of tag array as follows. In Figure 5(a), we use a one-dimension tag array to illustrate the finger impact on the tag array for simplicity. Specifically, the antenna A interrogates six tags T1 to T6, while the finger H is hanging upon the tag array. According to the single tag model, we can derive the reflection feature Sref lect for each tag. Additionally, Sref lect can be further divided into two parts based on the reflection path in Figure 5(a): Sref lect = SA→HSH→Ti . (4) where SA→H represents the signal from A to H. SH→Ti represents the signal reflected from H to Ti , and varies based on the tag’s position. In an ideal channel model [5], SH→Ti is defined as: SH→Ti = 1 d 2 HT i e JθHT i , (5) where dHT i is the distance between H and Ti . θHT i is the phase shift over the distance dHT i . Formally, the phase shift can be calculated from the wave length λ as: θHT i = 2π dHT i λ mod 2π. (6) For each tag Ti , we can combine Eq. (4) and Eq. (5) to calculate the power of the Sref lect [5] as: Pref lect = |Sref lect| 2 = C ∗ 1 d 4 HT i , (7) where |·| denotes the module of the complex parameter to get the power and C = |SA→H| 2 is a constant power. Therefore, the magnitude of Pref lect is determined by dHT i , meaning the finger leads to larger reflection power to the close tags. Given the position of H, we can calculate the distribution of reflection power Pref lect in the 2D space from Eq. (7). Signal Stream: !"#$% Reflection Feature Extraction Finger Tracking KNN-based Localization Kalman-based Trace Smoothing Signal Preprocessing Data Calibration Semengtation Multi-touch Recognition Correlation-based Image Construction CNN-based Gesture Recognition CNN-based Training Model Fig. 6. System framework. Figure 5(b) illustrates the case where the finger is at (0, 0) coordinate with 3cm height and each 1cm×1cm grid is supposed to deploy a tag. We set C to 1 for simplicity in this figure. We note that the power highly concentrates at the position of the finger. Therefore, we can use the theoretical power distribution as a pattern to estimate the finger position from the measured power distribution of the whole tag array. By computing the theoretical power distribution in a fine-resolution manner, we are able to refine the recognition resolution of the tag array with the correlation-based interpolation. In Section V-B, we will show the effectiveness of the tag array model by extracting the reflection feature from the reflection power distribution. IV. SYSTEM OVERVIEW A. System Architecture The major objective of our work is to recognize the finegrained finger gestures via a device-free approach. Towards this end, we design an RFID-based system, RF-finger, which captures the signal changes on the tag array for gesture recognition. As shown in Figure 6, RF-finger consists of four main components: two core modules Signal Pre-processing and Reflection Feature Extraction, followed by two functionality modules Finger Tracking and Multi-touch Recognition. Specifically, RF-finger takes as input the time-series signal si(t) received from each tag i of the tag array, including both the RSSI and phase information. The Signal Pre-processing module first calibrates the measured signal by interpolating the misreading signal and smoothing the signal. Next, we divide the smoothed signals into separated gestures by analyzing the signal variance of all tags, which accurately estimates the starting and ending point of a gesture. Then, the Reflection Feature Extraction module extracts the reflection features of the gesture based on our reflection model in Section III. After extracting the reflection features from RF signal, two main functionality modules are followed for finger tracking and multi-touch gesture recognition. For the finger-writing, the Finger Tracking module locates the finger from the re- flection features in each time stamp based on the K-Nearest Neighbors (KNN) algorithm. Locations in consecutive time stamps are connected together and smoothed via Kalman filter to obtain a fine-grained trace. For the multi-touch gestures, 4
2153020532510540507 2.4 Gesture- 2 49.050851051950050050.0 22 Release >3510500500-4904992051.0 (uerp) -497500500490480500486 Threshold ● 502508-49050050050051.0 riginal phas 14 ● 23 587 After interpolating smoothing 2 150 200 30 (a)Shuffled deployment (b)RSSI distribution of shuffled 50 100 100 200 Sample index Sample index deployment (a)Signal calibration (b)Gesture segmentation Fig.7.Shuffled deployment of the dense tag array. Fig.8.Illustration of signal preprocessing in RF-finger. the Multi-touch Recognition module leverages a Convolu- 4)=86-4)+(06+)-66-1》-车1 (8) tional Neural Network(CNN)to automatically classify each tit1-ti-1 gesture from the visual features.Particularly,it constructs a where 0(ti_1)and 0(ti1)are two adjacent phase readings before and after time ti.After interpolation,a moving average 3-frame image of the gesture from the reflection features, which describes the influence range of the multiple fingers filter is applied to smooth the signal,which further removes high-frequency noise.Figure 8(a)illustrates the effectiveness in the starting/middle/ending period of the gestures.Then we learn the neural network model from the 3-frame image for of our data calibration by comparing the phase stream before gesture classification.Finally,we can recognize the gestures and after data calibration.The phase stream shown in the figure is from one tag in the array,when the user is performing by analyzing the classification scores based on CNN right rotate gesture.From the enlarged figure,we could clearly B.Dense Tag Array Deployment see the misreadings are well interpolated.Moreover,after As illustrated in Section II-A,we observe that the adjacent smoothing,the high-frequency serrated waves are removed. tags in the dense tag array have great impacts on the signal To capture the signal pattern of a specific finger movement, quality of other tags due to the electromagnetic interference [8, we need to identify its starting and ending point,which 19].The principle behind such influence is the electromagnetic correspond to the gesture people tend to raise the hand up interference between the two tags [8].As a result,the parallel and drop the hand down.Therefore,a segmentation method deployed tags will affect the nearby tags due to the mutual based on the detection of the calibrated RF-signals variance interference.To eliminate such mutual interference,we shuffle is developed to detect the actions of rasing/releasing hand the directions of part tags as shown in Figure 7(a)by making to segment gestures.Intuitively,we observe that the signal the nearby tags perpendicular to each other.In this way,we should be stable when people drop the hand down,and the can minimize the interference between nearby tags by making signal of some tags experiences distinct variations when the the electromagnetic interference perpendicular to each tag.As user performs the gestures.Therefore,we further leverage a a result,we can then achieve a stable RSSI measurement sliding window to calculate the variance stream of each tag across all tags,which is shown in Figure 7(b).Therefore,we from the calibrated RF-signals,and the starting/ending points adopt the perpendicular deployment of the tag array in our should have large variance values.Figure 8(b)illustrates the system. variance stream of all the 35 tags,which takes as input the V.RF-FINGER SYSTEM DESIGN calibrated phase stream.We find only part of the tags have In this section,we will talk about the detailed design of the large signal variance at the same time,because the finger only proposed RF-finger system.Specifically,we first preprocess affects several tags close to the finger.Thus,we continuously the raw RF-signals and then extract the reflection features to calculate the maximum variance of each sliding window for depict the finger influence on the tag array.Finally,we track the maximum variance stream.Based on the first and last the finger trace and recognize the multi-touch gestures from peak of the max variance stream,we can detect the action of these reflection features. raising/releasing hand and then take the signal stream between them as the gesture signal. A.Signal Preprocessing Given the received RF-signals,which involve some inherent B.Refection Feature Extraction measurement defects such as misreading tags and noise,the After signal processing,we have the segmented and noise- data calibration process is developed to improve the reliability less signal of each individual gesture,so we first leverage of the RF-signals by interpolating the misreading tags and the reflection model in Section III-A to derive the reflection smoothing the signal.In RFID system,the misreading tags signals Sreftect of each tag.Then we extract the reflection are usually caused by the highly dynamic environment during features from the Sreftect as the likelihood distribution inside the finger movement.Therefore,we can interpolate the mis-the tag array zone,where the likelihood of each position reading RF-signals from adjacent sampling rounds based on depicts the probability that the finger locates at the position. the continuous movement of the finger.Take a phase stream Before defining the likelihood.we derive the reflection signal 0(t)as an example,which is time-series phase values from of each tag by removing the free-space signal as Srefiect one tag.If there is a misreading phase (ti),we calculate the Sactual-Sfree.Particularly,Sactual is collected during the interpolation value from other phase reading as: gesture period and Sfree is collected before the gesture. 5
(a) Shuffled deployment -52.7 -49.0 -51.0 -49.7 -50.2 -53.0 -50.6 -50.0 -50.0 -50.6 -52.0 -51.0 -50.0 -50.0 -49.0 -53.2 -51.9 -49.0 -49.0 -50.0 -51.0 -50.0 -49.9 -48.0 -50.0 -54.0 -50.0 -52.0 -50.0 -50.0 -50.7 -50.0 -51.0 -48.6 -51.0 X 1 2 3 4 5 6 7 Y 1 2 3 4 5 (b) RSSI distribution of shuffled deployment Fig. 7. Shuffled deployment of the dense tag array. the Multi-touch Recognition module leverages a Convolutional Neural Network (CNN) to automatically classify each gesture from the visual features. Particularly, it constructs a 3-frame image of the gesture from the reflection features, which describes the influence range of the multiple fingers in the starting/middle/ending period of the gestures. Then we learn the neural network model from the 3-frame image for gesture classification. Finally, we can recognize the gestures by analyzing the classification scores based on CNN. B. Dense Tag Array Deployment As illustrated in Section II-A, we observe that the adjacent tags in the dense tag array have great impacts on the signal quality of other tags due to the electromagnetic interference [8, 19]. The principle behind such influence is the electromagnetic interference between the two tags [8]. As a result, the parallel deployed tags will affect the nearby tags due to the mutual interference. To eliminate such mutual interference, we shuffle the directions of part tags as shown in Figure 7(a) by making the nearby tags perpendicular to each other. In this way, we can minimize the interference between nearby tags by making the electromagnetic interference perpendicular to each tag. As a result, we can then achieve a stable RSSI measurement across all tags, which is shown in Figure 7(b). Therefore, we adopt the perpendicular deployment of the tag array in our system. V. RF-FINGER SYSTEM DESIGN In this section, we will talk about the detailed design of the proposed RF-finger system. Specifically, we first preprocess the raw RF-signals and then extract the reflection features to depict the finger influence on the tag array. Finally, we track the finger trace and recognize the multi-touch gestures from these reflection features. A. Signal Preprocessing Given the received RF-signals, which involve some inherent measurement defects such as misreading tags and noise, the data calibration process is developed to improve the reliability of the RF-signals by interpolating the misreading tags and smoothing the signal. In RFID system, the misreading tags are usually caused by the highly dynamic environment during the finger movement. Therefore, we can interpolate the misreading RF-signals from adjacent sampling rounds based on the continuous movement of the finger. Take a phase stream θ(t) as an example, which is time-series phase values from one tag. If there is a misreading phase θ(ti), we calculate the interpolation value from other phase reading as: Sample index 0 50 100 150 200 Phase (radian) 1.2 1.4 1.6 1.8 2 2.2 2.4 Original phase After interpolating After interpolating & smoothing 2 2.1 2.2 2.3 (a) Signal calibration Sample index 0 100 200 300 Phase (radian) 0 1 2 3 4 Variance of one tag Max variance of all tags Threshold Gesture Raise hand Release hand (b) Gesture segmentation Fig. 8. Illustration of signal preprocessing in RF-finger. ˆθ(ti) = θ(ti−1) + (θ(ti+1) − θ(ti−1)) ti − ti−1 ti+1 − ti−1 , (8) where θ(ti−1) and θ(ti+1) are two adjacent phase readings before and after time ti . After interpolation, a moving average filter is applied to smooth the signal, which further removes high-frequency noise. Figure 8(a) illustrates the effectiveness of our data calibration by comparing the phase stream before and after data calibration. The phase stream shown in the figure is from one tag in the array, when the user is performing right rotate gesture. From the enlarged figure, we could clearly see the misreadings are well interpolated. Moreover, after smoothing, the high-frequency serrated waves are removed. To capture the signal pattern of a specific finger movement, we need to identify its starting and ending point, which correspond to the gesture people tend to raise the hand up and drop the hand down. Therefore, a segmentation method based on the detection of the calibrated RF-signals variance is developed to detect the actions of rasing/releasing hand to segment gestures. Intuitively, we observe that the signal should be stable when people drop the hand down, and the signal of some tags experiences distinct variations when the user performs the gestures. Therefore, we further leverage a sliding window to calculate the variance stream of each tag from the calibrated RF-signals, and the starting/ending points should have large variance values. Figure 8(b) illustrates the variance stream of all the 35 tags, which takes as input the calibrated phase stream. We find only part of the tags have large signal variance at the same time, because the finger only affects several tags close to the finger. Thus, we continuously calculate the maximum variance of each sliding window for the maximum variance stream. Based on the first and last peak of the max variance stream, we can detect the action of raising/releasing hand and then take the signal stream between them as the gesture signal. B. Reflection Feature Extraction After signal processing, we have the segmented and noiseless signal of each individual gesture, so we first leverage the reflection model in Section III-A to derive the reflection signals Sref lect of each tag. Then we extract the reflection features from the Sref lect as the likelihood distribution inside the tag array zone, where the likelihood of each position depicts the probability that the finger locates at the position. Before defining the likelihood, we derive the reflection signal of each tag by removing the free-space signal as Sref lect = Sactual − Sf ree. Particularly, Sactual is collected during the gesture period and Sf ree is collected before the gesture. 5