AirContour:Building Contour-based Model for In-Air Writing Gesture Recognition YAFENG YIN and LEI XIE,State Key Laboratory for Novel Software Technology,Nanjing University, China TAO GU,RMIT University,Australia YIJIA LU and SANGLU LU,State Key Laboratory for Novel Software Technology,Nanjing University, China Recognizing in-air hand gestures will benefit a wide range of applications such as sign-language recogni- tion,remote control with hand gestures,and "writing"in the air as a new way of text input.This article presents AirContour,which focuses on in-air writing gesture recognition with a wrist-worn device.We pro- pose a novel contour-based gesture model that converts human gestures to contours in 3D space and then recognizes the contours as characters.Different from 2D contours,the 3D contours may have the problems such as contour distortion caused by different viewing angles,contour difference caused by different writing directions,and the contour distribution across different planes.To address the above problem,we introduce Principal Component Analysis(PCA)to detect the principal/writing plane in 3D space,and then tune the projected 2D contour in the principal plane through reversing,rotating,and normalizing operations,to make the 2D contour in right orientation and normalized size under a uniform view.After that,we propose both an online approach,AC-Vec,and an offline approach,AC-CNN,for character recognition.The experimen- tal results show that AC-Vec achieves an accuracy of 91.6%and AC-CNN achieves an accuracy of 94.3%for gesture/character recognition,both outperforming the existing approaches. CCS Concepts:Human-centered computing-Ubiquitous and mobile computing design and eval- uation methods;Empirical studies in ubiquitous and mobile computing: Additional Key Words and Phrases:AirContour,in-air writing,contour-based gesture model,principal com- ponent analysis(PCA),gesture recognition ACM Reference format: Yafeng Yin,Lei Xie,Tao Gu,Yijia Lu,and Sanglu Lu.2019.AirContour:Building Contour-based Model for In-Air Writing Gesture Recognition.ACM Trans.Sen.Netw.15,4,Article 44(October 2019),25 pages. https:/doi.org/10.1145/3343855 This work is supported by National Natural Science Foundation of China under Grant Nos.61802169,61872174,61832008. 61321491:JiangSu Natural Science Foundation under Grant No.BK20180325;the Fundamental Research Funds for the Cen- tral Universities under Grant No.020214380049;Australian Research Council(ARC)Discovery Project Grants DP190101888 and DP180103932.This work is partially supported by Collaborative Innovation Center of Novel Software Technology and Industrialization Authors'addresses:Y.Yin,L.Xie (corresponding author).Y.Lu,and S.Lu,State Key Laboratory for Novel Software Technology,Nanjing University,Nanjing.210023,China;emails:(yafeng.Ixie@nju.edu.cn,lyj@smailnju.edu.cn,sanglu@ nju.edu cn;T.Gu,School of Computer Science and Information Technology,RMIT University.Melbourne VIC 3000,Aus- tralia;email:tao.gu@rmit.edu.au. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page.Copyrights for components of this work owned by others than ACM must be honored Abstracting with credit is permitted.To copy otherwise,or republish,to post on servers or to redistribute to lists.requires prior specific permission and/or a fee.Request permissions from permissions@acm.org. 2019 Association for Computing Machinery. 1550-4859/2019/10-ART44$15.00 https:/∥doi.org/10.1145/3343855 ACM Transactions on Sensor Networks,Vol.15,No.4.Article 44.Publication date:October 2019. 44
44 AirContour: Building Contour-based Model for In-Air Writing Gesture Recognition YAFENG YIN and LEI XIE, State Key Laboratory for Novel Software Technology, Nanjing University, China TAO GU, RMIT University, Australia YIJIA LU and SANGLU LU, State Key Laboratory for Novel Software Technology, Nanjing University, China Recognizing in-air hand gestures will benefit a wide range of applications such as sign-language recognition, remote control with hand gestures, and “writing” in the air as a new way of text input. This article presents AirContour, which focuses on in-air writing gesture recognition with a wrist-worn device. We propose a novel contour-based gesture model that converts human gestures to contours in 3D space and then recognizes the contours as characters. Different from 2D contours, the 3D contours may have the problems such as contour distortion caused by different viewing angles, contour difference caused by different writing directions, and the contour distribution across different planes. To address the above problem, we introduce Principal Component Analysis (PCA) to detect the principal/writing plane in 3D space, and then tune the projected 2D contour in the principal plane through reversing, rotating, and normalizing operations, to make the 2D contour in right orientation and normalized size under a uniform view. After that, we propose both an online approach, AC-Vec, and an offline approach, AC-CNN, for character recognition. The experimental results show that AC-Vec achieves an accuracy of 91.6% and AC-CNN achieves an accuracy of 94.3% for gesture/character recognition, both outperforming the existing approaches. CCS Concepts: • Human-centered computing → Ubiquitous and mobile computing design and evaluation methods; Empirical studies in ubiquitous and mobile computing; Additional Key Words and Phrases: AirContour, in-air writing, contour-based gesture model, principal component analysis (PCA), gesture recognition ACM Reference format: Yafeng Yin, Lei Xie, Tao Gu, Yijia Lu, and Sanglu Lu. 2019. AirContour: Building Contour-based Model for In-Air Writing Gesture Recognition. ACM Trans. Sen. Netw. 15, 4, Article 44 (October 2019), 25 pages. https://doi.org/10.1145/3343855 This work is supported by National Natural Science Foundation of China under Grant Nos. 61802169, 61872174, 61832008, 61321491; JiangSu Natural Science Foundation under Grant No. BK20180325; the Fundamental Research Funds for the Central Universities under Grant No. 020214380049; Australian Research Council (ARC) Discovery Project Grants DP190101888 and DP180103932. This work is partially supported by Collaborative Innovation Center of Novel Software Technology and Industrialization. Authors’ addresses: Y. Yin, L. Xie (corresponding author), Y. Lu, and S. Lu, State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, 210023, China; emails: {yafeng, lxie}@nju.edu.cn, lyj@smail.nju.edu.cn, sanglu@ nju.edu.cn; T. Gu, School of Computer Science and Information Technology, RMIT University, Melbourne VIC 3000, Australia; email: tao.gu@rmit.edu.au. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. © 2019 Association for Computing Machinery. 1550-4859/2019/10-ART44 $15.00 https://doi.org/10.1145/3343855 ACM Transactions on Sensor Networks, Vol. 15, No. 4, Article 44. Publication date: October 2019
44:2 Y.Yin et al. 1 INTRODUCTION With the advancement of rich embedded sensors,mobile or wearable devices (e.g.,smartphones, smartwatches)have been largely used in activity recognition [21,23,26,31,37,41,45]and benefit many human-computer interactions,e.g.,motion-sensing games [25],sign-language recognition [12],in-air writing [1],and so on.As a typical interaction mode,writing in the air has aroused wide attention [6,9,10,36,39].It allows users to write characters with arm and hand freely in the air without focusing attention on the small screen or tiny keys on a device [2].As shown in Figure 1,a user carrying/wearing a sensor-embedded device writes in the air,and the gesture will be recognized as a character.Recognizing in-air writing gestures is a key technology to facilitate writing gesture-based interactions in the air and can be used in many scenarios.For example, 'writing"commands in the air to control a unmanned aerial vehicle(UAV),while looking at the scene transmitted from the UAV in a virtual reality(VR)headset,to avoid taking off the VR headset and inputting the commands with a controller.Another example could be replacing the traditional on-screen text input by"writing"the text message in the air,thus allowing to interact with mobile or wearable devices having a tiny or no screen.Besides,when one hand of the user is occupied, typing with a keyboard becomes inconvenient;the sensor-assisted in-air input technology can be used to capture hand gestures and lay them out in text or image [1].When comparing to the existing handwriting,voice,or camera-based input,in-air writing with inertial sensors can tol- erate limited screen,environmental noises,and poor light conditions.In this article,we focus on recognizing in-air writing gestures as characters. In inertial sensor-based gesture recognition,many approaches have been proposed.Some data- driven approaches [2,7,10,15,35]tend to extract features from sensor data to train classifiers for gesture recognition while paying little attention on human activity analysis.If the user performs gestures with more degrees of freedom,i.e.,the gestures may have large variations in speeds,sizes, or orientations,then the type of approaches may fail to recognize them with high accuracy.In contrast,some pattern-driven approaches[1,13,32]try to capture the moving patterns of gestures for activity recognition.For example,Agrawal et al.[1]utilize the segmented strokes and grammar tree to recognize capital letters in a 2D plane.However,due to the complexity of analyzing human activities,the type of approaches may redefine the gesture patterns or constrain the gestures in a limited area (e.g.,on a limited 2D plane),which may decrease user experience.To track the continuous in-air gestures,Shen et al.[29]utilize the 5-DoF arm model and HMM to track the 3D posture of the arm.However,in 3D space,tracking is not directly linked to recognition,especially when the trajectory(e.g..handwriting trajectory)locates in different planes.Therefore,it is still a challenging task to apply the existing approaches to recognize in-air writing gestures that occur in 3D space with more degrees of freedom while guaranteeing user experience. To address the aforementioned issues,in this article,we explore contours to represent in-air writing gestures and propose a novel contour-based gesture model,where the "contour"is repre- sented with a sequence of coordinate points over time.We use an off-the-shelf wrist-worn device (e.g,smartwatch)to collect sensor data,and our basic idea is to build a 3D contour model for each gesture and utilize the contour feature to recognize gestures as characters,as illustrated in Figure 1.Since the gesture contour keeps the essential movement patterns of in-air gestures,it can tolerate the intra-class variability of gestures.It is worth noting that while the proposed"contour- gesture"model is applied in in-air writing gesture recognition for this work,it can also be used in sign-language recognition and remote control with hand gestures [40].However,different from 2D contours,building 3D contours presents several challenges,i.e.,contour distortion caused by different viewing angles,contour difference caused by different writing directions,and contour distribution across different planes,making it difficult to recognize 3D contours as 2D characters. ACM Transactions on Sensor Networks,Vol 15.No.4,Article 44.Publication date:October 2019
44:2 Y. Yin et al. 1 INTRODUCTION With the advancement of rich embedded sensors, mobile or wearable devices (e.g., smartphones, smartwatches) have been largely used in activity recognition [21, 23, 26, 31, 37, 41, 45] and benefit many human-computer interactions, e.g., motion-sensing games [25], sign-language recognition [12], in-air writing [1], and so on. As a typical interaction mode, writing in the air has aroused wide attention [6, 9, 10, 36, 39]. It allows users to write characters with arm and hand freely in the air without focusing attention on the small screen or tiny keys on a device [2]. As shown in Figure 1, a user carrying/wearing a sensor-embedded device writes in the air, and the gesture will be recognized as a character. Recognizing in-air writing gestures is a key technology to facilitate writing gesture-based interactions in the air and can be used in many scenarios. For example, “writing” commands in the air to control a unmanned aerial vehicle (UAV), while looking at the scene transmitted from the UAV in a virtual reality (VR) headset, to avoid taking off the VR headset and inputting the commands with a controller. Another example could be replacing the traditional on-screen text input by “writing” the text message in the air, thus allowing to interact with mobile or wearable devices having a tiny or no screen. Besides, when one hand of the user is occupied, typing with a keyboard becomes inconvenient; the sensor-assisted in-air input technology can be used to capture hand gestures and lay them out in text or image [1]. When comparing to the existing handwriting, voice, or camera-based input, in-air writing with inertial sensors can tolerate limited screen, environmental noises, and poor light conditions. In this article, we focus on recognizing in-air writing gestures as characters. In inertial sensor-based gesture recognition, many approaches have been proposed. Some datadriven approaches [2, 7, 10, 15, 35] tend to extract features from sensor data to train classifiers for gesture recognition while paying little attention on human activity analysis. If the user performs gestures with more degrees of freedom, i.e., the gestures may have large variations in speeds, sizes, or orientations, then the type of approaches may fail to recognize them with high accuracy. In contrast, some pattern-driven approaches [1, 13, 32] try to capture the moving patterns of gestures for activity recognition. For example, Agrawal et al. [1] utilize the segmented strokes and grammar tree to recognize capital letters in a 2D plane. However, due to the complexity of analyzing human activities, the type of approaches may redefine the gesture patterns or constrain the gestures in a limited area (e.g., on a limited 2D plane), which may decrease user experience. To track the continuous in-air gestures, Shen et al. [29] utilize the 5-DoF arm model and HMM to track the 3D posture of the arm. However, in 3D space, tracking is not directly linked to recognition, especially when the trajectory (e.g., handwriting trajectory) locates in different planes. Therefore, it is still a challenging task to apply the existing approaches to recognize in-air writing gestures that occur in 3D space with more degrees of freedom while guaranteeing user experience. To address the aforementioned issues, in this article, we explore contours to represent in-air writing gestures and propose a novel contour-based gesture model, where the “contour” is represented with a sequence of coordinate points over time. We use an off-the-shelf wrist-worn device (e.g., smartwatch) to collect sensor data, and our basic idea is to build a 3D contour model for each gesture and utilize the contour feature to recognize gestures as characters, as illustrated in Figure 1. Since the gesture contour keeps the essential movement patterns of in-air gestures, it can tolerate the intra-class variability of gestures. It is worth noting that while the proposed “contourgesture” model is applied in in-air writing gesture recognition for this work, it can also be used in sign-language recognition and remote control with hand gestures [40]. However, different from 2D contours, building 3D contours presents several challenges, i.e., contour distortion caused by different viewing angles, contour difference caused by different writing directions, and contour distribution across different planes, making it difficult to recognize 3D contours as 2D characters. ACM Transactions on Sensor Networks, Vol. 15, No. 4, Article 44. Publication date: October 2019
AirContour:Building Contour-based Model for In-Air Writing Gesture Recognition 44:3 .Modote Wrist-worn device (e.g.,smartwatch) In-air In-air/3D Output writing gesture contours (recognized characters) Fig.1.AirContour:in-air writing gesture recognition based on contours. To solve this problem,we first describe the range of viewing angles based on the way that the device is worn,which indicates the possible writing directions.We then apply Principal Compo- nent Analysis(PCA)to detect the principal/writing plane,i.e.,most of the contour is located in or close to the plane.After that,we calibrate the 2D projected contour in the principal plane for gesture/character recognition while considering the distortion caused by dimensionality reduction and the difference of gesture sizes. We make the following contributions in this article: To the best of our knowledge,we are the first to propose the contour-based gesture model to recognize in-air writing gestures.The model is designed to solve the new challenges in 3D gesture contours,e.g.,observation ambiguity,uncertain orientation and distribution of 3D contours,and tolerate the intra-class variability of gestures.The contour-based gesture model can be applied in not only in-air writing gesture recognition,but also many other scenarios such as sign-language recognition,motion-sensing games,and remote control with hand gestures. To recognize gesture contours in 3D space as characters in a 2D plane,we introduce PCA for dimensionality reduction and a series of calibrations for 2D contours.Specifically,we first utilize PCA to detect the principal/writing plane,and then project the 3D contour into the principal plane for dimensionality reduction.After that,we calibrate the 2D contour in the principal plane through reversing,rotating,and normalizing operations,to make it in right orientation and normalized size under a uniform view,i.e.,to make the 2D contour suitable for character recognition. We conduct extensive experiments to verify the efficiency of the proposed contour-based gesture model.In addition,based on the model,we propose an online approach,AC-Vec, and an offline approach,AC-CNN,to recognize 2D contours as characters.The experimental results show that AC-Vec and AC-CNN achieve an accuracy of 91.6%and 94.3%,respectively, for gesture/character recognition,and both outperform the existing approaches. 2 RELATED WORK In this section,we describe and analyze the state-of-the-art related to in-air gesture recognition, tracking,writing in the air,and handwritten character recognition,especially focusing on inertial sensor-based techniques. In-air gesture recognition:Parate et al.[26]design a mobile solution called RisQ to detect smoking gestures and sessions with a wristband and use a machine learning pipeline to process sensor data.Blank et al.[7]present a system for table tennis stroke detection and classification by attaching inertial sensors to table-tennis rackets.Thomaz et al.[31]describe the implementation ACM Transactions on Sensor Networks,Vol.15,No.4.Article 44.Publication date:October 2019
AirContour: Building Contour-based Model for In-Air Writing Gesture Recognition 44:3 Fig. 1. AirContour: in-air writing gesture recognition based on contours. To solve this problem, we first describe the range of viewing angles based on the way that the device is worn, which indicates the possible writing directions. We then apply Principal Component Analysis (PCA) to detect the principal/writing plane, i.e., most of the contour is located in or close to the plane. After that, we calibrate the 2D projected contour in the principal plane for gesture/character recognition while considering the distortion caused by dimensionality reduction and the difference of gesture sizes. We make the following contributions in this article: • To the best of our knowledge, we are the first to propose the contour-based gesture model to recognize in-air writing gestures. The model is designed to solve the new challenges in 3D gesture contours, e.g., observation ambiguity, uncertain orientation and distribution of 3D contours, and tolerate the intra-class variability of gestures. The contour-based gesture model can be applied in not only in-air writing gesture recognition, but also many other scenarios such as sign-language recognition, motion-sensing games, and remote control with hand gestures. • To recognize gesture contours in 3D space as characters in a 2D plane, we introduce PCA for dimensionality reduction and a series of calibrations for 2D contours. Specifically, we first utilize PCA to detect the principal/writing plane, and then project the 3D contour into the principal plane for dimensionality reduction. After that, we calibrate the 2D contour in the principal plane through reversing, rotating, and normalizing operations, to make it in right orientation and normalized size under a uniform view, i.e., to make the 2D contour suitable for character recognition. • We conduct extensive experiments to verify the efficiency of the proposed contour-based gesture model. In addition, based on the model, we propose an online approach, AC-Vec, and an offline approach, AC-CNN, to recognize 2D contours as characters. The experimental results show that AC-Vec and AC-CNN achieve an accuracy of 91.6% and 94.3%, respectively, for gesture/character recognition, and both outperform the existing approaches. 2 RELATED WORK In this section, we describe and analyze the state-of-the-art related to in-air gesture recognition, tracking, writing in the air, and handwritten character recognition, especially focusing on inertial sensor-based techniques. In-air gesture recognition: Parate et al. [26] design a mobile solution called RisQ to detect smoking gestures and sessions with a wristband and use a machine learning pipeline to process sensor data. Blank et al. [7] present a system for table tennis stroke detection and classification by attaching inertial sensors to table-tennis rackets. Thomaz et al. [31] describe the implementation ACM Transactions on Sensor Networks, Vol. 15, No. 4, Article 44. Publication date: October 2019
44:4 Y.Yin et al. and evaluation of an approach to infer eating moments using a 3-axis accelerometer in a smart- watch.Xu et al.[35]build a classifier to identify users'hand and finger gestures utilizing the essential features of accelerometer and gyroscope data measured from a smartwatch.Huang et al. [18]build a system to monitor brushing quality using a manual toothbrush modified by attaching small magnets to the handle and an off-the-shelf smartwatch.These approaches typically extract features from sensor data and apply machine learning techniques for gesture recognition. In-air gesture tracking:Zhou et al.[42-44]utilize a kinematic chain to track human upper- limb motion by placing multiple devices on the arm.Cutti et al.[11]utilize the joint angles to track the movements of upper limbs by placing sensors on the chest,shoulder,arm,and wrist. Chen et al.[8]design a wearable system consisting of a pair of magnetometers on fingers and a permanent magnet affixed to the thumb and introduce uTrack to convert the thumb and fingers into a continuous input system (e.g.,3D pointing).Shen et al.[29]utilize the 5-DoF arm model and HMM to track the 3D posture of the arm,using both motion and magnetic sensors in a smartwatch. In fact,accurate in-air gesture tracking in real time can be very challenging.Besides,obtaining the 3D moving trajectory does not mean recognizing in-air gestures.In this article,we do not require accurate trajectory tracking while aiming to obtain gesture contour and recognize it as a character. Writing in the air:Zhang et al.[39]quantify data into small integral vectors based on accel- eration orientation and then use HMM to recognize 10 Arabic numerals.Wang et al.[32]present IMUPEN to reconstruct motion trajectory and recognize handwritten digits.Bashir et al.[6]use a pen equipped with inertial sensors and apply DTW to recognize handwritten characters.Agrawal et al.[1]recognize handwritten capital letters and Arabic numerals in a 2D plane based on strokes and a grammar tree by using the built-in accelerometer in smartphone.Amma et al.[2]design a glove equipped with inertial sensors and use SVM,HMM,and statistical language model to rec- ognize capital letters,sentences,and so on.Deselaers et al.[13]present GyroPen to reconstruct the writing path for pen-like interaction.Xu et al.[36]utilize the continuous density HMM and Viterbi algorithm to recognize handwritten digits and letters using inertial sensors.In this article, we focus on single in-air character recognition without the assistance of a language model.For a character,we do not define specific strokes or require pen-up for stroke segmentation,while tol- erating the intra-class variability caused by writing speeds,gesture sizes,writing directions,and observation ambiguity caused by viewing angles and so on in 3D space. Handwritten character recognition:In addition to inertial sensor-based approaches,many image processing techniques [3,14,16]have also been adopted for recognizing handwritten characters in a 2D plane (i.e.,image).Bahlmann et al.[4]combine DTW and SVMs to establish a Gaussian DTW(GDTW)kernel for on-line recognition of UNIPEN handwriting data.Rayar et al.[28]propose preselection method for CNN-based classification and evaluate it in handwritten character recognition in images.Rao et al.[27]propose a newly designed network structure based on an extended nonlinear kernel residual network to recognize the handwritten characters over MINIST and SVHN datasets.These approaches focus on recognizing hand-moving trajectories in a 2D plane,while our article focuses on transforming the 3D gesture into a proper 2D contour and then utilizes the contour's space-time feature to recognize contours as characters. 3 TECHNICAL CHALLENGES AND DEFINITIONS IN IN-AIR GESTURE RECOGNITION 3.1 Intra-class Variability in Sensor Data As shown in Figure 2,even when the user performs the same type of gestures (e.g.,writes"t"), the sensor data can be quite different due to the variation of writing speeds(Figure 2(a)),gesture sizes(Figure 2(b)),writing directions(Figure 2(c)),and so on.It indicates that directly using the ACM Transactions on Sensor Networks,Vol 15.No.4,Article 44.Publication date:October 2019
44:4 Y. Yin et al. and evaluation of an approach to infer eating moments using a 3-axis accelerometer in a smartwatch. Xu et al. [35] build a classifier to identify users’ hand and finger gestures utilizing the essential features of accelerometer and gyroscope data measured from a smartwatch. Huang et al. [18] build a system to monitor brushing quality using a manual toothbrush modified by attaching small magnets to the handle and an off-the-shelf smartwatch. These approaches typically extract features from sensor data and apply machine learning techniques for gesture recognition. In-air gesture tracking: Zhou et al. [42–44] utilize a kinematic chain to track human upperlimb motion by placing multiple devices on the arm. Cutti et al. [11] utilize the joint angles to track the movements of upper limbs by placing sensors on the chest, shoulder, arm, and wrist. Chen et al. [8] design a wearable system consisting of a pair of magnetometers on fingers and a permanent magnet affixed to the thumb and introduce uTrack to convert the thumb and fingers into a continuous input system (e.g., 3D pointing). Shen et al. [29] utilize the 5-DoF arm model and HMM to track the 3D posture of the arm, using both motion and magnetic sensors in a smartwatch. In fact, accurate in-air gesture tracking in real time can be very challenging. Besides, obtaining the 3D moving trajectory does not mean recognizing in-air gestures. In this article, we do not require accurate trajectory tracking while aiming to obtain gesture contour and recognize it as a character. Writing in the air: Zhang et al. [39] quantify data into small integral vectors based on acceleration orientation and then use HMM to recognize 10 Arabic numerals. Wang et al. [32] present IMUPEN to reconstruct motion trajectory and recognize handwritten digits. Bashir et al. [6] use a pen equipped with inertial sensors and apply DTW to recognize handwritten characters. Agrawal et al. [1] recognize handwritten capital letters and Arabic numerals in a 2D plane based on strokes and a grammar tree by using the built-in accelerometer in smartphone. Amma et al. [2] design a glove equipped with inertial sensors and use SVM, HMM, and statistical language model to recognize capital letters, sentences, and so on. Deselaers et al. [13] present GyroPen to reconstruct the writing path for pen-like interaction. Xu et al. [36] utilize the continuous density HMM and Viterbi algorithm to recognize handwritten digits and letters using inertial sensors. In this article, we focus on single in-air character recognition without the assistance of a language model. For a character, we do not define specific strokes or require pen-up for stroke segmentation, while tolerating the intra-class variability caused by writing speeds, gesture sizes, writing directions, and observation ambiguity caused by viewing angles and so on in 3D space. Handwritten character recognition: In addition to inertial sensor-based approaches, many image processing techniques [3, 14, 16] have also been adopted for recognizing handwritten characters in a 2D plane (i.e., image). Bahlmann et al. [4] combine DTW and SVMs to establish a Gaussian DTW (GDTW) kernel for on-line recognition of UNIPEN handwriting data. Rayar et al. [28] propose preselection method for CNN-based classification and evaluate it in handwritten character recognition in images. Rao et al. [27] propose a newly designed network structure based on an extended nonlinear kernel residual network to recognize the handwritten characters over MINIST and SVHN datasets. These approaches focus on recognizing hand-moving trajectories in a 2D plane, while our article focuses on transforming the 3D gesture into a proper 2D contour and then utilizes the contour’s space-time feature to recognize contours as characters. 3 TECHNICAL CHALLENGES AND DEFINITIONS IN IN-AIR GESTURE RECOGNITION 3.1 Intra-class Variability in Sensor Data As shown in Figure 2, even when the user performs the same type of gestures (e.g., writes “t”), the sensor data can be quite different due to the variation of writing speeds (Figure 2(a)), gesture sizes (Figure 2(b)), writing directions (Figure 2(c)), and so on. It indicates that directly using the ACM Transactions on Sensor Networks, Vol. 15, No. 4, Article 44. Publication date: October 2019
AirContour:Building Contour-based Model for In-Air Writing Gesture Recognition 44:5 Fast 220 4r40 —2340 A mopiis.m 一a闪 40smp网9arm10He120 Sampsecnee50H2 Slow 0'G1 R的h礼 —n—有 (a)Different speeds (b)Different sizes (cm) (c)Different directions Fig.2.Linear acceleration of writing the same character"t." extracted features from sensor data may fail to recognize in-air gestures accurately.In regard to the definitions of speeds,sizes,and directions,they can be found in Section 6.2. To handle the intra-class variability of in-air gestures,e.g.,the variation of speed,amplitude, and orientation of gestures,we present the contour-based gesture model,which utilizes contours to correlate sensor data with human gestures.The"contour"is represented with a sequence of coordinate points over time.Additionally,to avoid the differences caused by facing directions,we transform the sensor data from a device coordinate system to a human coordinate system shown in Figure 5(a),i.e.,we analyze the 3D contours in a human coordinate system.In this article,we take the instance of writing characters in the air to illustrate the contour-based gesture model. The characters refer to the alphabet,i.e.,"a"-"z,"and we use the term"character"and "letter" interchangeably throughout the article.It is worth mentioning that in-air writing letters can be different from printed letters due to joined-up writing.In particular,we remove the point of"i" and"j,”and use“t”to represent the letter“I”for simplification. 3.2 Difference between 2D Contours and 3D Contours Usually,people get used to recognizing and reading handwritten characters in a 2D plane,e.g.,on a piece of paper.Therefore,we can map a 2D gesture contour with a 2D character for recogni- tion.However,based on extensive observations and experimental study,we find that 3D contour recognition is quite different from 2D contour recognition.In fact,recognizing 3D contours as 2D characters is a challenging task,due to the contour distortion caused by viewing angles,con- tour difference caused by writing directions,and contour distribution across different planes,as described below. 3.2.1 Viewing Angles.There is a uniform viewing angle for a 2D character contour,while there are multiple viewing angles for a 3D character contour.In a predefined plane-coordinate system, the 2D gesture contour is discriminative and can be used for character recognition;it is consistent with people's cognition habits for handwriting letters.However,in 3D space,even in a predefined coordinate system,we can look at the 3D contour from different viewing angles,thus the observed 3D contour can be quite different.As shown in Figure 3,when we look at the 3D contour of"t" from left to right,the shape and orientation of the character contour change a lot,as the contour located in the red circle in Figure 3(a),Figure 3(b),and Figure 3(c)indicates.For a character,its contour consists of one or several strokes in a sequential order and right orientation.If the char- acter contour changes,then it can lead to the misrecognition of characters.For example,when we ACM Transactions on Sensor Networks,Vol.15,No.4.Article 44.Publication date:October 2019
AirContour: Building Contour-based Model for In-Air Writing Gesture Recognition 44:5 Fig. 2. Linear acceleration of writing the same character “t.” extracted features from sensor data may fail to recognize in-air gestures accurately. In regard to the definitions of speeds, sizes, and directions, they can be found in Section 6.2. To handle the intra-class variability of in-air gestures, e.g., the variation of speed, amplitude, and orientation of gestures, we present the contour-based gesture model, which utilizes contours to correlate sensor data with human gestures. The “contour” is represented with a sequence of coordinate points over time. Additionally, to avoid the differences caused by facing directions, we transform the sensor data from a device coordinate system to a human coordinate system shown in Figure 5(a), i.e., we analyze the 3D contours in a human coordinate system. In this article, we take the instance of writing characters in the air to illustrate the contour-based gesture model. The characters refer to the alphabet, i.e., “a”–“z,” and we use the term “character” and “letter” interchangeably throughout the article. It is worth mentioning that in-air writing letters can be different from printed letters due to joined-up writing. In particular, we remove the point of “i” and “j,” and use “ι” to represent the letter “l” for simplification. 3.2 Difference between 2D Contours and 3D Contours Usually, people get used to recognizing and reading handwritten characters in a 2D plane, e.g., on a piece of paper. Therefore, we can map a 2D gesture contour with a 2D character for recognition. However, based on extensive observations and experimental study, we find that 3D contour recognition is quite different from 2D contour recognition. In fact, recognizing 3D contours as 2D characters is a challenging task, due to the contour distortion caused by viewing angles, contour difference caused by writing directions, and contour distribution across different planes, as described below. 3.2.1 Viewing Angles. There is a uniform viewing angle for a 2D character contour, while there are multiple viewing angles for a 3D character contour. In a predefined plane-coordinate system, the 2D gesture contour is discriminative and can be used for character recognition; it is consistent with people’s cognition habits for handwriting letters. However, in 3D space, even in a predefined coordinate system, we can look at the 3D contour from different viewing angles, thus the observed 3D contour can be quite different. As shown in Figure 3, when we look at the 3D contour of “t” from left to right, the shape and orientation of the character contour change a lot, as the contour located in the red circle in Figure 3(a), Figure 3(b), and Figure 3(c) indicates. For a character, its contour consists of one or several strokes in a sequential order and right orientation. If the character contour changes, then it can lead to the misrecognition of characters. For example, when we ACM Transactions on Sensor Networks, Vol. 15, No. 4, Article 44. Publication date: October 2019