EEE INFOCOM 2016-The 35th Annual IEEE International Conference on Computer Communications GlassGesture:Exploring Head Gesture Interface of Smart Glasses Shanhe Yi,Zhengrui Qin,Ed Novak,Yafeng Yin',Qun Li College of William and Mary,Williamsburg,VA,USA fState Key Laboratory for Novel Software Technology,Nanjing University,China syi,zhengrui,ejnovak.liqun@cs.wm.edu,yyf@dislab.nju.edu.cn Abstract-We have seen an emerging trend towards wearables be applied in every scenario;for example,when the user is nowadays.In this paper,we focus on smart glasses,whose current talking directly with someone,or in a conference or meeting. interfaces are difficult to use,error-prone,and provide no or insecure user authentication.We thus present GlassGesture,a An even worse example is that other people can accidentally system that improves Google Glass through a gesture-based activate Glass using voice commands,as long as the command user interface,which provides efficient gesture recognition and is loud enough to be picked by Glass.Additionally.disabled robust authentication.First,our gesture recognition enables the users are at a severe disadvantage using Glass if they cannot use of simple head gestures as input.It is accurate in various speak,or have lost control of their arms or fine motor skills. wearer activities regardless of noise.Particularly,we improve the On the other hand,authentication on Glass is very cumber- recognition efficiency significantly by employing a novel similarity search scheme.Second,our gesture-based authentication can some and is based solely on the touchpad [1].As a wearable identify owner through features extracted from head movements. device,Glass contains rich private information including point- We improve the authentication performance by proposing new of-view (POV)photo/video recording,deep integration of features based on peak analyses,and employing an ensemble social/communication apps,and personal accounts of all kinds. method.Last,we implement GlassGesture and present extensive There will be a severe information leak if Glass is accessed by evaluations.GlassGesture achieves a gesture recognition accuracy near 96%.For authentication,GlassGesture can accept autho- some malicious users.Thus,any user interface for Glass needs rized users in near 92%of trials,and reject attackers in near to provide schemes to reject unauthorized access.However, 99%of trials.We also show that in 100 trials imitators cannot the current authentication on Glass is far from mature:a successfully masquerade as the authorized user even once. "password"is set by performing four consecutive swiping or tapping actions on the touchpad similar to a traditional four I.INTRODUCTION digit PIN code.This system has many problems.First,the In recent years,we have seen an emerging trend towards entropy is low,as only five touchpad gestures (tap,swipe wearables,which are designed to improve the usability of forward with one or two fingers,or swipe backward with computers worn on the human body,while being more aesthet- one or two fingers)are available,which form a limited ically pleasing and fashionable at the same time.One category set of permutations.Second,these gestures are difficult to of wearable devices is smart glasses (eyewear),which are perform correctly on the narrow touchpad,especially when usually equipped with a heads-up,near-eye display and various the user is not still.Third,this sort of password is hard to sensors,mounted on a pair of glasses.Among many kinds remember because it is unorthodox.Finally,this system is of smart eyewear,Google Glass (Glass for short)is the most very susceptible to shoulder surfing attacks.Any attacker can iconic product.However,since Glass is a new type of wearable easily observe the pattern from possibly several meters away, device,the user interface is less than ideal. with no special equipment. On one hand,there is no virtual or physical keyboard attached to Glass.Currently,the most prominent input method for Glass has two parts.However,each of these input methods suffers in many scenarios.First,there is a touchpad mounted on the right-hand side of the device.Tapping and swiping on the touchpad is error-prone for users:1)The user needs to raise their hands and fingers to the side of their forehead to locate the touchpad and perform actions,which can be difficult or dangerous when the user is walking or driving.2)Since the touchpad is very narrow and slim,some gestures,such as slide up/down,or tap can be easily confused.3)When Fig.1:Head Movements the user puts Glass on their head,or takes it off,it is very To solve all of these problems,we propose the use of head easy to accidentally touch the touchpad,causing erroneous gestures (gesture for short)as an alternative user interface input.Second,Glass supports voice commands and speech for smart eyewear devices like Google Glass.Because head recognition.A significant drawback is that voice input cannot gestures are an intuitive option,we can leverage them as 978-1-4673-9953-1/16/$31.00©20161EEE
GlassGesture: Exploring Head Gesture Interface of Smart Glasses Shanhe Yi, Zhengrui Qin, Ed Novak, Yafeng Yin† , Qun Li College of William and Mary, Williamsburg, VA, USA †State Key Laboratory for Novel Software Technology, Nanjing University, China {syi,zhengrui,ejnovak,liqun}@cs.wm.edu, †yyf@dislab.nju.edu.cn Abstract—We have seen an emerging trend towards wearables nowadays. In this paper, we focus on smart glasses, whose current interfaces are difficult to use, error-prone, and provide no or insecure user authentication. We thus present GlassGesture, a system that improves Google Glass through a gesture-based user interface, which provides efficient gesture recognition and robust authentication. First, our gesture recognition enables the use of simple head gestures as input. It is accurate in various wearer activities regardless of noise. Particularly, we improve the recognition efficiency significantly by employing a novel similarity search scheme. Second, our gesture-based authentication can identify owner through features extracted from head movements. We improve the authentication performance by proposing new features based on peak analyses, and employing an ensemble method. Last, we implement GlassGesture and present extensive evaluations. GlassGesture achieves a gesture recognition accuracy near 96%. For authentication, GlassGesture can accept authorized users in near 92% of trials, and reject attackers in near 99% of trials. We also show that in 100 trials imitators cannot successfully masquerade as the authorized user even once. I. INTRODUCTION In recent years, we have seen an emerging trend towards wearables, which are designed to improve the usability of computers worn on the human body, while being more aesthetically pleasing and fashionable at the same time. One category of wearable devices is smart glasses (eyewear), which are usually equipped with a heads-up, near-eye display and various sensors, mounted on a pair of glasses. Among many kinds of smart eyewear, Google Glass (Glass for short) is the most iconic product. However, since Glass is a new type of wearable device, the user interface is less than ideal. On one hand, there is no virtual or physical keyboard attached to Glass. Currently, the most prominent input method for Glass has two parts. However, each of these input methods suffers in many scenarios. First, there is a touchpad mounted on the right-hand side of the device. Tapping and swiping on the touchpad is error-prone for users: 1) The user needs to raise their hands and fingers to the side of their forehead to locate the touchpad and perform actions, which can be difficult or dangerous when the user is walking or driving. 2) Since the touchpad is very narrow and slim, some gestures, such as slide up/down, or tap can be easily confused. 3) When the user puts Glass on their head, or takes it off, it is very easy to accidentally touch the touchpad, causing erroneous input. Second, Glass supports voice commands and speech recognition. A significant drawback is that voice input cannot be applied in every scenario; for example, when the user is talking directly with someone, or in a conference or meeting. An even worse example is that other people can accidentally activate Glass using voice commands, as long as the command is loud enough to be picked by Glass. Additionally, disabled users are at a severe disadvantage using Glass if they cannot speak, or have lost control of their arms or fine motor skills. On the other hand, authentication on Glass is very cumbersome and is based solely on the touchpad [1]. As a wearable device, Glass contains rich private information including pointof-view (POV) photo/video recording, deep integration of social/communication apps, and personal accounts of all kinds. There will be a severe information leak if Glass is accessed by some malicious users. Thus, any user interface for Glass needs to provide schemes to reject unauthorized access. However, the current authentication on Glass is far from mature: a “password” is set by performing four consecutive swiping or tapping actions on the touchpad similar to a traditional four digit PIN code. This system has many problems. First, the entropy is low, as only five touchpad gestures (tap, swipe forward with one or two fingers, or swipe backward with one or two fingers) are available, which form a limited set of permutations. Second, these gestures are difficult to perform correctly on the narrow touchpad, especially when the user is not still. Third, this sort of password is hard to remember because it is unorthodox. Finally, this system is very susceptible to shoulder surfing attacks. Any attacker can easily observe the pattern from possibly several meters away, with no special equipment. Fig. 1: Head Movements To solve all of these problems, we propose the use of head gestures (gesture for short) as an alternative user interface for smart eyewear devices like Google Glass. Because head gestures are an intuitive option, we can leverage them as IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications 978-1-4673-9953-1/16/$31.00 ©2016 IEEE
a hands-free and easy-to-use interface.A head gesture is a as one-class ensemble classifier,and one-class feature short burst of several discrete and consecutive movements selection,to improve the authentication performance. of the user's head,as illustrated in Fig.1.Motion sensors We prototype our system on Google Glass.We design (i.e.the accelerometer and gyroscope)on Glass are able to experiments to evaluate gesture recognition in different measure and detect all kinds of head movements due to their user activities.We collect a total of around 6000 gesture high electromechanical sensitivity.However,smart eyewear samples from 18 users to evaluate the authentication presents new challenges for head gesture interface design. performance.Our evaluation shows that GlassGesture We need to answer questions such as "What are easy-to- shows accurate gesture recognition.It can reliably accept perform head gestures?","How do we accurately recognize the authorized users and reject attackers. those gestures?","How do we make the system efficient on resource-limited hardware?",and"How does the system reject II.RELATED WORK unauthorized access?"and so on. In this paper,we present GlassGesture,a system aiming Activity Recognition.Researchers have shown that when to improve the usability of Glass by providing a novel user interface based on head gestures.We are the first work, smart device is carried with user,it can provide context information about the user activities [2]-14].However.in this to the authors'knowledge,to consider head-gesture-based recognition/authentication problems for smart glasses.First, paper,we are not aiming at improving upon the state-of-the-art GlassGesture provides head gesture recognition as a form activity recognition systems.We use a simple activity detector, to tune parameters for gesture detection. of user interface.This has several advantages against the current input methods,because head gestures are easy-to- Gesture Recognition.It has been shown that gestures as perform,intuitive,hands-free,user-defined,and accessible for input can be precise,and fast.While there is a broad range the disabled.In some situations,it may be considered inap- of gesture recognition techniques based on vision,wireless propriate or even rude to operate Glass through the provided signal,touch screen [5]7],we focus mainly on motion- touchpad or voice commands.Head gestures in comparison, sensor-based gesture recognition because it is low-cost,com- can be tiny and not easily noticeable to mitigate the social putationally feasible,and easy to deploy on mobile devices [8]. awkwardness.Second,the head gesture user interface can We differ from these works in that we propose a head gesture based interface for smart glasses.And we carefully design authenticate users.In particular,head gestures have not been exploited in authentication yet in the literature.We propose the system to work with head gestures which faces different a novel head-gesture-based authentication scheme by using challenges such as noise from user activities,performance on resource-constrained devices.For head gesture recognition. simple head gestures to answer security questions.For exam- ple,we ask user to answer a yes-or-no question,by shaking existing work mainly focuses on vision-based methods [9], (no)or nodding (yes)her head.However,an attacker who while GlassGesture utilizes sensor mounted on user's head. knows the answer to the security questions can still access For gesture recognition on Google Glass,Head Wake Up the device.To mitigate such attacks,we further propose to and Head Nudge [10]are two built-in gesture detectors as leverage unique signatures extracted from such head gestures experimental features which monitor the angle of head.A to identify the owner of the device from other users.Compared similar open-sourced implementation can be found in [111.In to the original,touchpad-based authentication,our proposed contrast,GlassGesture is more advanced which can recognize self-defined,free-form head gestures efficiently and accurately. head-gesture-based authentication is more resistant to shoulder surfing attacks and requires much less effort from the user User Authentication.There has been research on au- In summary,we make the following contributions: thenticating based on the unique patterns they exhibit while For gesture recognition,our system increases the input interacting with phone [12-17]through touch screens and space of the Glass by enabling small,easy-to-perform motion sensors.These systems show that such authentication head gestures.We propose a reference gesture library ex- schemes are less susceptible to shoulder surfing,and,don't clusively for head movements.We utilize activity context require the user to memorize passcode.For authentication on information to adaptively set thresholds for robust gesture Google Glass.work [18]and [19]are touchpad-gesture-based detection.We use a weighted dynamic time warping authentication,which needs continuous user effort to hold up (DTW)algorithm to match templates for better accuracy. fingers on the touchpad.Our work is orthogonal that tries to We speed up the gesture matching with a novel scheme, bring easy authentication to smart glasses using head gestures, which reduces the time cost by at least 55%. which is simple,hands-free,and requires less effort For authentication,we prove that "head gestures can be used as passwords".We design a two-factor authenti- III.GLASSGESTURE SYSTEM DESIGN cation scheme,in which we ask users to perform head gestures to answer questions that show up in the near- In this section,we present the system design of Glass- eye display.To characterize head gestures,we identify a Gesture.First,we give an overview of our system and its set of useful features and propose new features based on architecture.Then we introduce each module and elaborate its peak analyses.We also explore several optimizations such corresponding components
a hands-free and easy-to-use interface. A head gesture is a short burst of several discrete and consecutive movements of the user’s head, as illustrated in Fig. 1. Motion sensors (i.e. the accelerometer and gyroscope) on Glass are able to measure and detect all kinds of head movements due to their high electromechanical sensitivity. However, smart eyewear presents new challenges for head gesture interface design. We need to answer questions such as “What are easy-toperform head gestures?”, “How do we accurately recognize those gestures?”, “How do we make the system efficient on resource-limited hardware?”, and “How does the system reject unauthorized access?” and so on. In this paper, we present GlassGesture, a system aiming to improve the usability of Glass by providing a novel user interface based on head gestures. We are the first work, to the authors’ knowledge, to consider head-gesture-based recognition/authentication problems for smart glasses. First, GlassGesture provides head gesture recognition as a form of user interface. This has several advantages against the current input methods, because head gestures are easy-toperform, intuitive, hands-free, user-defined, and accessible for the disabled. In some situations, it may be considered inappropriate or even rude to operate Glass through the provided touchpad or voice commands. Head gestures in comparison, can be tiny and not easily noticeable to mitigate the social awkwardness. Second, the head gesture user interface can authenticate users. In particular, head gestures have not been exploited in authentication yet in the literature. We propose a novel head-gesture-based authentication scheme by using simple head gestures to answer security questions. For example, we ask user to answer a yes-or-no question, by shaking (no) or nodding (yes) her head. However, an attacker who knows the answer to the security questions can still access the device. To mitigate such attacks, we further propose to leverage unique signatures extracted from such head gestures to identify the owner of the device from other users. Compared to the original, touchpad-based authentication, our proposed head-gesture-based authentication is more resistant to shoulder surfing attacks , and requires much less effort from the user. In summary, we make the following contributions: • For gesture recognition, our system increases the input space of the Glass by enabling small, easy-to-perform head gestures. We propose a reference gesture library exclusively for head movements. We utilize activity context information to adaptively set thresholds for robust gesture detection. We use a weighted dynamic time warping (DTW) algorithm to match templates for better accuracy. We speed up the gesture matching with a novel scheme, which reduces the time cost by at least 55%. • For authentication, we prove that “head gestures can be used as passwords”. We design a two-factor authentication scheme, in which we ask users to perform head gestures to answer questions that show up in the neareye display. To characterize head gestures, we identify a set of useful features and propose new features based on peak analyses. We also explore several optimizations such as one-class ensemble classifier, and one-class feature selection, to improve the authentication performance. • We prototype our system on Google Glass. We design experiments to evaluate gesture recognition in different user activities. We collect a total of around 6000 gesture samples from 18 users to evaluate the authentication performance. Our evaluation shows that GlassGesture shows accurate gesture recognition. It can reliably accept the authorized users and reject attackers. II. RELATED WORK Activity Recognition. Researchers have shown that when smart device is carried with user, it can provide context information about the user activities [2]–[4]. However, in this paper, we are not aiming at improving upon the state-of-the-art activity recognition systems. We use a simple activity detector, to tune parameters for gesture detection. Gesture Recognition. It has been shown that gestures as input can be precise, and fast. While there is a broad range of gesture recognition techniques based on vision, wireless signal, touch screen [5]–[7], we focus mainly on motionsensor-based gesture recognition because it is low-cost, computationally feasible, and easy to deploy on mobile devices [8]. We differ from these works in that we propose a head gesture based interface for smart glasses. And we carefully design the system to work with head gestures which faces different challenges such as noise from user activities, performance on resource-constrained devices. For head gesture recognition, existing work mainly focuses on vision-based methods [9], while GlassGesture utilizes sensor mounted on user’s head. For gesture recognition on Google Glass, Head Wake Up and Head Nudge [10] are two built-in gesture detectors as experimental features which monitor the angle of head. A similar open-sourced implementation can be found in [11]. In contrast, GlassGesture is more advanced which can recognize self-defined, free-form head gestures efficiently and accurately. User Authentication. There has been research on authenticating based on the unique patterns they exhibit while interacting with phone [12]–[17] through touch screens and motion sensors. These systems show that such authentication schemes are less susceptible to shoulder surfing, and, don’t require the user to memorize passcode. For authentication on Google Glass, work [18] and [19] are touchpad-gesture-based authentication, which needs continuous user effort to hold up fingers on the touchpad. Our work is orthogonal that tries to bring easy authentication to smart glasses using head gestures, which is simple, hands-free, and requires less effort. III. GLASSGESTURE SYSTEM DESIGN In this section, we present the system design of GlassGesture. First, we give an overview of our system and its architecture. Then we introduce each module and elaborate its corresponding components
Data Collection Head-Gesture-base Gyroscope Authentication gyroscope readings while the user is performing head gestures Accelerometer Feature in various activities,compared to relatively noisy accelerom- Extractor eter readings.There it is possible to provide head gesture Head Gesture Recognition detection/recognition through the gyroscope data.3)Head Trainins/ Activity Detector Retraining gestures can be used rather frequently by the user.We need an efficient recognition scheme for performance considerations. Gesture Detector Gesture Recognize In summary.we face three challenges in designing this module.1)Head gesture library:There is no library,which Templates Accept/ defines the most suitable head gestures for smart glasses.2) Deny Noise:Sensors on Glass are used to collect head movements, while at the same time may also collecting noise from other Fig.2:System Architecture user activities.This will deteriorate the performance of the gesture recognition.3)Computation:In recognition tasks, A.System Overview computationally-intensive algorithms may need to be called Our system consists of two modules,which together form frequently,resulting in unsatisfactory performance.Therefore, our gesture-based interface.The first module allows users to it must be optimized to be extremely efficient,without sacri- input small gestures using their heads;the second module ficing substantial recognition accuracy. authenticates users based on their head gestures.The archi- Head Gesture Library.We need to provide a head gesture tecture of our system is illustrated in Fig.2,which shows library as reference since head gestures are quite different from that the Gesture Recognition module is the corner stone.We traditional hand gestures.For example,1)head gestures mainly leverage an activity detector to tune the parameters for more consist of rotational movement.2)users moving their heads accurate gesture detection,based on user activity context.An have limited freedom in 3D space.(e.g.usually humans can enrollment submodule is in charge of managing the gesture only look up and down in less than 180.3)In order to convey templates.The gesture recognizer runs the DTW matching more information.we need a new set of head gestures beside algorithm to recognize potential gestures.The gesture-based the traditional ones that are already used (e.g.,shaking for authentication module is built on top of the first module. "no"and nodding for "yes").In light of these constraints,we It extracts features from the raw sensor data for training. develop six basic candidate gesture categories adapted from With trained classifiers,we form a two-step authentication work [8]and [21]:1)nod,2)look up/down/left/right,3)shake. mechanism using simple head gestures to answer secure 4)circle,5)triangle,and 6)rectangle.To clear up confusion questions first and identifying the correct,unique signatures in when drawing (performing,acting out the gesture),we suggest the gesture movement data second.In the following sections, the user move their head just like drawing something in the we present the design details of each module. air in front of themselves using their nose like a pen tip. B.Head Gesture Recognition Gesture Styles Number Easy Frequency Easy Observations and Challenges.We have made some pre- of strokes to perfomm in Fig to repeat Decisicn up and down 3 52 low keep 2 up/down/left/right 49 high 81) keep,repeat 15 sitting 3 left and right 3+ 44 oa keep 4 30 very low neutral k知p 3 22 very low drop .15 6 start points 14 very low drop TABLE I:Head gesture candidates. With the purpose of trying to figure out what gestures are suitable,we performed a simple survey to rank them on how 10.0 20.0 36.0 Time(s) easy each category is to be performed for untrained users.It Fig.3:Collected Sensor Trace:The user sits still for about 17s is important to note that the survey,and all data collections in then stands up and walks for about 10s,then runs for a few seconds the entire paper,have gone through the IRB approval process. and stops.In each activities (marked in accelerometer plot),she In total,we have received 22 effective responses.The study performs several head gestures such as nodding,shaking,looking results are presented in Table I.Our survey results indicate up/down/left/right (sensor coordinate reference [20]). that nodding and shaking are popular and usually convey liminary observations from the collected trace in Fig.3:1)special meanings (e.g."yes"and "no").Circles are easy to Different activities add different amounts of noise.It is not perform since they are single-stroke.The rectangle and triangle easy to derive a general criterion for gesture detection in all gestures are the least favored,due to the multiple strokes they of the many kinds of activities the user may be participating entail.Simple"look up/down/left/right"gestures are easy and in at the time the gesture is made.2)Head gestures mainly fast,but they appear frequently in daily head movement as consist of rotations rather than accelerations.We see obvious shown in Fig.4,another study we have done to understand
(a)sitting/standing (b)walking (c)running 35 w/Head Gesture w/o Head Gesture 820 8.002040608102 00.2040.60.81.021,0.00.20.40.60.81.0121 Gyroscope Standard Deviaton Fig.4:Gesture frequency of a user seated,working at a desk for Fig.5:Thresholds under different activities.The threshold will be about 20 minutes.The number in the name is the repetition count. set small when the user is sitting or standing,to enable even tiny "cw"is short for clockwise."ccw"is short for counter-clockwise. head gesture detection (0.15).It will be set much larger when user is walking or running (0.7 and 1.3 respectively). the frequency of daily life head gestures.This leads us to believe there will be a significant false positive rate if they check the sample length and drop all the buffered samples are utilized naively.However,81%of participants think they if the length is too short or too long (a head gesture usually are easy to be performed repeatedly.We decide to keep these ranges from 30 to 240 samples at 50 Hz sampling rate). gestures,as long as the user is willing to repeat them two or Gesture Recognizer.The gesture recognizer is the core of three times consecutively to reduce the false positive rate.It is the gesture recognition module.A head gesture is defined as important to note that this head gesture library is only a default a time series of gyroscope samples about 0.5s to 2s long.The reference.GlassGesture allows the user to define new arbitrary raw gyroscope sensor data.S,can be written as an infinite head gestures.We also evaluate are gesture recognition system stream of four-tuples,i.e.S=(,y,,)1,(y,z,)2,.... with "number"and "letter"input later in this work. Likewise,a gesture G,is defined as a subset of sequential Activity Detector.The observations made in Fig.3 motivate elements in S.i.e.G-StttS.We refer to gestures the need for a user activity context service,to help detect that the system has already learned as "gesture templates". head gestures in different activity contexts.Normally,Google denoted as Gt.Because the system is passively listening,the Play Service provides activity APls,which can be leveraged. user can perform any gesture at any time,so the problem Unfortunately,it is not supported on Glass at the time of becomes finding the gesture G in the infinite time series S writing.To fill this gap,we have implemented a simple and identifying which template Gt is the closest match. activity detector using the accelerometer.Samples from the 1)Gesture Template Enrollment:GlassGesture selects tem- accelerometer are chunked by an overlapping sliding window. plates,from the gestures recorded as the user does,in a gesture We extract features,like mean,standard deviation (std),root template enrollment procedure.This allows the system to be mean square (rms),from each axis in every chunk.Then a maximally accurate for its user.During enrollment,we require decision tree is used as the classifier,due to its simplicity users to sit still when they are recording a new gesture.The and efficiency.The classifier currently gives one of the four recorded time series are normalized and error cases are filtered outputs:1)sitting/standing,which indicates that the user's out.We will create templates from the recorded gestures head is fixed and the user's body is not moving;2)walking;3) using a clustering algorithm called affinity propagation,which running;and 4)driving.By using a 50 Hz sampling rate,and a has been proposed as an effective method [22].The selected 10-second window with a 5-second overlap,the classifier gives gesture,i.e.the affinity propagation cluster center,is stored as an average accuracy of 98%in our preliminary experiments, a gesture template in the system for recognition later. which is adequate for use in our system. 2)Weighted Dynamic Time Warping:We use the weighted Gesture Detector.The goal of the gesture detector is to DTW algorithm to measure how well two gestures match, capture potential gesture from the sensor's time series data. which has several advantages such as simplicity,working di- rectly on raw data,and computational feasibility on wearables To find a potential gesture,we begin with windowing (30 8].DTW calculates the distance between two gestures,by samples)the gyroscope samples,and we calculate the rolling standard deviation (std).A threshold on the gyroscope rolling scaling one in the time domain until the distance is minimized. The algorithm takes two time series;a potential gesture G and std for the gesture detector will be set according to the current activity context,given by activity detector.To determine a gesture template Gt.Assuming that G is of length I and Gt the thresholds,we collect user gyroscope data in different is of length 4t,where i [1,],j[1,]given a 3-axis activities with and without the head gestures and apply a gyroscope time series,we have histogram-based method as shown in Fig.5.In our current implementation,we disable the gesture recognition function dtw(G,Gt)=VxDi,(r)+uDi.()+wzDi.(z) when the user is running or driving for safety concerns.If The function D denotes the matching distance or cost,which the rolling std is below the current threshold,we know that is calculated as there cannot be any gesture present,and the samples are discarded.Otherwise,we start to buffer both accelerometer Dii=d(G(i),Gt(j))+min{Di-1.j-1;Di.j-1;Di-1.} and gyroscope readings.We keep these buffered samples until where d is a distance measure;we use Euclidean distance the rolling std drops below the threshold,indicating that user (ED).We also add weights (w,w,w)to each axis to better is no longer moving and the gesture has finished.We then capture the differences of gestures,since we have found that
head gestures have different movement distributions along slightly different gestures.We evaluate nprw through our each axis.For example,a nodding gesture is much stronger experiments in the evaluation section. in the z-axis the the y-axis or z-axis.Weights are calculated by the std on each axis of the template as C.Head-Gesture-based Authentication std(Gtz) Basic Idea.As we mentioned previously,Glass does not Dx三 have a robust authentication scheme.To secure the interface in std(Gtz)+std(Gty)+std(Gtz) GlassGesture,we propose the use of signatures extracted from The best match (minimal D(l,lt))is optimized in the sense simple head gestures.In order to lead the user to perform a of an optimal alignment of those samples.We can say that natural and instinctual gesture,a"yes or no"security question, G matches Gt,if dtw(G,Gt)is below a certain threshold. that can be answered using head gestures,is presented on the To recognize which gesture is present in a given window,we near-eye display.The user answers with head movements.In need to run DTW iterating all templates.Whichever template this way,the instinctual gestures(nodding and shaking)can be has the lowest DTW distance with the target,and is below a considered consistent head movements.After that,the answer safety threshold,is selected as the recognition result. (gestures)will be verified by the system.Features are extracted 3)Efficient Similarity Search:DTW is a pair-wise template from motion sensors,then fed into a trained classifier.If matching algorithm,which means that to detect a gesture the answer is correct and the classifier labels the gesture as naively,we need to traverse all gesture templates.It costs belonging to the user,the user will be accepted.Otherwise,it O(N2)to compare two time series at length of N (we set will reject the user.Thus,we form a two-factor authentication I=lt =N for simplicity),which is not efficient when there scheme.While we mainly test the feasibility of the "nod" is a large number of gesture templates.We propose several and "shake"gestures,since they convey social meanings in schemes to optimize the performance. answering questions,we do not rule out the possibility of Firstly,to reduce the search complexity,we want to build other head gestures.This scheme has several advantages over a k-dimensional (k-d)tree to do k-Nearest Neighbor (kNN) the existing authentication done on the touchpad.First,the searches.However,tree branch pruning based on the triangle user does not have to remember anything,as the signatures inequality will introduce errors if applied directly on DTW we extract are inherent in their movement/gesture.Second, distances between gesture templates,since DTW distance is nod and shake are simple gestures taking almost no effort a non-metric and does not satisfy the triangle inequality [23]. from user.Finally,an attacker cannot brute-force this system Therefore,we build the tree using Euclidean distance (ED) even with significant effort,because 1)the near-eye display instead,which is a metric distance,and therefore preserves is a private display,which can prevent shoulder surfing on the triangle inequality,allowing us to do pruning safely. the secure questions;2)the signature of the head gestures are Secondly,to further reduce the computation,we down- hard to observe by the human eye,unaided by any special sample the inputs before calculating the ED.Then we build equipment.Furthermore they are difficult to forge even with the k-d tree.To recognize a target gesture,we first use the explicit knowledge of the features. down-sampled target gesture to do the kNN search over the Threat Model.We have identified three types of possible k-d tree.Then,we iterate over all k candidate templates to attackers.The Type-/attacker has no prior information what- calculate the DTW distance with the target to find the best soever.This attacker simply has physical access to the user's match with no down-sampling for the best accuracy. Glass and attempts to authenticate as the user.Type-I attacks The construction of a k-d tree is given in Alg.1.And the are very likely to fail and ultimately amount to a brute force kNN search is given in Alg.2.Say we have m templates, attack,which can be mitigated by locking the device after a which are all of length N.It costs O(m*N2)when iterating few consecutive authentication failures.The Type-II attacker over all the templates to match a target gesture,using DTW. may know the answer to the user specific security questions, The set of m gesture templates in N-space (each template but will try to authenticate with head gestures in their own is of length N)can be firstly down-sampled to nED-space natural styles (not trying to imitate the correct user's motions (each template is at nEp length,nED <N).We build a k-d or features).The Type-Ill attacker,the most powerful attacker, tree of O(m)size in O(m logm)time to process the down- not only knows the answers to the security questions,but sampled templates,of which the cost can be amortized.The also is able to observe authentication instances (e.g.through kNN search query can be answered in O(mED+k),where a video clip).The attacker can try to perform the gesture in a k is the number of query results.In total,the time cost is similar manner as the owner,in an attempt to fool the system. O(m"ED +k+k*N2). Note that,there is no security mechanism which can guarantee Lastly,we can also down-sample the gesture data before that the attacker will not be able to obtain the data on the running DTW after the kNN search.The time cost will become device once the attacker has physical access.The proposed O(m"ED+k+k*nprw2)where npTw<N is the down- authentication method can slow the attacker down,foil naive sampled length for DTW.However,it is non-trivial to choose or inexperienced attackers,and make the task of extracting proper nprw,since we don't want the down-sampling to data from the device more difficult. remove important features of the time series.If this is the Authentication Setup.In this offline setup phase,the user case,then the DTW algorithm may fail at differentiating two first needs to establish a large set of security questions with
head gestures have different movement distributions along each axis. For example, a nodding gesture is much stronger in the x-axis the the y-axis or z-axis. Weights are calculated by the std on each axis of the template as wx = std(Gtx) std(Gtx) + std(Gty) + std(Gtz) The best match (minimal D(l, lt)) is optimized in the sense of an optimal alignment of those samples. We can say that G matches Gt, if dtw(G, Gt) is below a certain threshold. To recognize which gesture is present in a given window, we need to run DTW iterating all templates. Whichever template has the lowest DTW distance with the target, and is below a safety threshold, is selected as the recognition result. 3) Efficient Similarity Search: DTW is a pair-wise template matching algorithm, which means that to detect a gesture naively, we need to traverse all gesture templates. It costs O(N2 ) to compare two time series at length of N (we set l = lt = N for simplicity), which is not efficient when there is a large number of gesture templates. We propose several schemes to optimize the performance. Firstly, to reduce the search complexity, we want to build a k-dimensional (k-d) tree to do k-Nearest Neighbor (kNN) searches. However, tree branch pruning based on the triangle inequality will introduce errors if applied directly on DTW distances between gesture templates, since DTW distance is a non-metric and does not satisfy the triangle inequality [23]. Therefore, we build the tree using Euclidean distance (ED) instead, which is a metric distance, and therefore preserves the triangle inequality, allowing us to do pruning safely. Secondly, to further reduce the computation, we downsample the inputs before calculating the ED. Then we build the k-d tree. To recognize a target gesture, we first use the down-sampled target gesture to do the kNN search over the k-d tree. Then, we iterate over all k candidate templates to calculate the DTW distance with the target to find the best match with no down-sampling for the best accuracy. The construction of a k-d tree is given in Alg. 1. And the kNN search is given in Alg. 2. Say we have m templates, which are all of length N. It costs O(m ∗ N2 ) when iterating over all the templates to match a target gesture, using DTW. The set of m gesture templates in N-space (each template is of length N) can be firstly down-sampled to nED-space (each template is at nED length, nED ≪ N). We build a k-d tree of O(m) size in O(m log m) time to process the downsampled templates, of which the cost can be amortized. The kNN search query can be answered in O(m 1 nED + k), where k is the number of query results. In total, the time cost is O(m 1 nED + k + k ∗ N2 ). Lastly, we can also down-sample the gesture data before running DTW after the kNN search. The time cost will become O(m 1 nED +k +k ∗nDTW 2 ) where nDTW ≪ N is the downsampled length for DTW. However, it is non-trivial to choose proper nDTW , since we don’t want the down-sampling to remove important features of the time series. If this is the case, then the DTW algorithm may fail at differentiating two slightly different gestures. We evaluate nDTW through our experiments in the evaluation section. C. Head-Gesture-based Authentication Basic Idea. As we mentioned previously, Glass does not have a robust authentication scheme. To secure the interface in GlassGesture, we propose the use of signatures extracted from simple head gestures. In order to lead the user to perform a natural and instinctual gesture, a “yes or no” security question, that can be answered using head gestures, is presented on the near-eye display. The user answers with head movements. In this way, the instinctual gestures (nodding and shaking) can be considered consistent head movements. After that, the answer (gestures) will be verified by the system. Features are extracted from motion sensors, then fed into a trained classifier. If the answer is correct and the classifier labels the gesture as belonging to the user, the user will be accepted. Otherwise, it will reject the user. Thus, we form a two-factor authentication scheme. While we mainly test the feasibility of the “nod” and “shake” gestures, since they convey social meanings in answering questions, we do not rule out the possibility of other head gestures. This scheme has several advantages over the existing authentication done on the touchpad. First, the user does not have to remember anything, as the signatures we extract are inherent in their movement/gesture. Second, nod and shake are simple gestures taking almost no effort from user. Finally, an attacker cannot brute-force this system even with significant effort, because 1) the near-eye display is a private display, which can prevent shoulder surfing on the secure questions; 2) the signature of the head gestures are hard to observe by the human eye, unaided by any special equipment. Furthermore they are difficult to forge even with explicit knowledge of the features. Threat Model. We have identified three types of possible attackers. The Type-I attacker has no prior information whatsoever. This attacker simply has physical access to the user’s Glass and attempts to authenticate as the user. Type-I attacks are very likely to fail and ultimately amount to a brute force attack, which can be mitigated by locking the device after a few consecutive authentication failures. The Type-II attacker may know the answer to the user specific security questions, but will try to authenticate with head gestures in their own natural styles (not trying to imitate the correct user’s motions or features). The Type-III attacker, the most powerful attacker, not only knows the answers to the security questions, but also is able to observe authentication instances (e.g. through a video clip). The attacker can try to perform the gesture in a similar manner as the owner, in an attempt to fool the system. Note that, there is no security mechanism which can guarantee that the attacker will not be able to obtain the data on the device once the attacker has physical access. The proposed authentication method can slow the attacker down, foil naive or inexperienced attackers, and make the task of extracting data from the device more difficult. Authentication Setup. In this offline setup phase, the user first needs to establish a large set of security questions with