Keystroke Recognition Using WiFi Signals Kamran Alit Alex X.Liut+Wei Wang Muhammad Shahzad tDept.of Computer Science and Engineering,Michigan State University,USA #State Key Laboratory for Novel Software Technology,Nanjing University,China tfalikamr3,alexliu,shahzadm@cse.msu.edu,ww@nju.edu.cn ABSTRACT ways to recognize keystrokes,which can be classified into Keystroke privacy is critical for ensuring the security of com- three categories:acoustic emission based approaches,elec- puter systems and the privacy of human users as what being tromagnetic emission based approaches,and vision based typed could be passwords or privacy sensitive information. approaches.Acoustic emission based approaches recognize In this paper,we show for the first time that WiFi signals keystrokes based on either the observation that different keys can also be exploited to recognize keystrokes.The intuition in a keyboard produce different typing sounds 1,2 or the is that while typing a certain key,the hands and fingers of a observation that the acoustic emanations from different keys user move in a unique formation and direction and thus gen- arrive at different surrounding smartphones at different time erate a unique pattern in the time-series of Channel State as the keys are located at different places in a keyboard [3]. Information (CSI)values,which we call CSI-waveform for Electromagnetic emission based approaches recognize key- that key.In this paper,we propose a WiFi signal based strokes based on the observation that the electromagnetic keystroke recognition system called WiKey.WiKey consists emanations from the electrical circuit underneath different of two Commercial Off-The-Shelf (COTS)WiFi devices,a keys in a keyboard are different 4.Vision based approaches sender (such as a router)and a receiver (such as a laptop) recognizes keystrokes using vision technologies 5. The sender continuously emits signals and the receiver con- In this paper,we show for the first time that WiFi signals tinuously receives signals.When a human subject types on can also be exploited to recognize keystrokes.WiFi signals a keyboard,WiKey recognizes the typed keys based on how are pervasive in our daily life at home,offices,and even the CSI values at the WiFi signal receiver end.We imple- shopping centers.The key intuition is that while typing a mented the WiKey system using a TP-Link TL-WR1043ND certain key,the hands and fingers of a user move in a unique WiFi router and a Lenovo X200 laptop.WiKey achieves formation and direction and thus generate a unique pattern more than 97.5%detection rate for detecting the keystroke in the time-series of Channel State Information(CSI)values, and 96.4%recognition accuracy for classifying single keys. which we call CSI-waveform,for that key.The keystrokes In real-world experiments,WiKey can recognize keystrokes of each key introduce relative unique multi-path distortions in a continuously typed sentence with an accuracy of 93.5%. in WiFi signals and this uniqueness can be exploited to re- cognize keystrokes.Due to the high data rates supported by Categories and Subject Descriptors modern WiFi devices,WiFi cards provide enough CSI val- C.2.1 Network Architecturel:Wireless Communica- ues within the duration of a keystroke to construct a high tions;D.4.6 Security and Protectione:Keystroke re- resolution CSI-waveform for each keystroke. covery We propose a WiFi signal based keystroke recognition sys. tem called WiKey.WiKey consists of two Commercial Off- Keywords The-Shelf (COTS)WiFi devices,a sender (such as a router) Gesture recognition;Wireless security:Keystroke recovery: and a receiver(such as a laptop),as shown in Figure 1.The Channel State Information:COTS WiFi devices sender continuously emits signals and the receiver continu- ously receives signals.When a human subject types in a 1.INTRODUCTION keyboard,on the WiFi signal receiver end,WiKey recog- Keystroke privacy is critical for ensuring the security of nizes the typed keys based on how the CSI value changes. computer systems and the privacy of human users as what CSI values quantify the aggregate effect of wireless phenom- being types could be passwords or privacy sensitive in- ena such as fading,multi-paths,and Doppler shift on the formation. The research community has studied various wireless signals in a given environment.When the environ- ment changes,such as a key is being pressed,the impact Permission to make digital or hard copies of all or part of this work for personal or of these wireless phenomena on the wireless signals change classroom use is granted without fee provided that copies are not made or distributed resulting in unique changes in the CSI values.There are for profit or commercial advantage and that copies bear this notice and the full cita- three key technical challenges.The first technical challenge tion on the first page.Copyrights for components of this work owned by others than is to segment the CSI time series to identify the start time ACM must be honored.Abstracting with credit is permitted.To copy otherwise,or re- and end time of each keystroke.We studied the character- publish,to post on servers or to redistribute to lists,requires prior specific permission istics of typical CSI-waveforms of different keystrokes and and/or a fee.Request permissions from Permissions@acm.org. MobiCom'l5.September 7-11,2015,Paris.France. observed that the waveforms of different keys show a similar ©2015ACM.1SBN978-1-4503-3619-2/15/09$15.00 rising and falling trends in the changing rate of CSI values. D0L:http:/x.doi.org/10.1145/2789168.2790109
Keystroke Recognition Using WiFi Signals Kamran Ali† Alex X. Liu†‡ Wei Wang‡ Muhammad Shahzad† †Dept. of Computer Science and Engineering, Michigan State University, USA ‡State Key Laboratory for Novel Software Technology, Nanjing University, China † {alikamr3,alexliu,shahzadm}@cse.msu.edu, ‡ww@nju.edu.cn ABSTRACT Keystroke privacy is critical for ensuring the security of computer systems and the privacy of human users as what being typed could be passwords or privacy sensitive information. In this paper, we show for the first time that WiFi signals can also be exploited to recognize keystrokes. The intuition is that while typing a certain key, the hands and fingers of a user move in a unique formation and direction and thus generate a unique pattern in the time-series of Channel State Information (CSI) values, which we call CSI-waveform for that key. In this paper, we propose a WiFi signal based keystroke recognition system called WiKey. WiKey consists of two Commercial Off-The-Shelf (COTS) WiFi devices, a sender (such as a router) and a receiver (such as a laptop). The sender continuously emits signals and the receiver continuously receives signals. When a human subject types on a keyboard, WiKey recognizes the typed keys based on how the CSI values at the WiFi signal receiver end. We implemented the WiKey system using a TP-Link TL-WR1043ND WiFi router and a Lenovo X200 laptop. WiKey achieves more than 97.5% detection rate for detecting the keystroke and 96.4% recognition accuracy for classifying single keys. In real-world experiments, WiKey can recognize keystrokes in a continuously typed sentence with an accuracy of 93.5%. Categories and Subject Descriptors C.2.1 [Network Architecture]: Wireless Communications; D.4.6 [Security and Protectione]: Keystroke recovery Keywords Gesture recognition; Wireless security; Keystroke recovery; Channel State Information; COTS WiFi devices 1. INTRODUCTION Keystroke privacy is critical for ensuring the security of computer systems and the privacy of human users as what being types could be passwords or privacy sensitive information. The research community has studied various Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Permissions@acm.org. MobiCom’15, September 7–11, 2015, Paris, France. c 2015 ACM. ISBN 978-1-4503-3619-2/15/09 ...$15.00. DOI: http://dx.doi.org/10.1145/2789168.2790109. ways to recognize keystrokes, which can be classified into three categories: acoustic emission based approaches, electromagnetic emission based approaches, and vision based approaches. Acoustic emission based approaches recognize keystrokes based on either the observation that different keys in a keyboard produce different typing sounds [1, 2] or the observation that the acoustic emanations from different keys arrive at different surrounding smartphones at different time as the keys are located at different places in a keyboard [3]. Electromagnetic emission based approaches recognize keystrokes based on the observation that the electromagnetic emanations from the electrical circuit underneath different keys in a keyboard are different [4]. Vision based approaches recognizes keystrokes using vision technologies [5]. In this paper, we show for the first time that WiFi signals can also be exploited to recognize keystrokes. WiFi signals are pervasive in our daily life at home, offices, and even shopping centers. The key intuition is that while typing a certain key, the hands and fingers of a user move in a unique formation and direction and thus generate a unique pattern in the time-series of Channel State Information (CSI) values, which we call CSI-waveform, for that key. The keystrokes of each key introduce relative unique multi-path distortions in WiFi signals and this uniqueness can be exploited to recognize keystrokes. Due to the high data rates supported by modern WiFi devices, WiFi cards provide enough CSI values within the duration of a keystroke to construct a high resolution CSI-waveform for each keystroke. We propose a WiFi signal based keystroke recognition system called WiKey. WiKey consists of two Commercial Off- The-Shelf (COTS) WiFi devices, a sender (such as a router) and a receiver (such as a laptop), as shown in Figure 1. The sender continuously emits signals and the receiver continuously receives signals. When a human subject types in a keyboard, on the WiFi signal receiver end, WiKey recognizes the typed keys based on how the CSI value changes. CSI values quantify the aggregate effect of wireless phenomena such as fading, multi-paths, and Doppler shift on the wireless signals in a given environment. When the environment changes, such as a key is being pressed, the impact of these wireless phenomena on the wireless signals change, resulting in unique changes in the CSI values. There are three key technical challenges. The first technical challenge is to segment the CSI time series to identify the start time and end time of each keystroke. We studied the characteristics of typical CSI-waveforms of different keystrokes and observed that the waveforms of different keys show a similar rising and falling trends in the changing rate of CSI values
adapted to recognize keystrokes because such coarse grained CSI VALUES information does not capture the minor variations in the CSl DIFFERENT SUMCARRIERS values caused by human micro-movements such as those of hands and fingers while typing.Some recent work,namely VIFI ROUTER WiHear.uses CSI values to extract the micro-movements of mouth to recognize 9 syllables in the spoken words 10 However,WiHear uses special hardware including direc- USER'S KEYOARE tional antennas and stepper motors to direct WiFi beams towards speaker's mouth and extract the micro-movements. Figure 1:WiKey System We implemented the WiKey system using COTS devices,i.e. a TP-Link TL-WR1043ND WiFi router and a Lenovo X200 Based on this observation,we design a keystroke extraction laptop with Intel 5300 WiFi NIC.In the evaluation process algorithm that utilizes CSI streams of all transmit-receive we build a keystroke database of 10 human subjects with antenna (TX-RX)pair pairs to determine the approximate IRB approval.WiKey achieves more than 97.5%detection start and end points of individual keystrokes in a given CSI- rate for detecting the keystroke and 96.4%recognition ac- waveform by continuously matching the trends in CSI time curacy for classifying single keys.In real-world experiments, series with the experimentally observed trends using a slid- WiKey can recognize keystrokes in a continuously typed sen- tence with an accuracy of 93.5% ing window approach. The second technical challenge is to extract distinguishing In this paper,we have shown that fine grained activity features for generating classification models for each of the recognition is possible by using COTS WiFi devices.Thus, 37 keys (10 digits,26 alphabets and 1 space-bar).As the the techniques proposed in this paper can be used for sev- keys on a keyboard are closely placed,conventional features eral HCI applications.Examples include zoom-in,zoom-out, such as maximum peak power,mean amplitude,root mean scrolling,sliding,and rotating gestures for operating per- square deviation of signal amplitude,second/third central sonal computers,gesture recognition for gaming consoles moment,rate of change,signal energy or entropy,and num- in-home gesture recognition for operating various household ber of zero crossings cannot be used because the values of devices,and applications such as writing and drawing in the these features for adjacent keys are almost identical.To air.Other than being a potential attack,our WiKey tech- address this challenge,we use the CSI-waveform shapes of nology can be potentially used to build virtual keyboards each key from each TX-RX antenna pair as features.As the where human users type on a printed keyboard. waveforms for each key contain a large number of samples. we apply the Discrete Wavelet Transform (DWT)technique 2. RELATED WORK on these waveforms to reduce the number of samples while keeping the shape preserving time and frequency domain in- 2.1 Device Free Activity Recognition formation intact.We use the waveforms resulting from the Device-free activity recognition solutions use the vari- DWT of individual keystrokes as their shape features. ations in wireless channel to recognize human activities in a The third technical challenge is to compare shape fea- given environment.Existing solutions can be grouped into tures of any two keystrokes.The midpoints of extracted three categories:(1)Received Signal Strength (RSS)based, CSI-wavforms of different keystrokes rarely align with each (2)CSI based,and (3)Software Defined Radio (SDR)based. other because the start and end points determined by ex- RSS Based:Sigg et al.proposed activity recognition traction algorithm are never exact.Moreover,the lengths schemes that utilize RSS values of WiFi signals to recog- of different keystroke waveforms also differ because the dur- nize four activities including crawling,lying down,standing ation of pressing any key is often different.Consequently up,and walking [11,12].They achieved activity recognition the midpoints and lengths of shape features do not match rates of over 80%for these four activities.To obtain the either.Another issue is that the shape of different keystroke RSS values from WiFi signals,they used USRPs,which are waveforms of the same key are often distorted versions of specialized hardware devices compared to the COTS WiFi each other because of slightly different formation and dir- devices that we used in our work.While RSS values can be ection of motion of hands and fingers while pressing that used for recognizing macro-movements,they are not suit- key.Thus,two shape features cannot be compared using able to recognize the micro-movements such as those of fin- standard measures like correlation coefficient or Euclidean gers and hands in keyboard typing because RSS values only distance.To address this challenge,we use the Dynamic provide coarse-grained information about the channel vari- Time Warping (DTW)technique to quantify the distance ations and do not contain fine-grained information about between the two shape features.DTW can find the min- small scale fading and multi-path effects caused by these imum distance alignment between two waveforms of differ- micro-movements. ent lengths. CSI Based:CSI values obtained from COTS WiFI net- The key novelty of this paper is on proposing the first work interface cards (NICs)(such as Intel 5300 and Ath- WiFi signal based keystroke recognition approach.Some re- eros 9390)have been recently proposed for activity recogni- cent work uses CSI values to recognize various macro aspects tion [6-10,13]and localization [14-16].Han et al.proposed of human movements such as falling down 6,household WiFall that detects fall of a human subject in an indoor activities [7],detection of human presence [8],and estim- environment using CSI values [6].Zhou et al.proposed a ating the number of people in a crowd [9].These schemes passive human detection scheme which exploits multi-path extract coarse grained information from the CSI values to variations for detecting human presence in an indoor envir- recognize the macro-movements such as falling down or re- onment using CSI values [8].Zou et al.proposed Electronic cognizing fullbody/limb gestures.They cannot be directly Frog Eye that counts the number of people in a crowd using
Figure 1: WiKey System Based on this observation, we design a keystroke extraction algorithm that utilizes CSI streams of all transmit-receive antenna (TX-RX) pair pairs to determine the approximate start and end points of individual keystrokes in a given CSIwaveform by continuously matching the trends in CSI time series with the experimentally observed trends using a sliding window approach. The second technical challenge is to extract distinguishing features for generating classification models for each of the 37 keys (10 digits, 26 alphabets and 1 space-bar). As the keys on a keyboard are closely placed, conventional features such as maximum peak power, mean amplitude, root mean square deviation of signal amplitude, second/third central moment, rate of change, signal energy or entropy, and number of zero crossings cannot be used because the values of these features for adjacent keys are almost identical. To address this challenge, we use the CSI-waveform shapes of each key from each TX-RX antenna pair as features. As the waveforms for each key contain a large number of samples, we apply the Discrete Wavelet Transform (DWT) technique on these waveforms to reduce the number of samples while keeping the shape preserving time and frequency domain information intact. We use the waveforms resulting from the DWT of individual keystrokes as their shape features. The third technical challenge is to compare shape features of any two keystrokes. The midpoints of extracted CSI-wavforms of different keystrokes rarely align with each other because the start and end points determined by extraction algorithm are never exact. Moreover, the lengths of different keystroke waveforms also differ because the duration of pressing any key is often different. Consequently, the midpoints and lengths of shape features do not match either. Another issue is that the shape of different keystroke waveforms of the same key are often distorted versions of each other because of slightly different formation and direction of motion of hands and fingers while pressing that key. Thus, two shape features cannot be compared using standard measures like correlation coefficient or Euclidean distance. To address this challenge, we use the Dynamic Time Warping (DTW) technique to quantify the distance between the two shape features. DTW can find the minimum distance alignment between two waveforms of different lengths. The key novelty of this paper is on proposing the first WiFi signal based keystroke recognition approach. Some recent work uses CSI values to recognize various macro aspects of human movements such as falling down [6], household activities [7], detection of human presence [8], and estimating the number of people in a crowd [9]. These schemes extract coarse grained information from the CSI values to recognize the macro-movements such as falling down or recognizing fullbody/limb gestures. They cannot be directly adapted to recognize keystrokes because such coarse grained information does not capture the minor variations in the CSI values caused by human micro-movements such as those of hands and fingers while typing. Some recent work, namely WiHear, uses CSI values to extract the micro-movements of mouth to recognize 9 syllables in the spoken words [10]. However, WiHear uses special hardware including directional antennas and stepper motors to direct WiFi beams towards speaker’s mouth and extract the micro-movements. We implemented the WiKey system using COTS devices, i.e. a TP-Link TL-WR1043ND WiFi router and a Lenovo X200 laptop with Intel 5300 WiFi NIC. In the evaluation process, we build a keystroke database of 10 human subjects with IRB approval. WiKey achieves more than 97.5% detection rate for detecting the keystroke and 96.4% recognition accuracy for classifying single keys. In real-world experiments, WiKey can recognize keystrokes in a continuously typed sentence with an accuracy of 93.5%. In this paper, we have shown that fine grained activity recognition is possible by using COTS WiFi devices. Thus, the techniques proposed in this paper can be used for several HCI applications. Examples include zoom-in, zoom-out, scrolling, sliding, and rotating gestures for operating personal computers, gesture recognition for gaming consoles, in-home gesture recognition for operating various household devices, and applications such as writing and drawing in the air. Other than being a potential attack, our WiKey technology can be potentially used to build virtual keyboards where human users type on a printed keyboard. 2. RELATED WORK 2.1 Device Free Activity Recognition Device-free activity recognition solutions use the variations in wireless channel to recognize human activities in a given environment. Existing solutions can be grouped into three categories: (1) Received Signal Strength (RSS) based, (2) CSI based, and (3) Software Defined Radio (SDR) based. RSS Based: Sigg et al. proposed activity recognition schemes that utilize RSS values of WiFi signals to recognize four activities including crawling, lying down, standing up, and walking [11, 12]. They achieved activity recognition rates of over 80% for these four activities. To obtain the RSS values from WiFi signals, they used USRPs, which are specialized hardware devices compared to the COTS WiFi devices that we used in our work. While RSS values can be used for recognizing macro-movements, they are not suitable to recognize the micro-movements such as those of fingers and hands in keyboard typing because RSS values only provide coarse-grained information about the channel variations and do not contain fine-grained information about small scale fading and multi-path effects caused by these micro-movements. CSI Based: CSI values obtained from COTS WiFI network interface cards (NICs) (such as Intel 5300 and Atheros 9390) have been recently proposed for activity recognition [6–10, 13] and localization [14–16]. Han et al. proposed WiFall that detects fall of a human subject in an indoor environment using CSI values [6]. Zhou et al. proposed a passive human detection scheme which exploits multi-path variations for detecting human presence in an indoor environment using CSI values [8]. Zou et al. proposed Electronic Frog Eye that counts the number of people in a crowd using
CSI values by treating the people reflecting the WiFi signals used cepstrum features [22]instead of FFT as keystroke fea- as "virtual antennas"[9].Wang et al.proposed E-eyes that tures and used unsupervised learning with language model exploits CSI values for recognizing household activities such correction on the collected features before using them for as washing dishes and taking a shower [7.Nandakumar et supervised training and recognition of different keystrokes al.leverage the CSI and RSS information from off-the-shelf Zhu et al.proposed a context-free geometry-based approach WiFi devices to classify four arm gestures-push,pull,lever for recognizing keystrokes that leverage the acoustic eman- and punch [13].The fundamental difference between these ations from keystrokes to first calculate the time difference schemes and our scheme is that these schemes extract coarse of keystroke arrival and then estimate the physical locations grained features from the CSI values provided by the COTS of the keystrokes to identify which keys are pressed [3. WiFi NIC to perform these tasks while our proposed scheme Electromagnetic Emissions Based Vuagnoux et al. refines these CSI to capture fine grained variations in the used a USRP to capture the electromagnetic emanations wireless channel for recognizing keystrokes.Wang et al.pro- while pressing the keys [4.These electromagnetic emana- pose WiHear that uses CSI values recognizes the shape of tions originated from the electrical circuit underneath each mouth while speaking to detect whether a person is utter- key in conventional keyboards.The authors proposed to cap- ing one of a set of nine predefined nine syllables [10].While ture the entire raw electromagnetic spectrum and process it WiHear can capture the micro-movements of lips,it uses to recognize the keystrokes.Unfortunately,this scheme is special purpose directional antennas with stepper motors highly susceptible to background electromagnetic noise that for directing the antenna beams towards a person's mouth exists in almost all environments these days such as due to to obtain a clean signal for recognizing mouth movements microwave ovens,refrigerators,and televisions. In contrast,our proposed scheme does not use any special Video Camera Based Balzarotti et al.proposed purpose equipment and recognizes the micro-movements of ClearShot that processes the video of a person typing to fingers and hands using COTS WiFi NIC reconstruct the sentences (s)he types 5.The authors pro- SDR Based:Researchers have proposed schemes that pose to use context and language sensitive analysis for re- utilize SRDs and special purpose hardware to transmit and constructing the sentences receive custom modulated signals for activity recognition [17-20].Pu et al.proposed WiSee that uses a special pur- 3.CHANNEL STATE INFORMATION pose receiver design on USRPs to extract small Doppler shifts from OFDM WiFi transmissions to recognize human Modern WiFi devices that support IEEE 802.11n/ac gestures [17].Kellogg et al.proposed to use a special pur- standard typically consist of multiple transmit and mul- pose analog envelop detector circuit for recognizing gestures tiple receive antennas and thus support MIMO.Each MIMO within a distance of up to 2.5 feet using backscatter sig- channel between each transmit-receive (TX-RX)antenna nals from RFID or TV transmissions 18.Lyonnet et al. pair of a transmitter and receiver comprises of multiple sub- use micro Doppler signatures to classify gaits of human carriers.These WiFi devices continuously monitor the state subjects into multiple categories using specialized Doppler of the wireless channel to effectively perform transmit power radars [19].Adib et al.proposed WiTrack that uses a spe- allocations and rate adaptations for each individual MIMO stream such that the available capacity of the wireless chan- cially designed frequency modulated carrier wave radio fron- tend to track human movements behind a wall [20].Recently. nel is maximally utilized [23.These devices quantify the state of the channel in terms of CSI values.The CSI val- Chen et al.proposed an SDR based custom receiver design which can be used to track keystrokes using wireless sig- ues essentially characterize the Channel Frequency Response nals [21].In contrast to all these schemes,our scheme does (CFR)for each subcarrier between each transmit-receive not use any specialized hardware or SDRs rather utilizes (TX-RX)antenna pair.As the received signal is the res- COTS WiFi NICs to recognize keystrokes ultant of constructive and destructive interference of several multipath signals scattered from the walls and surrounding objects,the disturbances caused by movement of hands and 2.2 Keystrokes Recognition fingers while typing on a keyboard near the WiFi receiver To the best of our knowledge,there is no prior work on re- not only lead to changes in previously existing multipaths cognizing keystrokes by leveraging variations in wireless sig- but also to the creation of new multipaths.These changes nals using commodity WiFi devices.Other than the SDRs are captured in the CSI values for all subcarriers between based keystroke tracking approach proposed in [21 which every TX-RX antenna pair and can then be used to recog- uses wireless signals to track keystrokes,researchers have nize keystrokes. proposed several keystrokes recognition schemes that are Let Mr denote the number of transmit antennas,MR de- based on other sensing modalities such as acoustics 1-3,22 note the number of receive antennas and Se denote the num- electromagnetic emissions 4,and video cameras 5.Next, ber of OFDM sub-carriers.Let Xi and Y;represent the MT we give a brief overview of the other existing schemes that dimensional transmitted signal vector and MR dimensional utilize these sensing modalities to recognize keystrokes. received signal vector,respectively,for subcarrier i and let Acoustics Based:Asonov et al.proposed a scheme Ni represent an MR dimensional noise vector.An MR x MT to recognize keystrokes by leveraging the observation that MIMO system at any time instant can be represented by the different keys of a given keyboard produce slightly dif- following equation. ferent sounds during regular typing [1].They used back- Yi=H:Xi+Wii∈1,Se (1) propagation neural network for keystroke recognition and fast fourier transform (FFT)of the time window of every In the equation above,the MR x Mr dimensional channel keystroke peak as features for training the classifiers.Zhuang matrix Hi represents the Channel State Information (CSI) et al.proposed another scheme that recognizes keystrokes for the sub-carrier i.Any two communicating WiFi devices based on the sounds generated during key presses [2].They estimate this channel matrix Hi for every subcarrier by reg-
CSI values by treating the people reflecting the WiFi signals as “virtual antennas” [9]. Wang et al. proposed E-eyes that exploits CSI values for recognizing household activities such as washing dishes and taking a shower [7]. Nandakumar et al. leverage the CSI and RSS information from off-the-shelf WiFi devices to classify four arm gestures - push, pull, lever, and punch [13]. The fundamental difference between these schemes and our scheme is that these schemes extract coarse grained features from the CSI values provided by the COTS WiFi NIC to perform these tasks while our proposed scheme refines these CSI to capture fine grained variations in the wireless channel for recognizing keystrokes. Wang et al. propose WiHear that uses CSI values recognizes the shape of mouth while speaking to detect whether a person is uttering one of a set of nine predefined nine syllables [10]. While WiHear can capture the micro-movements of lips, it uses special purpose directional antennas with stepper motors for directing the antenna beams towards a person’s mouth to obtain a clean signal for recognizing mouth movements. In contrast, our proposed scheme does not use any special purpose equipment and recognizes the micro-movements of fingers and hands using COTS WiFi NIC. SDR Based: Researchers have proposed schemes that utilize SRDs and special purpose hardware to transmit and receive custom modulated signals for activity recognition [17–20]. Pu et al. proposed WiSee that uses a special purpose receiver design on USRPs to extract small Doppler shifts from OFDM WiFi transmissions to recognize human gestures [17]. Kellogg et al. proposed to use a special purpose analog envelop detector circuit for recognizing gestures within a distance of up to 2.5 feet using backscatter signals from RFID or TV transmissions [18] . Lyonnet et al. use micro Doppler signatures to classify gaits of human subjects into multiple categories using specialized Doppler radars [19]. Adib et al. proposed WiTrack that uses a specially designed frequency modulated carrier wave radio frontend to track human movements behind a wall [20]. Recently, Chen et al. proposed an SDR based custom receiver design which can be used to track keystrokes using wireless signals [21]. In contrast to all these schemes, our scheme does not use any specialized hardware or SDRs rather utilizes COTS WiFi NICs to recognize keystrokes. 2.2 Keystrokes Recognition To the best of our knowledge, there is no prior work on recognizing keystrokes by leveraging variations in wireless signals using commodity WiFi devices. Other than the SDRs based keystroke tracking approach proposed in [21] which uses wireless signals to track keystrokes, researchers have proposed several keystrokes recognition schemes that are based on other sensing modalities such as acoustics [1–3,22], electromagnetic emissions [4], and video cameras [5]. Next, we give a brief overview of the other existing schemes that utilize these sensing modalities to recognize keystrokes. Acoustics Based: Asonov et al. proposed a scheme to recognize keystrokes by leveraging the observation that different keys of a given keyboard produce slightly different sounds during regular typing [1]. They used backpropagation neural network for keystroke recognition and fast fourier transform (FFT) of the time window of every keystroke peak as features for training the classifiers. Zhuang et al. proposed another scheme that recognizes keystrokes based on the sounds generated during key presses [2]. They used cepstrum features [22] instead of FFT as keystroke features and used unsupervised learning with language model correction on the collected features before using them for supervised training and recognition of different keystrokes. Zhu et al. proposed a context-free geometry-based approach for recognizing keystrokes that leverage the acoustic emanations from keystrokes to first calculate the time difference of keystroke arrival and then estimate the physical locations of the keystrokes to identify which keys are pressed [3]. Electromagnetic Emissions Based Vuagnoux et al. used a USRP to capture the electromagnetic emanations while pressing the keys [4]. These electromagnetic emanations originated from the electrical circuit underneath each key in conventional keyboards. The authors proposed to capture the entire raw electromagnetic spectrum and process it to recognize the keystrokes. Unfortunately, this scheme is highly susceptible to background electromagnetic noise that exists in almost all environments these days such as due to microwave ovens, refrigerators, and televisions. Video Camera Based Balzarotti et al. proposed ClearShot that processes the video of a person typing to reconstruct the sentences (s)he types [5]. The authors propose to use context and language sensitive analysis for reconstructing the sentences. 3. CHANNEL STATE INFORMATION Modern WiFi devices that support IEEE 802.11n/ac standard typically consist of multiple transmit and multiple receive antennas and thus support MIMO. Each MIMO channel between each transmit-receive (TX-RX) antenna pair of a transmitter and receiver comprises of multiple subcarriers. These WiFi devices continuously monitor the state of the wireless channel to effectively perform transmit power allocations and rate adaptations for each individual MIMO stream such that the available capacity of the wireless channel is maximally utilized [23]. These devices quantify the state of the channel in terms of CSI values. The CSI values essentially characterize the Channel Frequency Response (CFR) for each subcarrier between each transmit-receive (TX-RX) antenna pair. As the received signal is the resultant of constructive and destructive interference of several multipath signals scattered from the walls and surrounding objects, the disturbances caused by movement of hands and fingers while typing on a keyboard near the WiFi receiver not only lead to changes in previously existing multipaths but also to the creation of new multipaths. These changes are captured in the CSI values for all subcarriers between every TX-RX antenna pair and can then be used to recognize keystrokes. Let MT denote the number of transmit antennas, MR denote the number of receive antennas and Sc denote the number of OFDM sub-carriers. Let Xi and Yi represent the MT dimensional transmitted signal vector and MR dimensional received signal vector, respectively, for subcarrier i and let Ni represent an MR dimensional noise vector. An MR ×MT MIMO system at any time instant can be represented by the following equation. Yi = HiXi + Ni i ∈ [1, Sc] (1) In the equation above, the MR × MT dimensional channel matrix Hi represents the Channel State Information (CSI) for the sub-carrier i. Any two communicating WiFi devices estimate this channel matrix Hi for every subcarrier by reg-
ularly transmitting a known preamble of OFDM symbols between each other.For each Tx-Rx antenna pair,the driver of our Intel 5300 WiFi NIC reports CSI values for Se=30 OFDM subcarriers of the 20 MHz WiFi Channel [24].This leads to 30 matrices with dimensions MR x Mr per CSI sample. 4.NOISE REMOVAL 10 500 1000 500 The CSI values provided by commodity WiFi NICs are Sample inherently noisy because of the frequent changes in internal (a)Original time series (b)Filtered time series CSI reference levels,transmit power levels,and transmis- Figure 2:Original and filtered CSI time series sion rates.To use CSI values for recognizing keystrokes,such noise must first be removed from the CSI time series.For while a user was repeatedly pressing a key.We observe from this,WiKey first passes the CSI time series from a low- this figure that all subcarriers show correlated variations in pass filter to remove high frequency noises.Unfortunately,a their time series when the user presses the keys.The sub simple low pass filter does not denoise the CSI values very ef- carriers that are closely spaced in frequency show identical ficiently.Although strict low-pass filtering can remove noise variations whereas the subcarriers that farther away in fre- further,it causes loss of useful information from the signal as quency show non-identical changes.Despite non-identical well.To extract useful signal from the noisy CSI time series. changes,a strong correlation still exists even across the sub- WiKey leverages our observation that the variations in the carriers that are far apart in frequency.WiKey leverages this CSI time series of all subcarriers due to the movements of correlation and calculates the principal components from all hands and fingers are correlated.Therefore,it applies Prin- CSI time series.It then chooses those principal components cipal Component Analysis(PCA)on the filtered subcarriers that represent the most common variations among all CSI to extract the signals that only contain variations caused by time series movements of hands.Next,we first describe the process of applying the low-pass filter on the CSI time series and then explain how Wikey extracts hand and finger movement sig- nal using our PCA based approach. 2000 200 000 4.1 Low Pass Filtering The frequency of variations caused due to the movements of hands and fingers lie at the low end of the spectrum while the frequency of the noise lies at the high end of the spectrum.To remove noise in such a situation,Butterworth low-pass filter is a natural choice which does not signific- 2000 4000.300080 antly distort the phase information in the signal and has a maximally flat amplitude response in the passband and thus does not distort the hand and finger movement signal much.WiKey applies the Butterworth filter on the CSI time series of all subcarriers in each TX-RX antenna pair so that every stream experiences similar effects of phase distortion and group delay introduced by the filter.Although this pro- 2000 20o0098.60 8000 cess helps in removing some high frequency noise,the noise (a)#1,2,3,4,5 (b)#5.10,15,20,25 is not completely eliminated because Butterworth filter has slightly slow fall off gain in the stopband. Figure 3:Correlated variations in subcarriers We observed experimentally that the frequencies of the variations in CSI time series due to hand and finger move- There are two main advantages of using PCA.First,PCA ments while typing approximately lie anywhere between 3Hz reduces the dimensionality of the CSI information obtained to 80 Hz.As we sample CSI values at a rate of F=2500 from the 30 subcarriers in each TX-RX stream,which is samples/s,we set the cut-off frequency we of the Butter- useful because using information from all subcarriers for rthfilter at:三学==≈02rad/s.Figure keystroke extraction and recognition significantly increases 2(a)shows the amplitudes of the unfiltered CSI waveform the computational complexity of the scheme.Consequently. of a keystroke and Figure 2(b)shows the resultant from the PCA automatically enables Wikey to obtain the signals that Butterworth filter.We observe that Butterworth filter suc- are representative of hand and finger movements,without cessfully removes most of the bursty noises from the CSI having to devise new techniques and define new parameters waveforms for selecting appropriate subcarriers for further processing. Second.PCA helps in removing noise from the signals by 4.2 PCA Based Filtering taking advantage of correlated varations in CSI time series We observed experimentally that the movements of hands of different subcarriers.It removes the uncorrelated noisy and fingers results in correlated changes in the CSI time components,which can not be removed through traditional series for each subcarrier in every transmit-receive antenna low pass filtering.This PCA based noise reduction is one pair.Figure 3 plots the amplitudes of CSI time series of 10 of the major reasons behind high keystroke extraction and different subcarriers for one transmit-receive antenna pair recognition accuracies of our scheme
ularly transmitting a known preamble of OFDM symbols between each other. For each Tx-Rx antenna pair, the driver of our Intel 5300 WiFi NIC reports CSI values for Sc = 30 OFDM subcarriers of the 20 MHz WiFi Channel [24]. This leads to 30 matrices with dimensions MR × MT per CSI sample. 4. NOISE REMOVAL The CSI values provided by commodity WiFi NICs are inherently noisy because of the frequent changes in internal CSI reference levels, transmit power levels, and transmission rates. To use CSI values for recognizing keystrokes, such noise must first be removed from the CSI time series. For this, WiKey first passes the CSI time series from a lowpass filter to remove high frequency noises. Unfortunately, a simple low pass filter does not denoise the CSI values very ef- ficiently. Although strict low-pass filtering can remove noise further, it causes loss of useful information from the signal as well. To extract useful signal from the noisy CSI time series, WiKey leverages our observation that the variations in the CSI time series of all subcarriers due to the movements of hands and fingers are correlated. Therefore, it applies Principal Component Analysis (PCA) on the filtered subcarriers to extract the signals that only contain variations caused by movements of hands. Next, we first describe the process of applying the low-pass filter on the CSI time series and then explain how WiKey extracts hand and finger movement signal using our PCA based approach. 4.1 Low Pass Filtering The frequency of variations caused due to the movements of hands and fingers lie at the low end of the spectrum while the frequency of the noise lies at the high end of the spectrum. To remove noise in such a situation, Butterworth low-pass filter is a natural choice which does not significantly distort the phase information in the signal and has a maximally flat amplitude response in the passband and thus does not distort the hand and finger movement signal much. WiKey applies the Butterworth filter on the CSI time series of all subcarriers in each TX-RX antenna pair so that every stream experiences similar effects of phase distortion and group delay introduced by the filter. Although this process helps in removing some high frequency noise, the noise is not completely eliminated because Butterworth filter has slightly slow fall off gain in the stopband. We observed experimentally that the frequencies of the variations in CSI time series due to hand and finger movements while typing approximately lie anywhere between 3Hz to 80 Hz. As we sample CSI values at a rate of Fs = 2500 samples/s, we set the cut-off frequency ωc of the Butterworth filter at ωc = 2π∗f Fs = 2π∗80 2500 ≈ 0.2 rad/s. Figure 2(a) shows the amplitudes of the unfiltered CSI waveform of a keystroke and Figure 2(b) shows the resultant from the Butterworth filter. We observe that Butterworth filter successfully removes most of the bursty noises from the CSI waveforms. 4.2 PCA Based Filtering We observed experimentally that the movements of hands and fingers results in correlated changes in the CSI time series for each subcarrier in every transmit-receive antenna pair. Figure 3 plots the amplitudes of CSI time series of 10 different subcarriers for one transmit-receive antenna pair 0 500 1000 1500 10 11 12 13 14 15 16 17 18 19 Sample Amplitude (a) Original time series 0 500 1000 1500 11 12 13 14 15 16 17 Sample Amplitude (b) Filtered time series Figure 2: Original and filtered CSI time series while a user was repeatedly pressing a key. We observe from this figure that all subcarriers show correlated variations in their time series when the user presses the keys. The subcarriers that are closely spaced in frequency show identical variations whereas the subcarriers that farther away in frequency show non-identical changes. Despite non-identical changes, a strong correlation still exists even across the subcarriers that are far apart in frequency. WiKey leverages this correlation and calculates the principal components from all CSI time series. It then chooses those principal components that represent the most common variations among all CSI time series. 2000 4000 6000 8000 1.8 2 2.2 2.4 2.6 2000 4000 6000 8000 3 4 Absolute Value 0 2000 4000 6000 8000 7 8 9 0 2000 4000 6000 8000 9 10 11 12 13 0 2000 4000 6000 8000 12 14 16 Sample (a) # 1,2,3,4,5 2000 4000 6000 8000 12 14 16 2000 4000 6000 8000 12 14 16 0 2000 4000 6000 8000 9 10 0 2000 4000 6000 8000 18 20 22 0 2000 4000 6000 8000 2 2.5 3 Sample (b) # 5,10,15,20,25 Figure 3: Correlated variations in subcarriers There are two main advantages of using PCA. First, PCA reduces the dimensionality of the CSI information obtained from the 30 subcarriers in each TX-RX stream, which is useful because using information from all subcarriers for keystroke extraction and recognition significantly increases the computational complexity of the scheme. Consequently, PCA automatically enables WiKey to obtain the signals that are representative of hand and finger movements, without having to devise new techniques and define new parameters for selecting appropriate subcarriers for further processing. Second, PCA helps in removing noise from the signals by taking advantage of correlated varations in CSI time series of different subcarriers. It removes the uncorrelated noisy components, which can not be removed through traditional low pass filtering. This PCA based noise reduction is one of the major reasons behind high keystroke extraction and recognition accuracies of our scheme
5.KEYSTROKE EXTRACTION --PCA 2 WiKey segments the CSI time series to extract the CSI 6 PCA 3 --PCA4 waveforms for individual keystrokes.For this,WiKey oper- ates on the CSI time series resulting from the butterworth filtering.Let Ht.r(i)be an Sex 1 dimensional vector contain- ing the CSI values of the Sc subcarriers between an arbitrary TX-RX antenna pair t-r for theith CSI sample.Let H. be an N x S.dimensional matrix containing the CSI values of the Se subcarriers between an arbitrary TX-RX antenna pair t-r for N consecutive CSI samples.This matrix is given by the following equation. 00 2000 3000 4000 5000 6000 7000 Sample .=[H:.-(1)IH..-(2)/H..r(3)1...Hr(N)] (2) (a)Top 4 projections The columns of the matrix Ht.r represent the CSI time series -PCA 2 PCA 3 for each OFDM subcarrier.To detect the starting and end- ”6 --PCA4 ing points of any arbitrary key,WiKey first normalizes the 5 H.r matrix such that every CSI stream has zero mean and unit variance.We denote the normalized version of Ht.by Zt.r.WiKey then performs the PCA based dimensionality reduction and denoising(as described in Section 4.2)on Zt. and the resultant waveforms are further processed to detect the starting and ending points of the keystrokes from this particular TX-RX antenna pair.WiKey repeats this pro- cess on the CSI time series for all antenna pairs and obtains 1000200030004000.500060007000 Sample values for starting and ending points for keys based on the CSI time series from each antenna pair one by one.Finally. (b)Projections 2,3 4 WiKey combines the starting and ending points obtained Figure 4:PCA of Z-normalized CSI stream Zr from all TX-RX antenna pairs to calculate a robust estimate of starting and ending points of the time windows contain- component,we essentially remove the most noisy projection ing those keystrokes.Next we explain these steps in more among the all 4 projections of Zt.r. detail. 5.2 Keystroke Detection 5.1 PCA on Normalized Stream Although existing DFAR schemes propose techniques to LetΦ:pl}be an Se×p dimensional matrix that contains automatically detect the start and end of activities,they the top p principal components obtained from PCA on Zt.r. can not be directly adapted for use in detecting the start We remove the first component from those top p principal and end of keystrokes.Existing schemes use simple threshold components based on our observation that the first compon- based algorithms for detecting the start and end of activit- ent captures majority of the noise,while subsequent com- ies.While,threshold based schemes work well for macro- ponents contain information about movements of hands and movements,they are not well suited for micro-movements fingers while typing.This happens because PCA ranks prin- such as those of hands and fingers while typing,where we cipal components in descending order of their variance,due need to precisely segment time series of keystrokes that are to which the noisy components with higher variance gets closely spaced in time.Unlike general purpose threshold ranked among top principal components.Due to correlated based algorithms,we propose a keystroke detection al- nature of variations in multiple CSI time series,the removal gorithm that provides better detection accuracy,since it of this PCA component does not lead to any significant in- is strictly based on the experimentally observed shapes of formation loss as remaining PCA components still contain different keystroke waveforms.The intuition behind our al- enough information required for successfully detecting start- gorithm is that the CSI time series of every keystroke shows ing and ending points of the keystrokes. a typical increasing and decreasing trend in rates of change If we exclude the first component,the projection of the in CSI time series,similar to the one shown in Figure 2.To CSI stream Zt.r of t-r transmit-receive antenna pair onto the detect such increase and decrease in rates of change in CSI remaining principal components 2can then be written time series,our algorithm uses a moving window approach to as: detect the increasing and decreasing trends in rates of change Zp}=Zr×Φp} in all p-1 time series for each transmit-receive antenna pair (3) i.e.,on each column of.Our algorithm detects the start- where Zf2)is an Nx(p-1)dimensional matrix contain- ing and ending points of keystrokes in following six steps. ing the projected CSI streams in its columns.We choose First,the algorithm calculates the mean absolute devi- the p=4 in our implementation based on our observation ation (MAD)for each of the p-1 time series for each win- that only top 4 principal components contained most signi- dow of size W at j-th iteration.This is done primarily to ficant variations in CSI values caused by different keystrokes. detect the extent of variations in the values of a given time Figure 4(a)shows the result of projecting normalized CSI series.The main reason behind choosing MAD instead of time series Zt.r onto its top 4 principal components.We ob- variance is that in calculating,the deviations from the mean serve from Figure 4(b)that by removing the first principle are squared which gives more weight to extreme values.In
5. KEYSTROKE EXTRACTION WiKey segments the CSI time series to extract the CSI waveforms for individual keystrokes. For this, WiKey operates on the CSI time series resulting from the butterworth filtering. Let Ht,r(i) be an Sc×1 dimensional vector containing the CSI values of the Sc subcarriers between an arbitrary TX-RX antenna pair t − r for the i th CSI sample. Let Ht,r be an N × Sc dimensional matrix containing the CSI values of the Sc subcarriers between an arbitrary TX-RX antenna pair t − r for N consecutive CSI samples. This matrix is given by the following equation. Ht,r = [Ht,r(1)|Ht,r(2)|Ht,r(3)|...|Ht,r(N)]T (2) The columns of the matrix Ht,r represent the CSI time series for each OFDM subcarrier. To detect the starting and ending points of any arbitrary key, WiKey first normalizes the Ht,r matrix such that every CSI stream has zero mean and unit variance. We denote the normalized version of Ht,r by Zt,r. WiKey then performs the PCA based dimensionality reduction and denoising (as described in Section 4.2) on Zt,r and the resultant waveforms are further processed to detect the starting and ending points of the keystrokes from this particular TX-RX antenna pair. WiKey repeats this process on the CSI time series for all antenna pairs and obtains values for starting and ending points for keys based on the CSI time series from each antenna pair one by one. Finally, WiKey combines the starting and ending points obtained from all TX-RX antenna pairs to calculate a robust estimate of starting and ending points of the time windows containing those keystrokes. Next we explain these steps in more detail. 5.1 PCA on Normalized Stream Let Φ {1:p} Z be an Sc × p dimensional matrix that contains the top p principal components obtained from PCA on Zt,r. We remove the first component from those top p principal components based on our observation that the first component captures majority of the noise, while subsequent components contain information about movements of hands and fingers while typing. This happens because PCA ranks principal components in descending order of their variance, due to which the noisy components with higher variance gets ranked among top principal components. Due to correlated nature of variations in multiple CSI time series, the removal of this PCA component does not lead to any significant information loss as remaining PCA components still contain enough information required for successfully detecting starting and ending points of the keystrokes. If we exclude the first component, the projection of the CSI stream Zt,r of t-r transmit-receive antenna pair onto the remaining principal components Φ {2:p} Z can then be written as: Z {2:p} t,r = Zt,r × Φ {2:p} Z (3) where Z {2:p} t,r is an N × (p − 1) dimensional matrix containing the projected CSI streams in its columns. We choose the p = 4 in our implementation based on our observation that only top 4 principal components contained most signi- ficant variations in CSI values caused by different keystrokes. Figure 4(a) shows the result of projecting normalized CSI time series Zt,r onto its top 4 principal components. We observe from Figure 4(b) that by removing the first principle 1000 2000 3000 4000 5000 6000 7000 −1 0 1 2 3 4 5 6 7 Sample Projected CSI values PCA 1 PCA 2 PCA 3 PCA 4 (a) Top 4 projections 1000 2000 3000 4000 5000 6000 7000 −1 0 1 2 3 4 5 6 7 Sample Projected CSI values PCA 2 PCA 3 PCA 4 (b) Projections 2, 3 & 4 Figure 4: PCA of Z-normalized CSI stream Zt,r component, we essentially remove the most noisy projection among the all 4 projections of Zt,r. 5.2 Keystroke Detection Although existing DFAR schemes propose techniques to automatically detect the start and end of activities, they can not be directly adapted for use in detecting the start and end of keystrokes. Existing schemes use simple threshold based algorithms for detecting the start and end of activities. While, threshold based schemes work well for macromovements, they are not well suited for micro-movements such as those of hands and fingers while typing, where we need to precisely segment time series of keystrokes that are closely spaced in time. Unlike general purpose threshold based algorithms, we propose a keystroke detection algorithm that provides better detection accuracy, since it is strictly based on the experimentally observed shapes of different keystroke waveforms. The intuition behind our algorithm is that the CSI time series of every keystroke shows a typical increasing and decreasing trend in rates of change in CSI time series, similar to the one shown in Figure 2. To detect such increase and decrease in rates of change in CSI time series, our algorithm uses a moving window approach to detect the increasing and decreasing trends in rates of change in all p−1 time series for each transmit-receive antenna pair i.e., on each column of Z 2:p t,r . Our algorithm detects the starting and ending points of keystrokes in following six steps. First, the algorithm calculates the mean absolute deviation (MAD) for each of the p − 1 time series for each window of size W at j-th iteration. This is done primarily to detect the extent of variations in the values of a given time series. The main reason behind choosing MAD instead of variance is that in calculating, the deviations from the mean are squared which gives more weight to extreme values. In