SpiderMon:Towards Using Cell Towers as Illuminating Sources for Keystroke Monitoring Kang Ling,Yuntang Liu,Ke Sun,Wei Wang,Lei Xie and Qing Gu State Key Laboratory for Novel Software Technology,Nanjing University [lingkang.yuntangliu,kesun)@smail.nju.edu.cn,[ww.Ixie,guq}@nju.edu.cn Abstract-Cellular network operators deploy base stations with a high density to ensure radio signal coverage for 4G/5G LTE base station 1723 networks.While users enjoy the high-speed connection provided by cellular networks,an adversary could exploit the dense cellular deployment to detect nearby human movements and 0 even recognize keystroke movements of a victim by passively listening to the CRS broadcast from base stations.To demonstrate this,we develop SpiderMon,the first attempt to perform passive continuous keystroke monitoring using the signal transmitted by commercial cellular base stations.Our experimental results show Attacker that SpiderMon can detect keystroke movements at a distance of 5-15m 15 meters and can recover a 6-digits PIN input with a success Figure 1.SpiderMon leverage cellular base stations as illuminating sources rate of more than 51%within ten trials when the victim is behind for passive keystroke monitoring. the wall. ating sources".Therefore,it is harder to detect these attackers I.INTRODUCTION since they do not transmit any signal.Second,cellular signals Keystroke inference attacks are extremely dangerous since have larger coverage areas than Wi-Fi signals.Compared to the attacker could infer the content or even passwords typed Wi-Fi APs that are mostly installed in buildings,cellular by the user through side-channels that can hardly be detected. signals cover both outdoor and indoor areas.Third,cellular Existing works have used videos [1],[2],Inertial Measurement BSs provide highly stable reference signal sources.Cellular Units (IMU)[3],[41.and sound signals [5]-9]in side-channel BSs use GPS-regulated oscillators and low-noise amplifiers attacks that effectively infer the keystroke sequence,see Table to generate Cell-Specific Reference Signal (CRS)at a regular I.Recently,researchers discovered that Wi-Fi radio signals can rate of up to 4,000 times per second,which are more stable in also be used as the medium for keystroke inference attacks both the phase and the amplitude than the signals generated by [10]-[13].However,most of these existing attack models are low-end Wi-Fi devices.Finally,Wi-Fi transmissions could be short-ranged or requires active signal transmission. easily blocked since they use Carrier-Sense Multiple Access In this paper,we first show that an attacker can passively (CSMA)protocols.However,it is against FCC regulations listen to the commercial 4G/5G signals and infer the keystroke to interfere with cellular transmissions.Thus,users cannot sequence of a victim at a distance of 15 meters(Figure 1).As protect themselves by transmitting an interfering signal,as cellular network operators are using high-density deployments suggested in PhyCloak [17]. to improve radio signal coverage for 4G/5G networks.such We develop SpiderMon',a system that performs long- attacks could be pervasive in the near future.Currently,for range keystroke monitoring using the signal transmitted by outdoor areas,macro/micro Base Stations (BSs)are deployed commercial cellular BSs.The design of SpiderMon faces three with a high density of more than 0.3 BS/km2 in urban regions technical challenges.First,capturing the subtle changes caused [14].For indoor areas,radio repeaters and femtocells are by the keystroke movements at a distance of 15 meters is chal- deployed in most buildings to improve the radio signal quality lenging.To address this challenge,we first use a directional [15].As envisioned by the Ultra-Dense Networks (UDN)in antenna to amplify the signal reflected by the victim,as well 5G networks,the distance between cellular access points could as reducing the interferences of nearby movements.We then be a few meters for indoor deployments and 50 meters for design a block Principal Component Analysis (PCA)algorithm outdoor deployments [16].While users enjoy the high-speed that further amplifies the signal by combining signals in connections provided by 4G/5G cellular networks,such dense different subcarriers.Second,it is challenging to infer the cellular deployment leads to severe information leakage issues keystroke sequence of a continuous typing process,where that most users are unaware of. the victim types in a natural manner by continuously moving The cellular signal is a new type of side-channel attack from one key to the next.Existing works treat each keystroke medium that could be more harmful than Wi-Fi signals.First, cellular-based attackers are passive listeners.They use the I We name the system as SpiderMon because it monitors the victim by the small disturbance of a time-frequency grid formed by LTE CRS as shown in signal transmitted by commercial cellular BSs as the "illumin- Figure 2(d),just as a spider that uses its web to detect the prey
SpiderMon: Towards Using Cell Towers as Illuminating Sources for Keystroke Monitoring Kang Ling, Yuntang Liu, Ke Sun, Wei Wang, Lei Xie and Qing Gu State Key Laboratory for Novel Software Technology, Nanjing University {lingkang,yuntangliu,kesun}@smail.nju.edu.cn, {ww,lxie,guq}@nju.edu.cn Abstract—Cellular network operators deploy base stations with a high density to ensure radio signal coverage for 4G/5G networks. While users enjoy the high-speed connection provided by cellular networks, an adversary could exploit the dense cellular deployment to detect nearby human movements and even recognize keystroke movements of a victim by passively listening to the CRS broadcast from base stations. To demonstrate this, we develop SpiderMon, the first attempt to perform passive continuous keystroke monitoring using the signal transmitted by commercial cellular base stations. Our experimental results show that SpiderMon can detect keystroke movements at a distance of 15 meters and can recover a 6-digits PIN input with a success rate of more than 51% within ten trials when the victim is behind the wall. I. INTRODUCTION Keystroke inference attacks are extremely dangerous since the attacker could infer the content or even passwords typed by the user through side-channels that can hardly be detected. Existing works have used videos [1], [2], Inertial Measurement Units (IMU) [3], [4], and sound signals [5]–[9] in side-channel attacks that effectively infer the keystroke sequence, see Table I. Recently, researchers discovered that Wi-Fi radio signals can also be used as the medium for keystroke inference attacks [10]–[13]. However, most of these existing attack models are short-ranged or requires active signal transmission. In this paper, we first show that an attacker can passively listen to the commercial 4G/5G signals and infer the keystroke sequence of a victim at a distance of 15 meters (Figure 1). As cellular network operators are using high-density deployments to improve radio signal coverage for 4G/5G networks, such attacks could be pervasive in the near future. Currently, for outdoor areas, macro/micro Base Stations (BSs) are deployed with a high density of more than 0.3 BS/km2 in urban regions [14]. For indoor areas, radio repeaters and femtocells are deployed in most buildings to improve the radio signal quality [15]. As envisioned by the Ultra-Dense Networks (UDN) in 5G networks, the distance between cellular access points could be a few meters for indoor deployments and 50 meters for outdoor deployments [16]. While users enjoy the high-speed connections provided by 4G/5G cellular networks, such dense cellular deployment leads to severe information leakage issues that most users are unaware of. The cellular signal is a new type of side-channel attack medium that could be more harmful than Wi-Fi signals. First, cellular-based attackers are passive listeners. They use the signal transmitted by commercial cellular BSs as the “illuminLTE base station Attacker Figure 1. SpiderMon leverage cellular base stations as illuminating sources for passive keystroke monitoring. ating sources”. Therefore, it is harder to detect these attackers since they do not transmit any signal. Second, cellular signals have larger coverage areas than Wi-Fi signals. Compared to Wi-Fi APs that are mostly installed in buildings, cellular signals cover both outdoor and indoor areas. Third, cellular BSs provide highly stable reference signal sources. Cellular BSs use GPS-regulated oscillators and low-noise amplifiers to generate Cell-Specific Reference Signal (CRS) at a regular rate of up to 4,000 times per second, which are more stable in both the phase and the amplitude than the signals generated by low-end Wi-Fi devices. Finally, Wi-Fi transmissions could be easily blocked since they use Carrier-Sense Multiple Access (CSMA) protocols. However, it is against FCC regulations to interfere with cellular transmissions. Thus, users cannot protect themselves by transmitting an interfering signal, as suggested in PhyCloak [17]. We develop SpiderMon1 , a system that performs longrange keystroke monitoring using the signal transmitted by commercial cellular BSs. The design of SpiderMon faces three technical challenges. First, capturing the subtle changes caused by the keystroke movements at a distance of 15 meters is challenging. To address this challenge, we first use a directional antenna to amplify the signal reflected by the victim, as well as reducing the interferences of nearby movements. We then design a block Principal Component Analysis (PCA) algorithm that further amplifies the signal by combining signals in different subcarriers. Second, it is challenging to infer the keystroke sequence of a continuous typing process, where the victim types in a natural manner by continuously moving from one key to the next. Existing works treat each keystroke 1We name the system as SpiderMon because it monitors the victim by the small disturbance of a time-frequency grid formed by LTE CRS as shown in Figure 2(d), just as a spider that uses its web to detect the prey
Table I COMPARISON AMONG SIDE-CHANNEL BASED KEYSTROKE INFERENCE METHODS System Attack Distance Side-Channel Signal Passive Listening Continues Typing NLOS Owusu et al.[3] On device IMU (Smartphone) Yes Yes Liu et al.[4] Wearable IMU (Smartwatch) Yes Yes Shukla et al.[1] 5 meters Video Yes Yes No Sun et al.[2] 2 meters Video Yes Yes No Asonov et al.[8] 1 meter Acoustic Yes Yes Zhu et al.[6] 40 centimeters Acoustic Yes Yes Wikey [10] 30 centimeters Wi.FI No No Yes WindTalker [12 1.5 meters Wi-Fi No No Yes SpiderMon 5~15 meters LTE Yes Yes Yes separately by assuming that the user always returns to a MHz and to perform user localization [21].Soft-LTE uses the given posture after each keystroke [10].To handle continuous Sora software-radio to implement the LTE uplink with a full typing,we model the process as a Hidden Markov Model bandwidth but does not implement the downlink [22].Marco (HMM)and use the LTE signal to infer the transition between et al.[23]proposed a method for extracting TOA information subsequent keystrokes.Third,the LTE signal contains both from LTE CIR signals and achieved 20 meters accuracy for data transmission and reference signals so that the raw data vehicular position tracking.However,most of these systems rate is 122.88 MBytes per second,which makes real-time [20].[211.[24]do not support real-time operations on the full data processing and logging a challenge.To enable long-term 20 MHz LTE bandwidth. monitoring,we build a signal processing frontend running on RF-based Activity Monitoring Systems:Different types a workstation that compresses the measurements to a rate of of RF signals,including Wi-Fi [25]-[28].FMCW radar [29], 800 kBytes per second so that the results can be efficiently [30].60GHz radar [31],[32],and RFID [33].[34],have processed and stored in real-time for hours. been used for human activity monitoring.Most of the above Our experimental results show that SpiderMon can detect RF-based attacks require an active transmitting device to be 95%keystrokes at a distance of 15 meters.When the victim placed around the victim.There are systems that use signals is behind the wall at a distance of 5 meters,SpiderMon can transmitted by GSM BSs to perform through wall monitoring recover a 6-digits PIN input with a success rate of more than [35].However,GSM-based systems only extract the coarse- 51%within ten trials and this accuracy is above 36%at 15 grained Doppler shift data,while LTE-based systems can meters with line-of-sight. measure the signal phase with high accuracy. In summary,we have made the following contributions: Keystroke Inference Attacks:Existing keystroke inference .To the best of our knowledge,we are the first to show attacks use different types of sensors to capture the keystroke that commercial 4G/5G cellular signals can be used for fine- signal,including sound [5]-[8],IMU [3].[4],video [1]. grained human activity monitoring. [36],and RF signals [10],[12],[13].Asonov et al.[8]first We build a real-time cellular signal analysis system demonstrated that different keys can be distinguished by their with Commercial Off-The-Shelf(COTS)USRP devices and unique typing sounds.Zhuang et al.[7]and Berger et al.[37] workstations.Our system can process commercial LTE signals improved keystroke recognition accuracy by adding a language with a bandwidth of 20 MHz and extract 4,000 x 200 CRS model.Liu et al.[4]achieved 65%inference accuracy in samples per second in real-time. top-3 candidates using the IMU on a smartwatch.Sun et We propose to leverage the HMM to infer continuous al.[36]detected and quantified the subtle motion patterns of keystroke sequences.Our extensive evaluations on keystroke the back of the device induced by a user's keystrokes using sequence inference show that this method outperforms the videos.WiPass [13]and WindTalker [12]further uses the traditional individual keystroke recovery scheme. Wi-Fi CSI to snoop the unlock patterns and PINs on mobile devices.However,these methods have their own shortcomings II.RELATED WORK Sound and Wi-Fi-based methods tend to work only in limited We divide the existing related work into the following four distances.IMU-based solutions need to crack the victim's areas:LTE physical layer measurements,Radio Frequency wearables,while video-based solutions are limited by lighting (RF)based activity monitoring systems,keystroke inference conditions and obstructions such as ATM keyboard cover. attacks,and protection against RF-based attacks. Protection against RF-based Attacks:Most of exist- LTE Physical Layer Measurements:Existing LTE phys- ing privacy protection systems transmit interfering signals ical layer measurement tools mainly focus on the networking to prevent attackers from measuring key RF parameters or ranging problem.LTE physical layer information,such that are vital for activity recognition.PhyCloak [17]lever- as the Channel Quality Indicator (CQD,can be used in age an RF signal-relay to disturb the amplitude,delay, cross-layer design to improve TCP throughput of the cel-and Doppler shift of the signal received by the attacker lular network [18],[19].The real-time LTE radio resource so that they cannot reliably infer the activity of the user. monitor (RMon)extracts the PHY-layer resource allocation Aegis [38]uses randomized amplifications,fan movements, information to help LTE video streaming [20].LTEye uses and antenna rotations to distort the same set of RF sig USRP N210 to decode LTE signal with a bandwidth of 10 nal parameters.However,these protection schemes actively
Table I COMPARISON AMONG SIDE-CHANNEL BASED KEYSTROKE INFERENCE METHODS. System Attack Distance Side-Channel Signal Passive Listening Continues Typing NLOS Owusu et al. [3] On device IMU (Smartphone) Yes Yes / Liu et al. [4] Wearable IMU (Smartwatch) Yes Yes / Shukla et al. [1] 5 meters Video Yes Yes No Sun et al. [2] 2 meters Video Yes Yes No Asonov et al. [8] 1 meter Acoustic Yes Yes / Zhu et al. [6] 40 centimeters Acoustic Yes Yes / Wikey [10] 30 centimeters Wi-Fi No No Yes WindTalker [12] 1.5 meters Wi-Fi No No Yes SpiderMon 5∼15 meters LTE Yes Yes Yes separately by assuming that the user always returns to a given posture after each keystroke [10]. To handle continuous typing, we model the process as a Hidden Markov Model (HMM) and use the LTE signal to infer the transition between subsequent keystrokes. Third, the LTE signal contains both data transmission and reference signals so that the raw data rate is 122.88 MBytes per second, which makes real-time data processing and logging a challenge. To enable long-term monitoring, we build a signal processing frontend running on a workstation that compresses the measurements to a rate of 800 kBytes per second so that the results can be efficiently processed and stored in real-time for hours. Our experimental results show that SpiderMon can detect 95% keystrokes at a distance of 15 meters. When the victim is behind the wall at a distance of 5 meters, SpiderMon can recover a 6-digits PIN input with a success rate of more than 51% within ten trials and this accuracy is above 36% at 15 meters with line-of-sight. In summary, we have made the following contributions: • To the best of our knowledge, we are the first to show that commercial 4G/5G cellular signals can be used for finegrained human activity monitoring. • We build a real-time cellular signal analysis system with Commercial Off-The-Shelf (COTS) USRP devices and workstations. Our system can process commercial LTE signals with a bandwidth of 20 MHz and extract 4, 000 × 200 CRS samples per second in real-time. • We propose to leverage the HMM to infer continuous keystroke sequences. Our extensive evaluations on keystroke sequence inference show that this method outperforms the traditional individual keystroke recovery scheme. II. RELATED WORK We divide the existing related work into the following four areas: LTE physical layer measurements, Radio Frequency (RF) based activity monitoring systems, keystroke inference attacks, and protection against RF-based attacks. LTE Physical Layer Measurements: Existing LTE physical layer measurement tools mainly focus on the networking or ranging problem. LTE physical layer information, such as the Channel Quality Indicator (CQI), can be used in cross-layer design to improve TCP throughput of the cellular network [18], [19]. The real-time LTE radio resource monitor (RMon) extracts the PHY-layer resource allocation information to help LTE video streaming [20]. LTEye uses USRP N210 to decode LTE signal with a bandwidth of 10 MHz and to perform user localization [21]. Soft-LTE uses the Sora software-radio to implement the LTE uplink with a full bandwidth but does not implement the downlink [22]. Marco et al. [23] proposed a method for extracting TOA information from LTE CIR signals and achieved 20 meters accuracy for vehicular position tracking. However, most of these systems [20], [21], [24] do not support real-time operations on the full 20 MHz LTE bandwidth. RF-based Activity Monitoring Systems: Different types of RF signals, including Wi-Fi [25]–[28], FMCW radar [29], [30], 60GHz radar [31], [32], and RFID [33], [34], have been used for human activity monitoring. Most of the above RF-based attacks require an active transmitting device to be placed around the victim. There are systems that use signals transmitted by GSM BSs to perform through wall monitoring [35]. However, GSM-based systems only extract the coarsegrained Doppler shift data, while LTE-based systems can measure the signal phase with high accuracy. Keystroke Inference Attacks: Existing keystroke inference attacks use different types of sensors to capture the keystroke signal, including sound [5]–[8], IMU [3], [4], video [1], [36], and RF signals [10], [12], [13]. Asonov et al. [8] first demonstrated that different keys can be distinguished by their unique typing sounds. Zhuang et al. [7] and Berger et al. [37] improved keystroke recognition accuracy by adding a language model. Liu et al. [4] achieved 65% inference accuracy in top-3 candidates using the IMU on a smartwatch. Sun et al. [36] detected and quantified the subtle motion patterns of the back of the device induced by a user’s keystrokes using videos. WiPass [13] and WindTalker [12] further uses the Wi-Fi CSI to snoop the unlock patterns and PINs on mobile devices. However, these methods have their own shortcomings. Sound and Wi-Fi-based methods tend to work only in limited distances. IMU-based solutions need to crack the victim’s wearables, while video-based solutions are limited by lighting conditions and obstructions such as ATM keyboard cover. Protection against RF-based Attacks: Most of existing privacy protection systems transmit interfering signals to prevent attackers from measuring key RF parameters that are vital for activity recognition. PhyCloak [17] leverage an RF signal-relay to disturb the amplitude, delay, and Doppler shift of the signal received by the attacker so that they cannot reliably infer the activity of the user. Aegis [38] uses randomized amplifications, fan movements, and antenna rotations to distort the same set of RF signal parameters. However, these protection schemes actively
transmit signals in the targeting frequency band so that One framo they cannot be applied to cellular-based attacks,as it is 10ms 10ms 10ms against FCC regulations to transmit interfering signals in the 0封2的4567格 One slot licensed band. 0.5ms 0.5ms n III.ATTACK SCENARIO AND LTE BACKGROUND FDM sY In this section,we first present the attack scenario of our system.We then introduce the background of LTE system and discuss its protocol design with a focus on downlink Cell- (a)Time domain:frames,subframes,slots and symbols Specific Reference Signals(CRS). Ono resource block (12 subcarriers) A.Attack Scenario 4f-15h We consider an attack scenario where the adversary attempt od DC subcamic to infer the PIN code of a user when he/she inputs it on an 道m ATM or a smart lock door.The adversary may not have direct access to the target,but can deploy equipments at a distance of (b)Frequency domain:subcarriers and resource blocks. 5~15 meters,e.g.,from a building across the road or behind N路×N subcarriers a nearby wall.We assume that there is at least one LTE base Np=12 subcarriers station within a distance of 150 meters to the victim.The LTE coverage could be provided by a macro-cell or an indoor small cell.This requirement usually can be fulfilled in most urban areas.By passive listening to the LTE signal reflected by the victim,the adversary may infer the PIN input by the victim using a probability model. Reference Resource frequency Symbols Block B.LTE Primer (c)Each slot contain N RB.each RB contain 12 subcarriers We give a brief introduction to the LTE signal format and in the frequency domain,and 0.5 ms in the time domain. show how LTE signals form a time-frequency grid that can be used for human activity monitoring.Note that the 5G 700 cellular system uses a similar OFDM modulation scheme and frame structure as in the LTE system.Therefore,most of the following discussion applies to both 4G and 5G systems. 600 Time Domain:In the time domain,LTE BSs transmit 550 radio frames that have a fixed duration of 10ms.Each frame 500 contains ten subframes with a duration of 1ms and each subframe contains two slots of 0.5ms.Depending on the 50 100 150200250300 symbol configuration of the BS,each slot consists of six (in case of extended cyclic prefix)or seven (in case of normal cyclic (d)CRS (shown as small dots)and PSS/SSS for a commercial TDD base station (subcarriers around the DC subcarrier). prefix)OFDM symbols which have durations of 66.67us. Frequency domain:In the frequency domain,the OFDM Figure 2.Illustration of the time-frequency grid of LTE reference signals. symbol contains a series of subcarriers with a frequency subcarriers on two symbols in each slot(0.5 ms).Figure 2(d) interval of Af 15 kHz,as in Figure 2(b).The commonly shows the CRS grid captured from a commercial TDD base used bandwidths for LTE signals are 5,10 and 20 MHz,which station.Note that for TDD,there are some time slots reserved consist of 300,600,and 1200 subcarriers,respectively. for uplink so that the BS does not transmit in these slots. Time-Frequency Grid:The radio resources in LTE are In our experiments,the BS transmits in 14 slots in the 20 scheduled in units called Resource Blocks(RBs),which slots of each frame so that the CRS is sent in 2.800 symbols consists of N=12 subcarriers in the frequency domain (100 frames x 14 slots x 2 symbols)per second,and 200 and lasts one slot (0.5ms)in the time domain,as in Figure subcarriers(100 RB x 2 subcarriers)per symbol. 2(c).The LTE BS transmits the Cell-Specific Reference Signal (CRS)in all downlink RBs.The CRS is transmitted at four C.CRS as a Side Channel different locations in each RB with two CRS separated by In LTE systems,the User Equipments(UEs),e.g.,mobile six subcarriers in each of the two predefined symbols,as in phones,use the CRS to estimate the Channel Frequency Re- Figure 2(c).Therefore,the CRS forms a dense time-frequency sponse(CFR)of the downlink channel.The transmitted value grid at fixed time and frequency intervals.For example,a of CRS is predefined in the LTE protocol [39]determined by Time Division Duplex(TDD)base station that has N=100 the Physical Cell ID (PCD)and slot number.Suppose that the RBs(20 MHz bandwidth)will transmit CRS at 200 different BS transmits S(f,t)on a given subcarrier f at a given time t
transmit signals in the targeting frequency band so that they cannot be applied to cellular-based attacks, as it is against FCC regulations to transmit interfering signals in the licensed band. III. ATTACK SCENARIO AND LTE BACKGROUND In this section, we first present the attack scenario of our system. We then introduce the background of LTE system and discuss its protocol design with a focus on downlink CellSpecific Reference Signals (CRS). A. Attack Scenario We consider an attack scenario where the adversary attempt to infer the PIN code of a user when he/she inputs it on an ATM or a smart lock door. The adversary may not have direct access to the target, but can deploy equipments at a distance of 5∼15 meters, e.g., from a building across the road or behind a nearby wall. We assume that there is at least one LTE base station within a distance of 150 meters to the victim. The LTE coverage could be provided by a macro-cell or an indoor small cell. This requirement usually can be fulfilled in most urban areas. By passive listening to the LTE signal reflected by the victim, the adversary may infer the PIN input by the victim using a probability model. B. LTE Primer We give a brief introduction to the LTE signal format and show how LTE signals form a time-frequency grid that can be used for human activity monitoring. Note that the 5G cellular system uses a similar OFDM modulation scheme and frame structure as in the LTE system. Therefore, most of the following discussion applies to both 4G and 5G systems. Time Domain: In the time domain, LTE BSs transmit radio frames that have a fixed duration of 10ms. Each frame contains ten subframes with a duration of 1ms and each subframe contains two slots of 0.5ms. Depending on the configuration of the BS, each slot consists of six (in case of extended cyclic prefix) or seven (in case of normal cyclic prefix) OFDM symbols which have durations of 66.67µs. Frequency domain: In the frequency domain, the OFDM symbol contains a series of subcarriers with a frequency interval of ∆f = 15 kHz, as in Figure 2(b). The commonly used bandwidths for LTE signals are 5, 10 and 20 MHz, which consist of 300, 600, and 1200 subcarriers, respectively. Time-Frequency Grid: The radio resources in LTE are scheduled in units called Resource Blocks (RBs), which consists of N RB SC =12 subcarriers in the frequency domain and lasts one slot (0.5ms) in the time domain, as in Figure 2(c). The LTE BS transmits the Cell-Specific Reference Signal (CRS) in all downlink RBs. The CRS is transmitted at four different locations in each RB with two CRS separated by six subcarriers in each of the two predefined symbols, as in Figure 2(c). Therefore, the CRS forms a dense time-frequency grid at fixed time and frequency intervals. For example, a Time Division Duplex (TDD) base station that has N DL RB =100 RBs (20 MHz bandwidth) will transmit CRS at 200 different 0.5ms 0.5ms 10ms 10ms 10ms One frame One subframe One slot OFDM symbol OFDM symbol Extended CP Normal CP #0 #1 #2 #3 #4 #5 #6 #7 #8 #9 (a) Time domain: frames, subframes, slots and symbols. … … Unused DC subcarrier One resource block (12 subcarriers) Frequency ∆𝑓 ൌ 15𝑘𝐻𝑧 (b) Frequency domain: subcarriers and resource blocks. … … frequency symbol Resource Block Reference Symbols 𝑁ோ ൈ𝑁ௌ ோ 𝑠𝑢𝑏𝑐𝑎𝑟𝑟𝑖𝑒𝑟𝑠 𝑁ௌ ோ ൌ 12 𝑠𝑢𝑏𝑐𝑎𝑟𝑟𝑖𝑒𝑟𝑠 (c) Each slot contain NDL RB RB, each RB contain 12 subcarriers in the frequency domain, and 0.5 ms in the time domain. (d) CRS (shown as small dots) and PSS/SSS for a commercial TDD base station (subcarriers around the DC subcarrier). Figure 2. Illustration of the time-frequency grid of LTE reference signals. subcarriers on two symbols in each slot (0.5 ms). Figure 2(d) shows the CRS grid captured from a commercial TDD base station. Note that for TDD, there are some time slots reserved for uplink so that the BS does not transmit in these slots. In our experiments, the BS transmits in 14 slots in the 20 slots of each frame so that the CRS is sent in 2,800 symbols (100 frames × 14 slots × 2 symbols) per second, and 200 subcarriers (100 RB × 2 subcarriers) per symbol. C. CRS as a Side Channel In LTE systems, the User Equipments (UEs), e.g., mobile phones, use the CRS to estimate the Channel Frequency Response (CFR) of the downlink channel. The transmitted value of CRS is predefined in the LTE protocol [39] determined by the Physical Cell ID (PCI) and slot number. Suppose that the BS transmits S(f, t) on a given subcarrier f at a given time t
CRS Logger Preprocessing Keystroke Inference Kevstroke Detection LTE Reduced Baseband CFR Noise Removal CFR Shape Extraction 30.72M samples CFO&SFO 200*4k samples 11111222223339 4k samg Calibral山e0 per second per secono Component Analysis Keystroke Sequence Pre-oui时HMM Recovery Figure 3.System overview of SpiderMon. Figure 4.Omnidirectional and Directional an- tenna comparison. In case that the received signal at the UE is R(f,t),the CFR need to perform cross-correlation within five samples of the can be calculated by: expected PSS location to keep track of the PSS.The third step R(f,t) is searching for the Secondary Synchronization Signal(SSS) H(f.t)=s(f.t) (1)and extract the Physical Cell ID (PCD).We use the location of the detected PSS to capture the SSS and calculate the PCI Signals received by the antenna is a superposition of trans- using both the PSS and SSS. mitted signal from multiple paths [40].Suppose a radio signal CFO/SFO Calibration:As the transmitting BS and the arrives at the receiving antenna through k different paths.then receiver run at different clocks,there are both Carrier Fre- the CFR can be given as: quency Offset(CFO)and Sampling Frequency Offset(SFO) in the received baseband signal [42].[43].If we do not H(f,t)= ∑as(f,t)e-2f, (2) calibrate these frequency offsets,they may accumulate and k=1 the system will loose synchronization after several minutes where a(f,t)represents the attenuation and initial phase of continuous monitoring.We first use a high-quality clock offset of the kth path,e()is the phase shift on the kth source (OctoClock CDA-2990)that has a frequency accuracy path,and T(t)is the path delay.With this model,the nearby of 25 ppb to keep the CFO between our receiver and the human movements can be reflected in CFR measurement transmitting BS to be smaller than 100 Hz.However,there fluctuations based a similar model as in Wi-Fi systems [25], are still considerable residual phase offsets caused by CFO [271,[41]. and SFO in the CFR.The residual phase offset at a subcarrier f can be modeled as [42].[43]: IV.SYSTEM DESIGN The structure of SpiderMon is shown in Figure 3.The LTE (f,t)=0(f,t)+2nt6cFo 2rtf-f拉6sro, (3) baseband signal is captured by USRP B210 software radio front-end and transferred to a hosting workstation using a USB where 0(f,t)is the combination of the initial phase and the 3.0 interface.We use the standard 30.72 MHz sampling rate phase change caused by the activity.The last two components where each sample is a complex number with two-bytes real are phase offsets caused by the CFO and SFO. and imaginary parts.Then we use CRS Logger implemented in For the CFO calibration,we use the phase of the center C++to extract CRS and CFR estimations at a rate of 4000 x subcarrier to estimate ocFo since it always has an SFO of zero 200 complex samples per second.At last,the CFR estimations [42].We use linear regression over the phase history of the are transmitted to a Data Preprocessing module which uses center subcarrier with a duration of one second to estimate the MATLAB to analyze and visualize the CRS in real-time. current cFo.We then use an Exponential Moving Average (EMA)to further smooth the CFO estimation over consecutive A.CRS Logger seconds.After that,we compensate the CFO on all baseband The CRS Logger consists of three components:synchron- samples using the smoothed CFO estimation by multiplying ization,CFO/SFO calibration.and CRS extraction. the baseband signal with an estimated phase shift. Synchronization:The first step for synchronization is to The SFO is caused by sampling clock differences between find the carrier frequency of a nearby LTE BS and tune the the transmitter and the receiver.For the SFO calibration. USRP to its carrier frequency.This could be done by scanning we track the SFO by locating the cross-correlation peak of the entire LTE frequency band or using a smartphone in the the PSS.To correct the sampling offsets,we either skip a engineering mode to get the U-ARFCN codes,which indicate single sample or duplicate a sample so that the sampling the carrier frequencies used by neighboring BSs.The second point of the receiver is moved by one sample in the opposite step is searching for the Primary Synchronization Signal(PSS) direction.With CFO/SFO calibration,we can keep the system to find the boundary of the subframes and symbols.This step synchronized for a long duration (several hours). uses a computational intensive cross-correlation operation over CRS Extraction:Based on the PCI obtained from the the whole frame to match the PSS,but it is only performed synchronization step,we can calculate in which subcarriers at the searching stage.After the first searching stage,we only the CRS are transmitted as well as the value of the CRS [39]
Synchronization CFO & SFO Calibration CRS Extraction CRS Logger CFR LTE Baseband Noise Removal Block Principal Component Analysis Preprocessing Keystroke Detection Shape Extraction Keystroke Sequence Recovery Keystroke Inference 30.72M samples per second 200*4k samples per second 10*4k samples per second Reduced CFR Direction Classification Pre-build HMM Figure 3. System overview of SpiderMon. 0 5 10 15 20 25 30 35 40 45 Time (s) 6.5 7 7.5 8 8.5 9 CFR Amplitude 1 1 1 11 2 2222 3 33 33 4 4 4 44 Omni Antenna Directional Antenna Figure 4. Omnidirectional and Directional antenna comparison. In case that the received signal at the UE is R(f, t), the CFR can be calculated by: H(f, t) = R(f, t) S(f, t) . (1) Signals received by the antenna is a superposition of transmitted signal from multiple paths [40]. Suppose a radio signal arrives at the receiving antenna through K different paths, then the CFR can be given as: H(f, t) = X K k=1 ak(f, t)e −j2πf τk(t) , (2) where ak(f, t) represents the attenuation and initial phase offset of the k th path, e −j2πf τk(t) is the phase shift on the k th path, and τk(t) is the path delay. With this model, the nearby human movements can be reflected in CFR measurement fluctuations based a similar model as in Wi-Fi systems [25], [27], [41]. IV. SYSTEM DESIGN The structure of SpiderMon is shown in Figure 3. The LTE baseband signal is captured by USRP B210 software radio front-end and transferred to a hosting workstation using a USB 3.0 interface. We use the standard 30.72 MHz sampling rate where each sample is a complex number with two-bytes real and imaginary parts. Then we use CRS Logger implemented in C++ to extract CRS and CFR estimations at a rate of 4000 × 200 complex samples per second. At last, the CFR estimations are transmitted to a Data Preprocessing module which uses MATLAB to analyze and visualize the CRS in real-time. A. CRS Logger The CRS Logger consists of three components: synchronization, CFO/SFO calibration, and CRS extraction. Synchronization: The first step for synchronization is to find the carrier frequency of a nearby LTE BS and tune the USRP to its carrier frequency. This could be done by scanning the entire LTE frequency band or using a smartphone in the engineering mode to get the U-ARFCN codes, which indicate the carrier frequencies used by neighboring BSs. The second step is searching for the Primary Synchronization Signal (PSS) to find the boundary of the subframes and symbols. This step uses a computational intensive cross-correlation operation over the whole frame to match the PSS, but it is only performed at the searching stage. After the first searching stage, we only need to perform cross-correlation within five samples of the expected PSS location to keep track of the PSS. The third step is searching for the Secondary Synchronization Signal (SSS) and extract the Physical Cell ID (PCI). We use the location of the detected PSS to capture the SSS and calculate the PCI using both the PSS and SSS. CFO/SFO Calibration: As the transmitting BS and the receiver run at different clocks, there are both Carrier Frequency Offset (CFO) and Sampling Frequency Offset (SFO) in the received baseband signal [42], [43]. If we do not calibrate these frequency offsets, they may accumulate and the system will loose synchronization after several minutes of continuous monitoring. We first use a high-quality clock source (OctoClock CDA-2990) that has a frequency accuracy of 25 ppb to keep the CFO between our receiver and the transmitting BS to be smaller than 100 Hz. However, there are still considerable residual phase offsets caused by CFO and SFO in the CFR. The residual phase offset at a subcarrier f can be modeled as [42], [43]: ϕ(f, t) = θ(f, t) + 2πtδCF O + 2πt(f − fc) fs δSF O, (3) where θ(f, t) is the combination of the initial phase and the phase change caused by the activity. The last two components are phase offsets caused by the CFO and SFO. For the CFO calibration, we use the phase of the center subcarrier to estimate δCF O since it always has an SFO of zero [42]. We use linear regression over the phase history of the center subcarrier with a duration of one second to estimate the current δCF O. We then use an Exponential Moving Average (EMA) to further smooth the CFO estimation over consecutive seconds. After that, we compensate the CFO on all baseband samples using the smoothed CFO estimation by multiplying the baseband signal with an estimated phase shift. The SFO is caused by sampling clock differences between the transmitter and the receiver. For the SFO calibration, we track the SFO by locating the cross-correlation peak of the PSS. To correct the sampling offsets, we either skip a single sample or duplicate a sample so that the sampling point of the receiver is moved by one sample in the opposite direction. With CFO/SFO calibration, we can keep the system synchronized for a long duration (several hours). CRS Extraction: Based on the PCI obtained from the synchronization step, we can calculate in which subcarriers the CRS are transmitted as well as the value of the CRS [39]
10.12 18 Tme(图 Time (s) Time (s) (a)CFR signals in different subcarriers,from top to bottom: (a)Principal components (b)Keystroke detection result #1~#5.#81~#85.and#181#185. Figure 6.Keystroke detection with smooth variance of the block PCAs. the data size.Figure 15(b)shows that using block PCA has -9b#1-20 about 8%performance improvement over traditional PCA. 5b81-#100 -Sub*181-*200 12 16 V.KEYSTROKE MONITORING Tme倒 (b)Block PCA results.The first principal components cor- In the keystroke monitoring attack,the adversary points the respond to subcarriers #1#20.#81#100,and antenna towards the victim (ensure that the target is within #181~#200. the receiving angle of the directional antenna)while he/she Figure 5.Performance of the block PCA algorithm. is typing in order to intercept the typing content.We focus on attacking the keystrokes input on numerical keypad as After that,we calculate the CFR estimation for each symbol shown in Figure 10.which is widely used on ATM and doors and subcarrier based on Eg.(1). for inputting the PIN number.The attack contains two steps: B.Data Preprocessing keystroke detection and keystroke recognition. The Data Preprocessing module takes the CFR values and A.Keystroke detection performs the following two steps:noise removal and block In the keystroke detection step,we use a moving vari- principal component analysis. ance algorithm to detect each keystroke event.Figure 6 Noise Removal:We first reduce the impact of multi-path shows the keystroke detection process.We first calculate interference by directional antennas.Compared to omnidirec- the variance from the block PCA results.Once the vari- tional antennas,directional antennas amplify signals in the ance exceeds an empirically determined threshold,the sys- beam direction and reject signals in other directions.Figure tem detects a keystroke event.Sometimes one keystroke 4 compares the CFR captured by a directional antenna and movement may introduce multiple separated variation peaks, an omnidirectional antenna at one of the 200 subcarriers we treat these movements as one keystroke if their time at a distance of 10 meters.Due to the high noise level, interval is less than 0.1 second.The keystroke detection the keystroke movements are submerged in the noisy signal result is shown in Figure 6(b).The vertical red lines are collected by the omnidirectional antenna.But,with the help the groundtruth of the keystroke time-points provided by a of the directional antenna,we can easily determine the CFR key logger and the green/red dots are the detected keystrokes variations corresponding to each keystroke event. start/end time-points. The raw signals captured by directional antennas are still After detecting a keystroke movement with start and end distorted by high-frequency noises.As the hand/finger move- points,we calculate the midpoint of these two points and ments in keystroke input induce CRS variations with frequen- segment the data for a period of time near the midpoint cies between 2~30 Hz [12],we then use a moving-average as the waveform of the keystroke (typically two seconds filter to remove the high-frequency noises.Figure 5(a)shows in our experiments).Our keystroke detection works well the signal after the low-pass filter at selected subcarriers. when there is no interference around.However,it can hardly Block Principal Component Analysis:Most of the CFR detect a keystroke when there are objects moving around the samples are redundant,so they introduce unnecessary com- victim.In the future,we plan to use more antenna to separate putational costs in the keystroke recognition stage.We use nearby objects. PCA (Principal Component Analysis)to extract most prin- cipal components from raw CFR signals.Figure 5(a)shows B.Keystroke recognition the waveform of different LTE subcarriers,we can clearly Existing works treat each keystroke separately by assum- observe that signals between distant subcarriers have smaller ing that the user always returns to a given posture after correlations.Based on this observation,we first divide 200 each keystroke [6].[10].In case of continuous typing,our subcarriers into 10 blocks,then each block performs PCA key observation is that the CFR measurements indicate the and takes the first principal component.Thus,the block PCA hand/finger movements between keys,instead of the key press. algorithm outputs ten principal components.Figure 5(b)shows We model the process as a Hidden Markov Model (HMM) an example of block PCA results in three blocks,where we can to infer the transition between subsequent keystrokes.Note clearly observe the keystroke events.Compare to traditional that existing works such as Zhuang et al.[7]using HMM PCA performed directly on overall 200 subcarriers,block PCA methods to reveal text input are based on language model, can reserve more representative information while squeezing which is significant different to our method,and it can not
0 2 4 6 8 10 12 14 16 18 20 Time (s) 12 14 16 18 20 CFR Amplitude (a) CFR signals in different subcarriers, from top to bottom: #1 ∼ #5, #81 ∼ #85, and #181 ∼ #185. 0 2 4 6 8 10 12 14 16 18 20 Time (s) 20 30 40 Block PCA Sub #1~#20 Sub #81~#100 Sub #181~#200 (b) Block PCA results. The first principal components correspond to subcarriers #1 ∼ #20, #81 ∼ #100, and #181 ∼ #200. Figure 5. Performance of the block PCA algorithm. After that, we calculate the CFR estimation for each symbol and subcarrier based on Eq. (1). B. Data Preprocessing The Data Preprocessing module takes the CFR values and performs the following two steps: noise removal and block principal component analysis. Noise Removal: We first reduce the impact of multi-path interference by directional antennas. Compared to omnidirectional antennas, directional antennas amplify signals in the beam direction and reject signals in other directions. Figure 4 compares the CFR captured by a directional antenna and an omnidirectional antenna at one of the 200 subcarriers at a distance of 10 meters. Due to the high noise level, the keystroke movements are submerged in the noisy signal collected by the omnidirectional antenna. But, with the help of the directional antenna, we can easily determine the CFR variations corresponding to each keystroke event. The raw signals captured by directional antennas are still distorted by high-frequency noises. As the hand/finger movements in keystroke input induce CRS variations with frequencies between 2 ∼ 30 Hz [12], we then use a moving-average filter to remove the high-frequency noises. Figure 5(a) shows the signal after the low-pass filter at selected subcarriers. Block Principal Component Analysis: Most of the CFR samples are redundant, so they introduce unnecessary computational costs in the keystroke recognition stage. We use PCA (Principal Component Analysis) to extract most principal components from raw CFR signals. Figure 5(a) shows the waveform of different LTE subcarriers, we can clearly observe that signals between distant subcarriers have smaller correlations. Based on this observation, we first divide 200 subcarriers into 10 blocks, then each block performs PCA and takes the first principal component. Thus, the block PCA algorithm outputs ten principal components. Figure 5(b) shows an example of block PCA results in three blocks, where we can clearly observe the keystroke events. Compare to traditional PCA performed directly on overall 200 subcarriers, block PCA can reserve more representative information while squeezing 0 2 4 6 8 10 Time (s) -5 0 5 Normalized Amplitude 8 7 0 9 5 PCA #2 PCA #4 PCA #6 PCA #8 PCA #10 (a) Principal components 0 2 4 6 8 10 Time (s) 0 0.5 1 1.5 2 Moving Variance 10-3 8 7 0 9 5 variance smooth variance keystroke time start point end point (b) Keystroke detection result Figure 6. Keystroke detection with smooth variance of the block PCAs. the data size. Figure 15(b) shows that using block PCA has about 8% performance improvement over traditional PCA. V. KEYSTROKE MONITORING In the keystroke monitoring attack, the adversary points the antenna towards the victim (ensure that the target is within the receiving angle of the directional antenna) while he/she is typing in order to intercept the typing content. We focus on attacking the keystrokes input on numerical keypad as shown in Figure 10, which is widely used on ATM and doors for inputting the PIN number. The attack contains two steps: keystroke detection and keystroke recognition. A. Keystroke detection In the keystroke detection step, we use a moving variance algorithm to detect each keystroke event. Figure 6 shows the keystroke detection process. We first calculate the variance from the block PCA results. Once the variance exceeds an empirically determined threshold, the system detects a keystroke event. Sometimes one keystroke movement may introduce multiple separated variation peaks, we treat these movements as one keystroke if their time interval is less than 0.1 second. The keystroke detection result is shown in Figure 6(b). The vertical red lines are the groundtruth of the keystroke time-points provided by a key logger and the green/red dots are the detected keystrokes start/end time-points. After detecting a keystroke movement with start and end points, we calculate the midpoint of these two points and segment the data for a period of time near the midpoint as the waveform of the keystroke (typically two seconds in our experiments). Our keystroke detection works well when there is no interference around. However, it can hardly detect a keystroke when there are objects moving around the victim. In the future, we plan to use more antenna to separate nearby objects. B. Keystroke recognition Existing works treat each keystroke separately by assuming that the user always returns to a given posture after each keystroke [6], [10]. In case of continuous typing, our key observation is that the CFR measurements indicate the hand/finger movements between keys, instead of the key press. We model the process as a Hidden Markov Model (HMM) to infer the transition between subsequent keystrokes. Note that existing works such as Zhuang et al. [7] using HMM methods to reveal text input are based on language model, which is significant different to our method, and it can not