Thru-the-wall Eavesdropping on Loudspeakers via RFID by Capturing Sub-mm Level Vibration 182 CHUYU WANG,State Key Laboratory for Novel Software Technology,Nanjing University,China LEI XIE',State Key Laboratory for Novel Software Technology,Nanjing University,China YUANCAN LIN,State Key Laboratory for Novel Software Technology,Nanjing University,China WEI WANG,State Key Laboratory for Novel Software Technology,Nanjing University,China YINGYING CHEN,Electrical and Computer Engineering,Rutgers University,USA YANLING BU,State Key Laboratory for Novel Software Technology,Nanjing University,China KAl ZHANG,State Key Laboratory for Novel Software Technology,Nanjing University,China SANGLU LU,State Key Laboratory for Novel Software Technology,Nanjing University,China The unprecedented success of speech recognition methods has stimulated the wide usage of intelligent audio systems. which provides new attack opportunities for stealing the user privacy through eavesdropping on the loudspeakers.Effective eavesdropping methods employ a high-speed camera,relying on LOS to measure object vibrations,or utilize WiFi MIMO antenna array,requiring to eavesdrop in quiet environments.In this paper,we explore the possibility of eavesdropping on the loudspeaker based on COTS RFID tags,which are prevalently deployed in many corners of our daily lives.We propose Tag-Bug that focuses on the human voice with complex frequency bands and performs the thru-the-wall eavesdropping on the loudspeaker by capturing sub-mm level vibration.Tag-Bug extracts sound characteristics through two means:(1) Vibration effect,where a tag directly vibrates caused by sounds;(2)Reflection effect,where a tag does not vibrate but senses the reflection signals from nearby vibrating objects.To amplify the influence of vibration signals,we design a new signal feature referred as Modulated Signal Difference(MSD)to reconstruct the sound from RF-signals.To improve the quality of the reconstructed sound for human voice recognition,we apply a Conditional Generative Adversarial Network(CGAN)to recover the full-frequency band from the partial-frequency band of the reconstructed sound.Extensive experiments on the USRP platform show that Tag-Bug can successfully capture the monotone sound when the loudness is larger than 60dB.Tag-Bug can efficiently recognize the numbers of human voice with 95.3%,85.3%and 87.5%precision in the free-space eavesdropping. thru-the-brick-wall eavesdropping and thru-the-insulating-glass eavesdropping,respectively.Tag-Bug can also accurately recognize the letters with 87%precision in the free-space eavesdropping. CCS Concepts:.Networks-Cyber-physical networks;.Security and privacy-Mobile and wireless security. "Lei Xie is the corresponding author. Authors'addresses:Chuyu Wang.State Key Laboratory for Novel Software Technology,Nanjing University,Nanjing.China,chuyu@nju edu.cn;Lei Xie,State Key Laboratory for Novel Software Technology,Nanjing University,Nanjing,China,Ixie@nju.edu.cn:Yuancan Lin, State Key Laboratory for Novel Software Technology.Nanjing University,Nanjing.China,yclin@smailnju.edu.cn;Wei Wang.State Key Laboratory for Novel Software Technology,Nanjing University,Nanjing.China,ww@nju.edu.cn;Yingying Chen,Electrical and Computer Engineering.Rutgers University,New Brunswick,USA,yingche@scarletmaiLrutgers.edu;Yanling Bu,State Key Laboratory for Novel Software Technology,Nanjing University,Nanjing.China,yanling@smail.nju.edu.cn;Kai Zhang.State Key Laboratory for Novel Software Technology Nanjing University,Nanjing,China,mg1933091@smail.nju.edu.cn;Sanglu Lu,State Key Laboratory for Novel Software Technology,Nanjing University.Nanjing.China,sanglu@nju.edu.cn. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page.Copyrights for components of this work owned by others than ACM must be honored.Abstracting with credit is permitted.To copy otherwise,or republish,to post on servers or to redistribute to lists,requires prior specific permission and/or a fee.Request permissions from permissions@acm.org. 2021 Association for Computing Machinery. 2474-9567/2021/12-ART182$15.00 https:/doi.org/10.1145/3494975 Proc.ACM Interact.Mob.Wearable Ubiquitous Technol.,Vol.5,No.4,Article 182.Publication date:December 2021
182 Thru-the-wall Eavesdropping on Loudspeakers via RFID by Capturing Sub-mm Level Vibration CHUYU WANG, State Key Laboratory for Novel Software Technology, Nanjing University, China LEI XIE∗ , State Key Laboratory for Novel Software Technology, Nanjing University, China YUANCAN LIN, State Key Laboratory for Novel Software Technology, Nanjing University, China WEI WANG, State Key Laboratory for Novel Software Technology, Nanjing University, China YINGYING CHEN, Electrical and Computer Engineering, Rutgers University, USA YANLING BU, State Key Laboratory for Novel Software Technology, Nanjing University, China KAI ZHANG, State Key Laboratory for Novel Software Technology, Nanjing University, China SANGLU LU, State Key Laboratory for Novel Software Technology, Nanjing University, China The unprecedented success of speech recognition methods has stimulated the wide usage of intelligent audio systems, which provides new attack opportunities for stealing the user privacy through eavesdropping on the loudspeakers. E!ective eavesdropping methods employ a high-speed camera, relying on LOS to measure object vibrations, or utilize WiFi MIMO antenna array, requiring to eavesdrop in quiet environments. In this paper, we explore the possibility of eavesdropping on the loudspeaker based on COTS RFID tags, which are prevalently deployed in many corners of our daily lives. We propose Tag-Bug that focuses on the human voice with complex frequency bands and performs the thru-the-wall eavesdropping on the loudspeaker by capturing sub-mm level vibration. Tag-Bug extracts sound characteristics through two means: (1) Vibration e!ect, where a tag directly vibrates caused by sounds; (2) Re"ection e!ect, where a tag does not vibrate but senses the re"ection signals from nearby vibrating objects. To amplify the in"uence of vibration signals, we design a new signal feature referred as Modulated Signal Di!erence (MSD) to reconstruct the sound from RF-signals. To improve the quality of the reconstructed sound for human voice recognition, we apply a Conditional Generative Adversarial Network (CGAN) to recover the full-frequency band from the partial-frequency band of the reconstructed sound. Extensive experiments on the USRP platform show that Tag-Bug can successfully capture the monotone sound when the loudness is larger than 60dB. Tag-Bug can e#ciently recognize the numbers of human voice with 95.3%, 85.3% and 87.5% precision in the free-space eavesdropping, thru-the-brick-wall eavesdropping and thru-the-insulating-glass eavesdropping, respectively. Tag-Bug can also accurately recognize the letters with 87% precision in the free-space eavesdropping. CCS Concepts: • Networks ! Cyber-physical networks; • Security and privacy ! Mobile and wireless security. ∗ Lei Xie is the corresponding author. Authors’ addresses: Chuyu Wang, State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China, chuyu@nju. edu.cn; Lei Xie, State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China, lxie@nju.edu.cn; Yuancan Lin, State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China, yclin@smail.nju.edu.cn; Wei Wang, State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China, ww@nju.edu.cn; Yingying Chen, Electrical and Computer Engineering, Rutgers University, New Brunswick, USA, yingche@scarletmail.rutgers.edu; Yanling Bu, State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China, yanling@smail.nju.edu.cn; Kai Zhang, State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China, mg1933091@smail.nju.edu.cn; Sanglu Lu, State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China, sanglu@nju.edu.cn. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for pro$t or commercial advantage and that copies bear this notice and the full citation on the $rst page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior speci$c permission and/or a fee. Request permissions from permissions@acm.org. © 2021 Association for Computing Machinery. 2474-9567/2021/12-ART182 $15.00 https://doi.org/10.1145/3494975 Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 5, No. 4, Article 182. Publication date: December 2021
182:2·Wang et al. Voice assistan RFID antennas Voice assistan Signal features system system Loudspeaker 3Vibration effect RFID tag Sounds RFID tag Signal Attacker RFID tag effect (a)Application scenario (b)Tag-Bug:Acoustic thru-the-wall eavesdropping Fig.1.Thru-the-wall eavesdropping via RFID tags Additional Key Words and Phrases:Eavesdropping,RFID,Sub-mm Level Vibration ACM Reference Format: Chuyu Wang,Lei Xie,Yuancan Lin,Wei Wang,Yingying Chen,Yanling Bu,Kai Zhang,and Sanglu Lu.2021.Thru-the-wall Eavesdropping on Loudspeakers via RFID by Capturing Sub-mm Level Vibration.Proc.ACM Interact.Mob.Wearable Ubiquitous Technol 5,4.Article 182(December 2021),25 pages.https://doi.org/10.1145/3494975 1 INTRODUCTION Acoustic eavesdropping is one of the most significant security concerns,as the voice communication between people is an unencrypted transmission channel,making it easy to obtain the sensitive information.Traditional acoustic eavesdropping methods,which employ hidden or tampered microphones [8,23].can be prevented by using soundproof insulation.Due to such insulation,the user may involuntarily neglect the acoustic eavesdropping in such scenario,making the loudspeaker a potential threat for eavesdropping.Particularly,benefiting from the unprecedented success of the advancement in speech recognition,the intelligent audio systems have been widely integrated into our daily life,which largely extends the usage of loudspeakers and brings new attack opportunities.For example,Google Home may replay the passwords,when the 'Remember'function is activated to record the private information by the user.Then,private information,e.g.,daily schedule,passwords and even life style,may be leaked.Another example is that online meetings during COVID-19 bring great convenience to many companies and employees when working from home.However,all these meetings involve the usage of loudspeakers heavily,which may lead to severe personal and corporate proprietary information leakage. Due to its severe consequences,there have been active research efforts on eavesdropping of loudspeakers.Davis et al leverage a high-speed camera to capture the vibrations of objects(e.g.,a glass of water or a potted plant) caused by the loudspeaker to perceive the sound [10],which relies on the existence of line-of-sight communication. Sensors such as gyroscopes embedded in a smartphone have also been exploited to capture the sound from the loudspeaker [26].This approach works through the common medium with the loudspeaker and does not work for the thru-the-wall eavesdropping.It is also limited by the battery power of mobile devices.ART eavesdropper uses wireless signals to perceive the vibration of the loudspeaker diaphragm based on a specific MIMO antenna array [37].This solution incurs hardware(i.e,MIMO antenna array)with relatively high cost and works mostly in quiet environments.Any nearby vibrations,e.g.,a spinning fan,can affect the receiving signal.Some advanced work has shown that Ultra High Frequency(UHF)RFID tags can capture tiny vibrations.TagSound [20]perceives the mono-tone sound vibration by using harmonic signals,and others [40,41]capture the ambient vibrations based on the phase variation by using the compressive sensing.However,the harmonic signals are too weak to perform the thru-the-wall eavesdropping,and the compressive sensing cannot be used to extract the human voice with none-sparse frequency bands. Proc.ACM Interact.Mob.Wearable Ubiquitous Technol..Vol 5.No.4.Article 182.Publication date:December 2021
182:2 • Wang et al. Voice assistant system Hello!! RFID tag Voice assistant system RFID tag RFID tag RFID tag RFID tag (a) Application scenario Wall ③Vibration effect RFID antennas ②Sounds Loudspeaker TX RX ④Backscattered signal ④Leakage signal -0.12 -0.1 -0.08 -0.06 -0.04 In-phase 0.28 0.3 0.32 0.34 Quadrature Sounds Attacker ① Continuous Wave ③Reflection effect Signal features (b) Tag-Bug: Acoustic thru-the-wall eavesdropping Fig. 1. Thru-the-wall eavesdropping via RFID tags. Additional Key Words and Phrases: Eavesdropping, RFID, Sub-mm Level Vibration ACM Reference Format: Chuyu Wang, Lei Xie, Yuancan Lin, Wei Wang, Yingying Chen, Yanling Bu, Kai Zhang, and Sanglu Lu. 2021. Thru-the-wall Eavesdropping on Loudspeakers via RFID by Capturing Sub-mm Level Vibration. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 5, 4, Article 182 (December 2021), 25 pages. https://doi.org/10.1145/3494975 1 INTRODUCTION Acoustic eavesdropping is one of the most signi$cant security concerns, as the voice communication between people is an unencrypted transmission channel, making it easy to obtain the sensitive information. Traditional acoustic eavesdropping methods, which employ hidden or tampered microphones [8, 23], can be prevented by using soundproof insulation. Due to such insulation, the user may involuntarily neglect the acoustic eavesdropping in such scenario, making the loudspeaker a potential threat for eavesdropping. Particularly, bene$ting from the unprecedented success of the advancement in speech recognition, the intelligent audio systems have been widely integrated into our daily life, which largely extends the usage of loudspeakers and brings new attack opportunities. For example, Google Home may replay the passwords, when the ‘Remember’ function is activated to record the private information by the user. Then, private information, e.g., daily schedule, passwords and even life style, may be leaked. Another example is that online meetings during COVID-19 bring great convenience to many companies and employees when working from home. However, all these meetings involve the usage of loudspeakers heavily, which may lead to severe personal and corporate proprietary information leakage. Due to its severe consequences, there have been active research e!orts on eavesdropping of loudspeakers. Davis et al. leverage a high-speed camera to capture the vibrations of objects (e.g., a glass of water or a potted plant) caused by the loudspeaker to perceive the sound [10], which relies on the existence of line-of-sight communication. Sensors such as gyroscopes embedded in a smartphone have also been exploited to capture the sound from the loudspeaker [26]. This approach works through the common medium with the loudspeaker and does not work for the thru-the-wall eavesdropping. It is also limited by the battery power of mobile devices. ART eavesdropper uses wireless signals to perceive the vibration of the loudspeaker diaphragm based on a speci$c MIMO antenna array [37]. This solution incurs hardware (i.e., MIMO antenna array) with relatively high cost and works mostly in quiet environments. Any nearby vibrations, e.g., a spinning fan, can a!ect the receiving signal. Some advanced work has shown that Ultra High Frequency (UHF) RFID tags can capture tiny vibrations. TagSound [20] perceives the mono-tone sound vibration by using harmonic signals, and others [40, 41] capture the ambient vibrations based on the phase variation by using the compressive sensing. However, the harmonic signals are too weak to perform the thru-the-wall eavesdropping, and the compressive sensing cannot be used to extract the human voice with none-sparse frequency bands. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 5, No. 4, Article 182. Publication date: December 2021
Thru-the-wall Eavesdropping on Loudspeakers via RFID by Capturing Sub-mm Level Vibration.182:3 In this paper,we explore the possibility of eavesdropping the human voice played by the loudspeaker based on the surrounding COTS RFID tags,which could be attached on many everyday objects as shown in Figure 1(a) On one hand,many daily products from online purchasing,such as water bottles,delivery packages,hang tags, envelopes books,etc.,come with RFID tags.It greatly improves the chances of RFID tags appearing in our lives,and makes the tags easily overlooked.On the other hand,the adversary can even intentionally hide the battery-less and light-weighted RFID tags beside the loudspeaker,e.g.,under the table,which is hard to be detected and is able to eavesdrop in a long term.As shown in Figure 1(b),we develop Tag-Bug,an effective system to perform the thru-the-wall eavesdropping on the loudspeaker based on the received physical-layer signals.Similar to the previous attacks [6,17,26],we consider the loudspeaker as the sound source,which is widely used in a voice assistant system,e.g.,Google Home and Amazon Alexa,rather than the live human speech.The reason is that the live human speech mainly leads to the air flow from the mouth with small vibration of vocal cords while the loudspeaker mainly leads to the diaphragm vibration.Thus,the human speech can be drowned by the vibration due to the air flow in the extracted sound.In particular,Tag-Bug can extract the sound from loudspeaker through two ways:(1)Vibration effect,the tag directly vibrates caused by sounds,e.g.,the tag vibrates directly due to the playing sounds when attached on the delivery package.(2)Reflection effect,the tag does not vibrate but senses the reflection signals from nearby vibrating objects due to the sound,e.g.,the tag captures the reflection signal from a cup of water,which vibrates due to the playing sounds.To extract the tiny vibration of the sound, we build a model to decompose the received signals and extract the Modulated Signal Difference(MSD)as the vibration indicator.Since the RFID tag is more sensitive to the low-frequency sound due to the larger sound energy,we leverage a Conditional Generative Adversarial Network(CGAN)to recover the high-frequency band by referring to the low-frequency band,so as to improve the quality of recovered human voice. There are three main challenges in performing the eavesdropping via RFID tags.The first challenge is to detect the sub-mm level vibration caused by the sound.Traditionally,the vibration of the loudspeaker diaphragm is usually smaller than 1mm [16].However,such tiny vibration results in the phase change below 0.04 radians,which is close to the noise level [39].To address this challenge,we build a transmitting model and extract amplified vibration features from the received signal.Particularly,we extract the Modulated Signal Difference(MSD)as the difference of signals between the ON and OFF modulation states.The phase change of MSD indicates the tag displacement due to the vibration.Furthermore,we propose the amplified MSD by subtracting the average signal of OFF states in a time window.The amplified MSD can extract the sound from either the vibration effect or the reflection effect.In this way,Tag-Bug can extract the sub-mm level vibration,when either the tag itself or the nearby object vibrates caused by the sound wave The second challenge is to reduce the interference of the periodic commands sent by the RFID reader.In RFID systems,the periodic reader signal,e.g,the QUERY and ACK commands,is much stronger than the backscattered signal from the tag.Even if the reader signal does not overlap with the tag signal in the time domain,the periodic reader signal will lead to the large noise in the frequency band when received by the antenna.To address this challenge,we randomize the tag response mechanism based on the C1G2 protocol.In particular,we randomly set the frame-size of each query cycle and let the tag randomly retransmit the EPC command.Then,the noise due to the periodic commands can be significantly reduced. The third challenge is to refine the recovered human voice extracted from the amplified MSD.Human voice is the main target of the concerns during the eavesdropping.However,limited by the inherent material characteristics of RFID tags,the signals of high-frequency bands are very weak in the extracted sound from the amplified MSD,so the recovered sound is unclear for recognition.To address this challenge,we investigate the correlation of signals with different frequencies,and find that high-frequency signals are usually harmonic of low-frequency signals. To efficiently capture the correlation among different frequency bands,we develop a CGAN to recover the full- frequency band by referring to multiple low-frequencies.In this way,the refined sound has more comprehensive frequency band,and could be recognized more accurately. Proc.ACM Interact.Mob.Wearable Ubiquitous Technol.,Vol.5,No.4,Article 182.Publication date:December 2021
Thru-the-wall Eavesdropping on Loudspeakers via RFID by Capturing Sub-mm Level Vibration • 182:3 In this paper, we explore the possibility of eavesdropping the human voice played by the loudspeaker based on the surrounding COTS RFID tags, which could be attached on many everyday objects as shown in Figure 1(a). On one hand, many daily products from online purchasing, such as water bottles, delivery packages, hang tags, envelopes books, etc. , come with RFID tags. It greatly improves the chances of RFID tags appearing in our lives, and makes the tags easily overlooked. On the other hand, the adversary can even intentionally hide the battery-less and light-weighted RFID tags beside the loudspeaker, e.g., under the table, which is hard to be detected and is able to eavesdrop in a long term. As shown in Figure 1(b), we develop Tag-Bug, an e!ective system to perform the thru-the-wall eavesdropping on the loudspeaker based on the received physical-layer signals. Similar to the previous attacks [6, 17, 26], we consider the loudspeaker as the sound source, which is widely used in a voice assistant system, e.g., Google Home and Amazon Alexa, rather than the live human speech. The reason is that the live human speech mainly leads to the air "ow from the mouth with small vibration of vocal cords, while the loudspeaker mainly leads to the diaphragm vibration. Thus, the human speech can be drowned by the vibration due to the air "ow in the extracted sound. In particular, Tag-Bug can extract the sound from loudspeaker through two ways: (1) Vibration e!ect, the tag directly vibrates caused by sounds, e.g., the tag vibrates directly due to the playing sounds when attached on the delivery package. (2) Re"ection e!ect, the tag does not vibrate but senses the re"ection signals from nearby vibrating objects due to the sound, e.g., the tag captures the re"ection signal from a cup of water, which vibrates due to the playing sounds. To extract the tiny vibration of the sound, we build a model to decompose the received signals and extract the Modulated Signal Di!erence (MSD) as the vibration indicator. Since the RFID tag is more sensitive to the low-frequency sound due to the larger sound energy, we leverage a Conditional Generative Adversarial Network (CGAN) to recover the high-frequency band by referring to the low-frequency band, so as to improve the quality of recovered human voice. There are three main challenges in performing the eavesdropping via RFID tags. The #rst challenge is to detect the sub-mm level vibration caused by the sound. Traditionally, the vibration of the loudspeaker diaphragm is usually smaller than 1mm [16]. However, such tiny vibration results in the phase change below 0.04 radians, which is close to the noise level [39]. To address this challenge, we build a transmitting model and extract ampli$ed vibration features from the received signal. Particularly, we extract the Modulated Signal Di!erence (MSD) as the di!erence of signals between the ON and OFF modulation states. The phase change of MSD indicates the tag displacement due to the vibration. Furthermore, we propose the ampli$ed MSD by subtracting the average signal of OFF states in a time window. The ampli$ed MSD can extract the sound from either the vibration e!ect or the re"ection e!ect. In this way, Tag-Bug can extract the sub-mm level vibration, when either the tag itself or the nearby object vibrates caused by the sound wave. The second challenge is to reduce the interference of the periodic commands sent by the RFID reader. In RFID systems, the periodic reader signal, e.g., the QUERY and ACK commands, is much stronger than the backscattered signal from the tag. Even if the reader signal does not overlap with the tag signal in the time domain, the periodic reader signal will lead to the large noise in the frequency band when received by the antenna. To address this challenge, we randomize the tag response mechanism based on the C1G2 protocol. In particular, we randomly set the frame-size of each query cycle and let the tag randomly retransmit the EPC command. Then, the noise due to the periodic commands can be signi$cantly reduced. The third challenge is to re#ne the recovered human voice extracted from the ampli#ed MSD. Human voice is the main target of the concerns during the eavesdropping. However, limited by the inherent material characteristics of RFID tags, the signals of high-frequency bands are very weak in the extracted sound from the ampli$ed MSD, so the recovered sound is unclear for recognition. To address this challenge, we investigate the correlation of signals with di!erent frequencies, and $nd that high-frequency signals are usually harmonic of low-frequency signals. To e#ciently capture the correlation among di!erent frequency bands, we develop a CGAN to recover the fullfrequency band by referring to multiple low-frequencies. In this way, the re$ned sound has more comprehensive frequency band, and could be recognized more accurately. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 5, No. 4, Article 182. Publication date: December 2021
182:4·Wang et al. This paper makes three contributions.First,we show the possibility of using low-cost and easily-overlooked RFID tags to effectively perform the thru-the-wall eavesdropping,pushing the limit of RFID sensing capability to the sub-mm level.Particularly,Tag-Bug can extract the sound vibration either from the vibration effect or the reflection effect,improving the applicability of our system.Second,we build a signal transmitting model to extract the vibration from the amplified Modulated Signal Difference(MSD)by removing the strong interference.A CGAN based method is designed to improve the quality of the recovered human voice.Third,we implemented our system Tag-Bug on the USRP platform.Real-world experiments show that Tag-Bug can successfully capture the monotone sound when the loudness is larger than 60dB.Tag-Bug can efficiently recognize the numbers of human voice with 95.3%,85.3%and 87.5%precision in the free-space eavesdropping,thru-the-brick-wall eavesdropping and thru-the-insulating-glass eavesdropping,respectively.Tag-Bug can also accurately recognize the letters with 87%precision in the free-space eavesdropping. 2 PROBLEM FORMULATION In this paper,we consider the novel problem of launching the side-channel eavesdropping on the loudspeaker by leveraging the vibration of ambient RFID tag due to the sound.Our attack mainly focuses on the sound played by the loudspeaker,rather than the voice of live human speech,because the live human speech mainly leads to the air flow instead of the air vibration due to the sound.As a result,the vibration extracted from the tag signal is related to the air flow,instead of the human voice.In this paper,we use the USRP platform to extract the sound due to the convenient access to the physical-layer signal. 2.1 Attack Model We assume a victim user with a loudspeaker and some surrounding objects,which are attached with the passive RFID tags.Since the RFID tags are widely used to identify the objects in either the online shopping or the unmanned supermarket,any tagged object can be a potential threat to the user privacy.For example,the labeling tags on the delivery packages or the hang tags of the clothes from the online market may all open up a window of opportunity for eavesdropping.Besides,the adversary can even intentionally hide the battery-less and light- weighted RFID tags beside the loudspeaker,e.g.,under the table,which is hard to be detected and is able to eavesdrop in a long term.In this paper,we mainly focus on the private information,which are made up of number or letters,e.g.,social security number,a password,a credit card number,etc. The adversary leverages an RFID system that can interrogate the RFID tags,which can work even in thru- the-wall scenario,and further extract the sound from the RF-signal and deduce the private information.Once any tag is placed beside the loudspeaker,the RF-signal backscattered by the tag can capture the sound vibration. Particularly,the tag can be directly vibrated by the sound due to the vibration effect,or affected by a nearby vibrating object due to the reflection effect.The adversary continuously collects the RF-signals and extracts the sound information when the loudspeaker is playing an audio sound,e.g,a conversation during an online meeting. By analyzing the spectrogram energy distribution,the adversary can extract the sound from the RF-signals to deduce the private information,even if the adversary is outside the room of the victim. 2.2 Eavesdropping Scenarios The side channel attack described in this paper can be launched via three different means:medium-based,aerial- based and reflection-based eavesdropping.Medium-based eavesdropping means the tag is directly attached on the vibration medium,e.g.,the loudspeaker.Hence,the sound transmission can lead to the tiny vibration of the medium and the tag.Aerial-based eavesdropping means that the tag is vibrated due to the aerial sound played by the loudspeaker,where recent work[6,17,26]has already shown its feasibility of capturing aerial sound using motion sensors.Both Medium-based and Aerial-based eavesdropping methods are leveraging the vibration effect Proc.ACM Interact.Mob.Wearable Ubiquitous Technol..Vol 5.No.4.Article 182.Publication date:December 2021
182:4 • Wang et al. This paper makes three contributions. First, we show the possibility of using low-cost and easily-overlooked RFID tags to e!ectively perform the thru-the-wall eavesdropping, pushing the limit of RFID sensing capability to the sub-mm level. Particularly, Tag-Bug can extract the sound vibration either from the vibration e!ect or the re"ection e!ect, improving the applicability of our system. Second, we build a signal transmitting model to extract the vibration from the ampli$ed Modulated Signal Di!erence (MSD) by removing the strong interference. A CGAN based method is designed to improve the quality of the recovered human voice. Third, we implemented our system Tag-Bug on the USRP platform. Real-world experiments show that Tag-Bug can successfully capture the monotone sound when the loudness is larger than 60dB. Tag-Bug can e#ciently recognize the numbers of human voice with 95.3%, 85.3% and 87.5% precision in the free-space eavesdropping, thru-the-brick-wall eavesdropping and thru-the-insulating-glass eavesdropping, respectively. Tag-Bug can also accurately recognize the letters with 87% precision in the free-space eavesdropping. 2 PROBLEM FORMULATION In this paper, we consider the novel problem of launching the side-channel eavesdropping on the loudspeaker by leveraging the vibration of ambient RFID tag due to the sound. Our attack mainly focuses on the sound played by the loudspeaker, rather than the voice of live human speech, because the live human speech mainly leads to the air "ow instead of the air vibration due to the sound. As a result, the vibration extracted from the tag signal is related to the air "ow, instead of the human voice. In this paper, we use the USRP platform to extract the sound due to the convenient access to the physical-layer signal. 2.1 A!ack Model We assume a victim user with a loudspeaker and some surrounding objects, which are attached with the passive RFID tags. Since the RFID tags are widely used to identify the objects in either the online shopping or the unmanned supermarket, any tagged object can be a potential threat to the user privacy. For example, the labeling tags on the delivery packages or the hang tags of the clothes from the online market may all open up a window of opportunity for eavesdropping. Besides, the adversary can even intentionally hide the battery-less and lightweighted RFID tags beside the loudspeaker, e.g., under the table, which is hard to be detected and is able to eavesdrop in a long term. In this paper, we mainly focus on the private information, which are made up of number or letters, e.g., social security number, a password, a credit card number, etc. The adversary leverages an RFID system that can interrogate the RFID tags, which can work even in thruthe-wall scenario, and further extract the sound from the RF-signal and deduce the private information. Once any tag is placed beside the loudspeaker, the RF-signal backscattered by the tag can capture the sound vibration. Particularly, the tag can be directly vibrated by the sound due to the vibration e!ect, or a!ected by a nearby vibrating object due to the re"ection e!ect. The adversary continuously collects the RF-signals and extracts the sound information when the loudspeaker is playing an audio sound, e.g., a conversation during an online meeting. By analyzing the spectrogram energy distribution, the adversary can extract the sound from the RF-signals to deduce the private information, even if the adversary is outside the room of the victim. 2.2 Eavesdropping Scenarios The side channel attack described in this paper can be launched via three di!erent means: medium-based, aerialbased and re"ection-based eavesdropping. Medium-based eavesdropping means the tag is directly attached on the vibration medium, e.g., the loudspeaker. Hence, the sound transmission can lead to the tiny vibration of the medium and the tag. Aerial-based eavesdropping means that the tag is vibrated due to the aerial sound played by the loudspeaker, where recent work[6, 17, 26] has already shown its feasibility of capturing aerial sound using motion sensors. Both Medium-based and Aerial-based eavesdropping methods are leveraging the vibration e!ect Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 5, No. 4, Article 182. Publication date: December 2021
Thru-the-wall Eavesdropping on Loudspeakers via RFID by Capturing Sub-mm Level Vibration.182:5 RX Loudspeaker USRP TX ImpinJ reader (a)Experiment setup for empirical study 15 sRP.100H2- C0TS.100H2 COTS-100H -USRP-100Hz 100Hz can be detected 052 4 Noise 10 154 55 0T300H —C0TS-300H 05 -USRP.MNH nly USRP tects 300Hz 0.5 0.1 0.12 0.14 950 300 35d Time (s) Frequency (Hz) (b)Signal in time domain (c)Signal in frequency domain Fig.2.Signal analysis of USRP signal for vibration sensing. to extract the sound information.Reflection-based eavesdropping means that a tag does not vibrate itself,but instead it is affected by the vibration of a nearby object,e.g,a cup of water. Online meeting.One possible attack scenario is that the victim is using a loudspeaker to discuss in the online meeting,which is frequently used during the COVID-19 period.The adversary can leverage the surrounding RFID tags to eavesdrop the sound played by the loudspeaker.As a result,the sensitive information talked during the online meeting can be obtained by the adversary,which may threat the personal life and property safety Voice assistant system.With the success of AI technique in the speech recognition,intelligent voice assistant systems,e.g.,Google Home,Amazon Echo Dot,are widely used due to their convenience.The voice assistants may replay the messages which includes some private information,e.g.,Google Home can remember the passwords or social security number with the 'Remember'function and replays them when needed.Such replayed sounds from the loudspeaker open up the possibility of the adversary eavesdropping on the private information. 3 FEASIBILITY STUDY In this section,we use several experiments to study the feasibility of extracting the sound vibrations via RFID tags.Particularly,we focus on the mono-tone sound vibration to study the sensitivity of the RFID tags,which can be extended to the human voice. 3.1 COTS RFID Reader V.S.USRP Reader We first compare the COTS RFID reader with the USRP reader in terms of sensing the tag vibration.We place the tag in front of the loudspeaker,as shown in Figure 2(a).We study the impact of mono-tone sounds with frequencies of 100Hz and 300Hz.By default,for the COTS ImpinJ Speedway R420 RFID reader [3],we have the sampling rate of 228Hz;for the USRP reader based on open project [4],we have the sampling rate of 2MHz. Proc.ACM Interact.Mob.Wearable Ubiquitous Technol.,Vol.5,No.4,Article 182.Publication date:December 2021
Thru-the-wall Eavesdropping on Loudspeakers via RFID by Capturing Sub-mm Level Vibration • 182:5 ImpinJ reader USRP TX RX Tag Loudspeaker (a) Experiment setup for empirical study 0.52 0.53 0.54 Amplitude 1.45 1.5 1.55 Phase USRP-100Hz COTS-100Hz 0.1 0.12 0.14 0.16 Time (s) 0.52 0.53 0.54 Amplitude 1.45 1.5 1.55 Phase USRP-300Hz COTS-300Hz 100Hz wave 300Hz jitters (b) Signal in time domain 100Hz can be detected Only USRP detects 300Hz Noise 298 300 302 0.05 0 (c) Signal in frequency domain Fig. 2. Signal analysis of USRP signal for vibration sensing. to extract the sound information. Re"ection-based eavesdropping means that a tag does not vibrate itself, but instead it is a!ected by the vibration of a nearby object, e.g., a cup of water. Online meeting. One possible attack scenario is that the victim is using a loudspeaker to discuss in the online meeting, which is frequently used during the COVID-19 period. The adversary can leverage the surrounding RFID tags to eavesdrop the sound played by the loudspeaker. As a result, the sensitive information talked during the online meeting can be obtained by the adversary, which may threat the personal life and property safety. Voice assistant system. With the success of AI technique in the speech recognition, intelligent voice assistant systems, e.g., Google Home, Amazon Echo Dot, are widely used due to their convenience. The voice assistants may replay the messages which includes some private information, e.g., Google Home can remember the passwords or social security number with the ‘Remember’ function and replays them when needed. Such replayed sounds from the loudspeaker open up the possibility of the adversary eavesdropping on the private information. 3 FEASIBILITY STUDY In this section, we use several experiments to study the feasibility of extracting the sound vibrations via RFID tags. Particularly, we focus on the mono-tone sound vibration to study the sensitivity of the RFID tags, which can be extended to the human voice. 3.1 COTS RFID Reader V.S. USRP Reader We $rst compare the COTS RFID reader with the USRP reader in terms of sensing the tag vibration. We place the tag in front of the loudspeaker, as shown in Figure 2(a). We study the impact of mono-tone sounds with frequencies of 100Hz and 300Hz. By default, for the COTS ImpinJ Speedway '420 RFID reader [3], we have the sampling rate of 228Hz; for the USRP reader based on open project [4], we have the sampling rate of 2MHz. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 5, No. 4, Article 182. Publication date: December 2021