Smartphone Privacy Leakage of Social Relationships and Demographics from Surrounding Access Points Chen Wang*,Chuyu Wang*t,Yingying Chen*,Lei Xiet and Sanglu Lut *Department of Electrical and Computer Engineering Stevens Institute of Technology,Hoboken,NJ,USA {cwang42,yingying.chen;@stevens.edu State Key Laboratory for Novel Software Technology Nanjing University,Nanjing,Jiangsu,China wangcyu217@dislab.nju.edu.cn,(Ixie,sanglu@nju.edu.cn Abstract-While the mobile users enjoy the anytime anywhere personal information,in particular users'social relationships Internet access by connecting their mobile devices through Wi-Fi and demographics.could be derived.Prior work in demo- services,the increasing deployment of access points (APs)have raised a number of privacy concerns.This paper explores the graphics inference based on Wi-Fi network mainly rely on potential of smartphone privacy leakage caused by surrounding the context information obtained from passively sniffed users' APs.In particular,we study to what extent the users'personal Wi-Fi traffic [4],[5].For example,Cheng et al.examine information such as social relationships and demographics could users'Internet browsing activities by collecting their in-the- be revealed leveraging simple signal information from APs air traffic in public hotspots [4],whereas Huaxin et al.infer without examining the Wi-Fi traffic.Our approach utilizes users' user demographic information by passively sniffing the Wi- activities at daily visited places derived from the surrounding APs to infer users'social interactions and individual behaviors.Fur Fi traffic meta-data [5].These methods need to examine the thermore,we develop two new mechanisms:the Closeness-based Wi-Fi traffic and are thus not scalable to large number of Social Relationships Inference algorithm captures how closely users due to the high deployment overhead involved.Existing people interact with each other by evaluating their physical work in social relationships inference primarily depend on closeness and derives fine-grained social relationships,whereas the Behavior-based Demographics Inference method differentiates the encounter events detected by either bluetooth [6],Wi- various individual behaviors via the extracted activity features Fi SSID list [7],or GPS locations [8].These approaches (e.g.,activeness and time slots)at each daily place to reveal can only perform coarse-grained social relationships inference users'demographics.Extensive experiments conducted with 21 by examining whether users have interactions or not instead participants'real daily life including 257 different places in three of studying users'behaviors and how closely they interact cities over a 6-month period demonstrate that the simple signal information from surrounding APs have a high potential to reveal with each other.They can neither provide fine-grained so- people's social relationships and infer demographics with an over cial relationships(such as advisor-student,colleagues,friends, 90%accuracy when using our approach. husband-wife,neighbors)nor identify specific role of the user in the relationship. I.INTRODUCTION It is known that GPS.motion sensors and contact lists on Wi-Fi networks are becoming increasingly pervasive,to the mobile devices can exhibit privacy,but how much a user's point where public Wi-Fi access is readily in place in numer- privacy could be leaked from the ubiquitous access points is ous cities [1].And the number of public Wi-Fi Access Points unclear.In this work,we demonstrate that by examining the (APs)is expected to hit 340 million globally by 2018.resulting simple signal features of the surrounding APs it is possible to in one public Wi-Fi AP for every twenty people worldwide [2]. infer users'fine-grained social relationships and demographics More commonly,retail stores,offices,universities and homes without sniffing any Wi-Fi traffic.Specifically,the availability are usually Wi-Fi enabled for providing high bandwidth and of surrounding Wi-Fi APs is periodically scanned by mobile cost-effective connectivity to the Internet for the mobile users. devices because of their default systems purpose to optimize While the mobile users enjoy the anytime anywhere Internet network service via continuously seeking better Wi-Fi signals access by connecting their mobile devices (e.g.,smartphones)and remembered APs [9],[10]and accessing such information to the Wi-Fi networks,the surrounding APs have raised a only requires a common permission,which is considered number of privacy concerns.For example,mobile users could with low risk [11].Signal features such as the time-series of be located and tracked based on the ubiquitous APs,such as BSSIDs(i.e.MAC addresses)and Received Signal Strength using Google location service [3]. (RSS)are then extracted from these scanned APs and analyzed In this work,we study the potential of privacy leakage to derive users'activities at daily visited places.Our system caused by surrounding APs and explore to what extent the exploits the rich information of users'daily interactions and
Smartphone Privacy Leakage of Social Relationships and Demographics from Surrounding Access Points Chen Wang∗, Chuyu Wang∗†, Yingying Chen∗, Lei Xie† and Sanglu Lu† ∗Department of Electrical and Computer Engineering Stevens Institute of Technology, Hoboken, NJ, USA {cwang42, yingying.chen}@stevens.edu †State Key Laboratory for Novel Software Technology Nanjing University, Nanjing, Jiangsu, China wangcyu217@dislab.nju.edu.cn, {lxie, sanglu}@nju.edu.cn Abstract—While the mobile users enjoy the anytime anywhere Internet access by connecting their mobile devices through Wi-Fi services, the increasing deployment of access points (APs) have raised a number of privacy concerns. This paper explores the potential of smartphone privacy leakage caused by surrounding APs. In particular, we study to what extent the users’ personal information such as social relationships and demographics could be revealed leveraging simple signal information from APs without examining the Wi-Fi traffic. Our approach utilizes users’ activities at daily visited places derived from the surrounding APs to infer users’ social interactions and individual behaviors. Furthermore, we develop two new mechanisms: the Closeness-based Social Relationships Inference algorithm captures how closely people interact with each other by evaluating their physical closeness and derives fine-grained social relationships, whereas the Behavior-based Demographics Inference method differentiates various individual behaviors via the extracted activity features (e.g., activeness and time slots) at each daily place to reveal users’ demographics. Extensive experiments conducted with 21 participants’ real daily life including 257 different places in three cities over a 6-month period demonstrate that the simple signal information from surrounding APs have a high potential to reveal people’s social relationships and infer demographics with an over 90% accuracy when using our approach. I. INTRODUCTION Wi-Fi networks are becoming increasingly pervasive, to the point where public Wi-Fi access is readily in place in numerous cities [1]. And the number of public Wi-Fi Access Points (APs) is expected to hit 340 million globally by 2018, resulting in one public Wi-Fi AP for every twenty people worldwide [2]. More commonly, retail stores, offices, universities and homes are usually Wi-Fi enabled for providing high bandwidth and cost-effective connectivity to the Internet for the mobile users. While the mobile users enjoy the anytime anywhere Internet access by connecting their mobile devices (e.g., smartphones) to the Wi-Fi networks, the surrounding APs have raised a number of privacy concerns. For example, mobile users could be located and tracked based on the ubiquitous APs, such as using Google location service [3]. In this work, we study the potential of privacy leakage caused by surrounding APs and explore to what extent the personal information, in particular users’ social relationships and demographics, could be derived. Prior work in demographics inference based on Wi-Fi network mainly rely on the context information obtained from passively sniffed users’ Wi-Fi traffic [4], [5]. For example, Cheng et al. examine users’ Internet browsing activities by collecting their in-theair traffic in public hotspots [4], whereas Huaxin et al. infer user demographic information by passively sniffing the WiFi traffic meta-data [5]. These methods need to examine the Wi-Fi traffic and are thus not scalable to large number of users due to the high deployment overhead involved. Existing work in social relationships inference primarily depend on the encounter events detected by either bluetooth [6], WiFi SSID list [7], or GPS locations [8]. These approaches can only perform coarse-grained social relationships inference by examining whether users have interactions or not instead of studying users’ behaviors and how closely they interact with each other. They can neither provide fine-grained social relationships (such as advisor-student, colleagues, friends, husband-wife, neighbors) nor identify specific role of the user in the relationship. It is known that GPS, motion sensors and contact lists on mobile devices can exhibit privacy, but how much a user’s privacy could be leaked from the ubiquitous access points is unclear. In this work, we demonstrate that by examining the simple signal features of the surrounding APs it is possible to infer users’ fine-grained social relationships and demographics without sniffing any Wi-Fi traffic. Specifically, the availability of surrounding Wi-Fi APs is periodically scanned by mobile devices because of their default systems purpose to optimize network service via continuously seeking better Wi-Fi signals and remembered APs [9], [10] and accessing such information only requires a common permission, which is considered with low risk [11]. Signal features such as the time-series of BSSIDs (i.e. MAC addresses) and Received Signal Strength (RSS) are then extracted from these scanned APs and analyzed to derive users’ activities at daily visited places. Our system exploits the rich information of users’ daily interactions and
behaviors embedded in these derived activities and discloses We show with experimental study of 21 participants that fine-grained social relationships (including advisor-student, by using our system one can achieve over 91%accuracy supervisor-employee,colleagues,friends,husband-wife and of inferring social relationships and over 90%accuracy neighbors)as well as demographic information (such as oc- of deriving demographic information via examining the cupation,gender,religion,marital status). simple signal features from surrounding APs. Our approach of using simple signal features of APs can II.RELATED WORK be easily applied to a large number of users.For example, advertisers or third party companies could mine users'per- In this work,we aim to understand the privacy leakage sonal information for targeted advertising or recommending of smartphone users,in particular discovering users'social services.However,such an approach could cause significant relationships and demographics,by analyzing only the avail- privacy leakage if it is utilized by advertisers with aggressive ability of surrounding APs without sniffing any Wi-Fi traffic business attempts,who could simply publish free apps to users Obtaining such information requires limited permission other while these free apps actively collect users'surrounding AP than turning on GPS or accessing to contact lists.Our work is information and send back to the server to derive users'social related to the research efforts in using various information col- relationships and demographics. lected from Wi-Fi network and/or smartphone for meaningful In particular,we describe people's daily places in three places extraction [12]-[15],social relationships inference [6], dimensions (i.e.temporal,spatial and contextual)to infer peo- [7],[16]-[18],and demographics derivation [4],[5],[19]. ple's activities at each place.For users performing activities As the contextual location can be used for learning the per- at the same place,we calculate physical closeness of the users son's interest and providing content-aware applications,there (e.g.,whether staying at the same room,adjacent rooms or have been active studies on extracting contextual meaning of inside the same building)and extract users'activeness (e.g., the locations people visited.For example,Kang et al.design a walking around or sitting)together with other features (e.g., cluster-based method to extract meaningful places from traces time slots and duration)to characterize their activities at daily of location coordinates collected from GPS and Wi-Fi based places.We then develop Closeness-based Social Relationships indoor location system [12].Kim et al.propose SensLoc that Inference algorithm to capture where,when and how closely utilizes a combination of acceleration,Wi-Fi,and GPS sensors people interact to derive fine-grained social relationships.We to find semantic places,detect user movements,and track design Behavior-based Demographics Inference method to travel paths [13].These existing methods however only focus capture individual behavior based on users'various daily on individual users'visited locations without analyzing the activities to reveal demographic information including occu- interactions between them.Besides.the obtained meaningful pation,gender,religion and marriage.We conduct extensive places may be not sufficient to infer the higher level personal experiments with 21 participants carrying their smartphones information,such as fine-grained social relationship and de- to collect surrounding Wi-Fi AP information in their real mographics,due to the lack of information about the users' daily life across three cities over 6 months and study to what daily behaviors and social interactions. extent we can derive these participants'social relationships Information in Wi-Fi networks and smartphones have been and demographic information. used in literature to infer users'social relationships.For We summarize our main contributions as follows: example,Wiese et.al [16]use the smartphone contact list to mine personal relationships.Moreover,the similarity of We demonstrate that simple signal information (e.g.,time- smartphones'SSID lists is used to reveal users'social relation- series of MAC addresses and RSS)from users'surround- ships [7].These methods can only derive coarse-grained social ing Wi-Fi APs can reveal private information including relationships without analyzing the behaviors and interactions both social relationships and demographics. among people.Vicinity detection via Bluetooth or Wi-Fi We develop statistical methods to detect and character- signals opens opportunities for social interaction analysis and ize users'daily visited places based on the AP signal the strength of friendship ties can be inferred from such information and further infer the context of daily places wireless signals [6,[18.However,these vicinity detection by deriving users'activity features(e.g.,activeness,time methods only consider the relative interaction between people slots and duration) without interaction context (e.g.,place context and behaviors). We design closeness-based social relationships inference They are unable to differentiate the specific type of various algorithm to analyze when,where and how closely users social relationships,such as family members and friends.Our interact with each other and reveal users'detailed social previous work focuses on extracting the social relationship relationships (e.g.,advisor-student,supervisor-employee, from smartphone App leaked information such as GPS loca- colleagues,friends,husband-wife,customer relationship tion,IMEI and network location [20.It could only derive and neighbors). the social relationships in a coarse-grained manner.In this We further abstract people's various behaviors (e.g.,paper,we take a closer look and study the privacy leakage home,working and leisure behaviors)to infer their demo-just from the surrounding APs and derive people's activities graphic information such as occupation,gender,religion, and various closeness levels of social interactions for inferring and marital status. detailed relationships demographic information
behaviors embedded in these derived activities and discloses fine-grained social relationships (including advisor-student, supervisor-employee, colleagues, friends, husband-wife and neighbors) as well as demographic information (such as occupation, gender, religion, marital status). Our approach of using simple signal features of APs can be easily applied to a large number of users. For example, advertisers or third party companies could mine users’ personal information for targeted advertising or recommending services. However, such an approach could cause significant privacy leakage if it is utilized by advertisers with aggressive business attempts, who could simply publish free apps to users while these free apps actively collect users’ surrounding AP information and send back to the server to derive users’ social relationships and demographics. In particular, we describe people’s daily places in three dimensions (i.e. temporal, spatial and contextual) to infer people’s activities at each place. For users performing activities at the same place, we calculate physical closeness of the users (e.g., whether staying at the same room, adjacent rooms or inside the same building) and extract users’ activeness (e.g., walking around or sitting) together with other features (e.g., time slots and duration) to characterize their activities at daily places. We then develop Closeness-based Social Relationships Inference algorithm to capture where, when and how closely people interact to derive fine-grained social relationships. We design Behavior-based Demographics Inference method to capture individual behavior based on users’ various daily activities to reveal demographic information including occupation, gender, religion and marriage. We conduct extensive experiments with 21 participants carrying their smartphones to collect surrounding Wi-Fi AP information in their real daily life across three cities over 6 months and study to what extent we can derive these participants’ social relationships and demographic information. We summarize our main contributions as follows: • We demonstrate that simple signal information (e.g., timeseries of MAC addresses and RSS) from users’ surrounding Wi-Fi APs can reveal private information including both social relationships and demographics. • We develop statistical methods to detect and characterize users’ daily visited places based on the AP signal information and further infer the context of daily places by deriving users’ activity features (e.g., activeness, time slots and duration) • We design closeness-based social relationships inference algorithm to analyze when, where and how closely users interact with each other and reveal users’ detailed social relationships (e.g., advisor-student, supervisor-employee, colleagues, friends, husband-wife, customer relationship and neighbors). • We further abstract people’s various behaviors (e.g., home, working and leisure behaviors) to infer their demographic information such as occupation, gender, religion, and marital status. • We show with experimental study of 21 participants that by using our system one can achieve over 91% accuracy of inferring social relationships and over 90% accuracy of deriving demographic information via examining the simple signal features from surrounding APs. II. RELATED WORK In this work, we aim to understand the privacy leakage of smartphone users, in particular discovering users’ social relationships and demographics, by analyzing only the availability of surrounding APs without sniffing any Wi-Fi traffic. Obtaining such information requires limited permission other than turning on GPS or accessing to contact lists. Our work is related to the research efforts in using various information collected from Wi-Fi network and/or smartphone for meaningful places extraction [12]–[15], social relationships inference [6], [7], [16]–[18], and demographics derivation [4], [5], [19]. As the contextual location can be used for learning the person’s interest and providing content-aware applications, there have been active studies on extracting contextual meaning of the locations people visited. For example, Kang et al. design a cluster-based method to extract meaningful places from traces of location coordinates collected from GPS and Wi-Fi based indoor location system [12]. Kim et al. propose SensLoc that utilizes a combination of acceleration, Wi-Fi, and GPS sensors to find semantic places, detect user movements, and track travel paths [13]. These existing methods however only focus on individual users’ visited locations without analyzing the interactions between them. Besides, the obtained meaningful places may be not sufficient to infer the higher level personal information, such as fine-grained social relationship and demographics, due to the lack of information about the users’ daily behaviors and social interactions. Information in Wi-Fi networks and smartphones have been used in literature to infer users’ social relationships. For example, Wiese et. al [16] use the smartphone contact list to mine personal relationships. Moreover, the similarity of smartphones’ SSID lists is used to reveal users’ social relationships [7]. These methods can only derive coarse-grained social relationships without analyzing the behaviors and interactions among people. Vicinity detection via Bluetooth or Wi-Fi signals opens opportunities for social interaction analysis and the strength of friendship ties can be inferred from such wireless signals [6], [18]. However, these vicinity detection methods only consider the relative interaction between people without interaction context (e.g., place context and behaviors). They are unable to differentiate the specific type of various social relationships, such as family members and friends. Our previous work focuses on extracting the social relationship from smartphone App leaked information such as GPS location, IMEI and network location [20]. It could only derive the social relationships in a coarse-grained manner. In this paper, we take a closer look and study the privacy leakage just from the surrounding APs and derive people’s activities and various closeness levels of social interactions for inferring detailed relationships demographic information
More recently.Wi-Fi traffic monitoring and smartphone 3000 Surrounding Wi-Fi APs Apps have been used to infer users'demographic information. (Time-series of MAC 250 For example,Cheng et al.examine the user's Internet browsing addresses and R55] g2000 activities (e.g.,domain name querying,web browsing)by People's Activities a collecting their Wi-Fi traffic in public hotspots [4].They Daily Places 000 are able to reveal the travelers'identities,locations or social privacy.Huaxin et al.design an approach to infer user demo- Social nographics Relationships Information graphic information by sniffing the Wi-Fi traffic meta-data [5]. (a)Connection from surrounding APs to (b)lllustration of observed APs by Seneviratne et al.design a system to predict various user traits social relationships demographics. a user's smartphone in one day. by analyzing the snapshot of installed Apps [19].Different Fig.1.Preliminary studies. from the above work,we study the capability of examining the simple signal information of surrounding APs to derive leisure time)can be derived to reflect individual demographics demographic information without sniffing any Wi-Fi traffic or Furthermore,we observe that the same place or the places examining the installed Apps. in the neighborhoods may share some APs (e.g.,office and III.SYSTEM DESIGN restaurant 1).Their physical closeness may be obtained by A.Preliminaries checking how many surrounding APs they share,which is Environment-Behavior research reveals that an individual's useful for analyzing social interactions. activities such as work-related,household and leisure activities B.Challenges are related to the places they visit [21].And such activities Robust Daily Places and Activity Detection Using APs. at daily visited places can be analyzed and mined to infer Lacking the pre-knowledge of AP deployment,the accurate users'personal information such as social relationships and and robust detection of daily places and activities from ubiq- demographics [22].Thus by leveraging the users'activities at uitous APs is challenging.And the ubiquitous unstable and daily places as a bridge,we could start from the non-contextual mobile APs even add to the difficulties.Additionally,the daily surrounding AP information to infer users'social relationships places need to be abstracted with sufficient spatial resolution and demographics.This connection is depicted in Figure 1(a). (e.g.,differentiating rooms and floors)for further deriving The surrounding Wi-Fi APs reflect users'surrounding wireless users'mobility and their physical closeness during interaction. environments,which can be utilized to determine users'daily Determining the Context of Daily Places.Deriving the visited places and activities.The daily places in our work refer context of a user's daily visited places from the non-contextual to the abstract locations that users visit in their daily lives, AP signal information is challenging.Moreover,a place may such as home,workplace,restaurants,stores and churches.By exhibit different contexts to different users.For example,stores analyzing users'activities at daily places,we could derive the are leisure places to most people but the workplace to the social interactions between users and abstract individual's be- store staff.This requires us to search for the deep implication havior.Such information is then further utilized to mine users' behind the individual's activities at the place instead of relying social relationships and demographics.Note that contrary to on traditional place context based on the place function. the existing work in social relationships and demographics inference,we only utilize the availability of surrounding APs' Fine-grained Social Relationships Inference.Fine-grained relationships inference needs the information on not only simple signal information without requiring to sniff any Wi-Fi who have interactions but also on how closely they interact. traffic contents. Our systems needs to have the capability to define multiple To study how the surrounding APs can be utilized to detect a user's daily places and activities,we conduct preliminary closenesses between users.Furthermore,specifying the role of each user in a relationship (e.g.,husband or wife)may needs experiments by recording the APs on the user's smartphone at the regular rate of one scan per 15 seconds,because a Wi-Fi the assistance from demographic information (e.g.,gender). device usually scans every 5-15 seconds for providing the Demography Inference without Context.Inferring a user's user non-interrupted Wi-Fi connection to cope with the user's demographics with non-contextual simple signal information of surrounding APs is challenging.Different from the previous place change [23],[24].Figure 1(b)shows the recorded time- work relying on the content obtained from monitoring the Wi- series of a user's surrounding APs (differentiated by BSSIDs) Fi traffic,our system explores the possibility to abstract users' for one day,as well as the groundtruth of visited places.As behaviors based on their various activities at daily places for the AP index is assigned to each unique AP in sequence,the later observed AP has larger index.The observation is that demographic inference. the detected AP lists have large overlaps when the user stays C.System Overview at the same place,while the AP lists are distinct when the The basic idea of our system is to analyze users'activities user moves to a different daily place.This suggests that we at daily routine-based places that are derived from users' may utilize the changes of the observed AP list to detect the surrounding APs for fine-grained social relationships and user's daily visited places as well as the entrance/departure demographics inference.The proposed system takes as inputs time and the staying duration.Moreover,the user's activities the information of users'surrounding APs perceived by their at daily places (e.g.,the user's mobility at work and during smartphones at each scan,including the list of AP MAC
More recently, Wi-Fi traffic monitoring and smartphone Apps have been used to infer users’ demographic information. For example, Cheng et al. examine the user’s Internet browsing activities (e.g., domain name querying, web browsing) by collecting their Wi-Fi traffic in public hotspots [4]. They are able to reveal the travelers’ identities, locations or social privacy. Huaxin et al. design an approach to infer user demographic information by sniffing the Wi-Fi traffic meta-data [5]. Seneviratne et al. design a system to predict various user traits by analyzing the snapshot of installed Apps [19]. Different from the above work, we study the capability of examining the simple signal information of surrounding APs to derive demographic information without sniffing any Wi-Fi traffic or examining the installed Apps. III. SYSTEM DESIGN A. Preliminaries Environment-Behavior research reveals that an individual’s activities such as work-related, household and leisure activities are related to the places they visit [21]. And such activities at daily visited places can be analyzed and mined to infer users’ personal information such as social relationships and demographics [22]. Thus by leveraging the users’ activities at daily places as a bridge, we could start from the non-contextual surrounding AP information to infer users’ social relationships and demographics. This connection is depicted in Figure 1(a). The surrounding Wi-Fi APs reflect users’ surrounding wireless environments, which can be utilized to determine users’ daily visited places and activities. The daily places in our work refer to the abstract locations that users visit in their daily lives, such as home, workplace, restaurants, stores and churches. By analyzing users’ activities at daily places, we could derive the social interactions between users and abstract individual’s behavior. Such information is then further utilized to mine users’ social relationships and demographics. Note that contrary to the existing work in social relationships and demographics inference, we only utilize the availability of surrounding APs’ simple signal information without requiring to sniff any Wi-Fi traffic contents. To study how the surrounding APs can be utilized to detect a user’s daily places and activities, we conduct preliminary experiments by recording the APs on the user’s smartphone at the regular rate of one scan per 15 seconds, because a Wi-Fi device usually scans every 5 - 15 seconds for providing the user non-interrupted Wi-Fi connection to cope with the user’s place change [23], [24]. Figure 1(b) shows the recorded timeseries of a user’s surrounding APs (differentiated by BSSIDs) for one day, as well as the groundtruth of visited places. As the AP index is assigned to each unique AP in sequence, the later observed AP has larger index. The observation is that the detected AP lists have large overlaps when the user stays at the same place, while the AP lists are distinct when the user moves to a different daily place. This suggests that we may utilize the changes of the observed AP list to detect the user’s daily visited places as well as the entrance/departure time and the staying duration. Moreover, the user’s activities at daily places (e.g., the user’s mobility at work and during ! ! " # $ $ (a) Connection from surrounding APs to (b) Illustration of observed APs by social relationships & demographics. a user’s smartphone in one day. Fig. 1. Preliminary studies. leisure time) can be derived to reflect individual demographics. Furthermore, we observe that the same place or the places in the neighborhoods may share some APs (e.g., office and restaurant 1). Their physical closeness may be obtained by checking how many surrounding APs they share, which is useful for analyzing social interactions. B. Challenges Robust Daily Places and Activity Detection Using APs. Lacking the pre-knowledge of AP deployment, the accurate and robust detection of daily places and activities from ubiquitous APs is challenging. And the ubiquitous unstable and mobile APs even add to the difficulties. Additionally, the daily places need to be abstracted with sufficient spatial resolution (e.g., differentiating rooms and floors) for further deriving users’ mobility and their physical closeness during interaction. Determining the Context of Daily Places. Deriving the context of a user’s daily visited places from the non-contextual AP signal information is challenging. Moreover, a place may exhibit different contexts to different users. For example, stores are leisure places to most people but the workplace to the store staff. This requires us to search for the deep implication behind the individual’s activities at the place instead of relying on traditional place context based on the place function. Fine-grained Social Relationships Inference. Fine-grained relationships inference needs the information on not only who have interactions but also on how closely they interact. Our systems needs to have the capability to define multiple closenesses between users. Furthermore, specifying the role of each user in a relationship (e.g., husband or wife) may needs the assistance from demographic information (e.g., gender). Demography Inference without Context. Inferring a user’s demographics with non-contextual simple signal information of surrounding APs is challenging. Different from the previous work relying on the content obtained from monitoring the WiFi traffic, our system explores the possibility to abstract users’ behaviors based on their various activities at daily places for demographic inference. C. System Overview The basic idea of our system is to analyze users’ activities at daily routine-based places that are derived from users’ surrounding APs for fine-grained social relationships and demographics inference. The proposed system takes as inputs the information of users’ surrounding APs perceived by their smartphones at each scan, including the list of AP MAC
Wi-Fi AP list tim Dynamic Searching window Wi-Fi Access Point Time-series Ⅲ-II AP lists to be se Staying Segment AP List-hased Staying/Traveling d AP for all scan Detection and Grouping d AP fer all ance Distritrution-based Physical Closeness-based Esegment [T之h anted AP Int ms 1+ Daily Place and Daily Routine-based Stavine Activity Inference Segment Group Categorization Activity Feature Extraction and Fin Fig.3.Staying/traveling segmentation leveraging dynamic searching windows e-ra Home to analyze the overlapped AP lists over consecutive scans. Characterization and Closeness-based Social Relationships Classification to infer when,where and how closely people in- teract with each other for inferring their possible relationships avior-based I such as family,neighbors,colleagues,and friends.To derive a hips Infcrence user's demographics.Behavior-based Demographics Inference applies Daily Activity-based Behavior Derivation to abstract people's various behaviors including working behaviors,home Family Neighbors Colleagues Friends on Gender Religion Mariage behaviors and leisure behaviors,based on the activities at Social Relationships Demographics daily places.It then utilizes Behavior-based Decision Rule to Fig.2.Wi-Fi AP distribution-based social relationships and demographics infer users'demographic information(e.g.,occupation,gender, inference framework. marriage and religion)based on the behavior abstraction. addresses and RSS,to infer fine-grained social relationships At last,the Associate Reasoning can be applied to social and demographics.Figure 2 presents our system flow. relationships and demographics to improve the accuracy of First,the Staving Segment Detection and Grouping com- inference results,such as identifying the specific role of the ponent detects and characterizes users'daily visited places user in a relationship (e.g.,husband-wife and advisor-student). in three steps.AP List-based Staying/Traveling Segmentation analyzes the overlap of the AP lists over consecutive scans IV.STAYING SEGMENT GROUP DETECTION AND and divides the time-series into staying and traveling periods CHARACTERIZATION Staying Segment Characterization estimates the significance A.AP List-based Staying/Traveling Segmentation of each surrounding AP by calculating its appearance rate As observed in the preliminary study of Figure 1(b),the within the staying segment.It then categorizes the APs by discovered AP BSSID lists of consecutive scans have large their significance to describe the spatial information of each overlaps when the user stays at the same place,while the staying segment.The spatially close-by staying segments are similarity of the AP lists is rapidly diminished when the user then grouped together as one unique place by using Closeness- moves to a different place.We thus take the advantage of the based Staying Segment Grouping. AP list similarity (i.e.BSSID list similarity)in consecutive The next component is to derive the activities at daily places scans to detect the staying and traveling segments.We define which is an important building block of social relationships staying segment as the Wi-Fi AP-list time-series segment that and demographics inference.It is carried out by using Daily captures the temporal and spatial information when the user Place and Activity Inference,which involves Daily Routine- stays at a location.And we analyze the overlap of the AP lists based Staying Segment Group Categorization and Daily Ac-within a dynamic searching window of consecutive scans to tivity Feature Extraction and Fine-grained Place Context In- perform staying segmentation ference.Daily Routine-based Staying Segment Categorization In particular,Figure 3 illustrates the proposed AP List- classifies the grouped staying segments (i.e.unique places) based Staying/Traveling Segmentation in identifying the stay- into three contextual categories (i.e.home,leisure and work- ing segment n.The dynamic searching window starts at t place)based on people's daily routines.At last,Daily Activity and iteratively expands to the next scan.In each iteration, Feature Extraction and Fine-grained Place Context Inference we analyze the overlapped APs of all the scans within the derives people's activity features including the staying time searching window.The number of solid dots at each scanning slots,duration and activeness and assigns detailed contextual time ti(i=1.2....)indicates the number of overlapped APs information to these places by leveraging the derived activity that are found within the window from t to t.When the features and geo-information,such as restaurants or stores in searching window iteratively expands to the next scan,the leisure places,campus or office buildings in workplaces. number of overlapped APs may decrease.When no overlapped Finally,our system infers users'social relationships and AP is found in the expanded searching window (e.g.,the demographics based on the derived activities at daily places.window from f tot),such searching window is identified as In particular,it first calculates the physical closenesses of the one possible staying segment.We note that because it may take interactions between users.It then uses Interaction Segment several scans to travel out of an AP's range,this approach can
! " ! # ! " " #" " $ %& #" $' "$ $ * + ; ! ; ; * + $' ' "*+ ' + ! /'& #" Fig. 2. Wi-Fi AP distribution-based social relationships and demographics inference framework. addresses and RSS, to infer fine-grained social relationships and demographics. Figure 2 presents our system flow. First, the Staying Segment Detection and Grouping component detects and characterizes users’ daily visited places in three steps. AP List-based Staying/Traveling Segmentation analyzes the overlap of the AP lists over consecutive scans and divides the time-series into staying and traveling periods. Staying Segment Characterization estimates the significance of each surrounding AP by calculating its appearance rate within the staying segment. It then categorizes the APs by their significance to describe the spatial information of each staying segment. The spatially close-by staying segments are then grouped together as one unique place by using Closenessbased Staying Segment Grouping. The next component is to derive the activities at daily places which is an important building block of social relationships and demographics inference. It is carried out by using Daily Place and Activity Inference, which involves Daily Routinebased Staying Segment Group Categorization and Daily Activity Feature Extraction and Fine-grained Place Context Inference. Daily Routine-based Staying Segment Categorization classifies the grouped staying segments (i.e. unique places) into three contextual categories (i.e. home, leisure and workplace) based on people’s daily routines. At last, Daily Activity Feature Extraction and Fine-grained Place Context Inference derives people’s activity features including the staying time slots, duration and activeness and assigns detailed contextual information to these places by leveraging the derived activity features and geo-information, such as restaurants or stores in leisure places, campus or office buildings in workplaces. Finally, our system infers users’ social relationships and demographics based on the derived activities at daily places. In particular, it first calculates the physical closenesses of the interactions between users. It then uses Interaction Segment ାଵݐ ݐ ଵିݐ ଶିݐ ଷݐ ଶݐ ଵ ݐ ! "" # $% $$ & "" # $% $$ ଷିݐ ݉െͳ ܶ௦ ൌ ݐ െ ݐଵ ݄ݐ ௦ܶ ݉ ݄ݐ ൏ ௦ܶ % $$ # ' ' # Fig. 3. Staying/traveling segmentation leveraging dynamic searching windows to analyze the overlapped AP lists over consecutive scans. Characterization and Closeness-based Social Relationships Classification to infer when, where and how closely people interact with each other for inferring their possible relationships such as family, neighbors, colleagues, and friends. To derive a user’s demographics, Behavior-based Demographics Inference applies Daily Activity-based Behavior Derivation to abstract people’s various behaviors including working behaviors, home behaviors and leisure behaviors, based on the activities at daily places. It then utilizes Behavior-based Decision Rule to infer users’ demographic information (e.g., occupation, gender, marriage and religion) based on the behavior abstraction. At last, the Associate Reasoning can be applied to social relationships and demographics to improve the accuracy of inference results, such as identifying the specific role of the user in a relationship (e.g., husband-wife and advisor-student). IV. STAYING SEGMENT GROUP DETECTION AND CHARACTERIZATION A. AP List-based Staying/Traveling Segmentation As observed in the preliminary study of Figure 1(b), the discovered AP BSSID lists of consecutive scans have large overlaps when the user stays at the same place, while the similarity of the AP lists is rapidly diminished when the user moves to a different place. We thus take the advantage of the AP list similarity (i.e. BSSID list similarity) in consecutive scans to detect the staying and traveling segments. We define staying segment as the Wi-Fi AP-list time-series segment that captures the temporal and spatial information when the user stays at a location. And we analyze the overlap of the AP lists within a dynamic searching window of consecutive scans to perform staying segmentation. In particular, Figure 3 illustrates the proposed AP Listbased Staying/Traveling Segmentation in identifying the staying segment n. The dynamic searching window starts at t1 and iteratively expands to the next scan. In each iteration, we analyze the overlapped APs of all the scans within the searching window. The number of solid dots at each scanning time ti(i = 1,2,...) indicates the number of overlapped APs that are found within the window from t1 to ti. When the searching window iteratively expands to the next scan, the number of overlapped APs may decrease. When no overlapped AP is found in the expanded searching window (e.g., the window from t1 to tm), such searching window is identified as one possible staying segment. We note that because it may take several scans to travel out of an AP’s range, this approach can
Three laver of APsin the r11 r12 13 staving segment M=LiLB= r21 22 T广23 (1) r31 r32 r33 ) () where rij is the overlapping rate between subsets l4i and lBi of AP set vectors L4 and LB,respectively.The overlapping rate rij can be obtained by H)Lrvel-4 马salist of APs in e layer (ame reom) OverlapApNum(lAi,IBj) (a)Appearance rates and significance (b)Four kinds of closeness between rij= i,j=1,2,3. (2) min(Num(IAi),Num(Igi)) of the APs in a staying segment. staying segments A and B. Based on the statistical analysis with 431 staying segments Fig.4.AP appearance rate distribution-based staying segment characteriza- collected from 167 places in 3 cities,we empirically quantify tion. the physical closeness expressed by the closeness matrix M detect short staying segments even when the user is traveling. into five levels: We next check whether the segment duration Ts=fm-fi is greater than a threshold t (e.g.,t=6 minutes)to further Co={M:2-1y=0} (Completely separated) (Same street block) confirm valid staying segments and filter out the false staying Ci={M::>0amd22-1ry-r=0}: C=M:mry-r33-r>0andr=:(Same building) (3) segments.Meanwhile,the user's entrance/departure time and C={M:0<n1<0.6} (Adjacent rooms) C4={M:n1≥0.6, (Same room) corresponding staying duration could also be obtained. where CL,C2,C3,C4 are four mutually exclusive closeness sets B.AP Appearance Rate Distribution-based Staying Segment Characterization with increasing closeness level as shown in Figure 4(b),repre- senting the same street block,the same building,the adjacent We next characterize the visited places by deriving Wi-Fi rooms and the same room respectively.Co=CIUC2UC3 UC4 AP appearance distribution in the detected staying segments means two staying segments are completely separated.We use The discovered AP BSSID list can be used to describe the level-i closeness to express closeness in set Ci. wireless environment of the user in the staying segment. However,not all the APs have the same significance for D.Physical Closeness-based Staying Segments Grouping characterizing the spatial information.Some APs may appear We note that the same user's multiple staying segments may only in a few scans due to weak Wi-Fi signals,while others correspond to the same place as the user may pay multiple are more stable and appear almost in every scan.We calculate revisits.We thus combine these staying segments together by the appearance rate of each discovered AP to represent its checking whether there is level-4 closeness between them and significance,and then classify the APs into different categories keep all the time slots.The grouped staying segments represent based on their significance.In particular,the appearance rate non-redundant places visited by the user and contains the of an AP is defined as R=4,where Na is the appearance user's activities.We can then characterize the user's activities number of this AP and N is the total number of scans in at each unique place. the detected staying segment.The appearance rates together V.DAILY PLACE AND ACTIVITY INFERENCE with BSSIDs of the discovered APs are used to characterize the spatial information of the staying segment,which has the In this section,we explore to what extent we can understand potential to both differentiate places with good resolution but the contextual information of the places visited by people also measure people's physical closeness. and their activities at the places.which facilitate the social We empirically divide the APs of a staying segment into relationships and demographics inference. three layers li,i=1,2,3(i.e.lists of significant APs,secondary A.Daily Routine-based Place Inference APs and peripheral APs)according to their appearance rate. Compared to the physical information(e.g.,longitude and As shown in Figure 4(a),the significant APs are those with latitude),the contextual information (e.g.,name and type) appearance rate larger than 80%,the peripheral APs are the of a place contains more meaningful information related to ones with the appearance rate less than 20%,and the rest people's social relationships and demographics.To obtain of APs are secondary APs.Then the spatial information of such information,we exploit the simple signal information the staying segment can be characterized by AP set vector of surrounding APs (i.e.,BSSIDs and RSSs)that is readily L=(11,/2,13),which can tolerate the noise generated by the available in most mobile devices,to determine the daily place unstable APs,mobile APs or even missing AP scans. meanings of staying segments based on people's daily routines. C.Estimating Physical Closeness between Staying Segments 1)Daily Routine-based Places:Recent reports [25],[26] Measuring the physical closeness between different users' indicate that people's daily routines mainly consist of three staying segments can capture how closely people interact categories of activities:1)working and work-related activi- with each other.It can also be used to group the same ties (working activities);2)sleeping and household activities user's staying segments that are close to each other as one (home activities);and 3)leisure activities.Based on the place.In particular,we leverage the AP set vector to measure understanding of people's daily routines,we define three cate- the physical closeness between staying segments.Given two gories of daily routine-based places,namely Workplace (e.g., staying segments A and B and their AP set vectors L4 and LB, office buildings and universities),Home,and Leisure Place we calculate the closeness matrix M as follows: (e.g.,stores,restaurants,and churches),to describe contextual
; $' <=>?Q $ Q \>?^ ' $' <$ Q =>?^ ;* $' <\>? Q $ ^ ݈ଵ ݈ଶ ݈ଷ $' * ܮ ൌ ݈ଵǡ ݈ଶǡ ݈ଷ ݈ * $' % # % 012$1 0 3 42 052$5 0 "2 072$7 0 3 2 082$8 0 2 $ $ $ $ (a) Appearance rates and significance (b) Four kinds of closeness between of the APs in a staying segment. staying segments A and B. Fig. 4. AP appearance rate distribution-based staying segment characterization. detect short staying segments even when the user is traveling. We next check whether the segment duration Ts = tm −t1 is greater than a threshold τ (e.g., τ = 6 minutes) to further confirm valid staying segments and filter out the false staying segments. Meanwhile, the user’s entrance/departure time and corresponding staying duration could also be obtained. B. AP Appearance Rate Distribution-based Staying Segment Characterization We next characterize the visited places by deriving Wi-Fi AP appearance distribution in the detected staying segments. The discovered AP BSSID list can be used to describe the wireless environment of the user in the staying segment. However, not all the APs have the same significance for characterizing the spatial information. Some APs may appear only in a few scans due to weak Wi-Fi signals, while others are more stable and appear almost in every scan. We calculate the appearance rate of each discovered AP to represent its significance, and then classify the APs into different categories based on their significance. In particular, the appearance rate of an AP is defined as R = Na N , where Na is the appearance number of this AP and N is the total number of scans in the detected staying segment. The appearance rates together with BSSIDs of the discovered APs are used to characterize the spatial information of the staying segment, which has the potential to both differentiate places with good resolution but also measure people’s physical closeness. We empirically divide the APs of a staying segment into three layers li,i = 1,2,3 (i.e. lists of significant APs, secondary APs and peripheral APs) according to their appearance rate. As shown in Figure 4(a), the significant APs are those with appearance rate larger than 80%, the peripheral APs are the ones with the appearance rate less than 20%, and the rest of APs are secondary APs. Then the spatial information of the staying segment can be characterized by AP set vector L = (l1,l2,l3), which can tolerate the noise generated by the unstable APs, mobile APs or even missing AP scans. C. Estimating Physical Closeness between Staying Segments Measuring the physical closeness between different users’ staying segments can capture how closely people interact with each other. It can also be used to group the same user’s staying segments that are close to each other as one place. In particular, we leverage the AP set vector to measure the physical closeness between staying segments. Given two staying segments A and B and their AP set vectors LA and LB, we calculate the closeness matrix M as follows: M = L−1 A LB = ⎛ ⎝ r11 r12 r13 r21 r22 r23 r31 r32 r33 ⎞ ⎠, (1) where ri j is the overlapping rate between subsets lAi and lBi of AP set vectors LA and LB, respectively. The overlapping rate ri j can be obtained by ri j = OverlapApNum(lAi,lB j) min(Num(lAi),Num(lB j)),i, j = 1,2,3. (2) Based on the statistical analysis with 431 staying segments collected from 167 places in 3 cities, we empirically quantify the physical closeness expressed by the closeness matrix M into five levels: ⎧ ⎪⎪⎪⎪⎨ ⎪⎪⎪⎪⎩ C0 = M : ∑3 i, j=1 ri j = 0 ; (Completely separated) C1 = M : r33 > 0 and ∑3 i, j=1 ri j −r33 = 0 ; (Same street block) C2 = M : ∑3 i, j=1 ri j −r33 −r11 > 0 and r11 = 0 ; (Same building) C3 = {M : 0 < r11 < 0.6}; (Ad jacent rooms) C4 = {M : r11 ≥ 0.6}, (Same room) (3) where C1,C2,C3,C4 are four mutually exclusive closeness sets with increasing closeness level as shown in Figure 4(b), representing the same street block, the same building, the adjacent rooms and the same room respectively. C0 =C1 ∪C2 ∪C3 ∪C4 means two staying segments are completely separated. We use level-i closeness to express closeness in set Ci. D. Physical Closeness-based Staying Segments Grouping We note that the same user’s multiple staying segments may correspond to the same place as the user may pay multiple revisits. We thus combine these staying segments together by checking whether there is level-4 closeness between them and keep all the time slots. The grouped staying segments represent non-redundant places visited by the user and contains the user’s activities. We can then characterize the user’s activities at each unique place. V. DAILY PLACE AND ACTIVITY INFERENCE In this section, we explore to what extent we can understand the contextual information of the places visited by people and their activities at the places, which facilitate the social relationships and demographics inference. A. Daily Routine-based Place Inference Compared to the physical information (e.g., longitude and latitude), the contextual information (e.g., name and type) of a place contains more meaningful information related to people’s social relationships and demographics. To obtain such information, we exploit the simple signal information of surrounding APs (i.e., BSSIDs and RSSs) that is readily available in most mobile devices, to determine the daily place meanings of staying segments based on people’s daily routines. 1) Daily Routine-based Places: Recent reports [25], [26] indicate that people’s daily routines mainly consist of three categories of activities: 1) working and work-related activities (working activities); 2) sleeping and household activities (home activities); and 3) leisure activities. Based on the understanding of people’s daily routines, we define three categories of daily routine-based places, namely Workplace (e.g., office buildings and universities), Home, and Leisure Place (e.g., stores, restaurants, and churches), to describe contextual