系统结构 Learning Module Distributed Deep learning Feature-based content Database Estimation algorithm popularity matrix User features Internet 41 Content features y世 Hidden layer layer User-content Mobile Network Estimated interaction Operator (MNO) content core popularity Content popularity estimation at the core network Data preprocessing Accelerometer Backhaul link GPS Proactive caching at the Cache base station by Raw Data Collection Placement using content Module popularity estimation at the Uplink/Downlink Small Base Station core network (SBS) Mobile userterinal 2020年秋季 11/65 无线互联网
系统结构 2020年秋季 11 / 65 无线互联网 132 IEEE Network • May/June 2019 because they are gathered in real time. Second, the collected data vary from diverse kinds of sensor data, variable user preference, location data, and user-content relationships. Finally, the quantity of data to be handled is likely to be very high, for example, in the order of Terabytes. To facilitate pattern matching and location queries, a high performance relational database is used to efficiently store certain data, whereas a distributed database is used to optimize simple queried data for scalability. FEATURE EXTRACTION Feature extraction is the key element of the proposed framework. Two types of features have to be extracted from collected raw data, which are described as follows: User Features: The popularity of content in proactive caching might fluctuate across a user population since different content might be preferred by different users. The content preference of a user might be associated with the variety of their features as shown in Fig. 2a. User features involve the following: • Personal characteristics of the user, such as demographic information (for example, gender and age), mood or personality. • The explicit context, which involves the extraction of the explicit context, which is defined as a process of accumulating data in which the accumulation circumstances are properly described, for example, the weather conditions. • The latent context, which involves the extraction and collection of hidden or latent contextual patterns from mobile sensors to represent the user context. The addition of various types of user features leads to the expansion of its dimensionality, which results in a processing task of large amounts of training data. Therefore, an auto-encoder is used to determine the relationships between the diverse features and extract them in low-dimensional representation. Typically, the user feature extraction procedure consists of three phases: • Raw data is accumulated from the available data sources, for example, mobile sensors such as WiFi, GPS, microphone, active applications, and accelerometer. • A set of features is extracted from the raw data by using feature engineering, which consists of computing statistics such as dominant values, entropy, standard deviation, average, and so on. • An unsupervised technique called auto-encoder is applied to the extracted features to determine hidden patterns in the raw data. An auto-encoder is an unsupervised learning method that sets the target values as equivalent to the inputs by applying back-propagation. For example, for the given input of an unlabelled training dataset {a1, a2, a3, , ai }, where ai £ Rn , the aim of the auto-encoder would be to set ai = a^i . In order to extract the long-term patterns of the users and latent context, we use a simple auto-encoder, which is based on an existing netFIGURE 1. Overview architecture of DeepCachNet framework. Authorized licensed use limited to: University of Electronic Science and Tech of China. Downloaded on July 01,2020 at 04:24:21 UTC from IEEE Xplore. Restrictions apply
仿真结果 仿真实验 □6个SBS;60个用户;220内容 四对比了“任意缓存”[黑线]; 14 --ArbtraryGround fllering Feature-based colaboraive fitering GroundTruth[绿线]; (a) 合作滤波[红线]; 本文的合作滤波[蓝线]: 四考察指标为用户满意度[越大越好] 04046 和Backhaul的负载[越小越好]. O一Atra一Ground tn4h一fitering Featire-based cotaborasve fitering (b) 四分别研究了backhaul容量[a、 缓存容量[b]和业务量[c]的影响. n 4 结论都是:本文方法最接近 0 GroundTruth的效果. (c) FIGURE4.DeepCachNet performance in terms of user satisfaction ratio and backhaul load with variation in the:a)backhaul capacity;b)storage ratio;c)traffic intensity. 2020年秋季 12/65 无线互联网
仿真结果 2020年秋季 12 / 65 无线互联网 136 IEEE Network • May/June 2019 Then, the rest of the ratings in matrix P and their features are used as a testing set, and are predicted by using SVDFeature [15], and the most popular content is stored accordingly. Caching at the SBS: In this segment, the estimated popular content at the core network is greedily stored at the SBS until no space remains, as in [3]. The performance of the proposed framework is measured in terms of the user satisfaction ratio and the backhaul load. The user satisfaction ratio is described as the fraction of the content supplied at a given target rate, and the backhaul load denotes the fraction of the traffic carried by the backhaul links over the total possible amount of traffic generated by the content requests. The detailed procedure to determine the backhaul load and the satisfaction ratio can be found in [3]. Variations of the backhaul load and the user satisfaction ratio with respect to backhaul capacity, storage size, and traffic intensity are shown in Fig. 4 and discussed as follows. Variation in the Backhaul Capacity (I’d): The total capacity of backhaul links (II’d) is considered to be lesser than the wireless links capacity (IId). Escalation of backhaul capacity clearly contributes to growth in the satisfaction ratio in all cases, since any content is provided through backhaul that is not present in the caches of the SBS. ConFIGURE 4. DeepCachNet performance in terms of user satisfaction ratio and backhaul load with variation in the: a) backhaul capacity; b) storage ratio; c) traffic intensity. (a) (b) (c) Authorized licensed use limited to: University of Electronic Science and Tech of China. Downloaded on July 01,2020 at 04:24:21 UTC from IEEE Xplore. Restrictions apply. 仿真实验 6个SBS; 60个用户; 220内容. 对比了“任意缓存”[黑线]; GroundTruth[绿线]; 合作滤波[红线]; 本文的合作滤波[蓝线]. 考察指标为用户满意度[越大越好] 和Backhaul的负载[越小越好]. 分别研究了backhaul容量[a]、 缓存容量[b]和业务量[c]的影响. 结论都是: 本文方法最接近 GroundTruth的效果
小结 让我们来总结一下本文的贡献: 四通过用户与内容之间的流行度矩阵来辅助proactive caching. ◆早就有了. 回通过合作滤波来预测(不仅统计)流行度矩阵, ◆早就有了. 四要适当压缩/提炼用户/内容的特征矢量 ◆不难想到:工程上的必然 @采用AE/SDAE来完成降维/特征提炼. ◆本文唯一的贡献 TAKEAWAY: 以无监督的方式来提取数据中的“pattern”,并用低维矢量来表达. ◆如果你有这样的需求,不妨使用DL. 2020年秋季 13/65 无线互联网
小结 2020年秋季 13 / 65 无线互联网 让我们来总结一下本文的贡献: 通过用户与内容之间的流行度矩阵来辅助proactive caching. 早就有了. 通过合作滤波来预测(不仅统计)流行度矩阵. 早就有了. 要适当压缩/提炼用户/内容的特征矢量. 不难想到: 工程上的必然. 采用AE/SDAE来完成降维/特征提炼. 本文唯一的贡献. TAKEAWAY: 以无监督的方式来提取数据中的“pattern” , 并用低维矢量来表达. 如果你有这样的需求, 不妨使用DL
CASE#2 [Yang2019]Jiachen Yang,Jipeng Zhang,et.al.,"Deep Learning-Based Edge Caching for Multi-Cluster Heterogeneous Networks",Neural Computing and Applications,2019,https://doi.org/10.1007/s00521- 019-04040-z 同样是Edge Caching,但有以下不同: @ 系统场景不同:多个基站联合考虑, 。决策变量不同:假定已知内容流行度,求解内容放置问题 四建模方式不同:建模为两个时间尺度上的优化决策问题 四求解方法不同:除了一般的凸优化之外,用到了监督学习. ◆从中可以看到另一种DL的应用途径 2020年秋季 14/65 无线互联网
CASE#2 2020年秋季 14 / 65 无线互联网 [Yang2019] Jiachen Yang, Jipeng Zhang, et. al., “Deep Learning-Based Edge Caching for Multi-Cluster Heterogeneous Networks”, Neural Computing and Applications, 2019, https://doi.org/10.1007/s00521- 019-04040-z 同样是Edge Caching, 但有以下不同: 系统场景不同: 多个基站联合考虑. 决策变量不同: 假定已知内容流行度, 求解内容放置问题. 建模方式不同: 建模为两个时间尺度上的优化决策问题. 求解方法不同: 除了一般的凸优化之外, 用到了监督学习. 从中可以看到另一种DL的应用途径
0a0 系统模型 A A User BS tier-1 BS tier-2 device 密集部署的HetNet o△oPA Aǒ 回E个cluster[簇];K类基站[tier]. 四用户的空间分布为Poisson点过程, ◆均值为ue,1≤e≤E 回每类基站的分布也是PPP ◆ 均值为Re,1≤k≤K;1≤e≤E 四第k类基站的通信半径为rk,缓存容量为Ck,1≤k≤K ©为了节能,各类基站都有活动模式和睡眠模式. [活动模式下的能耗为ak,睡眠模式下的能耗为Bk.1≤k≤K灯 本文通过调整决策变量ke[e簇中k类基站的活动数目均值] 来控制系统能耗. 2020年秋季 15/65 无线互联网
系统模型 2020年秋季 15 / 65 无线互联网 non-uniformity and coupling in the user and BS locations, which inspires us to consider a PCP-based system model to formulate our optimization problems. 3 System model and problem description 3.1 Multi-cluster heterogeneous network model We consider a E-cluster K-tier HetNet model as in Fig. 1. The heterogeneous network comprising of K different types of BSs in each cluster is considered in this work with its storage capacity denoted by Ck and effective radius denoted by rk, 1 ! k ! K, where the effective radius means that the user can connect to the k-tier BS if and only if the distance between them is less than or equal to the effective radius rk. And this K-tier HetNets in the cluster e are modeled as K independent homogeneous Poisson point process (PPP) with their deployment density denoted as ktotal ke ; 1 ! k ! K; 1 ! e ! E. Again, user distribution on the 2D plane is assumed to be homogeneous PPP with its density denoted by kue ; 1 ! e ! E. We assumed that different clusters have different user preferences and are independent of each other, while for users in the same cluster, they share the same user preference, hence the same content popularity distribution. 3.2 Base station sleeping Because of the spatial and temporal evolution of the user density, here we introduce the base station sleeping to adapt to the time–space-varying user requests and avoid the base station resource waste at the idle time and the base station resource deficiency at the rush time [44]. We consider our BSs running on two modes: activated model and sleep model. The k-tier BSs in the cluster e running on the activated model has the distribution density denoted as kke, which is the variable we should optimize. Hence, the base station running on the sleeping mode has the distribution density denoted by ktotal ke " kke. This BS distribution density is closely related to the user density in each cluster. The energy consumption costs of the k-tier BSs running on the activated model and sleep model are denoted as ak and bk. We define tk ¼ ak " bk, for 1 ! k ! K. 3.3 Dynamic file library and cache refreshing model We assume there is a dynamic file library which contains F files with normalized unit size with its content popularity distribution evolves through time. We denote the fth most popular file at time slot t in the cluster e as ce fðtÞ, and its Fig. 1 A multi-cluster two-tier HetNet model, with user and BSs distributed as independent homogeneous PPP in each cluster Neural Computing and Applications 123 密集部署的HetNet E个cluster[簇]; K类基站[tier]. 用户的空间分布为Poisson点过程. 均值为���, � ≤ � ≤ � 每类基站的分布也是PPP. 均值为��� � , � ≤ � ≤ �; � ≤ � ≤ � 第k类基站的通信半径为��, 缓存容量为��, � ≤ � ≤ � 为了节能, 各类基站都有活动模式和睡眠模式. [活动模式下的能耗为��, 睡眠模式下的能耗为��. � ≤ � ≤ �] 本文通过调整决策变量���[e簇中k类基站的活动数目均值] 来控制系统能耗