Introduction Big Data Machine Learning o Definition:perform machine learning from big data. o Role:key for big data Ultimate goal of big data processing is to mine value from data. Machine learning provides fundamental theory and computational techniques for big data mining and analysis. 日卡三4元,互Q0 Li (http://cs.nju.edu.cn/lwj) Big Leaming CS.NJU 6/79
Introduction Big Data Machine Learning Definition: perform machine learning from big data. Role: key for big data Ultimate goal of big data processing is to mine value from data. Machine learning provides fundamental theory and computational techniques for big data mining and analysis. Li (http://cs.nju.edu.cn/lwj) Big Learning CS, NJU 6 / 79
Introduction Challenge oStorage:memory and disk Computation:CPU o Communication:network 日卡三4元,互Q0 Li (http://cs.nju.edu.cn/lvj) Big Leamning CS.NJU 7/79
Introduction Challenge Storage: memory and disk Computation: CPU Communication: network Li (http://cs.nju.edu.cn/lwj) Big Learning CS, NJU 7 / 79
Introduction Our Contribution 。Learning to hash(哈希学习):memory/disk/cpu/communication 。Distributed learning(分布式学习)memory/disk/cpu; but increase communication cost ●Stochastic learning(随机学习):memory/disk/cpu 日卡三4元,互Q0 Li (http://cs.nju.edu.cn/lvj) Big Learning CS.NJU 8/79
Introduction Our Contribution Learning to hash (MFÆS): memory/disk/cpu/communication Distributed learning (©Ÿ™ÆS): memory/disk/cpu; but increase communication cost Stochastic learning (ëÅÆS): memory/disk/cpu Li (http://cs.nju.edu.cn/lwj) Big Learning CS, NJU 8 / 79
Learning to Hash Outline Introduction ② Learning to Hash o Isotropic Hashing Supervised Hashing with Latent Factor Models o Supervised Multimodal Hashing with SCM Multiple-Bit Quantization Distributed Learning Coupled Group Lasso for Web-Scale CTR Prediction Distributed Power-Law Graph Computing Stochastic Learning Distributed Stochastic ADMM for Matrix Factorization ⑤Conclusion 4日卡0,·工·4元,至)及0 Li (http://cs.nju.edu.cn/lvj) Big Leaming CS.NJU 9/79
Learning to Hash Outline 1 Introduction 2 Learning to Hash Isotropic Hashing Supervised Hashing with Latent Factor Models Supervised Multimodal Hashing with SCM Multiple-Bit Quantization 3 Distributed Learning Coupled Group Lasso for Web-Scale CTR Prediction Distributed Power-Law Graph Computing 4 Stochastic Learning Distributed Stochastic ADMM for Matrix Factorization 5 Conclusion Li (http://cs.nju.edu.cn/lwj) Big Learning CS, NJU 9 / 79
Learning to Hash Nearest Neighbor Search (Retrieval) oGiven a query point g,return the points closest(similar)to g in the database(e.g.images). o Underlying many machine learning,data mining,information retrieval problems Challenge in Big Data Applications: o Curse of dimensionality Storage cost ●Query speed 日卡*2元至)风0 Li (http://cs.nju.edu.cn/lwj) Big Leaming CS.NJU 10/79
Learning to Hash Nearest Neighbor Search (Retrieval) Given a query point q, return the points closest (similar) to q in the database(e.g. images). Underlying many machine learning, data mining, information retrieval problems Challenge in Big Data Applications: Curse of dimensionality Storage cost Query speed Li (http://cs.nju.edu.cn/lwj) Big Learning CS, NJU 10 / 79