Big Data Machine Learning 大数据机器学习 李武军 LAMDA Group 南京大学计算机科学与技术系 软件新技术国家重点实验室 liwujun@nju.edu.cn 0ct31,2014 日卡*24元至)风0 Li (http://cs.nju.edu.cn/lwj) Big Learning CS.NJU 1/79
Big Data Machine Learning åÍ‚ÅÏÆS o… LAMDA Group HÆåÆOéÅâÆÜE‚X ^á#E‚I[:¢ø liwujun@nju.edu.cn Oct 31, 2014 Li (http://cs.nju.edu.cn/lwj) Big Learning CS, NJU 1 / 79
Outline Introduction ②Learning to Hash o Isotropic Hashing o Supervised Hashing with Latent Factor Models o Supervised Multimodal Hashing with SCM Multiple-Bit Quantization Distributed Learning Coupled Group Lasso for Web-Scale CTR Prediction Distributed Power-Law Graph Computing ④Stochastic Learning o Distributed Stochastic ADMM for Matrix Factorization Conclusion 日卡40三4元,重只0 Li (http://cs.nju.edu.cn/lvj) Big Leaming CS.NJU 2/79
Outline 1 Introduction 2 Learning to Hash Isotropic Hashing Supervised Hashing with Latent Factor Models Supervised Multimodal Hashing with SCM Multiple-Bit Quantization 3 Distributed Learning Coupled Group Lasso for Web-Scale CTR Prediction Distributed Power-Law Graph Computing 4 Stochastic Learning Distributed Stochastic ADMM for Matrix Factorization 5 Conclusion Li (http://cs.nju.edu.cn/lwj) Big Learning CS, NJU 2 / 79
Introduction Outline 0 Introduction ②Learning to Hash Isotropic Hashing o Supervised Hashing with Latent Factor Models Supervised Multimodal Hashing with SCM o Multiple-Bit Quantization Distributed Learning Coupled Group Lasso for Web-Scale CTR Prediction o Distributed Power-Law Graph Computing Stochastic Learning Distributed Stochastic ADMM for Matrix Factorization ⑤Conclusion 4日卡0,·三·4元至)及0 Li (http://cs.nju.edu.cn/lvj) Big Leaming CS.NJU 3/79
Introduction Outline 1 Introduction 2 Learning to Hash Isotropic Hashing Supervised Hashing with Latent Factor Models Supervised Multimodal Hashing with SCM Multiple-Bit Quantization 3 Distributed Learning Coupled Group Lasso for Web-Scale CTR Prediction Distributed Power-Law Graph Computing 4 Stochastic Learning Distributed Stochastic ADMM for Matrix Factorization 5 Conclusion Li (http://cs.nju.edu.cn/lwj) Big Learning CS, NJU 3 / 79
Introduction Big Data Big data has attracted much attention from both academic and industry. Facebook:750 Million users Flickr:6 Billion photos Wal-Mart:267 Million items/day;4PB data warehouse oSloan Digital Sky Survey:New Mexico telescope captures 200 GB image data/day Science FOURTH PARADIGM data Li (http://cs.nju.edu.cn/lwj) Big Leaming CS.NJU 4/79
Introduction Big Data Big data has attracted much attention from both academic and industry. Facebook: 750 Million users Flickr: 6 Billion photos Wal-Mart: 267 Million items/day; 4PB data warehouse Sloan Digital Sky Survey: New Mexico telescope captures 200 GB image data/day Li (http://cs.nju.edu.cn/lwj) Big Learning CS, NJU 4 / 79
Introduction Definition of Big Data o Gartner(2012):"Big data is high volume,high velocity,and/or high variety information assets that require new forms of processing to enable enhanced decision making,insight discovery and process optimization."("3Vs") International Data Corporation (IDC)(2011):"Big data technologies describe a new generation of technologies and architectures,designed to economically extract value from very large volumes of a wide variety of data, by enabling high-velocity capture,discovery,and/or analysis."("4Vs") McKinsey Global Institute(MGI)(2011):"Big data refers to datasets whose size is beyond the ability of typical database software tools to capture,store, manage,and analyze.' Why not hot until recent years? 。Big data:金矿 。Cloud computing:采矿技术 。Big data machine learning:治金技术 日卡*24元至)风0 Li (http://cs.nju.edu.cn/lwj) Big Leaming CS.NJU 5/79
Introduction Definition of Big Data Gartner (2012): “Big data is high volume, high velocity, and/or high variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization.” (“3Vs”) International Data Corporation (IDC) (2011): “Big data technologies describe a new generation of technologies and architectures, designed to economically extract value from very large volumes of a wide variety of data, by enabling high-velocity capture, discovery, and/or analysis.” (“4Vs”) McKinsey Global Institute (MGI) (2011): “Big data refers to datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze.” Why not hot until recent years? Big data: 7¶ Cloud computingµÊ¶E‚ Big data machine learningµé7E‚ Li (http://cs.nju.edu.cn/lwj) Big Learning CS, NJU 5 / 79