大数据机器学习 李武军 LAMDA Group 南京大学计算机科学与技术系 软件新技术国家重点实验室 liwujun@nju.edu.cn Nov26,2015 日卡*2元至)Q0 Li (http://cs.nju.edu.cn/lwj) Big Leaming CS.NJU 1/115
åÍ‚ÅÏÆS o… LAMDA Group HÆåÆOéÅâÆÜE‚X ^á#E‚I[:¢ø liwujun@nju.edu.cn Nov 26, 2015 Li (http://cs.nju.edu.cn/lwj) Big Learning CS, NJU 1 / 115
Outline ①Introduction Learning to Hash o Isotropic Hashing o Scalable Graph Hashing with Feature Transformation o Supervised Hashing with Latent Factor Models o Column Sampling based Discrete Supervised Hashing o Deep Supervised Hashing with Pairwise Labels o Supervised Multimodal Hashing with SCM Multiple-Bit Quantization Distributed Learning Coupled Group Lasso for Web-Scale CTR Prediction Distributed Power-Law Graph Computing ④Stochastic Learning o Fast Asynchronous Parallel Stochastic Gradient Descent Distributed Stochastic ADMM for Matrix Factorization Conclusion 口卡+得三4元互)Q0 Li (http://cs.nju.edu.cn/lvj) Big Leaming CS.NJU 2/115
Outline 1 Introduction 2 Learning to Hash Isotropic Hashing Scalable Graph Hashing with Feature Transformation Supervised Hashing with Latent Factor Models Column Sampling based Discrete Supervised Hashing Deep Supervised Hashing with Pairwise Labels Supervised Multimodal Hashing with SCM Multiple-Bit Quantization 3 Distributed Learning Coupled Group Lasso for Web-Scale CTR Prediction Distributed Power-Law Graph Computing 4 Stochastic Learning Fast Asynchronous Parallel Stochastic Gradient Descent Distributed Stochastic ADMM for Matrix Factorization 5 Conclusion Li (http://cs.nju.edu.cn/lwj) Big Learning CS, NJU 2 / 115
Introduction Outline ①Introduction Learning to Hash Isotropic Hashing Scalable Graph Hashing with Feature Transformation Supervised Hashing with Latent Factor Models Column Sampling based Discrete Supervised Hashing Deep Supervised Hashing with Pairwise Labels Supervised Multimodal Hashing with SCM Multiple-Bit Quantization Distributed Learning Coupled Group Lasso for Web-Scale CTR Prediction Distributed Power-Law Graph Computing Stochastic Learning Fast Asynchronous Parallel Stochastic Gradient Descent Distributed Stochastic ADMM for Matrix Factorization Conclusion +日卡+得,三4元互)风0 Li (http://cs.nju.edu.cn/lvj) Big Learning CS.NJU 3 /115
Introduction Outline 1 Introduction 2 Learning to Hash Isotropic Hashing Scalable Graph Hashing with Feature Transformation Supervised Hashing with Latent Factor Models Column Sampling based Discrete Supervised Hashing Deep Supervised Hashing with Pairwise Labels Supervised Multimodal Hashing with SCM Multiple-Bit Quantization 3 Distributed Learning Coupled Group Lasso for Web-Scale CTR Prediction Distributed Power-Law Graph Computing 4 Stochastic Learning Fast Asynchronous Parallel Stochastic Gradient Descent Distributed Stochastic ADMM for Matrix Factorization 5 Conclusion Li (http://cs.nju.edu.cn/lwj) Big Learning CS, NJU 3 / 115
Introduction Big Data Big data has attracted much attention from both academia and industry. o Facebook:750 million users Flickr:6 billion photos Wal-Mart:267 million items/day;4PB data warehouse oSloan Digital Sky Survey:New Mexico telescope captures 200 GB image data/day Science FOURTH PARADIGM data Li (http://cs.nju.edu.cn/lwj) Big Leamning CS.NJU 4/115
Introduction Big Data Big data has attracted much attention from both academia and industry. Facebook: 750 million users Flickr: 6 billion photos Wal-Mart: 267 million items/day; 4PB data warehouse Sloan Digital Sky Survey: New Mexico telescope captures 200 GB image data/day Li (http://cs.nju.edu.cn/lwj) Big Learning CS, NJU 4 / 115
Introduction Definition of Big Data o Gartner(2012):"Big data is high volume,high velocity,and/or high variety information assets that require new forms of processing to enable enhanced decision making,insight discovery and process optimization."("3Vs") International Data Corporation (IDC)(2011):"Big data technologies describe a new generation of technologies and architectures,designed to economically extract value from very large volumes of a wide variety of data, by enabling high-velocity capture,discovery,and/or analysis."("4Vs") McKinsey Global Institute(MGI)(2011):"Big data refers to datasets whose size is beyond the ability of typical database software tools to capture,store, manage,and analyze.' Why not hot until recent years? 。Big data:金矿 。Cloud computing:采矿技术 。Big data machine learning:治金技术 日卡*2元至)Q0 Li (http://cs.nju.edu.cn/lwj) Big Leaming CS.NJU 5 /115
Introduction Definition of Big Data Gartner (2012): “Big data is high volume, high velocity, and/or high variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization.” (“3Vs”) International Data Corporation (IDC) (2011): “Big data technologies describe a new generation of technologies and architectures, designed to economically extract value from very large volumes of a wide variety of data, by enabling high-velocity capture, discovery, and/or analysis.” (“4Vs”) McKinsey Global Institute (MGI) (2011): “Big data refers to datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze.” Why not hot until recent years? Big data: 7¶ Cloud computingµÊ¶E‚ Big data machine learningµé7E‚ Li (http://cs.nju.edu.cn/lwj) Big Learning CS, NJU 5 / 115