Learning to Hash with its Application to Big Data Retrieval and Mining Wu-Jun Li Department of Computer Science and Engineering Shanghai Jiao Tong University Shanghai,China Joint work with Weihao Kong and Minyi Guo Jan18,2013 日卡三4元,互Q0 i (http://www.cs.sjtu.edu.cn/-livujun Learning to Hash CSE,SJTU 1/45
Learning to Hash with its Application to Big Data Retrieval and Mining Wu-Jun Li Department of Computer Science and Engineering Shanghai Jiao Tong University Shanghai, China Joint work with Weihao Kong and Minyi Guo Jan 18, 2013 Li (http://www.cs.sjtu.edu.cn/~liwujun) Learning to Hash CSE, SJTU 1 / 45
Outline ①Introduction o Problem Definition Existing Methods Motivation and Contribution ②Isotropic Hashing Model o Learning o Experimental Results Multiple-Bit Quantization Double-Bit Quantization Manhattan Quantization ④ Conclusion ⑤Reference 日卡回24元,互Q0 Li(http://www.cs.sjtu.edu.cn/-liwujun) Learning to Hash CSE,SJTU 2 /45
Outline 1 Introduction Problem Definition Existing Methods Motivation and Contribution 2 Isotropic Hashing Model Learning Experimental Results 3 Multiple-Bit Quantization Double-Bit Quantization Manhattan Quantization 4 Conclusion 5 Reference Li (http://www.cs.sjtu.edu.cn/~liwujun) Learning to Hash CSE, SJTU 2 / 45
Introduction Outline ①Introduction o Problem Definition Existing Methods Motivation and Contribution Isotropic Hashing Model o Learning o Experimental Results Multiple-Bit Quantization Double-Bit Quantization Manhattan Quantization Conclusion Reference 日卡回24元,互Q0 Li (http://www.cs.sjtu.edu.cn/-livujun) Learning to Hash CSE,SJTU 3 /45
Introduction Outline 1 Introduction Problem Definition Existing Methods Motivation and Contribution 2 Isotropic Hashing Model Learning Experimental Results 3 Multiple-Bit Quantization Double-Bit Quantization Manhattan Quantization 4 Conclusion 5 Reference Li (http://www.cs.sjtu.edu.cn/~liwujun) Learning to Hash CSE, SJTU 3 / 45
Introduction Problem Definition Nearest Neighbor Search(Retrieval) oGiven a query point g,return the points closest(similar)to g in the database(e.g.images). o Underlying many machine learning,data mining,information retrieval problems Challenge in Big Data Applications: o Curse of dimensionality Storage cost ●Query speed 日卡三4元,互Q0 Li (http://www.cs.sjtu.edu.cn/-livujun Learning to Hash CSE,SJTU 4 /45
Introduction Problem Definition Nearest Neighbor Search (Retrieval) Given a query point q, return the points closest (similar) to q in the database(e.g. images). Underlying many machine learning, data mining, information retrieval problems Challenge in Big Data Applications: Curse of dimensionality Storage cost Query speed Li (http://www.cs.sjtu.edu.cn/~liwujun) Learning to Hash CSE, SJTU 4 / 45
Introduction Problem Definition Similarity Preserving Hashing h (dog)= h(Napoleon)= h (Napoleon)= 10001010 01100001 011001Q1 flipped bit Should be very different Should be similar 0Q0 Li(http://wwv.cs.sjtu.edu.cn/-livujun Learning to Hash CSE,SJTU 5/45
Introduction Problem Definition Similarity Preserving Hashing Li (http://www.cs.sjtu.edu.cn/~liwujun) Learning to Hash CSE, SJTU 5 / 45