Learning to Hash with its Application to Big Data Retrieval and Mining 李武军 Department of Computer Science and Engineering Shanghai Jiao Tong University Shanghai,China Joint work with孔维吴,张东擎,过敏意 Dec21,2013 日卡三4元,互Q0 i (http://www.cs.sjtu.edu.cn/-livujun Learning to Hash CSE,SJTU 1/49
Learning to Hash with its Application to Big Data Retrieval and Mining o… Department of Computer Science and Engineering Shanghai Jiao Tong University Shanghai, China Joint work with öëh, ‹¿ô, LØø Dec 21, 2013 Li (http://www.cs.sjtu.edu.cn/~liwujun) Learning to Hash CSE, SJTU 1 / 49
Outline Introduction Problem Definition ●Existing Methods ② Isotropic Hashing Model ●Learning o Experiment Multiple-Bit Quantization Double-Bit Quantization Manhattan Quantization ④ Conclusion Reference +日卡+得三元互)Q0¥ Li (http://www.cs.sjtu.edu.cn/-livujun Learning to Hash CSE,SJTU 2 /49
Outline 1 Introduction Problem Definition Existing Methods 2 Isotropic Hashing Model Learning Experiment 3 Multiple-Bit Quantization Double-Bit Quantization Manhattan Quantization 4 Conclusion 5 Reference Li (http://www.cs.sjtu.edu.cn/~liwujun) Learning to Hash CSE, SJTU 2 / 49
Introduction Outline Introduction Problem Definition ●Existing Methods Isotropic Hashing o Model e Learning o Experiment Multiple-Bit Quantization Double-Bit Quantization o Manhattan Quantization Conclusion Reference +日卡+得三元互)Q0¥ Li (http://www.cs.sjtu.edu.cn/-livujun Learning to Hash CSE,SJTU 3 /49
Introduction Outline 1 Introduction Problem Definition Existing Methods 2 Isotropic Hashing Model Learning Experiment 3 Multiple-Bit Quantization Double-Bit Quantization Manhattan Quantization 4 Conclusion 5 Reference Li (http://www.cs.sjtu.edu.cn/~liwujun) Learning to Hash CSE, SJTU 3 / 49
Introduction Problem Definition Nearest Neighbor Search(Retrieval) oGiven a query point g,return the points closest(similar)to g in the database(e.g.images). o Underlying many machine learning,data mining,information retrieval problems Challenge in Big Data Applications: o Curse of dimensionality Storage cost ●Query speed 日卡三4元,互Q0 Li (http://www.cs.sjtu.edu.cn/-livujun Learning to Hash CSE,SJTU 4 /49
Introduction Problem Definition Nearest Neighbor Search (Retrieval) Given a query point q, return the points closest (similar) to q in the database(e.g. images). Underlying many machine learning, data mining, information retrieval problems Challenge in Big Data Applications: Curse of dimensionality Storage cost Query speed Li (http://www.cs.sjtu.edu.cn/~liwujun) Learning to Hash CSE, SJTU 4 / 49
Introduction Problem Definition Similarity Preserving Hashing h(Statue of Liberty)= h(Napoleon)= h (Napoleon)= 10001010 01100001 011001Q1 flipped bit Should be very different Should be similar 0Q0 f(http:/aw.cs:sjtu.edu.cn/-1iu时jun】 Learning to Hash CSE,SJTU 5/49
Introduction Problem Definition Similarity Preserving Hashing Li (http://www.cs.sjtu.edu.cn/~liwujun) Learning to Hash CSE, SJTU 5 / 49