BIT FUNNEL:REVISITING SIGNATURES FOR SEARCH 信息检索与数据挖掘研讨报告 凌华泽 20190521 Goodwin B,Hopcroft M,Luu D,et al.BitFunnel:Revisiting signatures for search[C]//Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval.ACM,2017:605-614
-----信息检索与数据挖掘研讨报告 凌华泽 20190521 Goodwin B, Hopcroft M, Luu D, et al. BitFunnel: Revisiting signatures for search[C]//Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2017: 605-614
INTRODUCTION -BitFunnel Signature-based approach Bloom filter Four Challenges ·查询词项时需要查看每个文件的signature higher rank rows ·存在一定的误报率 frequency-conscious signature ·文件大小存在差异,signatur需要适应最大文件 shard the corpus ·基于签名的方案不是众所周知的问题 a cost model
BitFunnel Signature-based approach Bloom filter Four Challenges 查询词项时需要查看每个文件的signature higher rank rows 存在一定的误报率 frequency-conscious signature 文件大小存在差异,signatur需要适应最大文件 shard the corpus 基于签名的方案不是众所周知的问题 a cost model
Q A B C D E M N O P 0 0 BACK GROUND 2 1 00 00100 00 3 3 4 5 .Bit-Sliced Signature 1 6 0 6 10 0 ·这种方法将签名向量从行转换成列以方便 8 0 8888 同时查询多个文档 9 1 9 10 0 10 ·每个文档对应保存16位签名的列,每行对 0 11 对应于文档签名中的位 0 12 13 0 13 ·文档集C={A..) 14 00 150 15100 000 ·查询项Q A B O P .Bit-Sliced Blocked Signature 2 5 ·多个文件公用一列,以减小行的长度 9 01 2n5n9 0100010000000000 A B C D E FG H IJ K L M N O P
Bit-Sliced Signature 这种方法将签名向量从行转换成列以方便 同时查询多个文档 每个文档对应保存16位签名的列,每行对 对应于文档签名中的位 文档集C={A….P} 查询项Q Bit-Sliced Blocked Signature 多个文件公用一列,以减小行的长度
BITFUNNEL INNOVATIONS -Higher Rank Rows Rank00100010010001000 0123456789101112131415 Rank r: Rank111001100 01234567 89101112131415 ir +(io mod2) Rank 2 1100 0123 4567 .W:machine word log2 891011 12131415
Higher Rank Rows Rank r: W : machine word 的log2
noISE 0123456789101112131415 0123456789101112131415 R2COUOSOUOS OUOCOUO R200S00CU000G00SU0 RCUOOS OOU S UOOC O U R:0Us00C000Ug00s00 RoOUUOS OUOS UO OUOUO R 00 S U O C OO 00 C UO S OO Results o 000 S O 0 O S O 0 O UO 00 RRR00500C0000C00S00 0123456789101112131415 ■S:signal bit ·三个rank-l,得到对应的rank-0 ·U:rank-2导致的noise bit .Correlated noise: ■C:rank-0导致的noise bit n0-nr=sr-s0=1-(1-50)2'-s0
S : signal bit U:rank-2导致的 noise bit C: rank-0导致的 noise bit 三 个 rank - 1,得到对应的rank - 0 Correlated noise :