Computing Scores in a Complete Search System Efficient Ranking Computing cosine scores CosineSCORE(q float Scores[N=0 2 float Length[N] 3 for each query term t 4 do calculate Wt g and fetch postings list for t or each pair(d, tft. d )in postings list lo Scores[d+=Wd×Wtq 7 Read the array Length 8 for each d 9 do Scores d= Scores[d/Length[d 10 return Top K components of Scores
Computing Scores in a Complete Search System Computing cosine scores Efficient Ranking 6
Computing Scores in a Complete Search System Efficient Ranking Efficient cosine rankin Find the k docs in the collection "nearest to the query =K largest query-doc cosines ■ Efficient ran king: Computing a single cosine efficiently Choosing the k largest cosine values efficiently Can we do this without computing all n cosines?
Computing Scores in a Complete Search System Efficient cosine ranking ▪ Find the K docs in the collection “nearest” to the query K largest query-doc cosines. ▪ Efficient ranking: ▪ Computing a single cosine efficiently. ▪ Choosing the K largest cosine values efficiently. ▪ Can we do this without computing all N cosines? Efficient Ranking 7
Computing Scores in a Complete Search System Efficient Ranking Efficient cosine rankin What we re doing in effect solving the k-nearest neighbor problem for a query vector In general, we do not know how to do this efficiently for high-dimensional spaces But it is solvable for short queries and standard indexes support this well
Computing Scores in a Complete Search System Efficient cosine ranking ▪ What we’re doing in effect: solving the K-nearest neighbor problem for a query vector ▪ In general, we do not know how to do this efficiently for high-dimensional spaces ▪ But it is solvable for short queries, and standard indexes support this well Efficient Ranking 8
Computing Scores in a Complete Search System Efficient Ranking Special case -unweighted queries No weighting on query terms Assume each query term occurs only once Then for ranking, dont need to normalize query vector Slight simplification of agorithm from Lecture 7
Computing Scores in a Complete Search System Special case – unweighted queries ▪ No weighting on query terms ▪ Assume each query term occurs only once ▪ Then for ranking, don’t need to normalize query vector ▪ Slight simplification of algorithm from Lecture 7 Efficient Ranking 9
Computing Scores in a Complete Search System Efficient Ranking Faster cosine: unweighted query FASTCOSINESCORE(Q 1 float Scores[N=0 2 for each d 3 do Initialize Length[d] to the length of doc d 4 for each query term t 5 do calculate wtg and fetch postings list for t 6 for each pair(d, tft, d )in postings list 7 do add wf, d to Scores[d] 8 Read the array Length[dI g for each d 10 do Divide Scoresld] by Length[dI 11 return Top K components of Scoresll Figure 7.1 A faster algorithm for vector space scores
Computing Scores in a Complete Search System Faster cosine: unweighted query Efficient Ranking 10