Computing Scores in a Complete Search System Efficient Ranking Computing the k largest cosines selection vs sorting Typically we want to retrieve the top k docs in the cosine ranking for the query) not to tota lly order all docs in the collection Can we pick off docs with k highest cosines? et j= number of docs with nonzero cosines We seek the k best of these
Computing Scores in a Complete Search System Computing the K largest cosines: selection vs. sorting ▪ Typically we want to retrieve the top K docs (in the cosine ranking for the query) ▪ not to totally order all docs in the collection ▪ Can we pick off docs with K highest cosines? ▪ Let J = number of docs with nonzero cosines ▪ We seek the K best of these J Efficient Ranking 11
Computing Scores in a Complete Search System Efficient Ranking Use heap for selecting top k Binary tree in which each node's value> the values of children Takes 2J operations to construct, then each of K winners read off in 2log j steps For j=1m, k=100, this is about 10% of the cost of sorting. (1Q 3 ③、③8
Computing Scores in a Complete Search System Use heap for selecting top K ▪ Binary tree in which each node’s value > the values of children ▪ Takes 2J operations to construct, then each of K “winners” read off in 2log J steps. ▪ For J=1M, K=100, this is about 10% of the cost of sorting. 10 .9 .3 .3 .8 .1 .1 Efficient Ranking 12
Computing Scores in a Complete Search System Efficient Ranking Bottlenecks Primary computational bottleneck in scoring: cosine computation Can we avoid all this computation? Yes but may sometimes get it wrong a doc not in the top k may creep into the list of K output docs Is this such a bad thing
Computing Scores in a Complete Search System Bottlenecks ▪ Primary computational bottleneck in scoring: cosine computation ▪ Can we avoid all this computation? ▪ Yes, but may sometimes get it wrong ▪ a doc not in the top K may creep into the list of K output docs ▪ Is this such a bad thing? Efficient Ranking 13
Computing Scores in a Complete Search System Efficient Ranking Cosine similarity is only a proxy User has a task and a query formulation Cosine matches docs to query Thus cosine is anyway a proxy for user happiness If we get a list of k docs "close" to the top k by cosine measure, should be ok
Computing Scores in a Complete Search System Cosine similarity is only a proxy ▪ User has a task and a query formulation ▪ Cosine matches docs to query ▪ Thus cosine is anyway a proxy for user happiness ▪ If we get a list of K docs “close” to the top K by cosine measure, should be ok Efficient Ranking 14
Computing Scores in a Complete Search System Efficient Ranking Generic approach Find a set a of contenders, with K< JA/<< N a does not necessarily contain the top k, but has many docs from among the top k Return the top k docs in a Think of a as pruning non-contenders The same approach is also used for other (non cosine) scoring functions Will look at several schemes following this approach
Computing Scores in a Complete Search System Generic approach ▪ Find a set A of contenders, with K < |A| << N ▪ A does not necessarily contain the top K, but has many docs from among the top K ▪ Return the top K docs in A ▪ Think of A as pruning non-contenders ▪ The same approach is also used for other (noncosine) scoring functions ▪ Will look at several schemes following this approach Efficient Ranking 15