Computing Scores in a Complete Search System Web Search and Mining Lecture 8: Scoring and results assembly
Computing Scores in a Complete Search System Lecture 8: Scoring and results assembly Web Search and Mining 1
Computing Scores in a Complete Search System Recap: tf-idf weighting The tf-idf weight of a term is the product of its tf weight and its idf weight W,=(1+log tf, d)logo(N/df, Best known weighting scheme in information retrieval Increases with the number of occurrences within a document a Increases with the rarity of the term in the collection
Computing Scores in a Complete Search System Recap: tf-idf weighting ▪ The tf-idf weight of a term is the product of its tf weight and its idf weight. ▪ Best known weighting scheme in information retrieval ▪ Increases with the number of occurrences within a document ▪ Increases with the rarity of the term in the collection w (1 log tf ) log ( /df ) , t,d 10 N t t d = + 2
Computing Scores in a Complete Search System Recap: Queries as vectors Key idea 1: Do the same for queries: represent them as vectors in the space Key idea 2: Rank documents according to their proximity to the query in this space proximity similarity of vectors
Computing Scores in a Complete Search System Recap: Queries as vectors ▪ Key idea 1: Do the same for queries: represent them as vectors in the space ▪ Key idea 2: Rank documents according to their proximity to the query in this space ▪ proximity = similarity of vectors 3
Computing Scores in a Complete Search System Recap: cosine(query, document Dot product Unit vectors cos(g, d) ∑ cos(a, d) is the cosine similarity of g and d...on equivalently, the cosine of the angle between g and d
Computing Scores in a Complete Search System Recap: cosine(query,document) = = = = • = • = V i i V i i V i i i q d q d d d q q q d q d q d 1 2 1 2 1 cos( , ) Dot product Unit vectors cos(q,d) is the cosine similarity of q and d … or, equivalently, the cosine of the angle between q and d. 4
Computing Scores in a Complete Search System This lecture Speeding up vector space ranking Putting together a complete search system Will require learning about a number of miscellaneous topics and heuristics
Computing Scores in a Complete Search System This lecture ▪ Speeding up vector space ranking ▪ Putting together a complete search system ▪ Will require learning about a number of miscellaneous topics and heuristics 5