Probabilistic Information Retrieval Web Search and Mining Lecture 11: Probabilistic Information Retrieval
Probabilistic Information Retrieval 1 Lecture 11: Probabilistic Information Retrieval Web Search and Mining
Probabilistic Information Retrieval Recap of the last lecture Improving search results Especially for high recall. E. g searching for aircraft so it matches with plane; thermodynamic with heat Options for improving results Global methods Query expansion Thesauri Automatic thesaurus generation Global indirect relevance feedback ■Loca| methods Relevance feedback Pseudo relevance feedback
Probabilistic Information Retrieval 2 Recap of the last lecture ▪ Improving search results ▪ Especially for high recall. E.g., searching for aircraft so it matches with plane; thermodynamic with heat ▪ Options for improving results… ▪ Global methods ▪ Query expansion ▪ Thesauri ▪ Automatic thesaurus generation ▪ Global indirect relevance feedback ▪ Local methods ▪ Relevance feedback ▪ Pseudo relevance feedback
Probabilistic Information Retrieval Probabilistic relevance feedback Rather than reweighting in a vector space If user has told us some relevant and some irrelevant documents then we can proceed to build a probabilistic classifier, such as a Naive bayes model P(tkIR)=Drk/D P(tkINr) =Dnrk I /Dnr tk is a term; D is the set of known relevant documents Drk is the subset that contain ti D. is the set of known irrelevant documents D is the subset that contain t
Probabilistic Information Retrieval 3 Probabilistic relevance feedback ▪ Rather than reweighting in a vector space… ▪ If user has told us some relevant and some irrelevant documents, then we can proceed to build a probabilistic classifier, such as a Naive Bayes model: ▪ P(tk|R) = |Drk| / |Dr| ▪ P(tk|NR) = |Dnrk| / |Dnr| ▪ tk is a term; Dr is the set of known relevant documents; Drk is the subset that contain tk ; Dnr is the set of known irrelevant documents; Dnrk is the subset that contain tk
Probabilistic Information Retrieval Why probabilities in IR? User Understanding Query Information Need Representation of user need is un certain How to match? Uncertain guess of Document Documents whether docum ent Representation has relevant content In traditional IR systems, matching between each document and query is attempted in a semantically imprecise space of index terms Probabilities provide a princi pled foundation for un certain reasoning Can we use probabilities to guantify our uncertainties?
Probabilistic Information Retrieval 4 Why probabilities in IR? User Information Need Documents Document Representation Query Representation How to match? In traditional IR systems, matching between each document and query is attempted in a semantically imprecise space of index terms. Probabilities provide a principled foundation for uncertain reasoning. Can we use probabilities to quantify our uncertainties? Uncertain guess of whether document has relevant content Understanding of user need is uncertain
Probabilistic Information Retrieval Probabilistic IR topics Classical probabilistic retrieval model Probability ranking principle, etc Binary independence model Bayesian networks for text retrieval Language model approach to IR An important emphasis in recent work Probabilistic methods are one of the oldest but also one of the currently hottest topics in /R Traditionally: neat ideas, but they ve never won on performance. It may be different now
Probabilistic Information Retrieval 5 Probabilistic IR topics ▪ Classical probabilistic retrieval model ▪ Probability ranking principle, etc. ▪ Binary independence model ▪ Bayesian networks for text retrieval ▪ Language model approach to IR ▪ An important emphasis in recent work ▪ Probabilistic methods are one of the oldest but also one of the currently hottest topics in IR. ▪ Traditionally: neat ideas, but they’ve never won on performance. It may be different now