Web Search Overview Crawling IR Dimensions of IR Content Applications Tasks Text Web search Ad hoc search Images Vertical search Filtering Video Enterprise search Classification Scanned docs Desktop search Question answering Audio Forum search Music P2P search Literature search 11
Web Search Overview & Crawling 11 Dimensions of IR Content Applications Tasks Text Web search Ad hoc search Images Vertical search Filtering Video Enterprise search Classification Scanned docs Desktop search Question answering Audio Forum search Music P2P search Literature search IR
Web Search Overview Crawling IR IR Tasks Ad-hoc search Find relevant documents for an arbitrary text query Filtering Identify relevant user profiles for a new document Classification Identify relevant labels for documents Question answering Give a specific answer to a question 12
Web Search Overview & Crawling 12 IR Tasks ▪ Ad-hoc search ▪ Find relevant documents for an arbitrary text query ▪ Filtering ▪ Identify relevant user profiles for a new document ▪ Classification ▪ Identify relevant labels for documents ▪ Question answering ▪ Give a specific answer to a question IR
Web Search Overview Crawling IR Big Issues in IR Relevance What is it? Simple (and simplistic)definition: relevant document contains the information that a person was looking for when they submitted a query to the search engine Many factors influence a person's decision about what is relevant: e.g., task, context, novelty, style Topical relevance (same topic)vs. user relevance (everything else) 13
Web Search Overview & Crawling 13 Big Issues in IR ▪ Relevance ▪ What is it? ▪ Simple (and simplistic) definition: A relevant document contains the information that a person was looking for when they submitted a query to the search engine ▪ Many factors influence a person’s decision about what is relevant: e.g., task, context, novelty, style ▪ Topical relevance (same topic) vs. user relevance (everything else) IR
Web Search Overview Crawling IR Big Issues in IR Relevance Retrieval models define a view of relevance Ranking algorithms used in search engines are based on retrieval models Most models describe statistical properties of text rather than linguistic i.e., counting simple text features such as words instead of parsing and analyzing the sentences Statistical approach to text processing started with Luhn in the 50s Linguistic features can be part of statistical model 14
Web Search Overview & Crawling 14 Big Issues in IR ▪ Relevance ▪ Retrieval models define a view of relevance ▪ Ranking algorithms used in search engines are based on retrieval models ▪ Most models describe statistical properties of text rather than linguistic ▪ i.e., counting simple text features such as words instead of parsing and analyzing the sentences ▪ Statistical approach to text processing started with Luhn in the 50s ▪ Linguistic features can be part of a statistical model IR
Web Search Overview Crawling IR Big Issues in IR Evaluation Experimental procedures and measures for comparing system output with user expectations Originated in Cranfield experiments in the 60s Typically use test collection of documents, queries, and relevance judgments Most commonly used are TREC collections Recall and precision are two examples of effectiveness measures 15
Web Search Overview & Crawling 15 Big Issues in IR ▪ Evaluation ▪ Experimental procedures and measures for comparing system output with user expectations ▪ Originated in Cranfield experiments in the 60s ▪ Typically use test collection of documents, queries, and relevance judgments ▪ Most commonly used are TREC collections ▪ Recall and precision are two examples of effectiveness measures IR