08: Evaluating Search Engines 8. 1 Why Evaluate 8.2 The Evaluation Corpus 8.3 Logging 8. 4 Effectiveness metrics 8.5 Efficiency Metrics 8.6 Training, Testing, and Statistics 12/N
08: Evaluating Search Engines 8.1 Why Evaluate 8.2 The Evaluation Corpus 8.3 Logging 8.4 Effectiveness Metrics 8.5 Efficiency Metrics 8.6 Training, Testing, and Statistics 12/N
Query Logs Used for both tuning and evaluating search engInes also for various techniques such as query suggestion ypical contents User identifier or user session identifier Query terms- stored exactly as user entered List of urls of results their ranks on the result list and whether they were clicked on Timestamp(s)-records the time of user events such as query submission, clicks 13/N
Query Logs • Used for both tuning and evaluating search engines – also for various techniques such as query suggestion • Typical contents – User identifier or user session identifier – Query terms - stored exactly as user entered – List of URLs of results, their ranks on the result list, and whether they were clicked on – Timestamp(s) - records the time of user events such as query submission, clicks 13/N
Query Logs Clicks are not relevance judgments although they are correlated biased by a number of factors such as rank on result list Can use clickthough data to predict preferences between pairs of documents appropriate for tasks with multiple levels of relevance, focused on user relevance various"policies"used to generate preferences 14/N
Query Logs • Clicks are not relevance judgments – although they are correlated – biased by a number of factors such as rank on result list • Can use clickthough data to predict preferences between pairs of documents – appropriate for tasks with multiple levels of relevance, focused on user relevance – various “policies” used to generate preferences 14/N
Example Click Policy Skip Above and skip Next click data ds(clicked generated preferences d3>d41 15/N
Example Click Policy • Skip Above and Skip Next – click data – generated preferences 15/N
Query Logs Click data can also be aggregated to remove noise Click distribution information can be used to identify clicks that have a higher frequency than would be expected high correlation with relevance e. g using click deviation to filter clicks for preference-generation policies 16/N
Query Logs • Click data can also be aggregated to remove noise • Click distribution information – can be used to identify clicks that have a higher frequency than would be expected – high correlation with relevance – e.g., using click deviation to filter clicks for preference-generation policies 16/N