Evaluation Corpus Test collections consisting of documents, queries, and relevance judgments, e. g. CACM: Titles and abstracts from the communications of the acm from 1958-1979. Queries and relevance judgments generated by computer scientists. AP: Associated press newswire documents from 1988 1900 ( from tREC disks 1-3 ). Queries are the title fields from TREC topics 51-150 Topics and relevance judgments generated by government information analysts. GOV2: Web pages crawled from websites in the. gov domain during early 2004. Queries are the title fields from TREC topics 701-850 Topics and relevance judgments generated by government analysts 7/N
Evaluation Corpus • Test collections consisting of documents, queries, and relevance judgments, e.g., – CACM: Titles and abstracts from the Communications of the ACM from 1958-1979. Queries and relevance judgments generated by computer scientists. – AP: Associated Press newswire documents from 1988- 1900 (from TREC disks 1-3). Queries are the title fields from TREC topics 51-150. Topics and relevance judgments generated by government information analysts. – GOV2: Web pages crawled from websites in the .gov domain during early 2004. Queries are the title fields from TREC topics 701-850. Topics and relevance judgments generated by government analysts. 7/N
Test Collections Collection Number of Size Average number documents of words/doc CACM 3.204 2.2Mb 64 AP 242,91807Gb 474 GOV225,205,179426Gb 1073 Collection Number of Average number of Average number of queries words/query relevant docs/query CACM 13.0 16 AP 100 43 GOV2 150 3.1 180 8/N
Test Collections 8/N
TREC Topic Example <top> <num> Number: 794 <title> pet therapy <desc> description How are pets or animals used in therapy for humans and what are the benefits? <narr> narrative Relevant documents must include details of how pet-or animal-assisted therapy is or has been used relevant details include information about pet therapy programs, descriptions of the circumstances in which pet therapy is used, the benefits of this type of therapy the degree of success of this therapy and any laws or regulations governing it. </top> 9/N
TREC Topic Example 9/N
Relevance Judgments Obtaining relevance judgments is an expensive, time-consuming process who does it? what are the instructions? what is the level of agreement? TREC judgments depend on task being evaluated generally binary agreement good because of"narrative 10/N
Relevance Judgments • Obtaining relevance judgments is an expensive, time-consuming process – who does it? – what are the instructions? – what is the level of agreement? • TREC judgments – depend on task being evaluated – generally binary – agreement good because of “narrative” 10/N
Pooling Exhaustive judgments for all documents in a collection is not practical Pooling technique is used in TREC top k results for TREC, k varied between 50 and 200) from the rankings obtained by different search engines for retrieval algorithms are merged into a pool duplicates are removed documents are presented in some random order to the relevance judges Produces a large number of relevance judgments for each query, although still incomplete 1/N
Pooling • Exhaustive judgments for all documents in a collection is not practical • Pooling technique is used in TREC – top k results (for TREC, k varied between 50 and 200) from the rankings obtained by different search engines (or retrieval algorithms) are merged into a pool – duplicates are removed – documents are presented in some random order to the relevance judges • Produces a large number of relevance judgments for each query, although still incomplete 11/N