Structured Text Collection General question:How do we search such a collection? A general topic Subtopic 1 Subtopic k CCF-ADL at Zhengzhou University, June25-27,2010
Structured Text Collection ... Subtopic 1 Subtopic k A general topic General question: How do we search such a collection? 7 CCF-ADL at Zhengzhou University, June 25-27, 2010
Exploiting Intra-document Structures [Ogilvie Callan 2003] D Intuitively,we want to combine all the parts. but give more weights to some parts Title D Abstract +D2 Think about query-likelihood model... Select D and generate a Body-Part1 +D3 query word using D; Body-Part2 p(QID.R)-TIp(w.ID.R) -TIEs(D,ID.R)pOID.R i=1j=1 D "part selection"prob.Serves as weight for D Can be trained using EM Anchor text can be treated as a "part"of a document CCF-ADL at Zhengzhou University,June 25-27,2010
Exploiting Intra-document Structures [Ogilvie & Callan 2003] Title Abstract Body-Part1 Body-Part2 … D D1 D2 D3 Dk Intuitively, we want to combine all the parts, but give more weights to some parts Think about query-likelihood model… 1 1 1 ( | , ) ( | , ) ( | , ) ( | , ) n i i n k j i j i j p Q D R p w D R s D D R p w D R = = = = = “part selection” prob. Serves as weight for Dj Can be trained using EM Select Dj and generate a query word using Dj Anchor text can be treated as a “part” of a document 8 CCF-ADL at Zhengzhou University, June 25-27, 2010
Exploiting Inter-document Structures Document collection has links (e.g.,Web, citations of literature) ·Query:text query Results:ranked list of documents Challenge:how to exploit links to improve ranking? CCF-ADL at Zhengzhou University, 9 June25-27,2010
Exploiting Inter-document Structures • Document collection has links (e.g., Web, citations of literature) • Query: text query • Results: ranked list of documents • Challenge: how to exploit links to improve ranking? 9 CCF-ADL at Zhengzhou University, June 25-27, 2010
Exploiting Inter-Document Links "Extra text"/summary for a doc Description (“anchor text") Links indicate the utility of a doc Hub What does a link tell us? Authority CCF-ADL at Zhengzhou University,June 25-27,2010 10
Exploiting Inter-Document Links Description (“anchor text”) Hub Authority “Extra text”/summary for a doc Links indicate the utility of a doc What does a link tell us? 10 CCF-ADL at Zhengzhou University, June 25-27, 2010
PageRank:Capturing Page "Popularity" [Page Brin 98] 。Intuitions Links are like citations in literature -A page that is cited often can be expected to be more useful in general ·PageRank is essentially“citation counting”,but improves over simple counting -Consider "indirect citations"(being cited by a highly cited paper counts a lot...) Smoothing of citations(every page is assumed to have a non-zero citation count) PageRank can also be interpreted as random surfing (thus capturing popularity) CCF-ADL at Zhengzhou University, June25-27,2010
PageRank: Capturing Page “Popularity” [Page & Brin 98] • Intuitions – Links are like citations in literature – A page that is cited often can be expected to be more useful in general • PageRank is essentially “citation counting”, but improves over simple counting – Consider “indirect citations” (being cited by a highly cited paper counts a lot…) – Smoothing of citations (every page is assumed to have a non-zero citation count) • PageRank can also be interpreted as random surfing (thus capturing popularity) 11 CCF-ADL at Zhengzhou University, June 25-27, 2010