AskMSRStep3:MiningN-GramsSimple: Enumerate all N-grams (N=1,2,3 say) in all retrieved snippetsWeight of an n-gram: occurrence count, each weighted by "reliability(weight) of rewrite that fetched the documentExample:"Who created the characterof Scrooge?"Dickens - 117ChristmasCarol-78CharlesDickens-75交通大学Disney - 72Carl Banks - 54AChristmas-41Christmas Carol -45Uncle - 31
AskMSR l Step 3: Mining N-Grams • Simple: Enumerate all N-grams (N=1,2,3 say) in all retrieved snippets • Weight of an n-gram: occurrence count, each weighted by “reliability” (weight) of rewrite that fetched the document • Example: “Who created the character of Scrooge?” • Dickens - 117 • Christmas Carol - 78 • Charles Dickens - 75 • Disney - 72 • Carl Banks - 54 • A Christmas - 41 • Christmas Carol - 45 • Uncle - 31
AskMSRStep4:FilteringN-GramsEach question type is associated with one or more"data-type filters'=regularexpressionWhen...DateWhere...:LocationWhat ..PersonWho...Boost score of n-grams that do match reg expLower score of n-grams that don't match reg expDetails omitted from paper
AskMSR l Step 4: Filtering N-Grams • Each question type is associated with one or more “data-type filters” = regular expression • When. • Where. • What . • Who . • Boost score of n-grams that do match reg exp • Lower score of n-grams that don’t match reg exp • Details omitted from paper. Date Location Person
AskMSRStep5:TilingtheAnswersScores20CharlesDickensmerged, discardDickens15old n-gramsIMr Charles10+Score 45Mr Charles Dickenstile highest-scoring n-gramN-GramsN-GramsRepeat, until no more overlap
AskMSR l Step 5: Tiling the Answers Dickens Charles Dickens Mr Charles Scores 20 15 10 merged, discard old n-grams Score 45 Mr Charles Dickens N-Grams tile highest-scoring n-gram N-Grams Repeat, until no more overlap
AskMSRIssuesIn many scenarios (e.g., monitoring an individual' s email...) we only haveasmall setofdocumentsWorks best/onlyfor"Trivial Pursuit"-stylefact-based questionsLimited/brittlerepertoireof交通大学question categoriesanswerdatatypes/filtersquery rewriting rules
AskMSR l Issues • In many scenarios (e.g., monitoring an individual’s email.) we only have a small set of documents • Works best/only for “Trivial Pursuit”-style fact-based questions • Limited/brittle repertoire of • question categories • answer data types/filters • query rewriting rules
Outlines1.Motivation/History2.TheSQuADdataset3.TheStanfordAttentiveReadermodel4.BiDAF5.Recent,moreadvancedarchitectures
Outlines 1. Motivation/History 2. The SQuAD dataset 3. The Stanford Attentive Reader model 4. BiDAF 5. Recent, more advanced architectures