当前位置：和泉文库 > 计算机 > 浏览文档

《网络搜索和挖掘关键技术 Web Search and Mining》课程教学资源（PPT讲稿）Lecture 12 Language Models

文件格式：PPT，文件大小：660.5KB，售价：9.74元

文档详细内容（约43页）

Language Models Using Language Models in IR Treat each document as the basis for a model ( e. g unigram sufficient statistics) Rank document d based on P(d g) Pld a= plq l d)x pld)/ plg P(g) is the same for all documents so ignore P(d)ithe prior] is often treated as the same for all d But we could use criteria like authority length, genre Plq d) is the probability of g given d' s model Very general formal approach

Language Models 11 Using Language Models in IR ▪ Treat each document as the basis for a model (e.g., unigram sufficient statistics) ▪ Rank document d based on P(d | q) ▪ P(d | q) = P(q | d) x P(d) / P(q) ▪ P(q) is the same for all documents, so ignore ▪ P(d) [the prior] is often treated as the same for all d ▪ But we could use criteria like authority, length, genre ▪ P(q | d) is the probability of q given d’s model ▪ Very general formal approach

Language Models The fundamental problem of LMs Usually we don 't know the model m But have a sample of text representative of that model P(●o●。lM( Estimate a language model from a sample Then compute the observation probability

Language Models 12 The fundamental problem of LMs ▪ Usually we don’t know the model M ▪ But have a sample of text representative of that model ▪ ▪ Estimate a language model from a sample ▪ Then compute the observation probability P ( | M ( ) ) M

Language Models Query Likelihood Model Language Models for iR Language Modeling approaches Attempt to model query generation process Documents are ranked by the probability that a query would be observed as a random sample from the respective document model Multinomial approach P(QIMD)=P(wlMD)fu

Language Models 13 Language Models for IR ▪ Language Modeling Approaches ▪ Attempt to model query generation process ▪ Documents are ranked by the probability that a query would be observed as a random sample from the respective document model ▪ Multinomial approach Query Likelihood Model

Language Models Query Likelihood Model Retrieval based on probabilistic LM Treat the generation of queries as a random process Approach Infer a language model for each document Estimate the probability of generating the query according to each of these models Rank the documents according to these probabilities Usually a unigram estimate of words is used Some work on bigrams, paralleling van rijsbergen

Language Models 14 Retrieval based on probabilistic LM ▪ Treat the generation of queries as a random process. ▪ Approach ▪ Infer a language model for each document. ▪ Estimate the probability of generating the query according to each of these models. ▪ Rank the documents according to these probabilities. ▪ Usually a unigram estimate of words is used ▪ Some work on bigrams, paralleling van Rijsbergen Query Likelihood Model

Language Models Query Likelihood Model Retrieval based on probabilistic LM Intuition Users Have a reasonable idea of terms that are likely to occur in documents of interest They will choose query terms that distinguish these documents from others in the collection Collection statistics Are integral parts of the language model are not used heuristically as in many other approaches In theory In practice there s usually some wiggle room for empirically set parameters

Language Models 15 Retrieval based on probabilistic LM ▪ Intuition ▪ Users … ▪ Have a reasonable idea of terms that are likely to occur in documents of interest. ▪ They will choose query terms that distinguish these documents from others in the collection. ▪ Collection statistics … ▪ Are integral parts of the language model. ▪ Are not used heuristically as in many other approaches. ▪ In theory. In practice, there’s usually some wiggle room for empirically set parameters Query Likelihood Model

点击进入文档下载页（PPT格式）

共43页，试读已结束，阅读完整版请下载

您可能感兴趣的文档

四川大学：《操作系统 Operating System》课程教学资源（PPT课件讲稿）Chapter 6 Concurrency - Deadlock（死锁）and Starvation（饥饿）
《操作系统》课程教学资源（PPT课件讲稿）实时调度 Real-Time Scheduling
白城师范学院：《数据库系统概论 An Introduction to Database System》课程教学资源（PPT课件讲稿）第二章关系数据库（2.1-2.3）
《计算机算法设计与分析》课程教学资源（PPT课件）第8章回溯法
清华大学出版社：《计算机应用基础实例教程》课程教学资源（PPT课件讲稿，第二版，共七章，主编：吴霞，制作：李晓新）
中国科学技术大学：《计算机体系结构》课程教学资源（PPT课件讲稿）绪论、第1章量化设计与分析基础（主讲：周学海）
北京大学：烟花算法的变异算子（PPT讲稿）Mutation Operators of Fireworks Algorithm
Introduction to Text Mining 文本挖掘
《Managing XML and Semistructured Data》教学资源（PPT课件讲稿）Part 04 Compressing XML Data
《JAVA面向对象入门技术》教程教学资源（PPT课件讲稿）第二章 Java语言基础
北京大学：《项目成本管理》课程教学资源（PPT课件讲稿）项目范围计划（主讲：周立新）
山东大学：《网站设计与建设》课程教学资源（PPT课件讲稿）第三部分网站设计技术第20章 MySQL数据库
Progress of Concurrent Objects with Partial Methods
《编译原理与技术》课程教学资源（PPT课件讲稿）代码优化
《单片机应用技术》课程PPT教学课件（C语言版）第3章 MCS-51指令系统及汇编程序设计
《数据结构》课程教学资源（PPT课件讲稿）第八章图
同济大学：《大数据分析与数据挖掘 Big Data Analysis and Mining》课程教学资源（PPT课件讲稿）Platforms for Big Data Mining（主讲：饶卫雄）
《计算机网络》课程教学资源（PPT讲稿）网络安全（访问控制、加密、防火墙）
水平集方法与图像分割 Level set method and image segmentation
北京师范大学：《计算机文化基础》课程教学资源（PPT课件讲稿）08 网页制作基础知识（赵国庆）
《C语言程序设计》课程教学资源（PPT讲稿）第1章程序设计和C语言
《计算机组装与维护》课程教学资源（PPT课件讲稿）第十一章计算机数据恢复技术
贵州大学：计算机应用基础（PPT课件讲稿）计算机基础知识
《计算导论与程序设计》课程教学资源（PPT课件讲稿）Chap 5 函数

点击购买下载（PPT）

下载及服务说明

购买前请先查看本文档预览页，确认内容后再进行支付；
如遇文件无法下载、无法访问或其它任何问题，可发送电子邮件反馈，核实后将进行文件补发或退款等其它相关操作；
邮箱：

文档浏览记录