当前位置：和泉文库 > 计算机 > 《网络搜索和挖掘关键技术 Web Search and Mining》课程教学资源（PPT讲稿）Lecture 11 Probabilistic Information Retrieval

《网络搜索和挖掘关键技术 Web Search and Mining》课程教学资源（PPT讲稿）Lecture 11 Probabilistic Information Retrieval

文件格式：PPT，文件大小：540.5KB，售价：10.16元

文档详细内容（约45页）

Probabilistic Information Retrieval Probability Ranking Principle Probability Ranking Principle More complex case: retrieval costs Let d be a document c-cost of retrieval of relevant document C-cost of retrieval of non-relevant document Probability ranking principle: if C·p(R|d)+C·(1-p(R|d)≤C.p(Rd)+C.(1-p(R|d) for all d not yet retrieved, then d is the next document to be retrieved We won't further consider loss/utility from now on

Probabilistic Information Retrieval 11 Probability Ranking Principle ▪ More complex case: retrieval costs. ▪ Let d be a document ▪ C - cost of retrieval of relevant document ▪ C’ - cost of retrieval of non-relevant document ▪ Probability Ranking Principle: if for all d’ not yet retrieved, then d is the next document to be retrieved ▪ We won’t further consider loss/utility from now onC p(R | d) +C (1− p(R | d))  C  p(R | d) +C (1− p(R | d)) Probability Ranking Principle

Probabilistic Information Retrieval Probability Ranking Principle Probability ranking principle How do we compute all those probabilities? Do not know exact probabilities have to use estimates Binary Independence retrieval ( bir)-which we discuss later today -is the simplest model Questionable assumptions Relevance of each document is independent of relevance of other documents Really, it's bad to keep on returning duplicates Boolean model of relevance

Probabilistic Information Retrieval 12 Probability Ranking Principle ▪ How do we compute all those probabilities? ▪ Do not know exact probabilities, have to use estimates ▪ Binary Independence Retrieval (BIR) – which we discuss later today – is the simplest model ▪ Questionable assumptions ▪ “Relevance” of each document is independent of relevance of other documents. ▪ Really, it’s bad to keep on returning duplicates ▪ Boolean model of relevance Probability Ranking Principle

Probabilistic Information Retrieval Probabilistic Retrieval Strategy Estimate how terms contribute to relevance How do things like tf, df, and length influence your judgments about document relevance? One answer is the Okapi formulae(S Robertson) Combine to find document relevance probability Order documents by decreasing probability

Probabilistic Information Retrieval 13 Probabilistic Retrieval Strategy ▪ Estimate how terms contribute to relevance ▪ How do things like tf, df, and length influence your judgments about document relevance? ▪ One answer is the Okapi formulae (S. Robertson) ▪ Combine to find document relevance probability ▪ Order documents by decreasing probability

Probabilistic Information Retrieval Probabilistic Ranking Basic concept For a given query, if we know some documents that are relevant, terms that occur in those documents should be given greater weighting in searching for other relevant documents By making assumptions about the distribution of terms and applying Bayes Theorem, it is possible to derive weights theoretically Van rijsbergen

Probabilistic Information Retrieval 14 Probabilistic Ranking Basic concept: "For a given query, if we know some documents that are relevant, terms that occur in those documents should be given greater weighting in searching for other relevant documents. By making assumptions about the distribution of terms and applying Bayes Theorem, it is possible to derive weights theoretically." Van Rijsbergen

Probabilistic Information Retrieval Binary Independence Model Binary Independence model Traditionally used in conjunction with PRP Binary"= Boolean: documents are represented as binary incidence vectors of terms cf lecture 1) 1 iff term i is present in document x Independence: terms occur in documents independently Different documents can be modeled as the same vector Bernoulli naive bayes model (cf text categorization!

Probabilistic Information Retrieval 15 Binary Independence Model ▪ Traditionally used in conjunction with PRP ▪ “Binary” = Boolean: documents are represented as binary incidence vectors of terms (cf. lecture 1): ▪ ▪ iff term i is present in document x. ▪ “Independence”: terms occur in documents independently ▪ Different documents can be modeled as the same vector ▪ Bernoulli Naive Bayes model (cf. text categorization!) ( , , ) 1 n x x  x  = xi =1 Binary Independence Model

点击进入文档下载页（PPT格式）

共45页，试读已结束，阅读完整版请下载

您可能感兴趣的文档

山东大学：《人机交互技术》课程教学资源（PPT课件讲稿）第3章交互设备 3.5 显示设备 3.6 语音交互设备 3.7虚拟现实系统中的交互设备
东北大学：《可信计算基础》课程教学资源（PPT课件讲稿）第6章 TPM核心功能（主讲：周福才）
媒体服务（PPT课件讲稿）Media Services
河南中医药大学（河南中医学院）：《计算机网络》课程教学资源（PPT课件讲稿）第六章应用层
中国科学技术大学：《计算机体系结构》课程教学资源（PPT课件讲稿）第6章 Data-Level Parallelism in Vector, SIMD, and GPU Architectures
南京大学：《编译原理》课程教学资源（PPT课件讲稿）第七章运行时刻环境
《高级人工智能 Advanced Artificial Intelligence》教学资源（PPT讲稿）Lecture 7 Recurrent Neural Network
西安交通大学：《网络与信息安全》课程PPT教学课件（网络入侵与防范）第六章网络入侵与防范——拒绝服务攻击与防御技术
西安电子科技大学：《计算机通信网》课程教学资源（PPT课件讲稿）第1章概述（宋锐）
中国科学技术大学：《嵌入式操作系统 Embedded Operating Systems》课程教学资源（PPT课件讲稿）第四讲 CPU调度（part II）
大数据集成（PPT讲稿）Big Data Integration
《计算机文化基础》课程教学资源（PPT课件讲稿）第七章计算机网络基础
广西医科大学：《计算机网络 Computer Networking》课程教学资源（PPT课件讲稿）Chapter 01 Introduction overview
东南大学：《C++语言程序设计》课程教学资源（PPT课件讲稿）Chapter 10 Classes A Deeper Look（Part 2）
《网上开店实务》课程教学资源（PPT讲稿）学习情境1 网上开店创业策划
安徽理工大学：《Linux开发基础 Development Foundation on Linux OS》课程资源（PPT课件讲稿）Section 4 Perl programming（赵宝）
香港理工大学：Artificial Neural Networks for Data Mining
《TCP/IP协议及其应用》课程教学资源（PPT课件）第1章 TCP/IP协议基础
清华大学：《高级计算机网络 Advanced Computer Network》课程教学资源（PPT课件讲稿）Lecture 1 Introduction
香港浸会大学：C++ as a Better C; Introducing Object Technology
大庆职业学院：《计算机网络技术基础》课程教学资源（PPT课件讲稿）第2章数据通信的基础知识
The Art of Function Design -Measure and RKHS
《计算机网络与因特网》课程教学资源（PPT课件）Part VII 广域网（简称WAN）, 路由, 和最短路径
三维计算机视觉 3D computer vision（基于卡尔曼滤波的运动结构）

点击购买下载（PPT）

下载及服务说明

购买前请先查看本文档预览页，确认内容后再进行支付；
如遇文件无法下载、无法访问或其它任何问题，可发送电子邮件反馈，核实后将进行文件补发或退款等其它相关操作；
邮箱：

文档浏览记录