当前位置：和泉文库 > 计算机 > 《网络搜索和挖掘关键技术 Web Search and Mining》课程教学资源（PPT讲稿）Lecture 08 Scoring and results assembly

《网络搜索和挖掘关键技术 Web Search and Mining》课程教学资源（PPT讲稿）Lecture 08 Scoring and results assembly

文件格式：PPT，文件大小：446KB，售价：10.78元

文档详细内容（约48页）

Computing Scores in a Complete Search System Efficient Ranking Computing the k largest cosines selection vs sorting Typically we want to retrieve the top k docs in the cosine ranking for the query) not to tota lly order all docs in the collection Can we pick off docs with k highest cosines? et j= number of docs with nonzero cosines We seek the k best of these

Computing Scores in a Complete Search System Computing the K largest cosines: selection vs. sorting ▪ Typically we want to retrieve the top K docs (in the cosine ranking for the query) ▪ not to totally order all docs in the collection ▪ Can we pick off docs with K highest cosines? ▪ Let J = number of docs with nonzero cosines ▪ We seek the K best of these J Efficient Ranking 11

Computing Scores in a Complete Search System Efficient Ranking Use heap for selecting top k Binary tree in which each node's value> the values of children Takes 2J operations to construct, then each of K winners read off in 2log j steps For j=1m, k=100, this is about 10% of the cost of sorting. (1Q 3 ③、③8

Computing Scores in a Complete Search System Use heap for selecting top K ▪ Binary tree in which each node’s value > the values of children ▪ Takes 2J operations to construct, then each of K “winners” read off in 2log J steps. ▪ For J=1M, K=100, this is about 10% of the cost of sorting. 10 .9 .3 .3 .8 .1 .1 Efficient Ranking 12

Computing Scores in a Complete Search System Efficient Ranking Bottlenecks Primary computational bottleneck in scoring: cosine computation Can we avoid all this computation? Yes but may sometimes get it wrong a doc not in the top k may creep into the list of K output docs Is this such a bad thing

Computing Scores in a Complete Search System Bottlenecks ▪ Primary computational bottleneck in scoring: cosine computation ▪ Can we avoid all this computation? ▪ Yes, but may sometimes get it wrong ▪ a doc not in the top K may creep into the list of K output docs ▪ Is this such a bad thing? Efficient Ranking 13

Computing Scores in a Complete Search System Efficient Ranking Cosine similarity is only a proxy User has a task and a query formulation Cosine matches docs to query Thus cosine is anyway a proxy for user happiness If we get a list of k docs "close" to the top k by cosine measure, should be ok

Computing Scores in a Complete Search System Cosine similarity is only a proxy ▪ User has a task and a query formulation ▪ Cosine matches docs to query ▪ Thus cosine is anyway a proxy for user happiness ▪ If we get a list of K docs “close” to the top K by cosine measure, should be ok Efficient Ranking 14

Computing Scores in a Complete Search System Efficient Ranking Generic approach Find a set a of contenders, with K< JA/<< N a does not necessarily contain the top k, but has many docs from among the top k Return the top k docs in a Think of a as pruning non-contenders The same approach is also used for other (non cosine) scoring functions Will look at several schemes following this approach

Computing Scores in a Complete Search System Generic approach ▪ Find a set A of contenders, with K < |A| << N ▪ A does not necessarily contain the top K, but has many docs from among the top K ▪ Return the top K docs in A ▪ Think of A as pruning non-contenders ▪ The same approach is also used for other (noncosine) scoring functions ▪ Will look at several schemes following this approach Efficient Ranking 15

点击进入文档下载页（PPT格式）

共48页，可试读17页，点击继续阅读 ↓↓

您可能感兴趣的文档

上海海事大学：《数字图像处理》课程教学资源（PPT课件讲稿）Unit 7 Introduction to Digital Image Processing
Performance Evaluation of Long Range Dependent Queues（PPT讲稿）
《C语言程序设计》课程电子教案（PPT课件讲稿）第二章基本数据类型及运算
南京大学：《面向对象技术 OOT》课程教学资源（PPT课件讲稿）模式&框架 Pattern & Framework
《数据库系统概论 An Introduction to Database System》课程教学资源（PPT课件讲稿）第二讲关系数据库
《计算机辅助设计》课程介绍
沈阳工程学院：《面向对象程序设计》课程教学大纲（适用专业：计算机科学与技术专业）
《编译原理》课程教学资源（PPT课件讲稿）从正则表达式到有限自动机
Introduction to Computing Using Java（PPT讲稿）Java Language Basics
《物联网导论》课程教学资源（PPT课件讲稿）第2章自动识别技术与RFID
《计算机维修》课程教学资源（PPT课件讲稿）第3章磁盘工具
《数据结构》课程PPT教学课件（讲稿）第一章数据结构基础
《数据库基础》课程教学资源（PPT课件讲稿）第四章数据查询
北京大学：C++模板与STL库介绍（PPT讲稿）
Computer Graphics（PPT讲稿）INFORMATION VISUALIZATION
档案数字化基本程序与要求（PPT讲稿）
中国科学技术大学：《计算机体系结构》课程教学资源（PPT课件讲稿）第5章指令级并行
上海交通大学：《程序设计》课程教学资源（PPT课件讲稿）第14章输入输出与文件
中国科学技术大学：《计算机体系结构》课程教学资源（PPT课件讲稿）第7章多处理器及线程级并行
南京大学：《编译原理》课程教学资源（PPT课件讲稿）第五章语法制导的翻译
河南中医药大学：《网络技术实训》课程教学资源（PPT课件讲稿）第一阶段组网（主讲：路景鑫）
《SQL基础教程》课程教学资源（PPT课件讲稿）第6章数据操作与SQL语句
《计算机基础及C语言程序设计》课程PPT教学课件（讲稿）第1章概论
西安交通大学：《网络与信息安全》课程PPT教学课件（网络入侵与防范）身份认证

点击购买下载（PPT）

下载及服务说明

购买前请先查看本文档预览页，确认内容后再进行支付；
如遇文件无法下载、无法访问或其它任何问题，可发送电子邮件反馈，核实后将进行文件补发或退款等其它相关操作；
邮箱：

文档浏览记录