当前位置：和泉文库 > 计算机 > 浏览文档

《网络搜索和挖掘技术》课程教学资源（PPT讲稿）Lecture 1：Web Search Overview & Web Crawling

文件格式：PPT，文件大小：763KB，售价：14元

文档详细内容（约50页）

Web Search Overview Crawling IR Dimensions of IR Content Applications Tasks Text Web search Ad hoc search Images Vertical search Filtering Video Enterprise search Classification Scanned docs Desktop search Question answering Audio Forum search Music P2P search Literature search 11

Web Search Overview & Crawling 11 Dimensions of IR Content Applications Tasks Text Web search Ad hoc search Images Vertical search Filtering Video Enterprise search Classification Scanned docs Desktop search Question answering Audio Forum search Music P2P search Literature search IR

Web Search Overview Crawling IR IR Tasks Ad-hoc search Find relevant documents for an arbitrary text query Filtering Identify relevant user profiles for a new document Classification Identify relevant labels for documents Question answering Give a specific answer to a question 12

Web Search Overview & Crawling 12 IR Tasks ▪ Ad-hoc search ▪ Find relevant documents for an arbitrary text query ▪ Filtering ▪ Identify relevant user profiles for a new document ▪ Classification ▪ Identify relevant labels for documents ▪ Question answering ▪ Give a specific answer to a question IR

Web Search Overview Crawling IR Big Issues in IR Relevance What is it? Simple (and simplistic)definition: relevant document contains the information that a person was looking for when they submitted a query to the search engine Many factors influence a person's decision about what is relevant: e.g., task, context, novelty, style Topical relevance (same topic)vs. user relevance (everything else) 13

Web Search Overview & Crawling 13 Big Issues in IR ▪ Relevance ▪ What is it? ▪ Simple (and simplistic) definition: A relevant document contains the information that a person was looking for when they submitted a query to the search engine ▪ Many factors influence a person’s decision about what is relevant: e.g., task, context, novelty, style ▪ Topical relevance (same topic) vs. user relevance (everything else) IR

Web Search Overview Crawling IR Big Issues in IR Relevance Retrieval models define a view of relevance Ranking algorithms used in search engines are based on retrieval models Most models describe statistical properties of text rather than linguistic i.e., counting simple text features such as words instead of parsing and analyzing the sentences Statistical approach to text processing started with Luhn in the 50s Linguistic features can be part of statistical model 14

Web Search Overview & Crawling 14 Big Issues in IR ▪ Relevance ▪ Retrieval models define a view of relevance ▪ Ranking algorithms used in search engines are based on retrieval models ▪ Most models describe statistical properties of text rather than linguistic ▪ i.e., counting simple text features such as words instead of parsing and analyzing the sentences ▪ Statistical approach to text processing started with Luhn in the 50s ▪ Linguistic features can be part of a statistical model IR

Web Search Overview Crawling IR Big Issues in IR Evaluation Experimental procedures and measures for comparing system output with user expectations Originated in Cranfield experiments in the 60s Typically use test collection of documents, queries, and relevance judgments Most commonly used are TREC collections Recall and precision are two examples of effectiveness measures 15

Web Search Overview & Crawling 15 Big Issues in IR ▪ Evaluation ▪ Experimental procedures and measures for comparing system output with user expectations ▪ Originated in Cranfield experiments in the 60s ▪ Typically use test collection of documents, queries, and relevance judgments ▪ Most commonly used are TREC collections ▪ Recall and precision are two examples of effectiveness measures IR

点击进入文档下载页（PPT格式）

共50页，可试读17页，点击继续阅读 ↓↓

您可能感兴趣的文档

《程序设计语言》课程PPT教学课件（章节大纲）
长春大学旅游学院：《计算机网络与网络安全》课程教学资源（PPT课件）第6章计算机网络与网络安全
JavaScript编程基础（JavaScript语法规则）
《面向对象程序设计》课程PPT教学课件：第1章 Visual Basic概述（主讲：高慧）
西安电子科技大学：Operating-System Structures（PPT讲稿）
电子科技大学计算机学院：《现代密码学》课程PPT教学课件（密码学基础）第一章引言
山东大学：《微机原理及单片机接口技术》课程教学资源（PPT课件讲稿）第九章模数转换器与数模转换器
香港浸会大学：《Data Communications and Networking》课程教学资源（PPT讲稿）Chapter 10 Circuit Switching and Packet Switching
杭州电子科技大学：《计算机、互联网和万维网简介》教学资源（PPT课件）Chapter 01 C++ Programming Basics
《E-commerce 2014》电子商务（PPT讲稿）Chapter 5 E-commerce Security and Payment Systems
《WEB技术开发》教学资源（PPT讲稿）HTML AND CSS
《E-commerce 2014》电子商务（PPT讲稿）Chapter 12 B2B E-commerce：Supply Chain Management and Collaborative Commerce
《编译原理》课程教学资源（PPT课件讲稿）第四章语法分析——自上而下分析
赣南师范大学：《计算机网络技术》课程教学资源（PPT课件讲稿）第十章 Internet概述
Java面向对象程序设计：Java的接口（PPT讲稿）
动态内存分配器的实现（实验PPT讲稿）
东南大学：《数据结构》课程教学资源（PPT课件讲稿）随机算法（主讲：方效林）
中国科学技术大学：《现代密码学理论与实践》课程教学资源（PPT课件讲稿）第1章引言（主讲：苗付友）
《算法设计与分析 Design and Analysis of Algorithms》课程PPT课件：Tutorial 10
《C程序设计》课程PPT电子教案：第一章概述
南京大学：《嵌入式网络物理系统》课程教学资源（PPT讲稿）时光自动机 Timed Automata
《PowerPoint》课程PPT教学课件：第六章使用PowerPoint创建演示文稿
香港科技大学：Web-log Mining：from Pages to Relations
中国科学技术大学计算机学院：《高级操作系统 Advanced Operating System》课程教学资源（PPT课件）第四章分布式进程和处理机管理（分布式处理机分配算法）

点击购买下载（PPT）

下载及服务说明

购买前请先查看本文档预览页，确认内容后再进行支付；
如遇文件无法下载、无法访问或其它任何问题，可发送电子邮件反馈，核实后将进行文件补发或退款等其它相关操作；
邮箱：

文档浏览记录