当前位置：和泉文库 > 计算机 > 浏览文档

北京大学：《信息检索》课程教学资源（PPT课件讲稿）Web Search

文件格式：PPT，文件大小：845KB，售价：7.85元

文档详细内容（约32页）

Outline Overview of web search Next generation search engines CCF-ADL at Zhengzhou University, 2 June25-27,2010

Outline • Overview of web search • Next generation search engines 2 CCF-ADL at Zhengzhou University, June 25-27, 2010

Characteristics of Web Information "Infinite"size (Surface vs.deep Web) Surface static HTML pages Deep=dynamically generated HTML pages(DB) ·Semi-structured -Structured HTML tags,hyperlinks,etc Unstructured Text Different format (pdf,word,ps,...) 。Multi--media(Textual,,audio,images,…) High variances in quality(Many junks) "Universal"coverage(can be about any content) CCF-ADL at Zhengzhou University, June25-27,2010

Characteristics of Web Information • “Infinite” size (Surface vs. deep Web) – Surface = static HTML pages – Deep = dynamically generated HTML pages (DB) • Semi-structured – Structured = HTML tags, hyperlinks, etc – Unstructured = Text • Different format (pdf, word, ps, …) • Multi-media (Textual, audio, images, …) • High variances in quality (Many junks) • “Universal” coverage (can be about any content) 3 CCF-ADL at Zhengzhou University, June 25-27, 2010

General Challenges in Web Information Management Handling the size of the Web How to ensure completeness of coverage? Efficiency issues Dealing with or tolerating errors and low quality information Addressing the dynamics of the Web Some pages may disappear permanently New pages are constantly created CCF-ADL at Zhengzhou University, 4 June25-27,2010

General Challenges in Web Information Management • Handling the size of the Web – How to ensure completeness of coverage? – Efficiency issues • Dealing with or tolerating errors and low quality information • Addressing the dynamics of the Web – Some pages may disappear permanently – New pages are constantly created 4 CCF-ADL at Zhengzhou University, June 25-27, 2010

“Free text'"vs.“Structured text" So far,we've assumed "free text" Document word sequence -Query word sequence Collection a set of documents -Minimal structure .. But,we may have structures on text(e.g.,title, hyperlinks) Can we exploit the structures in retrieval? CCF-ADL at Zhengzhou University, June25-27,2010

“Free text” vs. “Structured text” • So far, we’ve assumed “free text” – Document = word sequence – Query = word sequence – Collection = a set of documents – Minimal structure … • But, we may have structures on text (e.g., title, hyperlinks) – Can we exploit the structures in retrieval? 5 CCF-ADL at Zhengzhou University, June 25-27, 2010

Examples of Document Structures Intra-doc structures(=relations of components) Natural components:title,author,abstract, sections,references,.. Annotations:named entities,subtopics, markups,… Inter-doc structures (=relations between documents) Topic hierarchy Hyperlinks/citations (hypertext) CCF-ADL at Zhengzhou University, June25-27,2010 6

Examples of Document Structures • Intra-doc structures (=relations of components) – Natural components: title, author, abstract, sections, references, … – Annotations: named entities, subtopics, markups, … • Inter-doc structures (=relations between documents) – Topic hierarchy – Hyperlinks/citations (hypertext) 6 CCF-ADL at Zhengzhou University, June 25-27, 2010

点击进入文档下载页（PPT格式）

共32页，可试读12页，点击继续阅读 ↓↓

您可能感兴趣的文档

北京大学：《信息检索》课程教学资源（PPT课件讲稿）Course Overview（主讲：闫宏飞）
中国科学技术大学：《人工智能基础》课程教学资源（课件讲稿）Lecture 01 Introdution（主讲：吉建民）
中国科学技术大学：《人工智能基础》课程教学资源（课件讲稿）Lecture 15 智能机器人系统介绍
中国科学技术大学：《人工智能基础》课程教学资源（课件讲稿）Lecture 14 Reinforcement Learning
中国科学技术大学：《人工智能基础》课程教学资源（课件讲稿）Lecture 13 神经网络与深度学习
中国科学技术大学：《人工智能基础》课程教学资源（课件讲稿）Lecture 09 AI Planning
中国科学技术大学：《人工智能基础》课程教学资源（课件讲稿）Lecture 08 First-Order Logic and Inference in FOL
中国科学技术大学：《人工智能基础》课程教学资源（课件讲稿）Lecture 11 马尔可夫决策过程
中国科学技术大学：《人工智能基础》课程教学资源（课件讲稿）Lecture 10 Uncertainty and Bayesian Networks
中国科学技术大学：《人工智能基础》课程教学资源（课件讲稿）Lecture 07 Logical Agents
中国科学技术大学：《人工智能基础》课程教学资源（课件讲稿）Lecture 06 Game Playing
中国科学技术大学：《人工智能基础》课程教学资源（课件讲稿）Lecture 05 Constraint Satisfaction Problems
北京大学：《信息检索》课程教学资源（PPT课件讲稿）Crawling the Web
北京大学：《信息检索》课程教学资源（PPT课件讲稿）Retrieval Models
北京大学：《信息检索》课程教学资源（PPT课件讲稿）Essential Background
哈尔滨工业大学：《信息检索》课程教学资源（课件讲义）文本分类 Text Categorization（主讲：刘挺）
哈尔滨工业大学：《信息检索》课程教学资源（课件讲义）信息过滤（主讲：刘挺）
哈尔滨工业大学：《信息检索》课程教学资源（课件讲义）信息检索模型 IRModel
哈尔滨工业大学：《信息检索》课程教学资源（课件讲义）信息检索概述
哈尔滨工业大学：《信息检索》课程教学资源（课件讲义）搜索引擎技术 SearchEngine
《统计自然语言处理》课程教学资源（PPT课件讲稿）第7章汉语自动分词与词性标注
北京大学：《信息检索》课程PPT课件讲稿（自然语言处理）01 Introduction（主讲：彭波）The CCF Advanced Disciplines Lectures
北京大学：《信息检索》课程PPT课件讲稿（自然语言处理）02 Link Analysis
北京大学：《信息检索》课程PPT课件讲稿（自然语言处理）03 Web Spam

点击购买下载（PPT）

下载及服务说明

购买前请先查看本文档预览页，确认内容后再进行支付；
如遇文件无法下载、无法访问或其它任何问题，可发送电子邮件反馈，核实后将进行文件补发或退款等其它相关操作；
邮箱：

文档浏览记录