当前位置：和泉文库 > 计算机 > 浏览文档

《电子商务 E-business》阅读文献：Learning Agorithms for Keyphrase Extraction

文件格式：PDF，文件大小：572.85KB，售价：13.74元

文档详细内容（约49页）

Turney Learning Algorithms for Keyphrase extraction 1. Introduction Many journals ask their authors to provide a list of keywords for their articles. We call these keyphrases, rather than keywords, because they are often phrases of two or more words, rather than single words. We define a keyphrase list as a short list of phrases(typically five to fifteen noun phrases)that capture the main topics discussed in a given document. This paper is concerned with the automatic extraction of keyphrases from text Keyphrases are meant to serve multiple goals. For example, (1)when they are printed on the first page of a journal article, the goal is summarization. They enable the reader to quickly determine whether the given article is in the readers fields of interest. (2)When they are printed in the cumulative index for a journal, the goal is indexing. They enable the reader to quickly find a relevant article when the reader has a specific need. (3)When a search engine form has a field labelled keywords, the goal is to enable the reader to make the search more precise. A search for documents that match a given query term in the keyword field will yield a smaller, higher quality list of hits than a search for the same term in the full text of the documents. Keyphrases can serve these diverse goals and others, because the goals share the requirement for a short list of phrases that captures the main topics of the documents We define automatic keyphrase extraction as the automatic selection of important, topi cal phrases from within the body of a document. Automatic keyphrase extraction is a special case of the more general task of automatic keyphrase generation, in which the generated phrases do not necessarily appear in the body of the given document. Section 2 discusses cri teria for measuring the performance of automatic keyphrase extraction algorithms. In the experiments in this paper, we measure the performance by comparing machine-generated keyphrases with human-generated key phrases In our document collections, an average of about 75% of the authors keyphrases appear somewhere in the body of the corresponding document. Thus, an ideal keyphrase extraction algorithm could (in principle) generate

Turney 2 Learning Algorithms for Keyphrase Extraction 1. Introduction Many journals ask their authors to provide a list of keywords for their articles. We call these keyphrases, rather than keywords, because they are often phrases of two or more words, rather than single words. We define a keyphrase list as a short list of phrases (typically five to fifteen noun phrases) that capture the main topics discussed in a given document. This paper is concerned with the automatic extraction of keyphrases from text. Keyphrases are meant to serve multiple goals. For example, (1) when they are printed on the first page of a journal article, the goal is summarization. They enable the reader to quickly determine whether the given article is in the reader’s fields of interest. (2) When they are printed in the cumulative index for a journal, the goal is indexing. They enable the reader to quickly find a relevant article when the reader has a specific need. (3) When a search engine form has a field labelled keywords, the goal is to enable the reader to make the search more precise. A search for documents that match a given query term in the keyword field will yield a smaller, higher quality list of hits than a search for the same term in the full text of the documents. Keyphrases can serve these diverse goals and others, because the goals share the requirement for a short list of phrases that captures the main topics of the documents. We define automatic keyphrase extraction as the automatic selection of important, topical phrases from within the body of a document. Automatic keyphrase extraction is a special case of the more general task of automatic keyphrase generation, in which the generated phrases do not necessarily appear in the body of the given document. Section 2 discusses criteria for measuring the performance of automatic keyphrase extraction algorithms. In the experiments in this paper, we measure the performance by comparing machine-generated keyphrases with human-generated keyphrases. In our document collections, an average of about 75% of the author’s keyphrases appear somewhere in the body of the corresponding document. Thus, an ideal keyphrase extraction algorithm could (in principle) generate

点击进入文档下载页（PDF格式）

共49页，可试读17页，点击继续阅读 ↓↓

您可能感兴趣的文档

点击购买下载（PDF）

下载及服务说明

购买前请先查看本文档预览页，确认内容后再进行支付；
如遇文件无法下载、无法访问或其它任何问题，可发送电子邮件反馈，核实后将进行文件补发或退款等其它相关操作；
邮箱：

文档浏览记录