Text Mining Terminology Unstructured or semi-structured data Corpus(and corpora) Terms Concepts Stemming Stop words(and include words) Synonyms(and polysemes Tokenizing Copynight@ 2014 Pearson Education, Inc Slide 5-11
Copyright © 2014 Pearson Education, Inc. Slide 5- 11 Text Mining Terminology ▪ Unstructured or semi-structured data ▪ Corpus (and corpora) ▪ Terms ▪ Concepts ▪ Stemming ▪ Stop words (and include words) ▪ Synonyms (and polysemes) ▪ Tokenizing
Text Mining Terminology Term dictionary Word frequency Part-of-speech tagging Morphology Term-by-document matrix Occurrence matrix Singular value decomposition Latent semantic indexing Copynight@ 2014 Pearson Education, Inc Slide 5-12
Copyright © 2014 Pearson Education, Inc. Slide 5- 12 Text Mining Terminology ▪ Term dictionary ▪ Word frequency ▪ Part-of-speech tagging ▪ Morphology ▪ Term-by-document matrix ▪ Occurrence matrix ▪ Singular value decomposition ▪ Latent semantic indexing
Application Case 5.2 Text Mining for Patent Analysis What is a patent? exclusive rights granted by a country to an inventor for a limited period of time in How do we do patent analysis(PAl? n" exchange for a disclosure of an inventio Why do we need to do PA? What are the benefits? What are the challenges? How does text mining help in PA? Copynight@ 2014 Pearson Education, Inc Slide 5-13
Copyright © 2014 Pearson Education, Inc. Slide 5- 13 Application Case 5.2 Text Mining for Patent Analysis ▪ What is a patent? ▪ “exclusive rights granted by a country to an inventor for a limited period of time in exchange for a disclosure of an invention” ▪ How do we do patent analysis (PA)? ▪ Why do we need to do PA? ▪ What are the benefits? ▪ What are the challenges? ▪ How does text mining help in PA?
Natural Language Processing(NLP) Structuring a collection of text Old approach: bag-of-words New approach: natural language processing NLP is a very important concept in text mining a subfield of artificial intelligence and computational linguistics the studies of "understanding" the natural human language Syntax versus semantics-based text mining Copynight@ 2014 Pearson Education, Inc Slide 5-14
Copyright © 2014 Pearson Education, Inc. Slide 5- 14 Natural Language Processing (NLP) ▪ Structuring a collection of text ▪ Old approach: bag-of-words ▪ New approach: natural language processing ▪ NLP is … ▪ a very important concept in text mining ▪ a subfield of artificial intelligence and computational linguistics ▪ the studies of "understanding" the natural human language ▪ Syntax versus semantics-based text mining
Natural Language Processing(NLP) What is“ Understanding”? Human understands, what about computers? Natural language is vaque, context driven True understanding requires extensive knowledge of a topic Can/will computers ever understand natural language the same/accurate way we do? Copynight@ 2014 Pearson Education, Inc Slide 5-15
Copyright © 2014 Pearson Education, Inc. Slide 5- 15 Natural Language Processing (NLP) ▪ What is “Understanding”? ▪ Human understands, what about computers? ▪ Natural language is vague, context driven ▪ True understanding requires extensive knowledge of a topic ▪ Can/will computers ever understand natural language the same/accurate way we do?