Introduction to text Mining Thanks for Hongning Wang@UVas slides on Text Ming Courses, Slides are slightly modified by Lei chen
Introduction to Text Mining Thanks for Hongning Wang@UVa’s slides on Text Ming Courses, Slides are slightly modified by Lei Chen
What is"Text Mining"? Text mining also referred to as text data mining roughly equivalent to text analytics, refers to the process of deriving high-quality in formation from text. -wikipedia Another way to view text data mining is as a process of exploratory data analysis that leads to heretofore unknown information, or to answers for questions for which the answer is not currently known. -Hearst, 1999 CSoUVa CS6501: Text Mining
What is “Text Mining”? • “Text mining, also referred to as text data mining, roughly equivalent to text analytics, refers to the process of deriving high-quality information from text.” - wikipedia • “Another way to view text data mining is as a process of exploratory data analysis that leads to heretofore unknown information, or to answers for questions for which the answer is not currently known.” - Hearst, 1999 CS@UVa CS6501: Text Mining 2
Two different definitions of mining Goal-oriented (effectiveness driven) Any process that generates useful results that are non- obvious is called"mining Keywords: useful+ non-obvious Data isnt necessarily massive Method-oriented (efficiency driven) Any process that involves extracting information from massive data is called"mining Keywords: "massive"+"pattern Patterns aren' t necessarily useful CSoUVa CS6501: Text Mining
Two different definitions of mining • Goal-oriented (effectiveness driven) – Any process that generates useful results that are nonobvious is called “mining”. – Keywords: “useful” + “non-obvious” – Data isn’t necessarily massive • Method-oriented (efficiency driven) – Any process that involves extracting information from massive data is called “mining” – Keywords: “massive” + “pattern” – Patterns aren’t necessarily useful CS@UVa CS6501: Text Mining 3
Text mining around us Sentiment analysis 20 12 RAC E FO R ∥,GMh,%N 5 uT sonERa WE COLLECT 70,000 H Mn Romney wL。。 “心心 WIN SENTIMEN N THESE I。ufcE TO THE DAY ULFORL 3 CNNPOUTICALTICKER-.COMBLOGSGingrichstepsupsupportrRomney,predictMourdockwihsinindiana. CSoUVa CS6501: Text Mining
Text mining around us • Sentiment analysis CS@UVa CS6501: Text Mining 4
Text mining around us Document summarization efficiently m 「0c8 至 wledge technologies a ng otes u u make il ach Tie vision assets CSoUVa CS6501: Text Mining
Text mining around us • Document summarization CS@UVa CS6501: Text Mining 5