Natural language understand ing · Machine translation F read F Speech recognition Text-to-speech Text proofi Optical character recognition Section 5.4 Review Questions List and briefly discuss some of the text mining applications in marketing Text mining can be used to increase cross-selling and up-selling by analyzing the unstructured data generated by call centers Text mining has become invaluable for customer relationship management Companies can use text mining to analyze rich sets of unstructured text data, combined with the relevant structured data extracted from organizational databases, to predict customer perceptions and subsequent purchasing behavior How can text mining be used in security and counterterrorism? Students may use the introductory case in this answer In 2007, EUROPOL developed an integrated system capable of accessing, storing, and analyzing vast amounts of structured and unstructured data sources in order to track transnational organized crime Another security-related application of text mining is in the area of deception detection 3. What are some promising text mining applications in biomedicine? As in any other experimental approach, it is necessary to analyze the vast amount of data in the context of previously known information about the biological entities under study. The literature is a particularly valuable source of information for experiment validation and interpretation. Therefore, the development of automated text tools to assist in such interpretation is one of the mai challenges in current bioinformatics research 6 Copyright C2018 Pearson Education, Inc
6 Copyright © 2018Pearson Education, Inc. • Natural language understanding • Machine translation • Foreign language reading • Foreign language writing • Speech recognition • Text-to-speech • Text proofing • Optical character recognition Section 5.4 Review Questions 1. List and briefly discuss some of the text mining applications in marketing. Text mining can be used to increase cross-selling and up-selling by analyzing the unstructured data generated by call centers. Text mining has become invaluable for customer relationship management. Companies can use text mining to analyze rich sets of unstructured text data, combined with the relevant structured data extracted from organizational databases, to predict customer perceptions and subsequent purchasing behavior. 2. How can text mining be used in security and counterterrorism? Students may use the introductory case in this answer. In 2007, EUROPOL developed an integrated system capable of accessing, storing, and analyzing vast amounts of structured and unstructured data sources in order to track transnational organized crime. Another security-related application of text mining is in the area of deception detection. 3. What are some promising text mining applications in biomedicine? As in any other experimental approach, it is necessary to analyze the vast amount of data in the context of previously known information about the biological entities under study. The literature is a particularly valuable source of information for experiment validation and interpretation. Therefore, the development of automated text mining tools to assist in such interpretation is one of the main challenges in current bioinformatics research
Section 5.5 Review Questions 1. What are the main steps in the text mining process? See Figure 5.6(p. 222). Text mining entails three tasks Establish the Corpus: Collect and organize the domain-specific unstructured data Create the Term-Document Matrix: Introduce structure to the corpus Extract Knowledge: Discover novel patterns from the T-D matrix 2. What is the reason for normalizing word frequencies? What are the common methods for normalizing word frequencies? The raw indices need to be normalized in order to have a more consistent tdm for further analysis. Common methods are log frequencies, binary frequencies, and inverse document frequenc What is SvD? How is it used in text mining? Singular value decomposition(SVD), which is closely related to principal components analysis, reduces the overall dimensionality of the input matrix (number of input documents by number of extracted terms )to a lower dimensional space, where each consecutive d imension represents the largest degree of variability(between words and documents) possible 4. What are the main knowledge extraction methods from corpus? The main categories of knowledge extraction methods are classification, clustering, association, and trend analysis Section 5.6 Review Questions 1. What is sentiment analysis? How does it relate to text mining Sentiment analysis tries to answer the question, " What do people feel about a certain topic? by digging into opinions of many using a variety of automated tools. It is also known as opinion mining, subjectivity analysis, and appraisal extraction Sentiment analysis shares many characteristics and techniques with text mining However, unlike text mining, which categorizes text by conceptual taxonomies of Copyright C2018 Pearson Education, Inc
7 Copyright © 2018Pearson Education, Inc. Section 5.5 Review Questions 1. What are the main steps in the text mining process? See Figure 5.6 (p. 222). Text mining entails three tasks: • Establish the Corpus: Collect and organize the domain-specific unstructured data • Create the Term–Document Matrix: Introduce structure to the corpus • Extract Knowledge: Discover novel patterns from the T-D matrix 2. What is the reason for normalizing word frequencies? What are the common methods for normalizing word frequencies? The raw indices need to be normalized in order to have a more consistent TDM for further analysis. Common methods are log frequencies, binary frequencies, and inverse document frequencies. 3. What is SVD? How is it used in text mining? Singular value decomposition (SVD), which is closely related to principal components analysis, reduces the overall dimensionality of the input matrix (number of input documents by number of extracted terms) to a lower dimensional space, where each consecutive dimension represents the largest degree of variability (between words and documents) possible. 4. What are the main knowledge extraction methods from corpus? The main categories of knowledge extraction methods are classification, clustering, association, and trend analysis. Section 5.6 Review Questions 1. What is sentiment analysis? How does it relate to text mining? Sentiment analysis tries to answer the question, “What do people feel about a certain topic?” by digging into opinions of many using a variety of automated tools. It is also known as opinion mining, subjectivity analysis, and appraisal extraction. Sentiment analysis shares many characteristics and techniques with text mining. However, unlike text mining, which categorizes text by conceptual taxonomies of
topics, sentiment classification generally deals with two classes(positive versus negative), a range of polarity(e.g, star ratings for movies), or a range in strength of opinion What are the most popular application areas for sentiment analysis? Why? Customer relationship management(CRM) and customer experience management are popular"voice of the customer(VOC)applications. Other application areas include"voice of the market(VOM)and"voice of the employee (VOe) What would be the expected benefits and beneficiaries of sentiment analysis in olitics? Opinions matter a great deal in politics. Because political discussions are dominated by quotes, sarcasm, and complex references to persons, organizations, and ideas, politics is one of the most difficult, and potentially fruitful, areas for sentiment analysis. By analyzing the sentiment on election forums, one may predict who is more likely to win or lose. Sentiment analysis can help understand what voters are thinking and can clarify a cand idate's position on issues Sentiment analysis can help political organizations, campaigns, and news analysts to better understand which issues and positions matter the most to voters. The technology was successfully applied by both parties to the 2008 and 2012 American presidential election campaigns 4. What are the main steps in carrying out sentiment analysis projects? The first step when performing sentiment analysis of a text document is called sentiment detection, during which text data is differentiated between fact and opinion(objective vs subjective). This is followed by negative-positive(N-P) polarity classification, where a subjective text item is classified on a bipolar range Following this comes target identification(identifying the person, product, event, etc. that the sentiment is about ) Finally come collection and aggregation, in which the overall sentiment for the document is calculated based on the calculations of sentiments of individual phrases and words from the first three 5. What are the two common methods for polarity identification? Explain Polarity identification can be done via a lexicon(as a reference library )or by using a collection of training documents and inductive machine learning algorithms. The lexicon approach uses a catalog of words, their synonyms, and their meanings, combined with numerical ratings indicating the position on the n P polarity associated with these words. In this way, affective, emotional, and attitud inal phrases can be classified according to their degree of positivity or negativity. By contrast, the training-document approach uses statistical analysis and machine learning algorithms, such as neural networks, clustering approaches Copyright C2018 Pearson Education, Inc
8 Copyright © 2018Pearson Education, Inc. topics, sentiment classification generally deals with two classes (positive versus negative), a range of polarity (e.g., star ratings for movies), or a range in strength of opinion. 2. What are the most popular application areas for sentiment analysis? Why? Customer relationship management (CRM) and customer experience management are popular “voice of the customer (VOC)” applications. Other application areas include “voice of the market (VOM)” and “voice of the employee (VOE).” 3. What would be the expected benefits and beneficiaries of sentiment analysis in politics? Opinions matter a great deal in politics. Because political discussions are dominated by quotes, sarcasm, and complex references to persons, organizations, and ideas, politics is one of the most difficult, and potentially fruitful, areas for sentiment analysis. By analyzing the sentiment on election forums, one may predict who is more likely to win or lose. Sentiment analysis can help understand what voters are thinking and can clarify a candidate’s position on issues. Sentiment analysis can help political organizations, campaigns, and news analysts to better understand which issues and positions matter the most to voters. The technology was successfully applied by both parties to the 2008 and 2012 American presidential election campaigns. 4. What are the main steps in carrying out sentiment analysis projects? The first step when performing sentiment analysis of a text document is called sentiment detection, during which text data is differentiated between fact and opinion (objective vs. subjective). This is followed by negative-positive (N-P) polarity classification, where a subjective text item is classified on a bipolar range. Following this comes target identification (identifying the person, product, event, etc. that the sentiment is about). Finally come collection and aggregation, in which the overall sentiment for the document is calculated based on the calculations of sentiments of individual phrases and words from the first three steps. 5. What are the two common methods for polarity identification? Explain. Polarity identification can be done via a lexicon (as a reference library) or by using a collection of training documents and inductive machine learning algorithms. The lexicon approach uses a catalog of words, their synonyms, and their meanings, combined with numerical ratings indicating the position on the NP polarity associated with these words. In this way, affective, emotional, and attitudinal phrases can be classified according to their degree of positivity or negativity. By contrast, the training-document approach uses statistical analysis and machine learning algorithms, such as neural networks, clustering approaches
and decision trees to ascertain the sentiment for a new text document based on patterns from previous training" documents with assigned sentiment scores Section 5.7 Review Questions 1. What are some of the main challenges the Web poses for knowledge discovery? The Web is too big for effective data mining The Web is too complex The Web is too dynamic The Web is not specific to a domain The Web has everything 2. What is Web mining? How does it differ from regular data mining or text mining? Web mining is the discovery and analysis of interesting and useful information from the Web and about the Web, usually through Web-based tools. Text mining is less structured because it's based on words instead of numeric data 3. What are the three main areas of Web mining? The three main areas of Web mining are Web content mining, Web structure mining, and Web usage (or activity) mining 4. What is Web content mining? How can it be used for competitive advantage? Web content mining refers to the extraction of useful information from Web pages. The documents may be extracted in some machine-readable format so that automated techniques can generate some information about the Web pages Collecting and mining Web content can be used for competitive intelligence (collecting intelligence about competitors'products, services, and customers), which can give your organization a competitive advantage 5. What is Web structure mining? How does it differ from Web content mining? Web structure mining is the process of extracting useful information from the links embedded in Web documents. By contrast, Web content mining involves analysis of the specific textual content of web pages. So, Web structure mining is more related to navigation through a website, whereas Web content mining is more related to text mining and the document hierarchy of a particular web page Copyright C2018 Pearson Education, Inc
9 Copyright © 2018Pearson Education, Inc. and decision trees to ascertain the sentiment for a new text document based on patterns from previous “training” documents with assigned sentiment scores. Section 5.7 Review Questions 1. What are some of the main challenges the Web poses for knowledge discovery? • The Web is too big for effective data mining. • The Web is too complex. • The Web is too dynamic. • The Web is not specific to a domain. • The Web has everything. 2. What is Web mining? How does it differ from regular data mining or text mining? Web mining is the discovery and analysis of interesting and useful information from the Web and about the Web, usually through Web-based tools. Text mining is less structured because it’s based on words instead of numeric data. 3. What are the three main areas of Web mining? The three main areas of Web mining are Web content mining, Web structure mining, and Web usage (or activity) mining. 4. What is Web content mining? How can it be used for competitive advantage? Web content mining refers to the extraction of useful information from Web pages. The documents may be extracted in some machine-readable format so that automated techniques can generate some information about the Web pages. Collecting and mining Web content can be used for competitive intelligence (collecting intelligence about competitors’ products, services, and customers), which can give your organization a competitive advantage. 5. What is Web structure mining? How does it differ from Web content mining? Web structure mining is the process of extracting useful information from the links embedded in Web documents. By contrast, Web content mining involves analysis of the specific textual content of web pages. So, Web structure mining is more related to navigation through a website, whereas Web content mining is more related to text mining and the document hierarchy of a particular web page