当前位置：和泉文库 > 计算机 > 浏览文档

《商务智能：数据分析的管理视角 Business Intelligence, Analytics, and Data Science：A Managerial Perspective》教学资源（教师手册，原书第4版）04 Predictive Analytics I：Data Mining Process, Methods, and Algorithms

文件格式：DOC，文件大小：135KB，售价：6.82元

文档详细内容（约26页）

3. What do you think is the most prominent application area for data mining? Why? Students answers will differ depending on which of the applications(most likely banking, retailing and logistics, manufacturing and production, government, healthcare, medicine, or homeland security) they think is most in need of greater certainty. Their reasons for selection should relate to the application areas need for better certainty and the ability to pay for the investments in data mining Can you think of other application areas for data mining not discussed in this section? Explain Students should be able to identify an area that can benefit from greater prediction or certainty. Answers will vary depend ing on their creativit Section 4.4 Review Questions What are the major data mining processes Similar to other information systems initiatives, a data mining project must follow a systematic project management process to be successful. Several data mining processes have been proposed: CRISP-DM, SEMMA, and KDD 2. Why do you think the early phases(understanding of the business and understand ing of the data) take the longest in data mining projects? Students should explain that the early steps are the most unstructured phases because they involve learning. Those phases(learning/understanding) cannot be automated. Extra time and effort are needed upfront because any mistake in understand ing the business or data will most likely result in a failed BI project 3. List and briefly define the phases in the CriSP-dM proce CRISP-DM provides a systematic and orderly way to conduct data mining projects. This process has six steps. First, an understanding of the data and an understand ing of the business issues to be addressed are developed concurrently Next, data are prepared for modeling, are modeled; model results are evaluated and the models can be employed for regular use What are the main data preprocessing steps? Briefly describe each step and provide relevant examples Data preprocessing is essential to any successful data mining study. Good data leads to good information; good information leads to good decisions. Data preprocessing includes four main steps(listed in Table 4. 1 on page 167) data consolidation: access. collect. select and filter data 6 Copyright C2018 Pearson Education, Inc

6 Copyright © 2018Pearson Education, Inc. 3. What do you think is the most prominent application area for data mining? Why? Students’ answers will differ depending on which of the applications (most likely banking, retailing and logistics, manufacturing and production, government, healthcare, medicine, or homeland security) they think is most in need of greater certainty. Their reasons for selection should relate to the application area’s need for better certainty and the ability to pay for the investments in data mining. 4. Can you think of other application areas for data mining not discussed in this section? Explain. Students should be able to identify an area that can benefit from greater prediction or certainty. Answers will vary depending on their creativity. Section 4.4 Review Questions 1. What are the major data mining processes? Similar to other information systems initiatives, a data mining project must follow a systematic project management process to be successful. Several data mining processes have been proposed: CRISP-DM, SEMMA, and KDD. 2. Why do you think the early phases (understanding of the business and understanding of the data) take the longest in data mining projects? Students should explain that the early steps are the most unstructured phases because they involve learning. Those phases (learning/understanding) cannot be automated. Extra time and effort are needed upfront because any mistake in understanding the business or data will most likely result in a failed BI project. 3. List and briefly define the phases in the CRISP-DM process. CRISP-DM provides a systematic and orderly way to conduct data mining projects. This process has six steps. First, an understanding of the data and an understanding of the business issues to be addressed are developed concurrently. Next, data are prepared for modeling; are modeled; model results are evaluated; and the models can be employed for regular use. 4. What are the main data preprocessing steps? Briefly describe each step and provide relevant examples. Data preprocessing is essential to any successful data mining study. Good data leads to good information; good information leads to good decisions. Data preprocessing includes four main steps (listed in Table 4.1 on page 167): data consolidation: access, collect, select and filter data

data cleaning: handle missing data, reduce noise, fix errors data transformation: normalize the data, aggregate data, construct new attributes data reduction reduce number of attributes and records balance skewed data 5. How does crisp-dm differ from SEMMa? The main difference between CRISP-DM and SEMMA is that CRiSP-DM takes a more comprehensive approach--includ ing understand ing of the business and the relevant data-to data mining projects, whereas SEMMa implicitly assumes that the data mining project s goals and objectives along with the appropriate data sources have been identified and understood Section 4.5 Review Questions Identify at least three of the main data mining methods Classification learns patterns from past data(a set of information--traits variables, features--on characteristics of the previously labeled items, objects,or events)in order to place new instances(with unknown labels) into their respective groups or classes. The objective of classification is to analyze the historical data stored in a database and automatically generate a model that can predict future behavior Cluster analysis is an exploratory data analysis tool for solving classification problems. The objective is to sort cases(e.g, people, things, events) into groups or clusters, so that the degree of association is strong among members of the san cluster and weak among members of different clusters Association rule mining is a popular data mining method that is commonly used lIning Is technologically less savvy audience. Association rule mining aims to find interesting relationships(affinities) between variables(items) in large databases Give examples of situations in which classification would be an appropriate data mining technique. Give examples of situations in which regression would be an appropriate data mining technique Students' answers will differ, but should be based on the following issues Classification is for prediction that can be based on historical data and relationships, such as predicting the weather, product demand, or a students success in a university. If what is being predicted is a class label(e.g,"sunny rainy, or" cloudy )the prediction problem is called a classification, whereas if it is a numeric value(e. g, temperature such as 68F), the prediction problem is called a regression Copyright C2018 Pearson Education, Inc

7 Copyright © 2018Pearson Education, Inc. data cleaning: handle missing data, reduce noise, fix errors data transformation: normalize the data, aggregate data, construct new attributes data reduction: reduce number of attributes and records; balance skewed data 5. How does CRISP-DM differ from SEMMA? The main difference between CRISP-DM and SEMMA is that CRISP-DM takes a more comprehensive approach—including understanding of the business and the relevant data—to data mining projects, whereas SEMMA implicitly assumes that the data mining project’s goals and objectives along with the appropriate data sources have been identified and understood. Section 4.5 Review Questions 1. Identify at least three of the main data mining methods. Classification learns patterns from past data (a set of information—traits, variables, features—on characteristics of the previously labeled items, objects, or events) in order to place new instances (with unknown labels) into their respective groups or classes. The objective of classification is to analyze the historical data stored in a database and automatically generate a model that can predict future behavior. Cluster analysis is an exploratory data analysis tool for solving classification problems. The objective is to sort cases (e.g., people, things, events) into groups, or clusters, so that the degree of association is strong among members of the same cluster and weak among members of different clusters. Association rule mining is a popular data mining method that is commonly used as an example to explain what data mining is and what it can do to a technologically less savvy audience. Association rule mining aims to find interesting relationships (affinities) between variables (items) in large databases. 2. Give examples of situations in which classification would be an appropriate data mining technique. Give examples of situations in which regression would be an appropriate data mining technique. Students’ answers will differ, but should be based on the following issues. Classification is for prediction that can be based on historical data and relationships, such as predicting the weather, product demand, or a student’s success in a university. If what is being predicted is a class label (e.g., “sunny,” “rainy,” or “cloudy”) the prediction problem is called a classification, whereas if it is a numeric value (e.g., temperature such as 68°F), the prediction problem is called a regression

8 Copyright © 2018Pearson Education, Inc. 3. List and briefly define at least two classification techniques. • Decision tree analysis. Decision tree analysis (a machine-learning technique) is arguably the most popular classification technique in the data mining arena. • Statistical analysis. Statistical classification techniques include logistic regression and discriminant analysis, both of which make the assumptions that the relationships between the input and output variables are linear in nature, the data is normally distributed, and the variables are not correlated and are independent of each other. • Case-based reasoning. This approach uses historical cases to recognize commonalities in order to assign a new case into the most probable category. • Bayesian classifiers. This approach uses probability theory to build classification models based on the past occurrences that are capable of placing a new instance into a most probable class (or category). • Genetic algorithms. The use of the analogy of natural evolution to build directed search-based mechanisms to classify data samples. • Rough sets. This method takes into account the partial membership of class labels to predefined categories in building models (collection of rules) for classification problems. 4. What are some of the criteria for comparing and selecting the best classification technique? • The amount and availability of historical data • The types of data, categorical, interval, ration, etc. • What is being predicted—class or numeric value • The purpose or objective 5. Briefly describe the general algorithm used in decision trees. A general algorithm for building a decision tree is as follows: 1. Create a root node and assign all of the training data to it. 2. Select the best splitting attribute. 3. Add a branch to the root node for each value of the split. Split the data into mutually exclusive (non-overlapping) subsets along the lines of the specific split and mode to the branches

点击进入文档下载页（DOC格式）

共26页，试读已结束，阅读完整版请下载

您可能感兴趣的文档

点击购买下载（DOC）

下载及服务说明

购买前请先查看本文档预览页，确认内容后再进行支付；
如遇文件无法下载、无法访问或其它任何问题，可发送电子邮件反馈，核实后将进行文件补发或退款等其它相关操作；
邮箱：

文档浏览记录