Data Mining Characteristics Objectives Source of data for dm is often a consolidated data warehouse(not always DM environment is usually a client-server or a Web- based information systems architecture Data is the most critical ingredient for DM which may include soft/unstructured data The miner is often an end user Striking it rich requires creative thinking Data mining tools' capabilities and ease of use are essential (Web, Parallel processing, etc. Pearson Copyright C 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved Data Mining Characteristics & Objectives • Source of data for DM is often a consolidated data warehouse (not always!). • DM environment is usually a client-server or a Webbased information systems architecture. • Data is the most critical ingredient for DM which may include soft/unstructured data. • The miner is often an end user. • Striking it rich requires creative thinking. • Data mining tools’ capabilities and ease of use are essential (Web, Parallel processing, etc.)
How Data Mining Works DM extract patterns from data Pattern? A mathematical (numeric and/or symbolic) relationship among data items Types of patterns Association Prediction Cluster (segmentation) Sequential (or time series) relationships Pearson Copyright C 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved How Data Mining Works • DM extract patterns from data – Pattern? A mathematical (numeric and/or symbolic) relationship among data items • Types of patterns – Association – Prediction – Cluster (segmentation) – Sequential (or time series) relationships
Application Case 4.2 Dell Is Staying Agile and Effective with Analytics in the 21st Century Questions for Discussion 1. What was the challenge dell was facing that led to their analytics journey? 2. What solution did dell develop and implement? What were the results? 3. As an analytics company itself, Dell has used its service offerings for its own business. Do you think it is easier or harder for a company to taste its own medicine? Explain Pearson Copyright C 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved Application Case 4.2 Dell Is Staying Agile and Effective with Analytics in the 21st Century Questions for Discussion 1. What was the challenge Dell was facing that led to their analytics journey? 2. What solution did Dell develop and implement? What were the results? 3. As an analytics company itself, Dell has used its service offerings for its own business. Do you think it is easier or harder for a company to taste its own medicine? Explain
A Taxonomy for Data Mining Figure 4.2 A Simple Taxonomy for Data Mining Tasks, Methods, and algorithms Data Mining Tasks Methods Data Mining Algorithms Leaming Type Prediction -Regression Linear Nonlinear Regression, ANN, 一 Time Series Methods, Exponential Smoothing, ARIMA Aprior, OneR, zeroR, Eclat GA Apriory Algorithm, FP-Growth, Graph- 一 Sequence analysi Clustering K-means, Expection Ma bon(EM) analysis K-means, Expectation I ( EM)Unsupervised Pearson Copyright C 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved A Taxonomy for Data Mining • Figure 4.2 A Simple Taxonomy for Data Mining Tasks, Methods, and Algorithms Data Mining Algorithms K-means, Expectation Maximization (EM) Autoregressive Methods, Averaging Methods, Exponential Smoothing, ARIMA Expectation Maximization, Apriory Algorithm, Graph-based Matching Apriory, OneR, ZeroR, Eclat, GA Linear/Nonlinear Regression, ANN, Regression Trees, SVM, kNN, GA Decision Trees, Neural Networks, Support Vector Machines, kNN, Naïve Bayes, GA Data Mining Tasks & Methods Prediction Classification Regression Segmentation Association Link analysis Sequence analysis Clustering Apriory Algorithm, FP-Growth, Graph- based Matching Time Series Market-basket Outlier analysis Learning Type K-means, Expectation Maximization (EM) Supervised Unsupervised Supervised Supervised Unsupervised Unsupervised Unsupervised Unsupervised
Other Data Mining Patterns/Tasks Time-series forecasting Part of the sequence or link analysis? Visualization Another data mining task? Covered in Chapter 3 Data Mining versus statistics Are they the same? What is the relationship between the two? Pearson Copyright C 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved Other Data Mining Patterns/Tasks • Time-series forecasting – Part of the sequence or link analysis? • Visualization – Another data mining task? – Covered in Chapter 3 • Data Mining versus Statistics – Are they the same? – What is the relationship between the two?