Descriptive Analytics I CHAPTER Nature of data Statistical Modeling, and Visualization Learning Objectives for Chapter 2 Understand the nature of data as it relates to business intelligence(Bi)and analytics Learn the methods used to make real-world data analytics ready Describe statistical modeling and its relationship to business analytics Learn about descriptive and inferential statistics Define business reporting, and understand its historical evolution Understand the importance of data/information visualization Learn different types of visualization techniques Appreciate the value that visual analytics brings to business analytics Know the capabilities and limitations of dashboards CHAPTER OVERVIEW In the age of Big data and business analytics in which we are living, the importance of data is undeniable. The newly coined phrases like"data is the oil, ""data is the Copyright C2018 Pearson Education, Inc
1 Copyright © 2018Pearson Education, Inc. Descriptive Analytics I: Nature of Data, Statistical Modeling, and Visualization Learning Objectives for Chapter 2 ▪ Understand the nature of data as it relates to business intelligence (BI) and analytics ▪ Learn the methods used to make real-world data analytics ready ▪ Describe statistical modeling and its relationship to business analytics ▪ Learn about descriptive and inferential statistics ▪ Define business reporting, and understand its historical evolution ▪ Understand the importance of data/information visualization ▪ Learn different types of visualization techniques ▪ Appreciate the value that visual analytics brings to business analytics ▪ Know the capabilities and limitations of dashboards CHAPTER OVERVIEW In the age of Big Data and business analytics in which we are living, the importance of data is undeniable. The newly coined phrases like “data is the oil,” “data is the CHAPTER 2
new bacon, "data is the new currency and"data is the king " are further stressing the renewed importance of data. But what type of data are we talking about? Obviously, not just any data. The"garbage in garbage out-GIGO concept/principle applies to todays Big Data phenomenon more so than any data definition that we have had in the past to be carefully created/identified, collected, integrated, cleaned, transformed, and ata has To live up to its promise, its value proposition, and its ability to turn into insight, data has properly contextualized for use in accurate and timely decision making. Data is the main heme of this chapter. Accordingly, the chapter starts with a description of the nature of data: what it is, what different types and forms it can come in, and how it can be preprocessed and made ready for analytics. The first few sections of the chapter are dedicated to a deep yet necessary understanding and processing of data. The next few sections describe the statistical methods used to prepare data as input to produce both descriptive and inferential measures. Following the statistics sections are sections or reporting and visualization. A report is a communication artifact prepared with the pecific intention of converting data into information and knowled ge and relaying that information in an easily understandable/digestible format. Nowadays, these reports are more visually oriented, often using colors and graphical icons that collectively look like a dashboard to enhance the information content. Therefore, the latter part of the chapter is ded icated to subsections that present the design, implementation, and best practices for information visualization, storytelling, and information dashboards CHAPTER OUTLINE 2. 1 Opening Vignette: Sirius XM Attracts and Engages a New Generation of Radio Consumers with Data-Driven Marketing 2.2 The Nature of Data 2.3 A Simple Taxonomy of Data 2. 4 The Art and Science of Data Preprocessing 2.5 Statistical Modeling for Business analytics 2.6 Regression Modeling For Inferential Statistics 2.7 Business Reporting 2. 8 Data visualization 2.9 Different Types of Charts and Graphs 2. 10 The Emergence of Visual Analytics 2.11 Information dashboards Copyright C2018 Pearson Education, Inc
2 Copyright © 2018Pearson Education, Inc. new bacon,” “data is the new currency,” and “data is the king” are further stressing the renewed importance of data. But what type of data are we talking about? Obviously, not just any data. The “garbage in garbage out—GIGO” concept/principle applies to today’s “Big Data” phenomenon more so than any data definition that we have had in the past. To live up to its promise, its value proposition, and its ability to turn into insight, data has to be carefully created/identified, collected, integrated, cleaned, transformed, and properly contextualized for use in accurate and timely decision making. Data is the main theme of this chapter. Accordingly, the chapter starts with a description of the nature of data: what it is, what different types and forms it can come in, and how it can be preprocessed and made ready for analytics. The first few sections of the chapter are dedicated to a deep yet necessary understanding and processing of data. The next few sections describe the statistical methods used to prepare data as input to produce both descriptive and inferential measures. Following the statistics sections are sections on reporting and visualization. A report is a communication artifact prepared with the specific intention of converting data into information and knowledge and relaying that information in an easily understandable/digestible format. Nowadays, these reports are more visually oriented, often using colors and graphical icons that collectively look like a dashboard to enhance the information content. Therefore, the latter part of the chapter is dedicated to subsections that present the design, implementation, and best practices for information visualization, storytelling, and information dashboards. CHAPTER OUTLINE 2.1 Opening Vignette: SiriusXM Attracts and Engages a New Generation of Radio Consumers with Data-Driven Marketing 2.2 The Nature of Data 2.3 A Simple Taxonomy of Data 2.4 The Art and Science of Data Preprocessing 2.5 Statistical Modeling for Business Analytics 2.6 Regression Modeling For Inferential Statistics 2.7 Business Reporting 2.8 Data Visualization 2.9 Different Types of Charts and Graphs 2.10 The Emergence of Visual Analytics 2.11 Information Dashboards
ANSWERS TO END OF SECTION REVIEW QUEST|oNs°···· Section 2.1 Review Questions 1. What does SiriusXM do? In what type of market does it conduct its business? SiriusXM is a provider of satellite rad io. They primarily provide services in What were the challenges? Comment on both technology and data-related challenges The company had several challenges. The first was the changing demographics of car owners. As cars were sold on the secondary market it was more difficult for them to identify new potential customers. Additionally, the company had a technical challenge because of an acquisition. There was uncertainty about their ability t all of the technolo ailable gh the What were the proposed solutions? The company felt that it would be able to maintain a strategic advantage if it egan working towards being a data-driven marketing company. This would allow them to more precisely target current and potential customers 4. How did they implement the proposed solutions? Did they face any implementation challenges? The company decided to bring all marketing work in- house. It was determined that it was important for them to clean the data and manage it in a central repository. To do this they partnered with Teradata. There were challenges with the implementation due to the variability in the data itself and the complexity of the task 5. What were the results and benefits? Were they worth the effort/investment? The company has been able to progress significantly in its goal of becoming data-driven marketing organization. With the new systems in place, it is possible to move campaigns faster with better visibility 6. Can you think of other companies facing similar challenges that can potentially benefit from similar data-driven marketing solutions? Most companies that market directly to end users could use a similar approach to managing and leveraging data in their marketing activities Copyright C2018 Pearson Education, Inc
3 Copyright © 2018Pearson Education, Inc. ANSWERS TO END OF SECTION REVIEW QUESTIONS Section 2.1 Review Questions 1. What does SiriusXM do? In what type of market does it conduct its business? SiriusXM is a provider of satellite radio. They primarily provide services in automobiles. 2. What were the challenges? Comment on both technology and data-related challenges. The company had several challenges. The first was the changing demographics of car owners. As cars were sold on the secondary market it was more difficult for them to identify new potential customers. Additionally, the company had a technical challenge because of an acquisition. There was uncertainty about their ability to use all of the technology available through the acquisition. 3. What were the proposed solutions? The company felt that it would be able to maintain a strategic advantage if it began working towards being a data-driven marketing company. This would allow them to more precisely target current and potential customers. 4. How did they implement the proposed solutions? Did they face any implementation challenges? The company decided to bring all marketing work in-house. It was determined that it was important for them to clean the data and manage it in a central repository. To do this they partnered with Teradata. There were challenges with the implementation due to the variability in the data itself and the complexity of the task. 5. What were the results and benefits? Were they worth the effort/investment? The company has been able to progress significantly in its goal of becoming a data-driven marketing organization. With the new systems in place, it is possible to move campaigns faster with better visibility. 6. Can you think of other companies facing similar challenges that can potentially benefit from similar data-driven marketing solutions? Most companies that market directly to end users could use a similar approach to managing and leveraging data in their marketing activities
Section 2.2 Review Questions How do you describe the importance of data in analytics? Can we think of analytics without data? Data is the main ingred ient in all forms of analytics. You cannot have analytics without data Considering the new and broad definition of business analytics, what are the main inputs and outputs to the analytics continuum? Because of the broader definition of business analytics, almost any data from almost any source can be considered an input. In the same way, after analytics has been performed output can take a wide variety of forms depending on the specific business purpos Where does the data for business analytics come from? Data can come from a wide variety of locations. Examples can include business processes and systems, the Internet and social media, and machines or the Internet of th 4. In your opinion, what are the top three data-related challenges for better analytics? Opinions will vary, but examples of challenges include data reliability, accuracy, accessibility, security, richness, consistency, timeliness, granularity, valid ity, and elevance 5 What are the most common metrics that make for analytics-ready data? It must be relevant to the problem at hand and meet the quality/quantity requirements. It also has to have a certain data structure in place with key field s/variables with properly normalized values and conform to organizational definitions Section 2.3 Review Questions 1. What is data? how does data differ from information and knowledge? Data refers to a collection of facts usually obtained as the result of experiments, observations, transactions, or experiences. Data may consist of numbers, letters, words. images, voice recordings and so on. as measurements of a set of variables Data is a raw commod ity and does not become information or knowledge until after it is processed Copyright C2018 Pearson Education, Inc
4 Copyright © 2018Pearson Education, Inc. Section 2.2 Review Questions 1. How do you describe the importance of data in analytics? Can we think of analytics without data? Data is the main ingredient in all forms of analytics. You cannot have analytics without data. 2. Considering the new and broad definition of business analytics, what are the main inputs and outputs to the analytics continuum? Because of the broader definition of business analytics, almost any data from almost any source can be considered an input. In the same way, after analytics has been performed output can take a wide variety of forms depending on the specific business purpose. 3. Where does the data for business analytics come from? Data can come from a wide variety of locations. Examples can include business processes and systems, the Internet and social media, and machines or the Internet of Things. 4. In your opinion, what are the top three data-related challenges for better analytics? Opinions will vary, but examples of challenges include data reliability, accuracy, accessibility, security, richness, consistency, timeliness, granularity, validity, and relevance. 5. What are the most common metrics that make for analytics-ready data? It must be relevant to the problem at hand and meet the quality/quantity requirements. It also has to have a certain data structure in place with key fields/variables with properly normalized values and conform to organizational definitions. Section 2.3 Review Questions 1. What is data? How does data differ from information and knowledge? Data refers to a collection of facts usually obtained as the result of experiments, observations, transactions, or experiences. Data may consist of numbers, letters, words, images, voice recordings, and so on, as measurements of a set of variables. Data is a raw commodity and does not become information or knowledge until after it is processed
What are the main categories of data? What types of data can we use for BI and The main categories of data are structured data and unstructured data. Both of these types of data can be used for business intelligence and analytics, although it is easier and more exped ient to use structured data 3. Can we use the same data representation for all analytics models? Why, or why not No, other data types, including textual, spatial, imagery, video, and voice, need to be converted into some form of categorical or numeric representation before they can be processed by analytics methods 4. What is a 1-of-n data representation? why and where is it used in analytics? Nominal or ordinal variables are converted into numeric representations using some type of 1-of-N pseudo variables(e. g, a categorical variable with three unique values can be transformed into three pseudo variables with binary values or 0). This allows it to be used in predictive analytics Section 2.4 Review Questions 1. Why is the original/raw data not read ily usable by analytics tasks? It is often dirty, misaligned, overly complex, and inaccurate 2. What are the main data preprocessing steps? The main data preprocessing steps include data consolidation, data cleaning, data transformation. and data reduction What does it mean to clean/scrub the data? What activities are performed in this phase? In this step, the values in the data set are identified and dealt with. The analyst will identify noisy values in the data and smooth them out, as well as addressing any missing values Why do we need data transformation? what are the commonly used data transformation tasks? Data transformation is often needed to ensure that data is in a format in which it can be used for analysis. During data transformation the data is normalized discretized. and attributes are created 5. Data reduction can be applied to rows(sampling) and/or columns(variable selection). Which is more challenging Copyright C2018 Pearson Education, Inc
5 Copyright © 2018Pearson Education, Inc. 2. What are the main categories of data? What types of data can we use for BI and analytics? The main categories of data are structured data and unstructured data. Both of these types of data can be used for business intelligence and analytics, although it is easier and more expedient to use structured data. 3. Can we use the same data representation for all analytics models? Why, or why not? No, other data types, including textual, spatial, imagery, video, and voice, need to be converted into some form of categorical or numeric representation before they can be processed by analytics methods. 4. What is a 1-of-N data representation? Why and where is it used in analytics? Nominal or ordinal variables are converted into numeric representations using some type of 1-of-N pseudo variables (e.g., a categorical variable with three unique values can be transformed into three pseudo variables with binary values—1 or 0). This allows it to be used in predictive analytics. Section 2.4 Review Questions 1. Why is the original/raw data not readily usable by analytics tasks? It is often dirty, misaligned, overly complex, and inaccurate. 2. What are the main data preprocessing steps? The main data preprocessing steps include data consolidation, data cleaning, data transformation, and data reduction. 3. What does it mean to clean/scrub the data? What activities are performed in this phase? In this step, the values in the data set are identified and dealt with. The analyst will identify noisy values in the data and smooth them out, as well as addressing any missing values. 4. Why do we need data transformation? What are the commonly used data transformation tasks? Data transformation is often needed to ensure that data is in a format in which it can be used for analysis. During data transformation the data is normalized, discretized, and attributes are created. 5. Data reduction can be applied to rows (sampling) and/or columns (variable selection). Which is more challenging?