Application Case 2.2 (1 of 6) Improving Student Retention with Data-Driven Analytics Questions for Discussion 1. What is student attrition, and why is it an important problem in higher education? 2. What were the traditional methods to deal with the attrition problem? 3. List and discuss the data- related challenges within context of this case study 4. What was the proposed solution and what were the results Pearson Copyright C 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved Application Case 2.2 (1 of 6) Improving Student Retention with Data-Driven Analytics Questions for Discussion 1. What is student attrition, and why is it an important problem in higher education? 2. What were the traditional methods to deal with the attrition problem? 3. List and discuss the data-related challenges within context of this case study. 4. What was the proposed solution? And, what were the results?
Application Case 2.2(2 of 6) Student retention Freshmen class Data Pruprocnsrinn coacting. merging Baning, balancing. s transforming Why it is important? What are the common techniques to deal with 必 Test Models student attrition? Dominion Trao Encomia EnsormlorBoosEing Analytics versus theoretical Lnoste Rogreetion Support Vector Mechine 中E点 approaches to student retention problem ant [Consion Metro courCy Pearson Copyright C 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved Application Case 2.2 (2 of 6) • Student retention – Freshmen class • Why it is important? • What are the common techniques to deal with student attrition? • Analytics versus theoretical approaches to student retention problem
Application Case 2.2 (3 of 6) Data imbalance problem Input Data Model Building, Testing Model Assessment and Validating Aocuraoy, Precision+, Precision-J 员B0%No 0909,100%.50% Test Yes No 2006 Yes Yes TP Fp Built Which one O 50%b No Validate No FN TN is better? 员 B0%.80%,B0 509N LAoouraay. Precision+, PrecisionH-I "Yes: dropped out, No: persisted Pearson Copyright C 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved Application Case 2.2 (3 of 6) • Data imbalance problem
Application Case 2.2 (4 of 6) able 2.2 Prediction Results for the Original/Unbalanced Dataset ANN(MLP) DT(C5) SVM LR No Yes No Ye Yes Yes 1494 1518 1438 Ye 1596 11142 157211222 161211271 1652 11150 3090 11526 309011526 309011526 3090 11526 Per-Class Accuracy 48.35%9667%4913%97.36%4783%9779%46.54%9674% Overall Accura 86 8716% 8723% 86.12% Table 2.3 Prediction Results for the balanced data set Confusion ANN(MLP) DT(C5) SVM LR Matrix Yes Yes No No Yes 464 2311 386 626 78 2626 7792673 7772704 2464 SUM 3090 3090 3090 3090 3090 3090 Per-class Accuracy 7472%8498%74.79%8650%7485%8751%68.77%79.74% Overall Accuracy 7985% 79 85% 8118% 74.26% Pearson Copyright C 2018, 2014, 2011 Pearson Education, InC. All Rights Reserved
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved Application Case 2.2 (4 of 6) Table 2.2 Prediction Results for the Original/Unbalanced Dataset Table 2.3 Prediction Results for the Balanced Data Set
Application Case 2.2 (5 of 6) Table 2. 4 Prediction results for the three ensemble models Boost ng Bagging Information Fusion (Boosted Trees) (Random Forest (Weighted Average) Yes No Yes No Y 2242 375 2327 2335 351 Y 848 2715 763 2728 755 2739 SUM 3090309 3090 3090 3090 Per-Class Accuracy7256%8786%7531%8828%7557%88.64% Overall Accuracy 8021% 8180% 81.80% Pearson Copyright C 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved Application Case 2.2 (5 of 6) Table 2.4 Prediction Results for the Three Ensemble Models