当前位置：和泉文库 > 计算机 > 浏览文档

重庆大学：《数据仓库与数据挖掘 Data Warehouse and Data mining》课程PPT教学课件（英文版）Chapter 7 Classification：Basic Concepts

◼ Classification: Basic Concepts ◼ Decision Tree Induction ◼ Bayes Classification Methods ◼ Rule-Based Classification ◼ Model Evaluation and Selection ◼ Summary

文件格式：PPT，文件大小：1.41MB，售价：20.28元

共76页，可试读20页，点击往前阅读 ↑↑

文档详细内容（约76页）

Classification→ATw。- Step Process Model construction: describing a set of predetermined classes Each tuple/sample is assumed to belong to a predefined class, as determined by the class label attribute the set of tuples used for model construction is training set the model is represented as classification rules decision trees or mathematical formulae Model usage: for classifying future or unknown objects Estimate accuracy of the model The known label of test sample is compared with the classified result from the model Accuracy rate is the percentage of test set samples that are correctly classified by the model Test set is independent of training set otherwise overfitting If the accuracy is acceptable, use the model to classify new data Note: If the test set is used to select models, it is called validation (test) set

6 Classification—A Two-Step Process ◼ Model construction: describing a set of predetermined classes ◼ Each tuple/sample is assumed to belong to a predefined class, as determined by the class label attribute ◼ The set of tuples used for model construction is training set ◼ The model is represented as classification rules, decision trees, or mathematical formulae ◼ Model usage: for classifying future or unknown objects ◼ Estimate accuracy of the model ◼ The known label of test sample is compared with the classified result from the model ◼ Accuracy rate is the percentage of test set samples that are correctly classified by the model ◼ Test set is independent of training set (otherwise overfitting) ◼ If the accuracy is acceptable, use the model to classify new data ◼ Note: If the test set is used to select models, it is called validation (test) set

ILlustrating Classification Task Tid Attrib1 Attrib2 Attrib3 Class Learning algorithm tedium 100K No Smal 4 Yes Induction 7 Yes 220K Learn Smal Model Medium 10 No Small 90K Yes Training set Model Appl Tid Attrib Attrib2 Attrib3 Class Model Small 55K 12 Yes Medium 110K Deduction 14|No 15 No 7K ? Test Set 7

7 Illustrating Classification Task Apply Model Induction Deduction Learn Model Model Tid Attrib1 Attrib2 Attrib3 Class 1 Yes Large 125K No 2 No Medium 100K No 3 No Small 70K No 4 Yes Medium 120K No 5 No Large 95K Yes 6 No Medium 60K No 7 Yes Large 220K No 8 No Small 85K Yes 9 No Medium 75K No 10 No Small 90K Yes 10 Tid Attrib1 Attrib2 Attrib3 Class 11 No Small 55K ? 12 Yes Medium 80K ? 13 Yes Large 110K ? 14 No Small 95K ? 15 No Large 67K ? 10 Test Set Learning algorithm Training Set

Process (O): Model Construction Classification algorithms Training Data NAMERANK YEARS TENURED Classifier Mike Assistant Prof no (Model) Mary Assistant Prof es Bⅲ Professor 372763 yes Associate prof yes Dave Assistant Prof IF rank="professor no Anne Associate Prof OR years >6 no ThEN tenured =yes 8

8 Process (1): Model Construction Training Data NAME RANK YEARS TENURED Mike Assistant Prof 3 n o Mary Assistant Prof 7 yes Bill Professor 2 yes Jim Associate Prof 7 yes Dave Assistant Prof 6 n o Anne Associate Prof 3 n o Classification Algorithms IF rank = ‘professor’ OR years > 6 THEN tenured = ‘yes’ Classifier (Model)

Process(2): Using the Model in Prediction Classifier Testing Data Unseen Data (Jeff, Professor, 4) NAME RANK YEARS TENURED Tom Assistant Prof no Tenured? Merlis a Associate prof no George rofessor 2757 yes Yes Joseph Assistant Prof yes

9 Process (2): Using the Model in Prediction Classifier Testing Data NAME RANK YEARS TENURED Tom Assistant Prof 2 n o Merlisa Associate Prof 7 n o George Professor 5 yes Joseph Assistant Prof 7 yes Unseen Data (Jeff, Professor, 4) Tenured?

Algorithm for Decision Tree Induction Basic algorithm (a greedy algorithm) Tree is constructed in a top-down recursive divide-and conquer manner At start all the training examples are at the root Attributes are categorical (if continuous-valued, they are discretized in advance) Examples are partitioned recursively based on selected attributes a Test attributes are selected on the basis of a heuristic or statistical measure(e.g information gain Conditions for stopping partitioning All samples for a given node belong to the same class There are no remaining attributes for further partitioning majority voting is employed for classifying the leaf There are no samples left

10 Algorithm for Decision Tree Induction ◼ Basic algorithm (a greedy algorithm) ◼ Tree is constructed in a top-down recursive divide-andconquer manner ◼ At start, all the training examples are at the root ◼ Attributes are categorical (if continuous-valued, they are discretized in advance) ◼ Examples are partitioned recursively based on selected attributes ◼ Test attributes are selected on the basis of a heuristic or statistical measure (e.g., information gain) ◼ Conditions for stopping partitioning ◼ All samples for a given node belong to the same class ◼ There are no remaining attributes for further partitioning – majority voting is employed for classifying the leaf ◼ There are no samples left

点击进入文档下载页（PPT格式）

共76页，可试读20页，点击继续阅读 ↓↓

您可能感兴趣的文档

重庆大学：《数据仓库与数据挖掘 Data Warehouse and Data mining》课程PPT教学课件（英文版）Chapter 6 Advanced Frequent Pattern Mining
重庆大学：《数据仓库与数据挖掘 Data Warehouse and Data mining》课程PPT教学课件（英文版）Chapter 5 Mining Frequent Patterns, Association and Correlations：Basic Concepts and Methods
重庆大学：《数据仓库与数据挖掘 Data Warehouse and Data mining》课程PPT教学课件（英文版）Chapter 4 OLAP - Data Warehousing and On-line Analytical Processing
重庆大学：《数据仓库与数据挖掘 Data Warehouse and Data mining》课程PPT教学课件（英文版）Chapter 3 Data Preprocessing
重庆大学：《数据仓库与数据挖掘 Data Warehouse and Data mining》课程PPT教学课件（英文版）Chapter 2 about data - Getting to Know Your Data
重庆大学：《数据仓库与数据挖掘 Data Warehouse and Data mining》课程PPT教学课件（英文版）Chapter 1 introduction
重庆师范大学：《人工智能 AI》精品课程PPT教学课件_第7章机器人规划
重庆师范大学：《人工智能 AI》精品课程PPT教学课件_第6章机器学习
重庆师范大学：《人工智能 AI》精品课程PPT教学课件_第5章搜索策略
重庆师范大学：《人工智能 AI》精品课程PPT教学课件_第4章智能计算（计算智能）
重庆师范大学：《人工智能 AI》精品课程PPT教学课件_第3章推理技术
重庆师范大学：《人工智能 AI》精品课程PPT教学课件_第2章知识表示
重庆大学：《数据仓库与数据挖掘 Data Warehouse and Data mining》课程PPT教学课件（英文版）Chapter 8 Cluster Analysis：Basic Concepts and Methods
重庆大学：《数据仓库与数据挖掘 Data Warehouse and Data mining》课程PPT教学课件（英文版）Chapter 9 Outlier Analysis
延安大学：《网页制作基础教程》课程教学资源_教学大纲
延安大学：《网页制作基础教程》学术论文_基于AJAX技术的Web模型在网站互动平台的应用研究
延安大学：《网页制作基础教程》学术论文_基于RIA技术的实验演示系统的设计与实现
延安大学：《网页制作基础教程》学术论文_服务器推技术在实验演示系统中的应用
延安大学：《网页制作基础教程》学术论文_用户行为驱动的网页布局自动调整的研究
《网页制作基础教程》参考书籍（PDF）：JavaScript 权威指南（第四版）
《网页制作基础教程》参考书籍（PDF）：Python学习手册（第3版，涵盖Pathon 2.5）
《网页制作基础教程》参考书籍：CSS Mastery 精通CSS书籍——高级WEB标准解决方案（人民邮电出版社）
延安大学：《网页制作基础教程》课程PPT教学课件_第一章网页结构（牛永洁）
延安大学：《网页制作基础教程》课程PPT教学课件_第二章网页头部

点击购买下载（PPT）

下载及服务说明

购买前请先查看本文档预览页，确认内容后再进行支付；
如遇文件无法下载、无法访问或其它任何问题，可发送电子邮件反馈，核实后将进行文件补发或退款等其它相关操作；
邮箱：

文档浏览记录