当前位置：和泉文库 > 计算机 > 浏览文档

《数据挖掘导论 Introduction to Data Mining》课程教学资源（PPT课件讲稿）Data Mining Classification（Basic Concepts, Decision Trees, and Model Evaluation）

文件格式：PPT，文件大小：1.78MB，售价：21.84元

文档详细内容（约101页）

Tree Induction Greedy strategy Split the records based on an attribute test that optimizes certain criterion ssues Determine how to split the records How to specify the attribute test condition? How to determine the best split? Determine when to stop splitting C Tan, Steinbach, Kumar Introduction to Data Mining 18/2004

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 ‹#› Tree Induction Greedy strategy. – Split the records based on an attribute test that optimizes certain criterion. Issues – Determine how to split the records ◆How to specify the attribute test condition? ◆How to determine the best split? – Determine when to stop splitting

How to determine the Best Split Before Splitting: 10 records of class 0, 10 records of class 1 Own Car Student Car? Type? ID? Yes N Family luxury Sport C0:6|c0:4 C0:1 c0:8 C0:1 C0: C0:1c0:0 c0:0 C1:4c1:6 C1:3C1:0 C1:7 c1:0¨c1:0c1:1c1:1 Which test condition is the best? C Tan, Steinbach, Kumar Introduction to Data Mining 18/2004

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 ‹#› How to determine the Best Split Own Car? C0: 6 C1: 4 C0: 4 C1: 6 C0: 1 C1: 3 C0: 8 C1: 0 C0: 1 C1: 7 Car Type? C0: 1 C1: 0 C0: 1 C1: 0 C0: 0 C1: 1 Student ID? ... Yes No Family Sports Luxury c1 c10 c20 C0: 0 C1: 1 ... c11 Before Splitting: 10 records of class 0, 10 records of class 1 Which test condition is the best?

How to determine the Best Split Greedy approach Nodes with homogeneous class distribution are preferred Need a measure of node impurity C0:5 C0:9 C1:5 C1:1 Non-homogeneous, Homogeneous, High degree of impurity Low degree of impurity C Tan, Steinbach, Kumar Introduction to Data Mining 18/2004

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 ‹#› How to determine the Best Split Greedy approach: – Nodes with homogeneous class distribution are preferred Need a measure of node impurity: C0: 5 C1: 5 C0: 9 C1: 1 Non-homogeneous, High degree of impurity Homogeneous, Low degree of impurity

Measures of Node Impurity Gini Index Entropy Misclassification error C Tan, Steinbach, Kumar Introduction to Data Mining 18/2004

How to Find the best split Before Splitting: a NoO MO NO1 A? B? N Yes Node ni Node n2 Node n3 Node n4 CO N10 CO N20 CO N30 CO N40 C1 N11 C1 N21 C1N31 C1 N41 M2 M M4 12 M34 Gain= Mo-M12 Vs M0-M34 C Tan, Steinbach, Kumar Introduction to Data Mining 18/2004

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 ‹#› How to Find the Best Split B? Yes No Node N3 Node N4 A? Yes No Node N1 Node N2 Before Splitting: C0 N10 C1 N11 C0 N20 C1 N21 C0 N30 C1 N31 C0 N40 C1 N41 C0 N00 C1 N01 M0 M1 M2 M3 M4 M12 M34 Gain = M0 – M12 vs M0 – M34

点击进入文档下载页（PPT格式）

共101页，试读已结束，阅读完整版请下载

您可能感兴趣的文档

《微型计算机原理及接口技术》课程电子教案（PPT课件）第9章 AT89S52单片机的I/O扩展
四川大学：《计算机网络 Computer Networks》课程教学资源（PPT课件讲稿）Unit5 Introduction to Computer Networks
《计算机软件技术基础》课程教学资源（PPT课件讲稿）排序（教师：曾晓东）
西安电子科技大学：《数据库系统 DataBase System》课程教学资源（PPT课件讲稿）normalization
《单片机原理及应用》课程教学资源（PPT课件讲稿）第11章单片机应用系统的串行扩展
中国科学技术大学：《计算机体系结构》课程教学资源（PPT课件讲稿）第7章多处理器及线程级并行 7.1 引言 7.2 集中式共享存储器体系结构
上海交通大学：操作系统安全（PPT课件讲稿）设备管理与I/O系统
《编辑原理》课程教学资源（PPT课件）目标代码生成
四川大学：Object-Oriented Design and Programming（Java，PPT课件）3.2 Graphical User Interface
《计算机系统结构》课程教学资源（PPT课件讲稿）第三章流水线技术
南京大学：《面向对象技术 OOT》课程教学资源（PPT课件讲稿）异常处理 Exception Handling
中国科学技术大学：云计算基本概念、关键技术、应用领域及发展趋势
《计算机组成与设计》课程教学资源（PPT课件讲稿）第2章指令——计算机的语言
清华大学：Local Area Network and Ethernet（PPT课件讲稿）
《密码学》课程教学资源（PPT课件讲稿）第10章密码学的新方向
《计算机系统安全》课程教学资源（PPT课件讲稿）第七章公开密钥设施PKI Public key infrastructure
《数字图像处理》课程PPT教学课件（讲稿）第四章点运算
《编译原理》课程教学资源（PPT课件讲稿）第八章代码生成
Introduction to Convolution Neural Networks（CNN）and systems
华北科技学院：数字视频教学软件与制作（PPT课件讲稿）数字视频编辑软件Premiere 6.5（主讲：于文华）
中国科学技术大学：《Linux操作系统分析》课程教学资源（PPT课件讲稿）文件系统
哈尔滨工业大学：再探深度学习词向量表示（PPT课件讲稿）Advanced word vector representations（主讲人：李泽魁）
《Visual Basic程序设计》课程教学资源（PPT课件讲稿）第四章 VB的基本语句
《单片机原理及应用》课程PPT教学课件（C语言版）第4章 C51程序设计入门（单片机C语言及程序设计）

点击购买下载（PPT）

下载及服务说明

购买前请先查看本文档预览页，确认内容后再进行支付；
如遇文件无法下载、无法访问或其它任何问题，可发送电子邮件反馈，核实后将进行文件补发或退款等其它相关操作；
邮箱：

文档浏览记录