统计学习理论及应用 第七讲非线性分类模型-集成方法 编写:文泉、陈娟 电子科技大学计算机科学与工程学院
统计学习理论及应用 第七讲 非线性分类模型–集成方法 编写:文泉、陈娟 电子科技大学 计算机科学与工程学院
目录 ①基本原理 2 多分类器结合 装袋Bagging 提升法 。Boosting(提升法) 。AdaBoost算法 o AdaBoost算法的另一个解释 1/41
目录 1 基本原理 2 多分类器结合 3 装袋 Bagging 4 提升法 Boosting(提升法) AdaBoost 算法 AdaBoost 算法的另一个解释 1 / 41
7.1.基本原理 o In any application,we can use several learning algorithms o The No Free Lunch Theorem:no single learning algorithm in any domains always introduces the most accurate learner o Try many and choose the one with the best cross-validation results 2/41
7.1. 基本原理 In any application, we can use several learning algorithms The No Free Lunch Theorem: no single learning algorithm in any domains always introduces the most accurate learner Try many and choose the one with the best cross-validation results 2 / 41
Rationale o On the other hand.. Each learning model comes with a set of assumption and thus bias Learning is an ill-posed problem finite data):each model converges to a different solution and fails under different circumstances Why do not we combine multiple learners intelligently, which may lead to improved results? Why it works? Suppose there are 25 base classifiers Each classifier has error rate,e=0.35 o If the base classifiers are identical,thus dependent,then the ensemble will misclassify the same examples predicted incorrectly by the base classifiers. 3/41
Rationale On the other hand … Each learning model comes with a set of assumption and thus bias Learning is an ill-posed problem ( finite data): each model converges to a different solution and fails under different circumstances Why do not we combine multiple learners intelligently, which may lead to improved results? Why it works? Suppose there are 25 base classifiers Each classifier has error rate, ε = 0.35 If the base classifiers are identical, thus dependent, then the ensemble will misclassify the same examples predicted incorrectly by the base classifiers. 3 / 41
Rationale o Assume classifiers are independent,i.e.,their errors are uncorrelated.Then the ensemble makes a wrong prediction only if more than half of the base classifiers predict incorrectly. o Probability that the ensemble classifier makes a wrong prediction: 25 25 e(1-e)25-=0.06 i=13 注意:x≥13,n=25,p=0.35二项式分布 4/41
Rationale Assume classifiers are independent, i.e., their errors are uncorrelated. Then the ensemble makes a wrong prediction only if more than half of the base classifiers predict incorrectly. Probability that the ensemble classifier makes a wrong prediction: X 25 i=13 25 i ε i (1 − ε) 25−i = 0.06 注意:x ≥ 13, n = 25, p = 0.35 二项式分布 4 / 41