从初始数据集中产生多个不同的训练集,对集成学 自助法 习有很大的好处 自助法在数据集较小、难以有效划分训练/测试集时 很有用: 基于“自助采样”(boot· 由于改变了数据集分布可能引入估计偏差,在数据 量足够时,留出法和交叉验证法更常用。 亦称“有放回采样” 、 “可重复采样” D' 约有36.8%的样本不出现 >训练集与原样本集同规模 ≈0.368 >数据分布有所改变 包外估计”(out-of-bag estimation) 初始数据集D中约有36.8%的样本未出现在采样数据集D'中。 D和D'都有m个样本。 D用作训练集,DD用作测试集-未出现在D'的36.8%数据在DD'中
$ $D (bootsrap sampling) !D (D " 36.8% ň ◦⯷∑庂ʼn (out-of-bag estimation) #0-MD?336.8%>6549J60-MD’ DD’I3m65 D’;D@MD\D’;8FM – 49D’>36.8%0-D\D’ Ø&#) )% Ø • #0-M:>D@M%M,$ 3+!>" • B70-MH&L3/D@/8FM1 +3; • < . 0-M(A*C'0- KG 1=7NE72);
“调参”与最终模型 算法的参数:一般由人工设定,亦称“超参数” 模型的参数:一般由学习确定 调参过程相似:先产生若干模型,然后基于某种评估 方法进行选择 参数调得好不好对最终性能有关键影响 区别:训练集VS. 测试集vS.验证集(validation set) 算法参数选定后,要用“训练集+验证集”重新训练最终模型
B%7( 5)/"9.> 3C" (/"9.1 BD40,:(+ '2@ $)E;F! "B%7 8&H =6I vs. *AI vs. J?I (validation set) 5)"F<-=6I+J?IG#=6%7(
“调参”与最终模型 区别:训练集vs.测试集vs.验证集(validation set)) 算法参数选定后,要用“训练集+验证集”重新训练最终模型 Cross validation public private Training Set Testing Set Testing Set Training Validation Using the results of public testing Set set data to tune your model You are making public set Model 1 Err=0.9 better than private set. Model 2 →Err=0.7 Not recommend Model 3 +Err=0.5 Err>0.5→ Err>0.5
vs. vs. (validation set) + Cross Validation Training Set Testing Set Testing Set public private Training Set Validation set Model 1 Model 2 Model 3 Err = 0.9 Err = 0.7 Err = 0.5 Err > 0.5 Err > 0.5 Using the results of public testing data to tune your model You are making public set better than private set. Not recommend
交叉验证 N-fold Cross Validation Training Set Model 1 Model 2 I Model 3 Train Train Val Err=0.2 Err=0.4 Err=0.4 Train Val Train Err=0.4 Err=0.5 Err=0.5 Val Train Train Err=0.3 Err=0.6 Err=0.3 Avg Err Avg Err Avg Err =0.3 =0.5 =0.4 Testing Set Testing Set public private
N-fold Cross Validation Training Set Train Train Val Train Val Train Val Train Train Model 1 Model 2 Model 3 Err = 0.4 Err = 0.5 Err = 0.3 Err = 0.4 Err = 0.5 Err = 0.6 Err = 0.2 Err = 0.4 Err = 0.3 Avg Err = 0.4 Avg Err = 0.5 Avg Err = 0.3 Testing Set Testing Set public private
模型选择(model selection) 三个关键问题: ▣如何获得测试结果? 评估方法 ▣如何评估性能优劣? 二 性能度量 ▣如何判断实质差别? 〉 比较检验
" (model selection) $%&: p p p # !'