当前位置：和泉文库 > 计算机 > 浏览文档

《人工智能、机器学习与大数据》课程教学资源（参考文献）Multiple-instance learning via disambiguation

文件格式：PDF，文件大小：543.08KB，售价：10.59元

文档详细内容（约37页）

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING,VOL.X,NO.X,XXX 200X Therefore,to estimate the best threshold *(and hence o?),we only need to compute the distance from t to each positive training bag.This procedure can be done very efficiently Ideally,disambiguation should identify all the true positive instances in the positive bags. However,the only thing we know is that each positive bag contains at least one true positive instance,but we do not know the exact number of true positive instances for one specific positive bag.Different positive bags may have different number of positive instances,and furthermore, the discriminative ability of the true positive instances from different positive bags might also be different.Hence,although we know in general the true positive instances will have higher ability to discriminate the training bags than negative instances,it is not easy to find a simple rule to identify all the true positive instances in the positive bags.Actually,from our experiments(cf. Section 4.3),we find that identifying all the true positive instances might exceed the requirement necessary for us to complete common MIL tasks,and consequently incur some unnecessary cost. Here,we design a simple disambiguation method to identify just one positive instance from each positive bag,which is enough for most MIL tasks based on our experiments2. Since the MIL formulation requires that each positive bag contains at least one true positive instance,it is always possible to find a true positive instance in a positive bag.In our algorithm, we select from each positive training bag the instance with the largest P*value as a candidate true positive instance.After true positive instance selection,the disambiguation process is completed. Algorithm 1 summarizes the disambiguation method presented above. 3.2.3 Comparison with Other Disambiguation Methods APR (axis-parallel rectangle)[1]and DD (diverse density)[10]are two well-known disam- biguation methods.Compared with them,our disambiguation method is more efficient and robust.Quantitative empirical comparison will be performed in the next section.We first give a qualitative comparison here with respect to the robustness property. APR represents the target concept by an axis-parallel rectangle which includes at least one instance from each positive bag but excludes all instances from the negative bags.Figure 1 gives a toy example to show how APR works.There are altogether nine bags (B,B,B,B,B,B,B7,Bs,B) in the two-dimensional feature space.If there is no noise,APR will choose the area in the red 2.The extension of our simple version of disambiguation method for some complex tasks will be discussed in Section 3.5.1. March 1.2009 DRAFT

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. X, NO. X, XXX 200X 11 Therefore, to estimate the best threshold θ ∗ t (and hence σ 2 t ), we only need to compute the distance from t to each positive training bag. This procedure can be done very efficiently. Ideally, disambiguation should identify all the true positive instances in the positive bags. However, the only thing we know is that each positive bag contains at least one true positive instance, but we do not know the exact number of true positive instances for one specific positive bag. Different positive bags may have different number of positive instances, and furthermore, the discriminative ability of the true positive instances from different positive bags might also be different. Hence, although we know in general the true positive instances will have higher ability to discriminate the training bags than negative instances, it is not easy to find a simple rule to identify all the true positive instances in the positive bags. Actually, from our experiments (cf. Section 4.3), we find that identifying all the true positive instances might exceed the requirement necessary for us to complete common MIL tasks, and consequently incur some unnecessary cost. Here, we design a simple disambiguation method to identify just one positive instance from each positive bag, which is enough for most MIL tasks based on our experiments 2 . Since the MIL formulation requires that each positive bag contains at least one true positive instance, it is always possible to find a true positive instance in a positive bag. In our algorithm, we select from each positive training bag the instance with the largest P ∗ value as a candidate true positive instance. After true positive instance selection, the disambiguation process is completed. Algorithm 1 summarizes the disambiguation method presented above. 3.2.3 Comparison with Other Disambiguation Methods APR (axis-parallel rectangle) [1] and DD (diverse density) [10] are two well-known disambiguation methods. Compared with them, our disambiguation method is more efficient and robust. Quantitative empirical comparison will be performed in the next section. We first give a qualitative comparison here with respect to the robustness property. APR represents the target concept by an axis-parallel rectangle which includes at least one instance from each positive bag but excludes all instances from the negative bags. Figure 1 gives a toy example to show how APR works. There are altogether nine bags (B + 1 , B+ 2 , B+ 3 , B+ 4 , B− 5 , B− 6 , B− 7 , B− 8 , B− 9 ) in the two-dimensional feature space. If there is no noise, APR will choose the area in the red 2. The extension of our simple version of disambiguation method for some complex tasks will be discussed in Section 3.5.1. March 1, 2009 DRAFT

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING,VOL.X,NO.X,XXX 200X Algorithm 1 Disambiguation Method Input:All training bags B,...,B,Br,...,B- Initialize:X={1,2,...,p}is the set of re-indexed instances from all positive training bags;T*=(empty set),where T*is used to keep the set of selected true positive instances. for k =1 to p do Compute P*(ck)according to (7) end for for i=1 to n+do t片=arg maxB时eB时P*(B晴) Add t;to T* end for Output:T* (smaller)rectangle as the true positive area,which is a very reasonable choice for this noise-free case.However,if we mislabel even just one negative bag,say B,to be positive,APR can no longer find a rectangle to include at least one instance from each positive bag but exclude all instances from the negative bags.If B6 is not in the training set,then the algorithm will choose the blue (bigger)rectangle,which is obviously not the real true positive area.Hence,APR is very sensitive to labeling noise.Furthermore,in many applications such as image classification, it is very difficult or even impossible for APR to find a rectangle that satisfies all the constraints. As for DD [10],it finds a single point in the feature space as well as the best feature weights corresponding to that point to represent the concept and then decides the label of each bag based on the distance from the bag to the concept point.If the weighted distance from the concept point to any instance of a bag is below a threshold,the bag will be labeled as positive.Hence, the true positive area of DD is an ellipse (or,more generally,hyperellipsoid).In the toy example in Figure 1,we can assume that both features have equal weights and hence the computed true positive area is a circle.When there is no noise,DD finds the red point as the computed concept point and the red(smaller)circle as the true positive area.This result is quite reasonable. However,DD is also very sensitive to labeling noise,which has been pointed out in [11].From the toy example,we can also observe this phenomenon easily.If we mislabel only one positive March 1,2009 DRAFT

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. X, NO. X, XXX 200X 12 Algorithm 1 Disambiguation Method Input: All training bags B + 1 , . . . , B+ n+ , B− 1 , . . . , B− n− Initialize: X = {x1, x2, . . . , xp} is the set of re-indexed instances from all positive training bags; T ∗ = φ (empty set), where T ∗ is used to keep the set of selected true positive instances. for k = 1 to p do Compute P ∗ (xk) according to (7) end for for i = 1 to n + do t ∗ i = arg maxB + ij∈B + i P ∗ (B + ij ) Add t ∗ i to T ∗ end for Output: T ∗ (smaller) rectangle as the true positive area, which is a very reasonable choice for this noise-free case. However, if we mislabel even just one negative bag, say B − 5 , to be positive, APR can no longer find a rectangle to include at least one instance from each positive bag but exclude all instances from the negative bags. If B − 6 is not in the training set, then the algorithm will choose the blue (bigger) rectangle, which is obviously not the real true positive area. Hence, APR is very sensitive to labeling noise. Furthermore, in many applications such as image classification, it is very difficult or even impossible for APR to find a rectangle that satisfies all the constraints. As for DD [10], it finds a single point in the feature space as well as the best feature weights corresponding to that point to represent the concept and then decides the label of each bag based on the distance from the bag to the concept point. If the weighted distance from the concept point to any instance of a bag is below a threshold, the bag will be labeled as positive. Hence, the true positive area of DD is an ellipse (or, more generally, hyperellipsoid). In the toy example in Figure 1, we can assume that both features have equal weights and hence the computed true positive area is a circle. When there is no noise, DD finds the red point as the computed concept point and the red (smaller) circle as the true positive area. This result is quite reasonable. However, DD is also very sensitive to labeling noise, which has been pointed out in [11]. From the toy example, we can also observe this phenomenon easily. If we mislabel only one positive March 1, 2009 DRAFT

点击进入文档下载页（PDF格式）

共37页，试读已结束，阅读完整版请下载

您可能感兴趣的文档

点击购买下载（PDF）

下载及服务说明

购买前请先查看本文档预览页，确认内容后再进行支付；
如遇文件无法下载、无法访问或其它任何问题，可发送电子邮件反馈，核实后将进行文件补发或退款等其它相关操作；
邮箱：

文档浏览记录