当前位置：和泉文库 > 基础医学 > 浏览文档

《医学影像信息学概论》课程参考资源：Support Vector Machines for Pattern Recognition（Support Vector Machines for Pattern Classification, Second Edition）

文件格式：PDF，文件大小：5.9MB，售价：84.3元

文档详细内容（约482页）

1 Introduction microarray for prediction of early intrahepatic recurrence of hepatocellular carcinoma after curative resection. The Lancet, 361(9361): 923-929, 2003 29. C. L. Nutt. D. R. A. Betensky, P. Tamayo, J. G. Cairncross, C. Ladd, U. Pol T.T. Batchelor. P. M. Black. A. von Deimling S. L. Pomeroy, T.R. Golub, and D. N. Louis. Gene expression-based classification of malignant gliomas correlates better with survival than histological classification. Cancer research,63(7):1602-1607,2003 30. T. R. Golub, D. K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J. P. Mesirov, H. Coller, M. L. Loh, J. R. Downing, M. A. Caligiuri, C. D. Bloomfield, and E. S. Lander. Molecular classification of cancer: Class discovery and class prediction by gene exp n monitoring. Science, 286: 531-537, 1999 31. D. Singh. P. G. Febbo. K. Ross. D. G. Jackson, J. Manola. C. Ladd. P. Tamayo A. A. Renshaw. A.V. D'Amico.. P. Richie. E. S. Lander. M. Loda. P. w. Kantoff T.R. Golub, and w.R. Sellers. Gene expression correlates of clinical prostate cancer behavior. Cancer Cell, 1(2): 203-209, 2002. 32. R. A. Fisher. The use of multiple measurements in taxonomic problems. Annals of Eugenics,7:179-188,1936. 33. J. C. Bezdek, J M. Keller, R. Krishnapuram, L. I. Kuncheva, and N. R. Pal. Will the 34. B ris data please stand up? IEEE Transactions on Fuzzy Systems, 7(3): 368-369 34. H. Takenaga, S. Abe, M. Takato, M. Kayama, T Kitamura, and Y Okuyama Input layer optimization of neural networks by sensitivity analysis and its application to recognition of numerals. Electrical Engineering in Japan, 111(4): 130-138, 1991 35. S. M. Weiss and I Kapouleas. An empirical comparison of pattern recognition, neural nets, and machine learning classification methods. In Proceedings of the Eleventh International Joint Conference on Artificial Intelligence, pages 781-787, Detroit, 1989 36. A. Asuncion and D. J. Newman. UCI machine learning repositor ttp//www.ics.uci.edu/mlearn/mlrepository.html.2007. 37. A. Hashizume, J. Motoike, and R. Yabe Fully automated blood cell differential system and its application. In Proceedings of the IUPAC Third International Congress Automation and New Technology in the Clinical Laboratory, pages 297-302, Kobe, Japan, 1988 38. M-S Lan, H. Takenaga, and S. Abe. Character recognition using fuzzy rules extracted from data. In Proceedings of the Third IEEE International Conference on Fuzzy ystems, volume 1, pages 415-420, Orlando, FL, 1994 G. L Cash and M. Hatamian. Optical character recognition by the method of moments Computer Vision, Graphics, and Image Processing, 39(3): 291-310, 1987. 40.UspsDataset.http://www-i6.informatik.rwth-aachen.de/keysers/usps.html 41. Y. Le Cun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11): 2278-2324, 1998 42. Y. LeCun and C. Cortes. The mNiSt database of handwritten digit http://yann.i 43. R. S. Crowder. Predicting the Mackey-Glass time series with cascade-correlation learn- ing. In D. S. Touretzky, J. L Elman, T.J. Sejnowski, and G. E. Hinton, editors, Co nectionist Models: Proceedings of the 1990 Summer School, Kaufmann. San Mateo. CA. 1991 44. K. Baba, I. Enbutu, and M. Yoda. Explicit representation of knowledge acquired from plant historical data using neural network. In Proceedings of 1990 IJCNN Interna tional Joint Conference on Neural Networks, volume 3, pages 155-160, San Diego 45.UclMachineLearningGroup.http://www.ucl.ac.be/mlg/index.phppage=home. 46. D. Harrison and Rubinfeld. Hedonic prices and the demand for clean air. Journal of Environmental Economics and Management, 5(1): 81-102, 1978 47.DelveDatasets.http://www.cs.toronto.edu/delve/data/datasets.htm

18 1 Introduction microarray for prediction of early intrahepatic recurrence of hepatocellular carcinoma after curative resection. The Lancet, 361(9361):923–929, 2003. 29. C. L. Nutt, D. R. Mani, R. A. Betensky, P. Tamayo, J. G. Cairncross, C. Ladd, U. Pohl, C. Hartmann, M. E. McLaughlin, T. T. Batchelor, P. M. Black, A. von Deimling, S. L. Pomeroy, T. R. Golub, and D. N. Louis. Gene expression-based classification of malignant gliomas correlates better with survival than histological classification. Cancer Research, 63(7):1602–1607, 2003. 30. T. R. Golub, D. K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J. P. Mesirov, H. Coller, M. L. Loh, J. R. Downing, M. A. Caligiuri, C. D. Bloomfield, and E. S. Lander. Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science, 286:531–537, 1999. 31. D. Singh, P. G. Febbo, K. Ross, D. G. Jackson, J. Manola, C. Ladd, P. Tamayo, A. A. Renshaw, A. V. D’Amico, J. P. Richie, E. S. Lander, M. Loda, P. W. Kantoff, T. R. Golub, and W. R. Sellers. Gene expression correlates of clinical prostate cancer behavior. Cancer Cell, 1(2):203–209, 2002. 32. R. A. Fisher. The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7:179–188, 1936. 33. J. C. Bezdek, J. M. Keller, R. Krishnapuram, L. I. Kuncheva, and N. R. Pal. Will the real iris data please stand up? IEEE Transactions on Fuzzy Systems, 7(3):368–369, 1999. 34. H. Takenaga, S. Abe, M. Takato, M. Kayama, T. Kitamura, and Y. Okuyama. Input layer optimization of neural networks by sensitivity analysis and its application to recognition of numerals. Electrical Engineering in Japan, 111(4):130–138, 1991. 35. S. M. Weiss and I. Kapouleas. An empirical comparison of pattern recognition, neural nets, and machine learning classification methods. In Proceedings of the Eleventh International Joint Conference on Artificial Intelligence, pages 781–787, Detroit, 1989. 36. A. Asuncion and D. J. Newman. UCI machine learning repository, http://www.ics.uci.edu/mlearn/MLRepository.html. 2007. 37. A. Hashizume, J. Motoike, and R. Yabe. Fully automated blood cell differential system and its application. In Proceedings of the IUPAC Third International Congress on Automation and New Technology in the Clinical Laboratory, pages 297–302, Kobe, Japan, 1988. 38. M.-S. Lan, H. Takenaga, and S. Abe. Character recognition using fuzzy rules extracted from data. In Proceedings of the Third IEEE International Conference on Fuzzy Systems, volume 1, pages 415–420, Orlando, FL, 1994. 39. G. L. Cash and M. Hatamian. Optical character recognition by the method of moments. Computer Vision, Graphics, and Image Processing, 39(3):291–310, 1987. 40. USPS Dataset. http://www-i6.informatik.rwth-aachen.de/keysers/usps.html. 41. Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998. 42. Y. LeCun and C. Cortes. The MNIST database of handwritten digits http://yann.lecun.com/exdb/mnist/. 43. R. S. Crowder. Predicting the Mackey-Glass time series with cascade-correlation learning. In D. S. Touretzky, J. L. Elman, T. J. Sejnowski, and G. E. Hinton, editors, Connectionist Models: Proceedings of the 1990 Summer School, pages 117–123. Morgan Kaufmann, San Mateo, CA, 1991. 44. K. Baba, I. Enbutu, and M. Yoda. Explicit representation of knowledge acquired from plant historical data using neural network. In Proceedings of 1990 IJCNN International Joint Conference on Neural Networks, volume 3, pages 155–160, San Diego, CA, 1990. 45. UCL Machine Learning Group. http://www.ucl.ac.be/mlg/index.php?page=home. 46. D. Harrison and D. L. Rubinfeld. Hedonic prices and the demand for clean air. Journal of Environmental Economics and Management, 5(1):81–102, 1978. 47. Delve Datasets. http://www.cs.toronto.edu/delve/data/datasets.html

Chapter 2 Two-Class Support Vector Machines In training a classifier, usually we try to maximize classification performance for the training data. But if the classifier is too fit for the training data, the classification ability for unknown data, i.e., the generalization ability is degraded. This phenomenon is called overfitting, namely, there is a trade-off between the generalization ability and fitting to the training data. Various methods have been proposed to prevent overfitting [1, pp. 11–15], [2, pp. 86–91], [3].1 For a two-class problem, a support vector machine is trained so that the direct decision function maximizes the generalization ability [4, pp. 127–151], [5, pp. 92–129], namely, the m-dimensional input space x is mapped into the l-dimensional (l ≥ m) feature space z. Then in z, the quadratic programming problem is solved to separate two classes by the optimal separating hyperplane. In this chapter we discuss support vector machines for two-class problems. First, we discuss hard-margin support vector machines, in which training data are linearly separable in the input space. Then we extend hard-margin support vector machines to the case where training data are not linearly separable and map the input space into the high-dimensional feature space to enhance linear separability in the feature space. The characteristics of support vector machines are then studied theoretically and by computer experiments. 2.1 Hard-Margin Support Vector Machines Let M m-dimensional training inputs xi (i = 1,...,M) belong to Class 1 or 2 and the associated labels be yi = 1 for Class 1 and −1 for Class 2. If these data are linearly separable, we can determine the decision function: 1 One of the main ideas is, like support vector machines, to add a regularization term, which controls the generalization ability, to the objective function. S. Abe, Support Vector Machines for Pattern Classification, 21 Advances in Pattern Recognition, DOI 10.1007/978-1-84996-098-4_2, c Springer-Verlag London Limited 2010

2 Two-Class Support Vector Machines (x) There w is an m-dimensional vector, b is a bias term, and for i=1,., M 0 for yi =1, < (22) Because the training data are linearly separable, no training data satisfy x+b=0. Thus, to control separability, instead of (2.2), we consider the following inequalities x i+b ∫≥1forv=1 1 fo Here, I and -l on the right-hand sides of the inequalities can be constants a(>0) and -a, respectively. But by dividing both sides of the inequalities by a,(2.3) is obtained. Equation(2.3)is equivalent t v(wx1+b)≥1fori=1,,M The hyperplane D(x)=w x+b=c for -1<c< forms a separating hyperplane that separates xi(i=l M).When c=0 the separating hyperplane is in the middle of the two hyperplanes with c=l and -1. The distance between the separating hyperplane and the training data sample nearest to the hyperplane is called the margin. Assuming that the hyperplanes D(x)=l and -l include at least one training data sample the hyperplane D(x)=0 has the maximum margin for -1 <c<l. The region x-1 s D(x)<l is the generalization region for the decision function Figure 2.1 shows two decision functions that satisfy(2. 4). Thus there are infinite number of decision functions that satisfy (2.4), which rating hyperplanes. The generalization ability depends on the location of he separating hyperplane, and the hyperplane with the maximum margin is called the optimal separating hyperplane(see Fig 2.1). Assume that no outliers are included in the training data and that unknown test data will obey the same distribution as that of the training data. Then it is intuitively clear that the generalization ability is maximized if the optimal separating hyperplane is selected as the separating hyperplane Now consider determining the optimal separating hyperplane. The Er lean distance from a training data sample x to the separating hyperplane is given by D(x)/w. This can be shown as follows. Because the vector w is orthogonal to the separating hyperplane, the line that goes through x and that is orthogonal to the hyperplane is given by aw /w+x, where a is the Euclidean distance from x to the hyperplane. It crosses the hyperp

22 2 Two-Class Support Vector Machines D(x) = w x + b, (2.1) where w is an m-dimensional vector, b is a bias term, and for i = 1,...,M w xi + b > 0 for yi = 1, < 0 for yi = −1. (2.2) Because the training data are linearly separable, no training data satisfy w x + b = 0. Thus, to control separability, instead of (2.2), we consider the following inequalities: w xi + b ≥ 1 for yi = 1, ≤ −1 for yi = −1. (2.3) Here, 1 and −1 on the right-hand sides of the inequalities can be constants a (> 0) and −a, respectively. But by dividing both sides of the inequalities by a, (2.3) is obtained. Equation (2.3) is equivalent to yi (w xi + b) ≥ 1 for i = 1,...,M. (2.4) The hyperplane D(x) = w x + b = c for − 1 <c< 1 (2.5) forms a separating hyperplane that separates xi (i = 1,...,M). When c = 0, the separating hyperplane is in the middle of the two hyperplanes with c = 1 and −1. The distance between the separating hyperplane and the training data sample nearest to the hyperplane is called the margin. Assuming that the hyperplanes D(x)=1 and −1 include at least one training data sample, the hyperplane D(x)=0 has the maximum margin for −1 <c< 1. The region {x | − 1 ≤ D(x) ≤ 1} is the generalization region for the decision function. Figure 2.1 shows two decision functions that satisfy (2.4). Thus there are an infinite number of decision functions that satisfy (2.4), which are separating hyperplanes. The generalization ability depends on the location of the separating hyperplane, and the hyperplane with the maximum margin is called the optimal separating hyperplane (see Fig. 2.1). Assume that no outliers are included in the training data and that unknown test data will obey the same distribution as that of the training data. Then it is intuitively clear that the generalization ability is maximized if the optimal separating hyperplane is selected as the separating hyperplane. Now consider determining the optimal separating hyperplane. The Euclidean distance from a training data sample x to the separating hyperplane is given by |D(x)|/w. This can be shown as follows. Because the vector w is orthogonal to the separating hyperplane, the line that goes through x and that is orthogonal to the hyperplane is given by aw/w + x, where |a| is the Euclidean distance from x to the hyperplane. It crosses the hyperplane

点击进入文档下载页（PDF格式）

共482页，试读已结束，阅读完整版请下载

您可能感兴趣的文档

点击购买下载（PDF）

下载及服务说明

购买前请先查看本文档预览页，确认内容后再进行支付；
如遇文件无法下载、无法访问或其它任何问题，可发送电子邮件反馈，核实后将进行文件补发或退款等其它相关操作；
邮箱：

文档浏览记录