当前位置：和泉文库 > 基础医学 > 浏览文档

《医学影像信息学概论》课程参考资源：Geoff Dougherty：Pattern Recognition and Classification An Introduction

1 Introduction 2 Classification 3 Nonmetric Methods 4 Statistical Pattern Recognition 5 Supervised Learning 6 Nonparametric Learning 7 Feature Extraction and Selection 8 Unsupervised Learning 9 Estimating and Comparing Classifiers 10 Projects

文件格式：PDF，文件大小：7.45MB，售价：41.85元

共199页，可试读30页，点击往前阅读 ↑↑

文档详细内容（约199页）

2.2 Features Classifier produces a decision boundary Labeled objects Class B Test Object classified as class A ion, using labeled training sets and two features, results in a linear decision boundary The classification stage assigns objects to certain categories(or classes )based on the feature information. How many features should we measure? And which are the best? The problem is that the more we measure the higher is the dimension of feature space, and the more complicated the classification will become(not to mention the added requirements for computation time and storage). This is referred to as the"curse of dimensionality. In our search for a simple, yet efficient, classifier we are frequently drawn to using the minimum number of"good"features that will be sufficient to do the classification adequately(for which we need a measure of the performance of the classifier) for a particular problem. This follows the heuristic principle known traditionally as Occam's razor(viz., the simplest olution is the best) or referred to as Kiss(Keep contemporary language; while it may not be true in all situations, we will adopt a natural bias towards simplicity The prudent approach is to err on the side of measuring more features per object than that might be necessary, and then reduce the number of features by either (1) feature selection- choosing the most informative subset of features, and removing s many irrelevant and redundant features as possible (Yu and Liu 2004) or(2) feature extraction--combining the existing feature set into a smaller set of ne known feature extraction method is Principal Component Analysis(PCA), which we will consider fully in Chap. 6. One paradigm for classification is the learning from examples approach. If a sample of labeled objects(called the training set) is randomly selected, and their feature vectors plotted in feature space, then it may be possible to build a classifier which separates the two(or more) classes adequately using a decision boundary or ecision surface. A linear classifier results in a decision surface which is a hyp plane(Fig. 2.7). Again, the decision boundary should be as simple as possible, consistent with doing an adequate job of classifying. The use of a labeled training set, in which it is known to which class the sample objects belong, constitutes

The classification stage assigns objects to certain categories (or classes) based on the feature information. How many features should we measure? And which are the best? The problem is that the more we measure the higher is the dimension of feature space, and the more complicated the classification will become (not to mention the added requirements for computation time and storage). This is referred to as the “curse of dimensionality.” In our search for a simple, yet efficient, classifier we are frequently drawn to using the minimum number of “good” features that will be sufficient to do the classification adequately (for which we need a measure of the performance of the classifier) for a particular problem. This follows the heuristic principle known traditionally as Occam’s razor (viz., the simplest solution is the best) or referred to as KISS (Keep It Simple, Stupid) in more contemporary language; while it may not be true in all situations, we will adopt a natural bias towards simplicity. The prudent approach is to err on the side of measuring more features per object than that might be necessary, and then reduce the number of features by either (1) feature selection—choosing the most informative subset of features, and removing as many irrelevant and redundant features as possible (Yu and Liu 2004) or (2) feature extraction—combining the existing feature set into a smaller set of new, more informative features (Markovitch and Rosenstein 2002). The most wellknown feature extraction method is Principal Component Analysis (PCA), which we will consider fully in Chap. 6. One paradigm for classification is the learning from examples approach. If a sample of labeled objects (called the training set) is randomly selected, and their feature vectors plotted in feature space, then it may be possible to build a classifier which separates the two (or more) classes adequately using a decision boundary or decision surface. A linear classifier results in a decision surface which is a hyperplane (Fig. 2.7). Again, the decision boundary should be as simple as possible, consistent with doing an adequate job of classifying. The use of a labeled training set, in which it is known to which class the sample objects belong, constitutes supervised learning. Fig. 2.7 Linear classification, using labeled training sets and two features, results in a linear decision boundary 2.2 Features 15

2 Classification The choice of which specific learning algorithm to use is a critical step. We should choose an algorithm, apply it to a training set, and then evaluate it before dopting it for general use. The evaluation is most often based on prediction ccuracy,i.e, the percentage of correct prediction divided by the total number of predictions. There are at least three techniques which are used to calculate the I. Split the training set by using two-thirds for training and the other third for estimating performance 2. Divide the training set into mutually exclusive and equal-sized subsets and for each subset, train the classifier on the union of all the other subsets. The average of the error rate of each subset is then an estimate of the error rate of the classifier. This is known as cross-validation 3. Leave-one-out validation is a special case of cross-validation, with all test subsets consisting of a single instance. This type of validation is, of course, more expensive computationally, but useful when the most accurate estimate of We will consider ways to estimate the performance and accuracy of classifiers in Chap 8 2.5 Approaches to Classification 1. Statistical approaches( Chaps. 4 and 5)are characterized by their reliance on an explicit underlying probability model. The features are extracted from the input data(object)and are used to assign each object(described by a feature vector)to one of the labeled classes. The decision boundaries are determined by the probability distributions of the objects belonging to each class, which must either be specified or learned. A priori probabilities (i.e, probabilities before measurement--described by probability density functions)are converted into a posteriori(or class/measurement-conditioned probabilities probabilities (i.e, probabilities after measurement). Bayesian networks(e. g, Jensen 1996)are the most well-known representative of statistical learning Gorithm In a discriminant analysis-based approach, a parametric form of the decision undary(e. g, linear or quadratic) is specified, and then the best decision boundary of this form is found based on the classification of training objects. Such boundaries can be constructed using, for example, a mean squared error criterion In maximum entropy techniques, the overriding principle is that when othing is known, the distribution should be as uniform as possible, i.e., have maximal entropy. Labeled training data are used to derive a set of constraints for

The choice of which specific learning algorithm to use is a critical step. We should choose an algorithm, apply it to a training set, and then evaluate it before adopting it for general use. The evaluation is most often based on prediction accuracy, i.e., the percentage of correct prediction divided by the total number of predictions. There are at least three techniques which are used to calculate the accuracy of a classifier 1. Split the training set by using two-thirds for training and the other third for estimating performance. 2. Divide the training set into mutually exclusive and equal-sized subsets and for each subset, train the classifier on the union of all the other subsets. The average of the error rate of each subset is then an estimate of the error rate of the classifier. This is known as cross-validation. 3. Leave-one-out validation is a special case of cross-validation, with all test subsets consisting of a single instance. This type of validation is, of course, more expensive computationally, but useful when the most accurate estimate of a classifier’s error rate is required. We will consider ways to estimate the performance and accuracy of classifiers in Chap. 8. 2.5 Approaches to Classification There are a variety of approaches to classification: 1. Statistical approaches (Chaps. 4 and 5) are characterized by their reliance on an explicit underlying probability model. The features are extracted from the input data (object) and are used to assign each object (described by a feature vector) to one of the labeled classes. The decision boundaries are determined by the probability distributions of the objects belonging to each class, which must either be specified or learned. A priori probabilities (i.e., probabilities before measurement—described by probability density functions) are converted into a posteriori (or class-/measurement-conditioned probabilities) probabilities (i.e., probabilities after measurement). Bayesian networks (e.g., Jensen 1996) are the most well-known representative of statistical learning algorithms. In a discriminant analysis-based approach, a parametric form of the decision boundary (e.g., linear or quadratic) is specified, and then the best decision boundary of this form is found based on the classification of training objects. Such boundaries can be constructed using, for example, a mean squared error criterion. In maximum entropy techniques, the overriding principle is that when nothing is known, the distribution should be as uniform as possible, i.e., have maximal entropy. Labeled training data are used to derive a set of constraints for 18 2 Classification

点击进入文档下载页（PDF格式）

共199页，可试读30页，点击继续阅读 ↓↓

您可能感兴趣的文档

点击购买下载（PDF）

下载及服务说明

购买前请先查看本文档预览页，确认内容后再进行支付；
如遇文件无法下载、无法访问或其它任何问题，可发送电子邮件反馈，核实后将进行文件补发或退款等其它相关操作；
邮箱：

文档浏览记录