当前位置：和泉文库 > 基础医学 > 浏览文档

《医学影像信息学概论》课程参考资源：Computational Intelligence Paradigms in Advanced Pattern Classification

Chapter 1 Recent Advances in Pattern Classification Chapter 2 Neural Networks for Handwriting Recognition Chapter 3 Moving Object Detection from Mobile Platforms Using Stereo Data Registration Chapter 4 Pattern Classifications in Cognitive Informatics Chapter 5 Optimal Differential Filter on Hexagonal Lattice Chapter 6 Graph Image Language Techniques Supporting Advanced Classification and Cognitive Interpretation of CT Coronary Vessel Visualizations Chapter 7 A Graph Matching Approach to Symmetry Detection and Analysis

文件格式：PDF，文件大小：8.08MB，售价：42.6元

共204页，可试读40页，点击往前阅读 ↑↑

文档详细内容（约204页）

6 M. Liwicki. A Graves and H. Bunke the movement of the pen-tip, is captured, while in the off-line case only an ima of the text is available. Because of the greater ease of extracting relevant features, online recognition generally yields better results [1]. Another crucial division is hat between the recognition of isolated characters or words, and the recognition of whole lines of text. Unsurprisingly, the latter is substantially harder, and the ex cellent results that have been obtained for e. g. digit and character recognition [2] 3] have never been matched for complete lines. Lastly, handwriting recognition can be split into cases where the writing style is constrained in some way-for example, only hand printed characters are allowed-and the more challenging scenario where it is unconstrained. Despite more than 40 years of handwriting recognition research [2],B3],[4],[5], developing a reliable, general-purpose sys tem for unconstrained text line recognition remains an open problem. 1.1 State-of-the-Art A well known testbed for isolated handwritten character recognition is the UNIPEN database [6]. Systems that have been found to perform well on UNIPEN include: a writer-independent approach based on hidden Markov models [7]; a hy id technique called cluster generative statistical dynamic time warping (CSDTW)[8], which combines dynamic time warping with HMMs and embeds clustering and statistical sequence modeling in a single feature space; and a sup- port vector machine with a novel Gaussian dynamic time warping kernel [9]. Typ- ical error rates on UNIPEN range from 3% for digit recognition, to about 10% for lower case character recognition Similar techniques can be used to classify isolated words, and this has giv good results for small vocabularies(e.g, a writer dependent word error rate of bout 4.5% for 32 words [10). However an obvious drawback of whole word classification is that it does not scale to large vocabularies For large vocabulary recognition tasks, such as those considered in this chapter the usual approach is to recognize individual characters and map them onto com- plete words using a dictionary. Naively, we could do this by presegmenting words into characters and classifying each segment. However, segmentation is difficult for cursive or unconstrained text, unless the words have already been recognized This creates a circular dependency between segmentation and recognition that is often referred to as Sayre's paradox [11]. Nonetheless, approaches have been pro- posed where segmentation is carried out before recognition. Some techniques for character segmentation, based on unsupervised learning and data-driven methods are given in [3]. Other strategies first segment the text into basic strokes, rather han characters. The stroke boundaries may be defined in various ways, such as the minima of the velocity, the minima of the y-coordinates, or the points of max- imum curvature. For example, one online approach first segments the data at the minima of the y-coordinates then applies self-organizing maps [12]. Another, off- line, approach[ 13] uses the minima of the vertical histogram for an initial estima- tion of the character boundaries and then applies various heuristics to improve the

6 M. Liwicki, A. Graves, and H. Bunke the movement of the pen-tip, is captured, while in the off-line case only an image of the text is available. Because of the greater ease of extracting relevant features, online recognition generally yields better results [1]. Another crucial division is that between the recognition of isolated characters or words, and the recognition of whole lines of text. Unsurprisingly, the latter is substantially harder, and the excellent results that have been obtained for e.g. digit and character recognition [2], [3] have never been matched for complete lines. Lastly, handwriting recognition can be split into cases where the writing style is constrained in some way—for example, only hand printed characters are allowed—and the more challenging scenario where it is unconstrained. Despite more than 40 years of handwriting recognition research [2], [3], [4], [5], developing a reliable, general-purpose system for unconstrained text line recognition remains an open problem. 1.1 State-of-the-Art A well known testbed for isolated handwritten character recognition is the UNIPEN database [6]. Systems that have been found to perform well on UNIPEN include: a writer-independent approach based on hidden Markov models [7]; a hybrid technique called cluster generative statistical dynamic time warping (CSDTW) [8], which combines dynamic time warping with HMMs and embeds clustering and statistical sequence modeling in a single feature space; and a support vector machine with a novel Gaussian dynamic time warping kernel [9]. Typical error rates on UNIPEN range from 3% for digit recognition, to about 10% for lower case character recognition. Similar techniques can be used to classify isolated words, and this has given good results for small vocabularies (e.g., a writer dependent word error rate of about 4.5% for 32 words [10]). However an obvious drawback of whole word classification is that it does not scale to large vocabularies. For large vocabulary recognition tasks, such as those considered in this chapter, the usual approach is to recognize individual characters and map them onto complete words using a dictionary. Naively, we could do this by presegmenting words into characters and classifying each segment. However, segmentation is difficult for cursive or unconstrained text, unless the words have already been recognized. This creates a circular dependency between segmentation and recognition that is often referred to as Sayre’s paradox [11]. Nonetheless, approaches have been proposed where segmentation is carried out before recognition. Some techniques for character segmentation, based on unsupervised learning and data-driven methods, are given in [3]. Other strategies first segment the text into basic strokes, rather than characters. The stroke boundaries may be defined in various ways, such as the minima of the velocity, the minima of the y-coordinates, or the points of maximum curvature. For example, one online approach first segments the data at the minima of the y-coordinates then applies self-organizing maps [12]. Another, offline, approach [13] uses the minima of the vertical histogram for an initial estimation of the character boundaries and then applies various heuristics to improve the segmentation

Neural Networks for Handwriting Recognition A more satisfactory solution to Sayre's paradox would be to segment and rec ognize at the same time. Hidden Markov models (HMMs) are able to do this which is one reason for their popularity for unconstrained handwriting [14],[15] [16],[17], [18],[19]. The idea of applying HMMs to handwriting recognition was originally motivated by their success in speech recognition [20), where a similar conflict exists between recognition and segmentation. Over the years, numerous refinements of the basic HMM approach have been proposed, such as the writer independent system considered in [7], which combines point oriented and stroke oriented input features. However HMMs have several well-known drawbacks. One of these is that the assume the probability of each observation depends only on the current state, which makes contextual effects difficult to model. Another is that HMMs are ge- nerative, when discriminative models generally give better performance labeling and classification tasks Recurrent neural networks(RNNs) do not suffer from these limitations, and ould therefore seem a promising alternative to HMMs. However the application of RNNs alone to handwriting recognition have so far been limited to isolated cha- racter recognition(e. g. [21)). One reason for this is that the traditional neural work objective functions require a separate training signal for every point in the input sequence, which in turn requires presegmented data. A more successful use of neural networks for handwriting recognition has been to combine them with HMMs in the so-called hybrid approach [22],[23]. A varie- ty of network architectures have been tried for hybrid handwriting recognition,in- cluding multilayer perceptrons [24],[25], time delay neural networks (TDNNS) [18],[26], [27], and RNNs [28],(29],[30]. However, although hybrid models al- leviate the difficulty of introducing context to HMMs, they still suffer from many of the drawbacks of HMMs, and they do not realize the full potential of RNNs for 1.2 Contribution This chapter describes a recently introduced alternative approach, in which a single RNN is trained directly for sequence labeling. The network uses the connectionist temporal classification (CTC) combined with bidirectional Long Short-Term Memory(BLsTM) architecture, which provides access to long range input context in both directions. a further enhancement which allows the network to work in multiple dimensions will be presented in this chapter. The so-called Multidimen- sional LSTM(MDLSTM) is very successful even on raw pixel data. The rest of this Chapter is organized as follows. Section 2 presents the andwritten data and the feature extraction techniques. Subsequently, Section 3 describes the novel neural network classifier. Experimental results are presented in Section 4. Finally, Section 5 concludes this chapter

Neural Networks for Handwriting Recognition 7 A more satisfactory solution to Sayre’s paradox would be to segment and recognize at the same time. Hidden Markov models (HMMs) are able to do this, which is one reason for their popularity for unconstrained handwriting [14], [15], [16], [17], [18], [19]. The idea of applying HMMs to handwriting recognition was originally motivated by their success in speech recognition [20], where a similar conflict exists between recognition and segmentation. Over the years, numerous refinements of the basic HMM approach have been proposed, such as the writer independent system considered in [7], which combines point oriented and stroke oriented input features. However, HMMs have several well-known drawbacks. One of these is that they assume the probability of each observation depends only on the current state, which makes contextual effects difficult to model. Another is that HMMs are generative, when discriminative models generally give better performance labeling and classification tasks. Recurrent neural networks (RNNs) do not suffer from these limitations, and would therefore seem a promising alternative to HMMs. However the application of RNNs alone to handwriting recognition have so far been limited to isolated character recognition (e.g. [21]). One reason for this is that the traditional neural network objective functions require a separate training signal for every point in the input sequence, which in turn requires presegmented data. A more successful use of neural networks for handwriting recognition has been to combine them with HMMs in the so-called hybrid approach [22], [23]. A variety of network architectures have been tried for hybrid handwriting recognition, including multilayer perceptrons [24], [25], time delay neural networks (TDNNs) [18], [26], [27], and RNNs [28], [29], [30]. However, although hybrid models alleviate the difficulty of introducing context to HMMs, they still suffer from many of the drawbacks of HMMs, and they do not realize the full potential of RNNs for sequence modeling. 1.2 Contribution This chapter describes a recently introduced alternative approach, in which a single RNN is trained directly for sequence labeling. The network uses the connectionist temporal classification (CTC) combined with bidirectional Long Short-Term Memory (BLSTM) architecture, which provides access to long range input context in both directions. A further enhancement which allows the network to work in multiple dimensions will be presented in this chapter. The so-called Multidimemsional LSTM (MDLSTM) is very successful even on raw pixel data. The rest of this Chapter is organized as follows. Section 2 presents the handwritten data and the feature extraction techniques. Subsequently, Section 3 describes the novel neural network classifier. Experimental results are presented in Section 4. Finally, Section 5 concludes this chapter

Neural Networks for Handwriting recognition The online and offline databases used are the IAM-OnDB 31 and the IaM DB [32] respectively. Note that these do not correspond to the same handwriting samples: the IAM-OnDB was acquired from a whiteboard, while the IAM-DB consists of scanned images of handwritten forms 2.1 General Processing Steps A recognition system for unconstrained Roman script is usually divided into con- secutive units which iteratively process the handwritten input data to finally obtain he transcription. The main units are illustrated in Fig. I and summarized in this section. Certainly, there are differences between offline and online processing, but he principles are the same. Only the methodology for performing the individual steps differs. First, preprocessing steps are applied to reduce noise in the raw data. The input is raw handwritten data and the output usually consists of extracted text lines. The amount of effort that need to be invested into the preprocessing depends on the given data. If the data have been acquired from a system that does not produce any noise and only single words have been recorded, there is nothing to do in this step But usually the data contains noise which need to be removed to improve the qual ity of the handwriting, e.g., by means of image enhancement. The offline images are furthermore binarized and the online data, which usually contain noisy points and gaps within strokes, is processed with some heuristics to recover from these artifacts. These operations are described in Ref. [33]. The cleaned text data is then automatically divided into lines using some simple heuristics Next, the data is normalized, i. e, it is attempted to remove writer-specific cha- racteristics of the handwriting to make writings from different authors looking more similar to each other. This is a very important step in any handwriting rec- ognition system, because the writing styles of the writers differ with respect to skew, slant, height, and width of the characters. In the literature there is no stan dard way of normalizing the data, but many systems use similar techniques. First, the text line is corrected in regard to its skew, i.e., it is rotated, so that the baseline is parallel to the x-axis. Then, slant correction is performed so that the slant be- comes upright. The next important step is the computation of the baseline and the corpus line. These two lines divide the text into three areas: the upper area, which mainly contains the ascenders of the letters; the middle area, where the corpus of the letters is present; and the lower area with the descenders of some letters. These hree areas are normalized to predefined heights. Often, some additional normali- zation steps are performed, depending on the domain. In offline recognition, thi ning and binarization may be applied. In online recognition the delayed strokes, e.g., the crossing of a"t"or the dot of an"i, are usually removed, and equidistant resampling is applied The databases and benchmark tasks are available on http://www.iamunibe.ch/fki/databases

Neural Networks for Handwriting Recognition 9 The online and offline databases used are the IAM-OnDB [31] and the IAMDB [32] respectively. Note that these do not correspond to the same handwriting samples: the IAM-OnDB was acquired from a whiteboard, while the IAM-DB consists of scanned images of handwritten forms.1 2.1 General Processing Steps A recognition system for unconstrained Roman script is usually divided into consecutive units which iteratively process the handwritten input data to finally obtain the transcription. The main units are illustrated in Fig. 1 and summarized in this section. Certainly, there are differences between offline and online processing, but the principles are the same. Only the methodology for performing the individual steps differs. First, preprocessing steps are applied to reduce noise in the raw data. The input is raw handwritten data and the output usually consists of extracted text lines. The amount of effort that need to be invested into the preprocessing depends on the given data. If the data have been acquired from a system that does not produce any noise and only single words have been recorded, there is nothing to do in this step. But usually the data contains noise which need to be removed to improve the quality of the handwriting, e.g., by means of image enhancement. The offline images are furthermore binarized and the online data, which usually contain noisy points and gaps within strokes, is processed with some heuristics to recover from these artifacts. These operations are described in Ref. [33]. The cleaned text data is then automatically divided into lines using some simple heuristics. Next, the data is normalized, i.e., it is attempted to remove writer-specific characteristics of the handwriting to make writings from different authors looking more similar to each other. This is a very important step in any handwriting recognition system, because the writing styles of the writers differ with respect to skew, slant, height, and width of the characters. In the literature there is no standard way of normalizing the data, but many systems use similar techniques. First, the text line is corrected in regard to its skew, i.e., it is rotated, so that the baseline is parallel to the x-axis. Then, slant correction is performed so that the slant becomes upright. The next important step is the computation of the baseline and the corpus line. These two lines divide the text into three areas: the upper area, which mainly contains the ascenders of the letters; the middle area, where the corpus of the letters is present; and the lower area with the descenders of some letters. These three areas are normalized to predefined heights. Often, some additional normalization steps are performed, depending on the domain. In offline recognition, thinning and binarization may be applied. In online recognition the delayed strokes, e.g., the crossing of a “t” or the dot of an “i”, are usually removed, and equidistant resampling is applied. 1 The databases and benchmark tasks are available on http://www.iam.unibe.ch/fki/databases

点击进入文档下载页（PDF格式）

共204页，可试读40页，点击继续阅读 ↓↓

您可能感兴趣的文档

点击购买下载（PDF）

下载及服务说明

购买前请先查看本文档预览页，确认内容后再进行支付；
如遇文件无法下载、无法访问或其它任何问题，可发送电子邮件反馈，核实后将进行文件补发或退款等其它相关操作；
邮箱：

文档浏览记录