当前位置：和泉文库 > 基础医学 > 浏览文档

《医学影像信息学概论》课程参考资源：Computational Intelligence Paradigms in Advanced Pattern Classification

Chapter 1 Recent Advances in Pattern Classification Chapter 2 Neural Networks for Handwriting Recognition Chapter 3 Moving Object Detection from Mobile Platforms Using Stereo Data Registration Chapter 4 Pattern Classifications in Cognitive Informatics Chapter 5 Optimal Differential Filter on Hexagonal Lattice Chapter 6 Graph Image Language Techniques Supporting Advanced Classification and Cognitive Interpretation of CT Coronary Vessel Visualizations Chapter 7 A Graph Matching Approach to Symmetry Detection and Analysis

文件格式：PDF，文件大小：8.08MB，售价：42.6元

共204页，可试读40页，点击往前阅读 ↑↑

文档详细内容（约204页）

16 M. Liwicki. A Graves and H. Bunke 3. 3 Bidirectional Recurrent Neural Networks For many tasks it is useful to have access to future as well as past context. In handwriting recognition, for example, the identification of a given letter is helped by knowing the letters both to the right and left of it. Bidirectional Recurrent Neural Networks(BRNNs)[35] are able to access context in both directions along the input sequence. BRNNs contain two separate hidden layers, one of which processes the inputs forwards, while the other processes them backwards. Both hidden layers are connected to the output layer, which therefore has access to all past and future context of every point in the sequence Combining BRNNs and LSTM gives bidirectional LSTM (BLSTM)[42] 3.4 Connectionist Temporal Classification(CTC) Standard rnn objective functions require a presegmented input sequence with a separate target for every segment. This has limited the applicability of RNNs in domains such as cursive handwriting recognition, where segmentation is difficult to determine. Moreover, because the outputs of a standard rNn are a series of in- dependent, local classifications, some form of post processing is required to trans- form them into the desired label sequence. Connectionist Temporal Classification (CTC)(36, 34] is an RNN output layer specifically designed for sequence labeling tasks. It does not require the data to be presegmented, and it directly outputs a probability distribution over label sequences. CTC has been shown to outperform RNN-HMM hybrids in a speech recognition task [36] Q A CtC output layer contains as many units as there are labels in the task, plus ing the softmax function), so that they sum to 1 and are each in the range(0,(. additional or 'no label'unit. The output activations are normalized yk where ak is the unsquashed activation of output unit k at time t, and yk is the ac- tivation of the same unit after the softmax function is applied. The above activations are used to estimate the conditional probabilities p(k, tIx)of observing the label (or blank)with index k at time t in the input yk=p(, t Ix)

16 M. Liwicki, A. Graves, and H. Bunke 3.3 Bidirectional Recurrent Neural Networks For many tasks it is useful to have access to future as well as past context. In handwriting recognition, for example, the identification of a given letter is helped by knowing the letters both to the right and left of it. Bidirectional Recurrent Neural Networks (BRNNs) [35] are able to access context in both directions along the input sequence. BRNNs contain two separate hidden layers, one of which processes the inputs forwards, while the other processes them backwards. Both hidden layers are connected to the output layer, which therefore has access to all past and future context of every point in the sequence. Combining BRNNs and LSTM gives bidirectional LSTM (BLSTM) [42]. 3.4 Connectionist Temporal Classification (CTC) Standard RNN objective functions require a presegmented input sequence with a separate target for every segment. This has limited the applicability of RNNs in domains such as cursive handwriting recognition, where segmentation is difficult to determine. Moreover, because the outputs of a standard RNN are a series of independent, local classifications, some form of post processing is required to transform them into the desired label sequence. Connectionist Temporal Classification (CTC) [36,34] is an RNN output layer specifically designed for sequence labeling tasks. It does not require the data to be presegmented, and it directly outputs a probability distribution over label sequences. CTC has been shown to outperform RNN-HMM hybrids in a speech recognition task [36]. A CTC output layer contains as many units as there are labels in the task, plus an additional ‘blank’ or ‘no label’ unit. The output activations are normalized (using the softmax function), so that they sum to 1 and are each in the range (0; 1):  ′ ′ = k a a t k t k t k e e y , where t k a is the unsquashed activation of output unit k at time t, and t k y is the activation of the same unit after the softmax function is applied. The above activations are used to estimate the conditional probabilities p(k,t | x) of observing the label (or blank) with index k at time t in the input sequence x: y p(k,t | x) t k =

Neural Networks for Handwriting recognition The conditional probability P(r I x)of observing a particular path through he lattice of label observations is then found by multiplying together the label and blank probabilities at every time step p(rlx)=lp(t, tlx)=lyr where I, is the label observed at time t along path I Paths are mapped onto label sequences lEL, where L denotes the set of all strings on the alphabet L of length sT, by an operator B that removes first the repeated labels, then the blanks. For example, both B(a, ,a, b,)and B(,a, a, -- ,a, b, b) yield the labeling(a, a, b). Since the paths are mutually exclusive, the conditional probability of a given labelling lEL is the sum of the probabilities of all the paths corresponding to it: p(|)=∑p(|x) The above step is what allows the network to be trained with unsegmented data The intuition is that, because we don' t know where the labels within a particular transcription will occur, we sum over all the places where they could occur In general, a large number of paths will correspond to the same label sequence so a naive calculation of the equation above is unfeasible. However, it can be effi- ciently evaluated using a graph-based algorithm, similar to the forward-backward algorithm for HMMs. more details about the ctC forward-back ward algorithm appear in [39] 3.5 Multidimensional recurrent Neural Networks Ordinary RNNs are designed for time-series and other data with a single spatio- emporal dimension. However the benefits of RNNS(such as robustness to input distortion, and flexible use of surrounding context) are also advantageous for mul bidimensional data, such as images and video sequences Multidimensional recurrent neural networks(MDRNNs)[43, 34), a special case of Directed Acyclic Graph RNNS [44], generalize the basic structure of RNNs to MDRNNS have as many recurrent connections as there are spatio-temporal dimen sions in the data. This allows them to access previous context information along all input directions. Multidirectional MDRNNs are the generalization of bidirectional RNNs to mul- tiple dimensions. For an n-dimensional data sequence, 2 different hidden layers are used to scan through the data in all directions. As with bidirectional RNNs, all

Neural Networks for Handwriting Recognition 17 The conditional probability p(π | x) of observing a particular path π through the lattice of label observations is then found by multiplying together the label and blank probabilities at every time step: ( | ) ( , | ) , 1 1 ∏ ∏ = = = = T t t T t t t p x p t x y π π π where π t is the label observed at time t along path π . Paths are mapped onto label sequences T l L≤ ∈ , where T L≤ denotes the set of all strings on the alphabet L of length ≤ T , by an operator B that removes first the repeated labels, then the blanks. For example, both B(a,−,a,b,−) and B(−,a,a,−,−,a,b,b) yield the labeling (a,a,b) . Since the paths are mutually exclusive, the conditional probability of a given labelling T l L≤ ∈ is the sum of the probabilities of all the paths corresponding to it: − ∈ = ( ) 1 ( | ) ( | ) B l p l x p x π π The above step is what allows the network to be trained with unsegmented data. The intuition is that, because we don’t know where the labels within a particular transcription will occur, we sum over all the places where they could occur. In general, a large number of paths will correspond to the same label sequence, so a naïve calculation of the equation above is unfeasible. However, it can be efficiently evaluated using a graph-based algorithm, similar to the forward-backward algorithm for HMMs. More details about the CTC forward-backward algorithm appear in [39]. 3.5 Multidimensional Recurrent Neural Networks Ordinary RNNs are designed for time-series and other data with a single spatiotemporal dimension. However the benefits of RNNs (such as robustness to input distortion, and flexible use of surrounding context) are also advantageous for multidimensional data, such as images and video sequences. Multidimensional recurrent neural networks (MDRNNs) [43, 34], a special case of Directed Acyclic Graph RNNs [44], generalize the basic structure of RNNs to multidimensional data. Rather than having a single recurrent connection, MDRNNs have as many recurrent connections as there are spatio-temporal dimensions in the data. This allows them to access previous context information along all input directions. Multidirectional MDRNNs are the generalization of bidirectional RNNs to multiple dimensions. For an n-dimensional data sequence, 2n different hidden layers are used to scan through the data in all directions. As with bidirectional RNNs, all

M. Liwicki. A Graves and H. Bunke he layers are connected to a single output layer, which therefore has access to context information in both directions along all dimensions Multidimensional LSTM (MDLSTM) is the generalization of bidirectional LSTM to multidimensional data 3.6 Hierarchical Subsampling Recurrent Neural Networks Hierarchical subsampling is a common technique in computer vision [45] and oth- er domains with large input spaces. The basic principle is to iteratively re- represent the data at progressively lower resolutions, using a hierarchy of feature extractors. The features extracted at each level are subsampled and used as inp to the next level. The number and complexity of the features typically increases as one climbs the hierarchy. This is much more efficient for high-resolution data than a single flat'feature extractor, since most of the computations are carried out or low resolution feature maps, rather than, for example, raw pixels. A well-known connectionist hierarchical subsampling architecture is Convolu tional Neural Networks [46]. Hierarchical subsampling is also possible with RNNS, and hierarchies of MDLSTM layers have been applied to offline handwrit ition [47]. Hierarchical subsampling with LSTM is equally useful for long ID sequences, such as raw speech data or online handwriting trajectories with a high sampling rate From the point of view of handwriting recognition, the most interesting aspect of hierarchical subsampling RNNs is that they can be applied directly to the raw input data(offline images or online point-sequences) without any normalization or feature extraction 4 Experiments The experiments have been performed with the freely available RNNLIB tool by Alex Graves. This tool implements the network architecture and furthermore pro- vides examples for the recognition of several scripts. 4.1 Comparison with HMMs on the lAM databases The aim of the first experiments was to evaluate the performance of the complete RNN handwriting recognition system, illustrated in Figure 6, for both online and offlne handwriting In particular we wanted to see how it compared to an HMM based system. The online and offline databases used the lAM-OnDB and the IAM-DB respectively(see above). Note that these do not correspond to the same handwriting samples: the IAM-OnDB was acquired from a whiteboard, while the IAM-DB consists of scanned images of handwritten forms http://sourceforge.net/projects/rnnl

18 M. Liwicki, A. Graves, and H. Bunke the layers are connected to a single output layer, which therefore has access to context information in both directions along all dimensions. Multidimensional LSTM (MDLSTM) is the generalization of bidirectional LSTM to multidimensional data. 3.6 Hierarchical Subsampling Recurrent Neural Networks Hierarchical subsampling is a common technique in computer vision [45] and other domains with large input spaces. The basic principle is to iteratively rerepresent the data at progressively lower resolutions, using a hierarchy of feature extractors. The features extracted at each level are subsampled and used as input to the next level. The number and complexity of the features typically increases as one climbs the hierarchy. This is much more efficient for high-resolution data than a single `flat’ feature extractor, since most of the computations are carried out on low resolution feature maps, rather than, for example, raw pixels. A well-known connectionist hierarchical subsampling architecture is Convolutional Neural Networks [46]. Hierarchical subsampling is also possible with RNNs, and hierarchies of MDLSTM layers have been applied to offline handwriting recognition [47]. Hierarchical subsampling with LSTM is equally useful for long 1D sequences, such as raw speech data or online handwriting trajectories with a high sampling rate. From the point of view of handwriting recognition, the most interesting aspect of hierarchical subsampling RNNs is that they can be applied directly to the raw input data (offline images or online point-sequences) without any normalization or feature extraction. 4 Experiments The experiments have been performed with the freely available RNNLIB tool by Alex Graves.2 This tool implements the network architecture and furthermore provides examples for the recognition of several scripts. 4.1 Comparison with HMMs on the IAM Databases The aim of the first experiments was to evaluate the performance of the complete RNN handwriting recognition system, illustrated in Figure 6, for both online and offlne handwriting. In particular we wanted to see how it compared to an HMMbased system. The online and offline databases used were the IAM-OnDB and the IAM-DB respectively (see above). Note that these do not correspond to the same handwriting samples: the IAM-OnDB was acquired from a whiteboard, while the IAM-DB consists of scanned images of handwritten forms. 2 http://sourceforge.net/projects/rnnl/

点击进入文档下载页（PDF格式）

共204页，可试读40页，点击继续阅读 ↓↓

您可能感兴趣的文档

点击购买下载（PDF）

下载及服务说明

购买前请先查看本文档预览页，确认内容后再进行支付；
如遇文件无法下载、无法访问或其它任何问题，可发送电子邮件反馈，核实后将进行文件补发或退款等其它相关操作；
邮箱：

文档浏览记录