Advanced artificial Intelligence Lecture: Recurrent neural Network
Advanced Artificial Intelligence Lecture 7: Recurrent Neural Network
Outline Recurrent neural Network Vanilla rnns Some rnn variants Backpropagation through time Gradient Vanishing/Exploding Long short-term Memory LSTM Neuron Multiple-layer LSTM Backpropagation through time in LStm Time-Series Prediction
Outline ▪ Recurrent Neural Network ▪ Vanilla RNNs ▪ Some RNN Variants ▪ Backpropagation through time ▪ Gradient Vanishing / Exploding ▪ Long Short-term Memory ▪ LSTM Neuron ▪ Multiple-layer LSTM ▪ Backpropagation through time in LSTM ▪ Time-Series Prediction
Vanilla rnns Sequential data So far, we assume that data points(x, y)'s in a dataset are i.i. d (independent and identically distributed) Does not hold in many applications Sequential data: data points come in order and successive points may be dependent, e.g Letters in a word Words in a sentence/document Phonemes in a spoken word utterance Page clicks in a Web session Frames in a video, etc
Vanilla RNNs ▪ Sequential data So far, we assume that data points (x, y)’s in a dataset are i.i.d (independent and identically distributed) Does not hold in many applications Sequential data: data points come in order and successive points may be dependent, e.g., Letters in a word Words in a sentence/document Phonemes in a spoken word utterance Page clicks in a Web session Frames in a video, etc
Vanilla rnns Sequence Modeling How to model sequential data? Recurrent neural networks(vanilla rNNs) c(t depends on x(1),.x(t) Output all t)depends on hidden activations a+.) LtD) (Bias term omitted (k1) act(( )(k -1)+w(k)a(k-1 x a( summarizes x(…,x1) Earlier points are less important Sourceofslidehttps://ww.youtubecom/watch?v2btuy-fw3c&list=plipcwhqlgjdkvoozhmqswxlja9xw7osok
Vanilla RNNs ▪ Sequence Modeling How to model sequential data? Recurrent neural networks (vanilla RNNs): C(t) depends on x(1) ,··· ,x(t) Output a (L,t) depends on hidden activations: (Bias term omitted) a (·,t) summarizes x(t) ,··· ,x(1) Earlier points are less important Source of slide: https://www.youtube.com/watch?v=2btuy_-Fw3c&list=PLlPcwHqLqJDkVO0zHMqswX1jA9Xw7OSOK
Vanilla rnns Sequence Modeling a(k, =act(zk, t) =act(U(a4-1)+Wa(k-1) Weights are shared across time instances(W(k) Assumes that the“ transition functions”are time invariant(U(k) Our goal is to learn U(k)and W(k) for k=1,. L Sourceofslidehttps://ww.youtubecom/watch?v2btuy-fw3c&list=plipcwhqlgjdkvoozhmqswxlja9xw7osok
Vanilla RNNs ▪ Sequence Modeling Weights are shared across time instances (W(k) ) Assumes that the “transition functions” are time invariant (U(k) ) Our goal is to learn U(k) and W(k) for k = 1,···,L Source of slide: https://www.youtube.com/watch?v=2btuy_-Fw3c&list=PLlPcwHqLqJDkVO0zHMqswX1jA9Xw7OSOK