RNNLMTraining1. Obtain a big corpus formed as word sequences x(1), x(2),., x(t) (other formsneedtobepre-processed)交通大学
Training 1. Obtain a big corpus formed as word sequences � (1) , � (2) ,., � (�) (other forms need to be pre-processed) RNN LM
RNNLMTraining1. Obtain a big corpus formed as word sequences x(1), x(2),., x(t) (other formsneedtobepre-processed)Input RNN; Compute the output probability (t) of each step t; (each word)2交通大学
Training 1. Obtain a big corpus formed as word sequences � (1) , � (2) ,., � (�) (other forms need to be pre-processed) 2. Input RNN; Compute the output probability � (�) of each step �; (each word) RNN LM
RNNLMTraining1. Obtain a big corpus formed as word sequences x(1), x(2), ., x(t) (otherformsneedtobepre-processed)2.Input RNN; Computethe outputprobabilityy(t)of eachstept;(each word)3.The cross entropy of the predicted word and the ground truth word is used asthe lossfunction;;J)(0) = CE(g(,g(0) - - Z g ) = -1og +-WEV
Training RNN LM
RNNLMTraining1. Obtain a big corpus formed as word sequences x(1), x(2),.., x(t) (other formsneedtobepre-processed)Input RNN; Compute the output probability y(t) of each step t; (each word)23. The cross entropy of the predicted word and the ground truth word is used asthe loss function;;J0(0) = CE(g(0,g(0) --Z og) =-1og-WEV4.Compute training loss;TTgy(t)Z J(t)(0) =J(0) =>Y109&t+1=1-1
Training 1. Obtain a big corpus formed as word sequences � (1) , � (2) ,., � (�) (other forms need to be pre-processed) 2. Input RNN; Compute the output probability � (�) of each step �; (each word) 3. The cross entropy of the predicted word and the ground truth word is used as the loss function;; 4. Compute training loss; RNN LM
Outlines1.RNNbasedLM2.Seq2seqModel3.AttentionMechanism
Outlines 1. RNN based LM 2. Seq2seq Model 3. Attention Mechanism