RNNLMWord Embedding(1e(t) = Ex(t)EEEEWord vector (one-hot,distributedthegirlheropenedrepresentation.......)x(1)x(2)x(3)x(4)x(t) E IRIVI
RNN LM � (1) � (2) � (3) � (4) � (�) ∈ ℝ|�| Word vector(one-hot, distributed representation.) � (�) = �� (�) Word Embedding the girl opened her � � � � � (1) � (2) � (3) � (4)
RNNLMh(0)h(1)h(3)h(4)h(2).HiddenLayer3BwWnWhWn一+h(t) = o(Wnh(t-1) + Wee(t) + b1)h(o) is the initial hidden stateOWeWWW.福Word Embeddinge(1)2(4)e(t) = Ex(t)??EEEEWord vector (one-hot,distributedthegirlheropenedrepresentation.......x(1)x(2)x(3)x(4)
RNN LM � (1) � (2) � (3) � (4) �ℎ �ℎ �ℎ ℎ (1) ℎ (2) ℎ (3) ℎ (4) Word vector(one-hot, distributed representation.) � (�) = �� (�) Word Embedding ℎ (�) = �(�ℎℎ (�−1) + ��� (�) + �1) ℎ (0) is the initial hidden state Hidden Layer �ℎ the girl opened her ℎ (0) �� �� �� �� � � � � � (1) � (2) � (3) � (4) � (�) = �� (�)
RNNLMbookslaptopsOutputLayer(4) =P(x(5)|thegirl opened her)y(t) = softmax(Uh(t) + b2) E IRIVlZOO-h(0)h(1)h(2)h(3)h(4)8HiddenLayerwwnwhWh.+h(t) = o(Wnh(t-1) + Wee(t) + b1).h(o) is the initial hidden stateO?WeWeWeW?Word EmbeddingOe(1)(2)(3e(4)e(t) = Ex(t)..?EEEEWord vector (one-hot, distributedthegirlheropenedrepresentation......x(1)x(2)x(3)x(4)
RNN LM � (1) � (2) � (3) � (4) �ℎ �ℎ �ℎ � (4) = �(� (5)|�ℎ� 𝑔𝑖 𝑜𝑝𝑒 ℎ��) ℎ (1) ℎ (2) ℎ (3) ℎ (4) Word vector(one-hot, distributed representation.) Word Embedding ℎ (�) = �(�ℎℎ (�−1) + ��� (�) + �1) ℎ (0) is the initial hidden state Hidden Layer � (�) = ����𝑓�(�ℎ (�) + �2) ∈ ℝ|�| Output Layer �ℎ the girl opened her ℎ (0) �� �� �� �� � � � � � (1) � (2) � (3) � (4) books laptops a zoo � (�) = �� (�) � (�) = �� (�)
RNNLMbooksAdvantagelaptopsy(4) = P(x(5)|the girl opened her)Could process variable-lengthsentences;ZOO-Theoretically, t step usesh(0)h(1)h(2)h(3)h(4)informationof several8iw:wprevious steps;O..Thesizeof model doesn'tWeWeWeWgrow as input becomes longer;?Each step uses asame w(3e(1)(2)e(4)whichsaves computation..?power.EEEEthegirlheropenedx(1)x(2)x(3)x(4)
RNN LM � (1) � (2) � (3) � (4) �ℎ �ℎ �ℎ � (4) = �(� (5)|�ℎ� 𝑔𝑖 𝑜𝑝𝑒 ℎ��) ℎ (1) ℎ (2) ℎ (3) ℎ (4) �ℎ the girl opened her ℎ (0) �� �� �� �� � � � � � (1) � (2) � (3) � (4) books laptops a zoo Advantage • Could process variable-length sentences; • Theoretically, � step uses information of several previous steps; • The size of model doesn’t grow as input becomes longer; • Each step uses a same � which saves computation power
RNNLMbooksAdvantagelaptopsy(4) = P(x(5)|the girl opened her)Couldprocessvariable-lengthsentences;ZOOTheoretically,tstepuses informationh(0)h(1)h(2)h(3)h(4)of several previous steps;The size of model doesn'tgrowaswwwhwhRPinputbecomes longer;O?EachstepusesasamewwhichWeWeWeWsavescomputationpower.福DisadvantageOe(1)(2)(3e(4)福Recursivecomputationisslow,..?Itis difficultto transmit informationofEEEEprevious stepscompletelythegirlheropenedx(1)x(2)x(3)x(4)
RNN LM � (1) � (2) � (3) � (4) �ℎ �ℎ �ℎ � (4) = �(� (5)|�ℎ� 𝑔𝑖 𝑜𝑝𝑒 ℎ��) ℎ (1) ℎ (2) ℎ (3) ℎ (4) �ℎ the girl opened her ℎ (0) �� �� �� �� � � � � � (1) � (2) � (3) � (4) books laptops a zoo Advantage Disadvantage • Recursive computation is slow; • It is difficult to transmit information of previous steps completely. • Could process variable-length sentences; • Theoretically, � step uses information of several previous steps; • The size of model doesn’t grow as input becomes longer; • Each step uses a same � which saves computation power