Outlines1.What is NLG?2.FormalizingNLG3. Decoding from NLG models4. Training NLG models5. Evaluating NLG Systems6.Ethical Considerations
Outlines 1. What is NLG? 2. Formalizing NLG 3. Decoding from NLG models 4. Training NLG models 5. Evaluating NLG Systems 6. Ethical Considerations
Formalizing NLGBasicsofnatural languagegenerationIn autoregressive text generation models, at each time step t, ourmodel takes in a sequence of tokens of text as input (y)<t andoutputsanewtokenytVText Generation ModeYt-4Yt-2Yt-1Yt-3
Formalizing NLG Basics of natural language generation • In autoregressive text generation models, at each time step �, our model takes in a sequence of tokens of text as input � <� and outputs a new token �� �� ��−4 ��−3 ��−2 ��−1
FormalizingNLGBasics of natural languagegenerationIn autoregressive text generation models, at each time step t, ourmodel takes in a sequence of tokens of text as input (y)<t andoutputsanewtokenytYttXtText Generation ModelYt-4yt-2yt-1父Yt-3
Formalizing NLG Basics of natural language generation �� ��−4 ��−3 ��−2 ��−1 ��+1 �� • In autoregressive text generation models, at each time step �, our model takes in a sequence of tokens of text as input � <� and outputs a new token ��
Formalizing NLGBasicsofnatural languagegenerationIn autoregressive text generation models, at each time step t, ourmodel takes in a sequence of tokens of text as input (y)<t andoutputsanewtokenytt+1XtYt+21个Text Generation ModelYt-4Yt-2Yt-1父Yt-3Yt+1
Formalizing NLG Basics of natural language generation �� ��−4 ��−3 ��−2 ��−1 ��+1 ��+1 ��+2 �� • In autoregressive text generation models, at each time step �, our model takes in a sequence of tokens of text as input � <� and outputs a new token ��
FormalizingNLGAlookatasinglestepAt eachtimestept,ourmodel computes avectorof scoresforeachtoken in ourvocabulary,S ERV:S = f((y<t), 0)(.) is your modelThen, we compute a probability distribution Pover w E Vusing these scores:交通大学exp (Sw)P(yt = wly<t) =Zw'evexp(Sw)
Formalizing NLG A look at a single step • At each time step �, our model computes a vector of scores for each token in our vocabulary, S ∈ ℝ�: • Then, we compute a probability distribution �over � ∈ � using these scores: � = �( �<� , �) �(�� = �| �<� ) = exp (��) �′∈� exp (��′ ) �( . ) is your model