Outlines1. What is NLG?2. Formalizing NLG3. Decoding from NLG models4. Training NLG models5. Evaluating NLG Systems6.Ethical Considerations
Outlines 1. What is NLG? 2. Formalizing NLG 3. Decoding from NLG models 4. Training NLG models 5. Evaluating NLG Systems 6. Ethical Considerations
Decoding from NLG modelsDecoding:whatisitall about?At each time step t, our modelcomputes a vector of scoresforeachtokeninourvocabulary,S ERV:S = f((y<t))( : ) is your modelThen, we compute a probability distribution P over these scores(usuallywithasoftmaxfunction):exp (Sw)P(yt = (y<t))Zw'ev exp (Sw)
Decoding from NLG models Decoding: what is it all about? • At each time step �, our model computes a vector of scores for each token in our vocabulary, S ∈ ℝ�: • Then, we compute a probability distribution � over these scores (usually with a softmax function): � = �( �<� ) �( . ) is your model �(�� = �<� ) = exp (��) �′∈� exp (��′ )
Decoding from NLG modelsDecoding: what is it all about?At each time step t, ourmodel computes a vector of scoresforeach token in our vocabulary, S E RV:S = f((y<t)( : ) is your modelThen, we compute a probability distribution P over these scores(usuallywithasoftmaxfunction):exp (Sw)P(yt = (y<t))Zw'ev exp (Sw)Our decoding algorithm defines a function to select a token fromthis distribution:yt = g(P(ytlly<t3)g(.) is your decoding algorithm
Decoding from NLG models Decoding: what is it all about? • At each time step �, our model computes a vector of scores for each token in our vocabulary, S ∈ ℝ�: • Then, we compute a probability distribution � over these scores (usually with a softmax function): • Our decoding algorithm defines a function to select a token from this distribution: �� = � �(��| �<� ) �( . ) is your decoding algorithm � = �( �<� ) �( . ) is your model �(�� = �<� ) = exp (��) �′∈� exp (��′ )
DecodingfromNLGmodelsDecoding:whatisitall about? Our decoding algorithm defines a function to select a token fromthis distributiont = g(P(ytl(y*3, (<t))<END>2YTYT-1JT-2AText Generation ModeN不2YT-1y*2y*1y*YT-2yT-3<START>
Decoding from NLG models Decoding: what is it all about? • Our decoding algorithm defines a function to select a token from this distribution �1 �0 ∗ <START> <END> �� . . �−2 ∗ �−1 ∗ �1 �2 ��−2 ��−1 �2 ��−3 ��−2 ��−1 �� = � �(�� | � ∗ , � <�)
DecodingfromNLGmodelsGreedymethodsArgmaxDecoding: Selects the highest probability token in P(ytly<t)yt = argmaxwev P(Yt = wly<t)BeamSearch·Agreedyalgorithm, but with wider search over candidates High-parameter beam size k. For each time step, k words withthe highest conditional probability are selected
Decoding from NLG models Greedy methods • Beam Search • A greedy algorithm, but with wider search over candidates • High-parameter beam size �. For each time step, � words with the highest conditional probability are selected • Argmax Decoding • Selects the highest probability token in �(�� |�<�) �� = argmax�∈� �(�� = �|�<�)