Pre-Neural Machine Translation( SMT )1990s-2010s:Statistical MachineTranslationCoreidea:LearnaprobabilisticmodelfromdataSupposewe'retranslatingFrench→EnglishWewanttofindbestEnglishsentencey,givenFrench sentencexargmax, P(yl)UseBayesRuletobreak thisdownintotwo componentsto belearned separately:= argmax, P(α|y)P(y)Translation ModelLanguage ModelModelshowtowriteModelshowwordsandphrasesgood English (fluency)should be translated (fidelity))LearntfromparalleldataLearntfrommonolingual data
l 1990s-2010s: Statistical Machine Translation ( SMT ) • Core idea: Learn a probabilistic model from data • Suppose we’re translating French → English. • We want to find best English sentence y, given French sentence x • Use Bayes Rule to break this down into two components to be learned separately: Translation Model Models how words and phrases should be translated (fidelity). Learnt from parallel data. Language Model Models how to write good English (fluency). Learnt from monolingual data. Pre-Neural Machine Translation
Pre-Neural Machine Translation(SMT)1990s-2010s:Statistical MachineTranslationQuestion: How to learn translation model P(αy)?First,needlargeamountofparalleldata(e.g,pairsofhuman-translatedFrench/Englishsentences)TheRosettaStoneAncient EgyptianDemotic大味AncientGreek
l 1990s-2010s: Statistical Machine Translation (SMT) • Question: How to learn translation model ? • First, need large amount of parallel data (e.g., pairs of human-translated French/English sentences) Ancient Egyptian Demotic Ancient Greek The Rosetta Stone Pre-Neural Machine Translation
Pre-Neural Machine TranslationLearningalignmentforSMTQuestion:HowtolearntranslationmodelP(αy)fromtheparallelcorpus?Break it down further: Introduce latent a variable into the model: P(c, ay)whereaisthe alignment,i.e.word-levelcorrespondencebetweensource sentencexandtargetsentenceyfliegeMorgerichnachKanadazur KonferenzwillflyTomorotheconferenceCanada
l Learning alignment for SMT • Question: How to learn translation model from the parallel corpus? • Break it down further: Introduce latent a variable into the model: where a is the alignment, i.e. word-level correspondence between source sentence x and target sentence y Pre-Neural Machine Translation
Pre-Neural Machine TranslationWhatisalignment?Alignmentisthecorrespondencebetweenparticularwordsinthetranslatedsentence pair..Typologicaldifferencesbetweenlanguages leadtocomplicatedalignments!.Note:Somewordshavenocounterpartxnano"spurious"oasuoexnapwordsedeLe.JaponJapan.Japanshakensecoueshakenbyparbytwodeux-nouveauxtwonewquakesseismesnewquakesExamples from:"The Mathematics of Statistical MachineTranslation:ParameterEstimation",www.aclweb.org/anthology/J93-2003htnBPOE
l What is alignment? Alignment is the correspondence between particular words in the translated sentence pair. • Typological differences between languages lead to complicated alignments! • Note: Some words have no counterpart Examples from: “The Mathematics of Statistical Machine Translation: Parameter Estimation", Brown et al, 1993. http://www.aclweb.org/anthology/J93-2003 Pre-Neural Machine Translation
Pre-NeuralMachineTranslationAlignmentiscomplexAlignmentcanbemany-to-oneeThe:.Lexne0balance-restewas-Thethe:.appartenaitterritorybalanceof.auxwasthetheaboriginal.autochtonesterritorypeopleofthemany-to-oneaboriginalalignmentspeople:http://www.aclweb.org/anthologv/193-2003Examples from:"TheMathematicsof StatisticalMachineTranslation:Parameter Estimation,Brownetal,1993
l Alignment is complex Alignment can be many-to-one Examples from: “The Mathematics of Statistical Machine Translation: Parameter Estimation", Brown et al, 1993. http://www.aclweb.org/anthology/J93-2003 Pre-Neural Machine Translation