Pre-Neural MachineTranslationAlignmentiscomplexAlignmentcanbeone-to-manyWecall thisafertile wordsa0SAnd-Lethe·programmeAndprogram-athe-étehasprogrambeen-mishasimplemented-enbeenapplicationimplementedone-to-manyalignmentExamplesfrom:"TheMathematicsof StatisticalMachineTranslation:ParameterEstimation,Brownetal,1993.http://www.aclweb.org/anthologv/J93-2003
l Alignment is complex Alignment can be one-to-many Examples from: “The Mathematics of Statistical Machine Translation: Parameter Estimation", Brown et al, 1993. http://www.aclweb.org/anthology/J93-2003 Pre-Neural Machine Translation
Pre-Neural MachineTranslationAlignmentiscomplexAlignmentcanbemany-to-many(phrase-level)juosTheLesThepoorpauvrespoorsontdon'tIdon'thavedemunishaveanyanymoneymoneymany-to-manyalignmentphrasealignmentExamples from:"TheMathematics of StatisticalMachineTranslation:ParameterEstimation",Brown etal.1993.,http://www.aclweb.org/anthology/93-2003
l Alignment is complex Alignment can be many-to-many (phrase-level) Examples from: “The Mathematics of Statistical Machine Translation: Parameter Estimation", Brown et al, 1993. http://www.aclweb.org/anthology/J93-2003 Pre-Neural Machine Translation
Pre-Neural Machine TranslationLearningalignmentforSMTWe learn P(α, aly) as a combination of many factors, including:Probabilityofparticularwordsaligning(alsodependsonpositioninsent).Probabilityof particularwordshavingaparticularfertility(numberofcorresponding words)·etc.Alignments a are latentvariables:Theyaren't explicitly specified in the data!.Requiretheuseofspecial learningalgorithms(likeExpectation-Maximization)forlearningtheparametersofdistributionswithlatentvariables艾通大守
l Learning alignment for SMT • We learn as a combination of many factors, including: • Probability of particular words aligning (also depends on position in sent) • Probability of particular words having a particular fertility (number of corresponding words) • etc. • Alignments a are latent variables: They aren’t explicitly specified in the data ! • Require the use of special learning algorithms (like Expectation-Maximization) for learning the parameters of distributions with latent variables Pre-Neural Machine Translation
Pre-Neural Machine TranslationDecodingforSMTargmax,P(αly)P(y)Language ModelQuestion:TranslationModelHowtocomputethis argmax?Wecouldenumerateeverypossibleyandcalculatetheprobability?>Tooexpensive!Answer:Imposestrongindependenceassumptionsinmodel,usedynamicprogrammingforgloballyoptimal solutions (e.g.Viterbialgorithm).Thisprocess iscalleddecoding
l Decoding for SMT • We could enumerate every possible y and calculate the probability? → Too expensive! • Answer: Impose strong independence assumptions in model, use dynamic programming for globally optimal solutions (e.g. Viterbi algorithm). • This process is called decoding Question: How to compute this argmax? Translation Model Language Model Pre-Neural Machine Translation
Pre-Neural Machine TranslationDecodingforSMTgehtjanichtnachhauseerheISyesnotafterhouseitistoaredo nothome,Itofcoursedoesnotaccordingtogoeschamber,hegoisnotinathomeitisnothomehewill beisnotunderhouseitgoesdoesnotreturnhomedo notdo nothegoestoISarefollowingisafterallnotafterdoesnottonotisnotarenotisnotaSource:"StatisticalMachineTranslation",Chapter6,Koehn,2009https://www.cambridge.0rg/core/books/statistical-machine-translation/94EADF9F680558E13BE759997553CDE5
l Decoding for SMT Source: ”Statistical Machine Translation", Chapter 6, Koehn, 2009. https://www.cambridge.org/core/books/statistical-machine-translation/94EADF9F680558E13BE759997553CDE5 Pre-Neural Machine Translation