基于短语的统计机器翻译 ·短语翻译模型:以隐结构短语为基本翻译单元 布什与沙龙举行了会谈 布什 与沙龙 举行了会谈 Bush With Sharon held a talk Bush held a talk With Sharon Bush held a talk With Sharon (Koehn et al., 2003) 6
基于短语的统计机器翻译 • 短语翻译模型:以隐结构短语为基本翻译单元 6 布什 与 沙龙 举行 了 会谈 布什 与 沙龙 举行 了 会谈 Bush with Sharon held a talk Bush held a talk with Sharon Bush held a talk with Sharon (Koehn et al., 2003)
统计机器翻译的优缺点 优点 ·隐结构可解释性高 利用局部特征和动态规划处理指数级结构空间 缺点 线性模型难以处理高维空间中线性不可分的情况 ·需要人类专家设计隐式结构及相应的翻译过程 需要人类专家设计特征 离散表示带来严重的数据稀疏问题 难以处理长距离依赖
统计机器翻译的优缺点 • 优点 • 隐结构可解释性高 • 利用局部特征和动态规划处理指数级结构空间 • 缺点 • 线性模型难以处理高维空间中线性不可分的情况 • 需要人类专家设计隐式结构及相应的翻译过程 • 需要人类专家设计特征 • 离散表示带来严重的数据稀疏问题 • 难以处理长距离依赖 7
难点:长距离调序 held Bush President talk Sharon at Minister Israel the He ouse Prime 如何用上述词语拼成合理的译文?
难点:长距离调序 8 Bush President held a talk with Israeli Prime Minister Sharon at the White House 如何用上述词语拼成合理的译文?
统计机器翻译示例 Chinese 美国总统布什昨天在白宫与以色列总理沙龙就中东局势x 举行了一个小时的会谈。 English Yesterday, U.S. President George W. Bush at the White House with Israeli Prime Minister Ariel sharon on the situation in the middle east held a one-hour talks 9
统计机器翻译示例 9
深度学习带来新思路 nature full stop Is chosen (7, 72.,76. Overall, this process generates sequences of French words according to a probability distribution that depends on the English sentence. This rather naive way of performing machine translation has quickly become competitive with the state-of-the-art, and this raises serious doubts about whether understanding a sen- tence requires anything like the internal symbolic expressions that are manipulated by using inference rules. It is more compatible with the Yann Le cun Yoshua Bengio Geoffrey Hinton (Le Cun et al, 2015)
深度学习带来新思路 10 Yann LeCun Yoshua Bengio Geoffrey Hinton (LeCun et al, 2015)