The Second Question WHAT IS DOMAIN-SPECIFIC SMT SYSTEM? 11
WHAT IS DOMAIN-SPECIFIC SMT SYSTEM? The Second Question 11
ypical SMT VS Domain-Specific SMT o Typical SMT systems are trained on a large and broad corpus (i.e. general-domain)and deal with texts with ignoring domain o Performance depends heavily upon the quality and quantity of training data. o Outputs preserve semantics of the source side but lack morphological and syntactic correctness. o Understandable translation quality. BBC News Example [5] Input: Holly wood actor Jackie Chan has apologised over his son's arrest on drug-related charges, saying he feels ashamed"and"sad Google Output: 好萊塢影星成龍已經道歉了他兒子的被捕與毒品有關的指控·說他 感覺“羞恥”和“悲傷”。 [savailaBleathttp://www.bbc.com/news/world-asia-china-28871698.(bBcNEws20August2014) (12/84)
Typical SMT vs. Domain-Specific SMT Typical SMT systems are trained on a large and broad corpus (i.e., general-domain) and deal with texts with ignoring domain. Performance depends heavily upon the quality and quantity of training data. Outputs preserve semantics of the source side but lack morphological and syntactic correctness. Understandable translation quality. BBC News Example [5]. [5] Available at http://www.bbc.com/news/world-asia-china-28871698. (BBC News 20 August 2014.) Input: Hollywood actor Jackie Chan has apologised over his son's arrest on drug-related charges, saying he feels "ashamed" and "sad". Google Output: 好萊塢影星成龍已經道歉了他兒子的被捕與毒品有關的指控,說他 感覺“羞恥”和“悲傷”。 (12/84)
achine Translation ls“ good enough good enough? Learn more o
13
ypical SMT VS Domain-Specific SMT o Domain-Specific SMT systems are trained on a small but relative corpus(i.e. in-domain)and deal with texts from one specific domain. o Consider relevance between training data and what we want to translate (test data). o Outputs preserve semantics of the source side, morphologica and syntactic correctness o Publishable quality. Patent Document Example [6] Input: 本发明涉及新的 tetramic酸型化合物,它从CR-5活性复合物中分离出来,在控制 条件下通过将生物纯的微生物培养液(球毛壳霉 Kunze SCH1705ATCC74489)发酵来 制备复合物。[5] ICONIC Translator Output Novel tetramic acid -type compounds isolated from a CCr-5 active complex produced by fermentation under controlled conditions of a biologically pure culture of the microorganism, Chaetomium globosum Kunze SCH 1705, ATCC 74489, pharmaceutica compositions containing the compounds. [6] Chinese Patent w074712《受体拮抗剂趋化因子》 (14/84)
Typical SMT vs. Domain-Specific SMT Domain-Specific SMT systems are trained on a small but relative corpus (i.e., in-domain) and deal with texts from one specific domain. Consider relevance between training data and what we want to translate (test data). Outputs preserve semantics of the source side, morphological and syntactic correctness. Publishable quality. Patent Document Example [6] [6] Chinese Patent WO01/74772《受体拮抗剂趋化因子》. Input: 本发明涉及新的tetramic酸型化合物,它从CCR-5活性复合物中分离出来,在控制 条件下通过将生物纯的微生物培养液(球毛壳霉Kunze SCH 1705 ATCC 74489)发酵来 制备复合物。[5] ICONIC Translator Output: Novel tetramic acid-type compounds isolated from a CCR-5 active complex produced by fermentation under controlled conditions of a biologically pure culture of the microorganism, Chaetomium globosum Kunze SCH 1705, ATCC 74489 ., pharmaceutical compositions containing the compounds. (14/84)
The Third Question WHAT IS DOMAIN-SPECIFIC TRANSLATION CHALLENGE? 15
WHAT IS DOMAIN-SPECIFIC TRANSLATION CHALLENGE? The Third Question 15