Machine Translation for Conversational Texts Dr Xiaojun Zhang University of Stirling, UK
Machine Translation
Speech Translation
Statistical Machine Translation
Why? Why Chinese MT is worse than non-Chinese MT? IWSLT2015:
Chinese makes extensive use of different tones. Shishi(“试试/实施/事实/实时/时时/石狮/事事/史诗/适时/时事/湿湿/逝世/世事/石室/十世/誓师/施食/失事/诗史/施事/史实/诗诗/师师/十时/师士/矢石/嗜食/事势/实事/失实/…”) Chinese has no spaces between words. 江南大学 or 江南/大学? Chinese lacks inflection - “吃了没?” - Did you eat? - “吃了.” - Yes, I did. - “还想吃吗?” - Do you want to eat more? – “明天再吃.” - No, I’ll eat it tomorrow.
Dropped Pronoun Test Set Baseline 18.76 Oracle 22.98
How? How to predict the dropped pronouns? How to generate the dropped pronouns? How to translate the dropped pronouns?
Data Sets
DP Annotation 我 DP-annotated Chinese Corpus
DP Generation Detection Prediction DP position detection -> sequence labeling RNN: Prediction Specific DP prediction -> word class classification Feed-forward NN with 4 layers
DP Translation DP-inserted input (DP-ins. TM) Subjective personal pronoun DP-inserted input (DP-ins. TM) Train a translation model based on the parallel corpus which source side is DP-annotated. DP-generated input (DP-gen. Input) Pre-process the input sentence by inserting possible DPs with the DP generation model.
Evaluation
Results Analysis
Thanks xiaojun.zhang@stir.ac.uk