Download presentation
Presentation is loading. Please wait.
1
Machine Translation for Conversational Texts
Dr Xiaojun Zhang University of Stirling, UK
2
Machine Translation
3
Speech Translation
4
Statistical Machine Translation
5
Why? Why Chinese MT is worse than non-Chinese MT? IWSLT2015:
6
Chinese makes extensive use of different tones.
Shishi(“试试/实施/事实/实时/时时/石狮/事事/史诗/适时/时事/湿湿/逝世/世事/石室/十世/誓师/施食/失事/诗史/施事/史实/诗诗/师师/十时/师士/矢石/嗜食/事势/实事/失实/…”) Chinese has no spaces between words. 江南大学 or 江南/大学? Chinese lacks inflection - “吃了没?” - Did you eat? - “吃了.” - Yes, I did. - “还想吃吗?” - Do you want to eat more? – “明天再吃.” - No, I’ll eat it tomorrow.
7
Dropped Pronoun Test Set Baseline 18.76 Oracle 22.98
8
How? How to predict the dropped pronouns?
How to generate the dropped pronouns? How to translate the dropped pronouns?
9
Data Sets
10
DP Annotation 我 DP-annotated Chinese Corpus
11
DP Generation Detection Prediction
DP position detection -> sequence labeling RNN: Prediction Specific DP prediction -> word class classification Feed-forward NN with 4 layers
12
DP Translation DP-inserted input (DP-ins. TM)
Subjective personal pronoun DP-inserted input (DP-ins. TM) Train a translation model based on the parallel corpus which source side is DP-annotated. DP-generated input (DP-gen. Input) Pre-process the input sentence by inserting possible DPs with the DP generation model.
13
Evaluation
14
Results Analysis
15
Thanks
Similar presentations