PRIMT: A Pick-Revise Framework for Interactive Machine Translation Shanbo Cheng, Shujian Huang, Huadong Chen, Xinyu Dai and Jiajun Chen Nanjing University By Jiawei Ling
The Pick-Revise IMT Framework Introduction IMT Traditional IMT and Pick-Revise framework The Pick-Revise IMT Framework Pick Revise Decoder and Model Adaption Automatic Suggestion Models PSM RSM Experiments Example Analysis Conclusion
IMT Human translators usually have to modify the results generated by a machine translation (MT) system which needs a lot of modifications, and is time- consuming. To speed up the process, interactive machine translation (IMT) is proposed which instantly update the translation result after every human action. Because the translation quality could be improved after every update, IMT is expected to generate high quality translations with less human actions. 1、在开始的机器翻译中,经常使用post-editing,即译后编辑,指的是“通过少量的人工修改以对机器生成的翻译进行完善”的过程。
Traditional IMT Typical IMT systems usually use a left-to-right sentence completing framework in which the users process the translation from the beginning of the sentence and interact with the system at the left-most error. It is difficult to modify critical translation errors at the end of a sentence. Critical translation errors are those errors that has large impact on the translation of other words or phrases, which are often caused by the inherent difficulty of translating source phrases. 1、典型的IMT系统经常使用从左到右的句子翻译框架,翻译过程从句子开始,并对最左边的翻译错误进行人机交互。假设从句子开始到被修改的部分,这个部分叫做“前缀“,是正确的,系统会在给定前缀的后面生成新的翻译。 并且从左至右修改将延迟歧义点的修改,降低了交互的效率。
Introduction to Pick-Revise Framework Pick: a wrongly-translated phrase is selected from the whole sentence. Revise: the correct translation is selected from the translation table (or manually added) to replace the original one. Our system then re-translates the sentence and searches for the best translation using previous modifications as constraints. we propose two automatic suggestion models that could predict the wrongly- translated phrases and select the revised translation. Pick:即从整个句子中挑选出被错误翻译的短语,Revise:即从翻译表中(或者手工添加的翻译)选择相对正确的翻译,去修改之前(错误的)翻译。 3、句子会被重新翻译,系统也将原先的修改作为约束搜索出最好的翻译。
Difference between PR and L2R 在left-to-right翻译系统中,系统选择最左侧的错误”to discuss”,修改为”discuss”。但是这样并不会带来更加有效的效果,所以我们需要更多的人机交互提高翻译质量。 在pick-revise系统中,假设我们挑选“反恐”作为最严重的翻译错误,然后将其从“the”修改为“anti-terrorism”。之后句子将会被重新翻译,不仅生成了正确的翻译,而且提高了翻译的质量。
(Sij,t’) (Sij,t) Start Model Adaption S1,…,Sn Constrained Decoder Acceptable? Picking Revising Model Adaption Stop Yes S1,…,Sn E1,…,En (Sij,t’) (Sij,t) No 框架系统使用带约束的解码器生成翻译,约束由原先的pick和revise过程生成。 pick和revise的结果也被收入到模型适应中 整个过程会循环直到翻译被用户接受。
Pick In the picking step, the users pick the wrongly-translated phrase, (sji ,t). Aiming at finding critical errors in the translation, caused by errors in the translation table or inherent translation ambiguities. To make the picking step easier to be integrated into MT system, we limit the selection of translation errors to be those phrases in the previous PR-cycle output. For more convenient user interactions, in our PRIMT system, critical errors can be picked from both the source and target side by simply a mouse click on it. 1、s[i..j]是包括了源语句中从i到j位置的短语,被翻译成t 错误越严重,改正翻译错误使得翻译质量提高更大,因为严重的翻译错误会对文本翻译造成很大的影响。
Revise The users revise the translation of sij by selecting the correct translation t′ from the translation table, or manually add one if there is no correct translation in the translation table. Whether to perform selection or adding depends on the quality of the translation table. When the translation system is trained with large enough parallel data, the quality of the translation table is usually high enough to offer the correct translation. 2、此外,对于被选中的短语,短语表中的翻译选项在用户面前以表的形式呈现,用户仅需简单的使用鼠标点击正确的翻译完成修订的操作,或者将一个新的翻译输入到一个输入区域。
Decoder and Model Adaption We use a constrained decoder to search for the best translation with the previous PRPs as constraints. It makes an extra comparison between each translation option and previous PR pairs, which ignores all the phrases that overlap with the source side of a pick-revise pair (PRP). It makes the search space much smaller than standard decoding. 0、在一个Pick-revise循环中,pick-revise对(s[i..j],t’)被收入到解码器中。
The Picking Suggestion Model (PSM) The goal of PSM is to automatically recognize those phrases that might be wrongly-translated, and suggest users to pick these phrases. Within all the phrases of a source sentence, we need to separate the wrongly- translated phrases and correctly-translated phrases. We use the translation quality gain after the revising action as a measurement. 为了进一步减少人的操作,我们在pick和revise操作中使用一个自动化的建议模型,以给用户提供pick和revise操作的建议。因为在pick和revise操作中,会在大量候选中实行操作,我们使用分类为基础的方法对两个操作建立模型。接下来我们介绍如何将pick和revise定义为分类任务,并且选择特征去对这些建立模型。 3、因为翻译错误会导致翻译质量的下降,我们将revise操作后翻译质量的提高作为衡量标准。将修改操作后翻译质量提高的那些短语当做曾经错误翻译过的短语,那些修改之后反而翻译质量退化的短语当做正确翻译的短语。
The Picking Suggestion Model (PSM) determine whether the phrase is difficult-to-translate. determine whether the current translation option is correct. 将pick过程建立模型需要两方面信息
我们使用翻译模型,语言模型,词汇重排序模型,计数模型,词性标注和词汇
The Revising Suggestion Model (RSM) The goal of RSM is to predict the correct translation and suggest users to replace the wrong translation with the predicted one. We use two criteria to distinguish correct translation options from wrong translation options: The correct translation option should be a substring of the references. The correct translation option should be consistent with pretrained word alignment on the translated sentence pair. With the above criteria, we select all correct translation options as positive instances for the revising step, and randomly sample the same number of wrong translation options to be negative instances. 1、对于一个短语,词汇表就有很多的翻译选择,我们需要将其分为正确和错误的翻译选择。(并不会让用户去标记判断这些翻译) 2、第一个标准保证了选择本身的重要性,第二个标准保证了翻译选择不会选择源短语之外的翻译 3、特别的,用基线系统的翻译选择视为错误的例子。
The Revising Suggestion Model (RSM) For translations of a given source phrase, there is no need to compare their source-side information because these translation options share the same source phrase and context. Features mainly focus on estimating the translation quality of a given translation option. 这些功能主要集中在估计给定的翻译选项的翻译质量
Experiments in ideal environment 我们对其中可以通过我们目前的机器翻译系统使用强制解码产生的参考句子进行实验。强制解码迫使解码器生成几乎和参考相同的翻译,意味着不必输入新的单词生成正确的翻译。我们只模拟人的修改操作作为在短语表中选择最好的翻译,保证了短语表包含了每个短语的正确翻译。可以看到第一次PR操作,改正最严重的错误使得翻译质量得到很大的提高,BLEU(一种机器翻译的自动评价方法),KSMR (Keystroke and Mouse Action Ratio) 达到正确译文所需的键盘敲击次数与鼠标点击次数占正确译文长度的比例,和译后编辑相比,使用PRIMT框架使得人们可以用更少的交互就能得到更好的翻译,这个efficiency啊
Experiments in general environment 我们也得出了在一般环境下,翻译质量也得到了显著的提高。因为机器翻译系统本身的局限性,在一些句子中,翻译表中可能没有包括句子中源短语的正确翻译。尽管在一般情况下,BLEU的提升率不如理想环境,但是相对来说仍然有着大幅度提升。
Using Automatic Suggestion Models 我们用分类表现和翻译表现,说明了自动化建议模型的有效性。因为只有被预测为正确翻译选项才会被用于IMT系统,因此左表的准确率(提取出的正确信息条数 / 提取出的信息条数),召回率(提取出的正确信息条数 / 样本中的信息条数 )和F-score(F 值即为正确率和召回率的调和平均值)是在正确的翻译选项基础上计算的(因为很难去自动识别正确翻译,当RSM分类所有的翻译为错误时,保持翻译不变)。前馈神经网络有一定的提升,而PSM和RSM的F值均在0.60-0.70左右,可以说准确率很高了。
Using Automatic Suggestion Models 我们也测量了当模型运用于PR框架时翻译质量的提升。当随机Pick时,对翻译质量几乎没有什么提升,但是使用了PSM模型时BLEU却有了一定的提升,说明了在修改步骤中,BLEU的提升并不是因为长的翻译匹配。随机修订也不会带来BLEU的显著提升,使用RSM模型也只有不到2的提升。 一般的,使用PSM和RSM中一个,仍然会得到翻译质量的提升。但是相对于完全模拟的结果,提升相对来说很小,说明人工参与对提升翻译质量还是很重要的。如果有更好的模型或者拥有更多数据,对自动建议模型的质量会有很大的提升。
Example Analysis 1、第一次循环挑中“第六”,从the变为the 6th,引起confirmed变为confirms,第二次循环将病例从cases修改为case,“禽流感死亡病例”又改为death case from the bird flu 2、第一次挑中“需要 一定”,使得“通常”由“is”改为usually,第二次将过程从process改为course,使得,改为,and,与此同时,course移位,最后一次循环“很难”从it改为it cannot be,但是一蹴而就没有适当的翻译选择,因此需要人工翻译员添加,并生成参考翻译 3、第一次选中“无法”作为critical error,与之有关的,将充分扫除改为fully clear;第二次选择“以色列”,并使回答由response改成reply,但是句子翻译依旧不一样,因为语言模型和词汇重排序模型更倾向于错误的短语顺序,使得the us放在句末,这是机器翻译系统本身的问题,在框架中无法解决。
Conclusion By correcting the critical error instead of the left most one, our framework could improve the translation quality in a quicker and more efficient way. By using automatic suggestion models, we could reduce human interaction to a single type, either picking or revising. The performance of current framework is still related to the underlying MT system. Further improvement could be achieved by supporting other type of interactions, such as reordering operations, or building the system with stronger statistical models. 3、框架仍然是以机器翻译系统为基础
Q&A Thank you~