基于规则抽取的时间表达式识别 -英文Ⅲ 高冠吉
对TE3数据集的思考 TE3测试集上的Relaxed Match已经达到98.55,在自动识别之后再 进行一些处理是可行的。 利用语法树:一个时间表达式应当是一个完整的NP、ADVP或CD 对token层面的规则进一步整理 (大致框架:抽取规则抽取待定时间表达式语法树对边界修 正合并)
语法树解决识别过长 This comes just over a week before the start of British Summer Time. This flu season started in early December, a month earlier than usual.
语法树解决识别部分 Leon worked in Texas, a position he had held for almost seven years. China's current economic policies would result in an enormous surge in coal consumption and automobile sales over the next decade.
人工构建token规则(泛化、分类) 可以延用SynTime的token规则,并进行修正和扩充 将类别层次化,解决部分可泛化的时间表达式 Late last July [MONTH_REGEX] Januarys?('s)?|Februarys?('s)?|Marchs?('s)?|Aprils?('s)?|Mays?('s)?|Junes?('s)?|Julys?('s)?|Augusts?('s)?|Septembers?('s)?|Octobers?('s)?|Novembers?('s)?|Decembers?('s)? [PREFIX_REGEX_1] the|this|that|these|those|next|following|consecutive|previous|latter|last|late(st)?|initial|universal|mid(dle)?|final|coming|upcoming|past|future|current|recent|ides|early|each|every|other|alternate|alternating|another|about|around|almost|some|whole|few|several|of|more|less|than|near(ly)?|right