句子分解 丁文韬
句子分解 流程 原则 基于syntax tree把句子拆分成若干部分 识别拆出的部分之间的关系,构造新的树 拆分出得到的部分应该是更细致的理解的输入 拆分出的每个部分应该能表达充足的语义
拆分句子的方式 拆分依据(syntax tree结构) 拆分得到的每个部分都必须包含下列的结构之一 完整陈述 S | (WH-word | NP)…VP 省略主语的陈述 CC … VP | to VP | V+“ing” … 单独的项 (“,”|“and”|“or”) NP (“,”|“.”|“;”) 括号内的内容 “-LRB-” … “-RRB-”
拆分方法 首先从后向前检测所有单独的项,将其挂在前 方的NP上 从上至下BFS 切下子树时:根节点归在最左子树所属的一半 从前向后(左边界小的优先)枚举连续的一段文本 对应的树结构是否满足模式 若满足,检查切下这一段后剩余部分是否满足模式 若均满足,执行拆分,终止遍历并对得到两个新的 结果调用拆分算法 切下子树时:根节点归在最左子树所属的一半
示例1 Carl_Friedrich_Gauss A contested story relates that, when he was eight, he figured out how to add up all the numbers from 1 to 100. A contested story relates that, when he was eight, he figured out how to add up all the numbers from 1 to 100.
示例1 Carl_Friedrich_Gauss when he was eight, he figured out how to add up all the numbers from 1 to 100 when he was eight ,he figured out how to add up all the numbers from 1 to 100
示例2 Mass_media For example, it is controversial whether to include cell phones , computer games (such as MMORPGs ), and video games in the definition.
示例2 Mass_media For example, it is controversial whether to include {cell phones, computer games, video games in the definition}. For example, it is controversial whether to include …
Q & A Thanks for listening