Exploring Segment Representations for Neural Segmentation Models

Exploring Segment Representations for Neural Segmentation Models
Yijia Liu, Wanxiang Che, Jiang Guo, Bing Qin, and Ting Liu Research Center for Social Computing and Information Retrieval Harbin Institute of Technology 各位下午好，我是来至哈尔滨工业大学的刘一佳，我们论文的题目是exploring segment representations for neural segmentation models。

Problem: NLP Segmentation Problem
我们的这项工作关注的是自然语言处理中的分割问题。很多自然语言处理任务都可以用分割问题建模，比如中文分词，命名实体识别。

Problem: NLP Segmentation Problem
input is a sequence of elements segmentation is a sequence of segment 𝐒=( 𝑠 1 , 𝑠 2 , …, 𝑠 𝑝 ) a segment is a tuple 𝑠= (𝑢, 𝑣, 𝑦) 𝑢: the beginning position 𝑣: the ending position 𝑦: the label associated with the segment (optional) constrained on 𝑣 𝑖 +1= 𝑢 𝑖+1 形式化地，我们可以把分割问题定义为连续片段的序列，每个片段是一个三元组（u，v，y），其中u是片段开始，v是片段结束，y是标签。

Motivating: Can we use word embedding in CWS?
浦东开发与建设浦东 / 开发 / 与 / 建设 Pudong development and construction 我们这项工作的主要动机是如下一个问题：我们能不能在中文分词中使用词向量呢？这实际是一个鸡生蛋、蛋生鸡的问题。

To achieve this gold, we need to access the segment (the potential word) during inference to represent the segment 为了达到这个目标，我们需要在模型的解码过程中：既能够获得潜在的词，也能够对应对其进行表示。

To achieve this gold, we need to access the segment (the potential word) during inference to represent the segment in “浦东开发与建设” “浦东” is a potential word structure prediction segment representation “浦东”: [0.5, 0.3, 0.6, …] “虹桥”: [0.5, 0.2, 0.5, …] they have similar syntactic/semantic function. 而，这两个模型的相互作用方式应该是类似这样的。对于“浦东开发与建设”这个句子，我们的结构预测模型告诉我们浦东是一个潜在的词，而我们的片段表示模型将其进行表示，而更理想的是，这种片段表示能够为我们提供一些上下文相似性等信息。

To achieve this gold, we need to access the segment (the potential word) during inference to represent the segment in “浦东开发与建设” “浦东” is a potential word semi-Markov CRF deep learning “浦东”: [0.5, 0.3, 0.6, …] “虹桥”: [0.5, 0.2, 0.5, …] they have similar syntactic/semantic function. 在这项工作中，我们的结构预测模型是semi-Markov CRF，而我们用deeplearning 模型对片段表示进行建模。

Refresh on semi-CRF semi-CRF model the conditional probability of 𝑆 as
𝑝 𝑆 𝑋 = 1 𝑍 exp 𝑊Φ 𝑆,𝑋 by restricting segment representation within on certain segment, Φ(𝑆,𝑋) can be decomposed as 𝑖 𝑝 𝜙 𝑠 𝑖 ,𝑋 core problem in achieving good segment performance Representing 𝝓( 𝒔 𝒊 ,𝑿) Semi-crf是直接建模给定输入的条件下输出分割序列的概率。如果我们在解码过程中把我们考虑的信息限制在一个片段内，\phi可以表示成加和的形式。然后，我们可以发现，关键问题在于如何表示phi。而我们工作的重点也在关注如何表示phi

(Old-school) 𝜙 𝑠 𝑖 ,𝑋 representation
crf styled features: input unit level information e.g.: character semi-crf styled features: segment-level information e.g.: length of the segment suffer from sparsity and can not efficient utilizing the unlabeled data 传统的\phi的表示用一种稀疏离散的特征向量表示。这些特征一般来自输入级别的特征，比如字等等。另一种特征通常是片段级别的特征。但这种特征需要人工设计比较好的有泛化性的特征。

Neuralized 𝜙 𝑠 𝑖 ,𝑋 representing
neural crf styled features: composing the representation of input units into a vector handling variable length nature 𝑆𝐶𝑂𝑀 𝑃 𝑖 neural semi-crf styled features: embed the entire segment learning from labeled/unlabeled data 𝑆𝐸𝑀 𝐵 𝑖 今年来使用神经网络进行表示学习是一个研究的热点。主要原因在于两点网络结构可以对于自然语言的组合特性进行建模神经网络可以通过在大规模数据上学习分布式表示这项工作中关注的使用神经网络对片段进行表示也尝试从这两个方面出发。一个是采用神经网络，对输入单元进行建模，将输入单元组合成一个向量表示。另一个则是对片段采用一种嵌入的方式直接进行表示

Composing Input Units 𝑆𝐶𝑂𝑀 𝑃 𝑖 =𝑁𝑒𝑡( 𝑥 𝑢 , 𝑥 𝑢+1 ,…, 𝑥 𝑣 ) Net SRNN
SCNN SCONCATE 在对输入单元进行建模组合时，我们尝试采用了三种网络结构。Rnn，cnn以及简单拼接。由于片段具有变长的特性，我们的网络模型应该能够对变长输入进行建模。Rnn和cnn都可以很好处理这一特性。对于简单拼接，我们在semi-CRF解码过程中往往设置一个最大长度。在这种情况下，我们可以用部零的方法将变长输入的建模变成定长输入的建模。这样，我们就能够获得输入单元组合的一种表示。

Embedding Entire Segment
𝑆𝐸𝑀 𝐵 𝑖 =𝑙𝑜𝑜𝑘𝑢𝑝( 𝑥 𝑢 𝑥 𝑢+1 … 𝑥 𝑣 ) Problem: where did the embedding come from? Answer 1: learning from training data [overfitting] Answer 2: learning from unlabeled but auto-segmented data auto-segmented data: homogeneous or heterogeneous? 到了整个片段的表示，我们采用一种查表的片段嵌入表示方法。这种方法回到了我们最早的问题，在分词时如何用词向量。这里的词向量应该从哪获得。一种方式是用从训练数据获得，但这种方式回答来严重的过拟合。另一种方式是在很多semi-supervised learning方法中经常使用，就是用基线模型自动地分析大规模文本。然后把自动结果当成特征输入。所以，我们的另一种获得片段嵌入的是直接在自动分析大规模文本上学习片段嵌入。

Final Model 我们最终表示片段的模型是输入单元组合网络以及片段嵌入进行组合获得的。

Experiments Two typical NLP segmentation tasks: NER and CWS Baselines:
sparse feature CRF neural sequence labeling neural CRF 我们用三个基线模型。分别是：传统稀疏特征的模型神经网络序列标注以及神经网咯crf

w/ Input Units Composition only
structure predication models outperform classification but difference is not significant within structure models 我们的只采用输入单元组合网络的结果。我们发现如果只使用输入单元的信息，neural semi-crf取得了比nn-label更好的性能，但是基本与nn-crf性能相似。我们认为这主要是由于两者都是结构预测模型。而且都没有利用完整片段的信息。

w/ Segment Embedding: Learning from the Training data?
severe overfitting initialize with embedding solve this problem 接下来的实验中，我们尝试把片段表示加入模型。我们首先尝试从training data中获得segment，实验中，这样的模型不出意外地过拟合了。但如果我们用从自动segment的数据上学习的片段嵌入作为初始化。这样的模型的性能就变得比较正常了。

w/ Segment Embedding: Auto-segmented data from Homo- or Hetero- baseline
Generally, they all help Hetero- is a little better than Homo- baseline confirmed with boosting in machine learning 然后我们尝试使用不同的基线模型获得自动切分数据。其中包括传统稀疏特征的模型以及我们的neural semi-CRF baseline。实验发现如果自动切分数据是从异构的稀疏特征的模型红学习到的，带来的提升更明显。

Final Result Using segment-level representation greatly improve the performance 最后，我们把片段嵌入加入我们的模型，我们发现加入片段嵌入后，ner可以获得多于0.7个点的提升，而分词上多于接近2个点提升

Final Result (compare w/ NER SOTA)
achieve comparable performance without domain-specific knowledge

Final Result (compare w/ CWS SOTA)
achieve SOTA on two datasets

Conclusion We thoroughly study representing the segment in neural semi-CRF SCONCATE is comparable with SRNN but runs faster Segment embedding greatly improve the performance Our code can be found at: nn-semicrf

Thanks and Questions!

Exploring Segment Representations for Neural Segmentation Models

Similar presentations

Presentation on theme: "Exploring Segment Representations for Neural Segmentation Models"— Presentation transcript:

Similar presentations

About project

反馈

请登录

Auth with social network:

Exploring Segment Representations for Neural Segmentation Models

Similar presentations

Presentation on theme: "Exploring Segment Representations for Neural Segmentation Models"— Presentation transcript:

Similar presentations

About project

反馈