Exploiting Coarse-to-Fine Task Transfer for Aspect-level Sentiment Classification 大家下午好,我叫李正来自香港科技大学,今天我给大家介绍一下我们在aaai上的工作,叫做挖掘用于aspect-level的情感分类由粗到细的任务迁移。
Outline Aspect-level Sentiment Classification Background & Motivation Problem Definition: “coarse-to-fine task transfer” Multi-Granularity Alignment Network (MGAN) Coarse-to-Fine Attention (reduce task discrepancy) Contrastive Feature Alignment (reduce feature distribution discrepancy) Experiment Settings Comparative Study Future work 演讲主要分为4部分,首先,我们介绍aspect-level情感分类任务的背景,以及在该任务下进行迁移的原因,从而引出我们新问题的设定,即由粗到细的任务迁移。接着,我们介绍用于解决该问题的多力度对齐网络,其中包含了由粗到细的注意力机制,用于降低任务的差异。以及对比特征迁移用于减少领域间特征分布的差异。最后我们介绍试验设计,对比研究,以及未来的工作方向。 This presentation will be divided into four parts. First, we will introduce the background and motivation for aspect-level sentiment classification task, as well as the new problem definition: “coarse-to-fine task transfer”. Then we will introduce the multi-granularity alignment network, which consists of two core components “coarse-to-fine attention” and contrastive feature alignment for solving the task discrepancy and feature distribution discrepancy respectively. Finally, we will describe the experiments, including settings and comparative study, and give the future work.
Background Aspect-level Sentiment Classification (ASC) is to infer the overall opinions / sentiments of the user review towards the given aspect. (P(y | a, x)) The input: Aspect: phrase of opinion entity. (a) Context: the original review sentence. (x) The output : Sentiment prediction (y). Aspect can behave as aspect category (AC) : implicitly appears in the sentence, a general category of the entities. aspect term (AT) : a specific entity that explicitly occurs in the sentence. Example: AC-level task AT-level task Aspect级别的的情感分析是为了推理用户评论关于给定的对象的情感,所以输入的文本来自两个,一个是aspect即评价对象词组,另外一个是原始的评论句子。输出是情感的预测。其中aspect又表示成aspect category或者aspect term,aspect category指的是隐式出现在句子比较general实体类别,例如左边图中有一条评论,这个三文鱼很好吃,但是服务员非常的粗鲁。用于对其中的两个aspect category即food seafood fish以及service分别表达了正向以及负向的情感。Aspect term指的是显式的出现在句子中的特定实体。例如右边图中相同的评论,用于对其中两个aspect term即salmon以及waiter分别表达的正向和负向的情感。 Aspect-level Sentiment Classification (ASC) is to infer the overall opinions / sentiments of the user review towards the given aspect. The input texts come from two sources: Aspect: phrase of opinion entity. Context: the original review sentence. The output is the sentiment prediction. Besides, we need to note that the aspect can behave as aspect category or aspect term. The aspect category : implicitly appears in the sentence, a general category of the entities. For example, let’s see the left picture, in the review ‘the salmon is delicious but the waiter is very rude. The users speaks positively and negatively towards the two aspect categories ‘food seafood fish’ and service respectively. However, the aspect term describes a specific entity that explicitly occurs in the sentence. Considering the same reviews, the aspect terms are the salmon and waiter, the user express positive and negative sentiments towards them. Aspect category Sentiment polarity Aspect term Sentiment polarity food seafood fish + salmon + service - waiter - The salmon is delicious but the waiter is very rude. The salmon is delicious but the waiter is very rude.
Motivation – Why Transfer? Aspect-level sentiment analysis - current solutions : RNN (sequential patterns) + attention mechanism (aspect-specific context features) Data-driven, depend on large corpora. The state-of-the-art methods still cannot achieve satisfactory results. Aspect-level sentiment analysis concerning with providing polarity detection at a more fine-grained level, is intuitively more suitable for commercial applications like targeted recommendation and advertisement. Besides, existing domain adaptation tasks for sentiment analysis focus on traditional sentiment classification without considering the aspect. 那么aspect级别的目前主流的做法就是RNN+attention模型,RNN用于捕获序列的pattern,attention用来找到aspect-specific的上下文特征。这类方法是data-driven的,比较依赖是大语料,但是由于该任务数据集的缺乏,现有的方法都没法取得令人满意的结果。 我们在这个任务上面进行迁移学习的研究也是因为这个问题更有价值,aspect-level的情感分析更关注于细粒度的情感,这对于商业应用例如目标推荐以及广告都是有帮助的。第二个,也是因为现有的情感领域自适应都是集中于传统的情感分类而没有考虑到aspect.
Motivation – What to transfer? (A? -> B?) AT-level dataset (B) aspect terms are required to be comprehensively manually labeled or extracted by sequence labeling algorithms from the sentences. low-resource, expensive, limits the potential of neural models. AC-level dataset (A) aspect category can be pre-defined, a small set. rich-resource, beneficial auxiliary source domains. More easily to collect: commercial services can define a set of valuable aspect categories towards products or events in a particular domain. (e.g., “food”, “service”, “speed”, and “price” in the Restaurant domain) 我们通过观察分析到,Aspect term级别的数据集都是需要人手工的标注aspect或者通过序列标注的算法提取,因为数据集比较匮乏,而且标记昂贵,这极大的限制了神经网络的潜力。 但是aspect category数据集更容易获取,这是因为aspect category数量比较少,而且可以预先定义,那么商业应用就可以在一个特定的领域定义一系列关于产品或者事件的aspect category,例如在餐馆领域中的食品,服务,速度,价格。这样的话,大量的收集用户关于不同aspect类别的偏好数据就比较的现实。
Screen is crystal clear but the system is quite slow. Problem Definition Coarse-to-Fine Task Transfer: (new transfer setting) Both the domains and task granularities are different. Source domain (Restaurant) AC-level task: Coarse-grained aspect Aspect category Sentiment polarity < food seafood fish, +> < service, - > The salmon is delicious but the waiter is very rude. 因此我们提出了一个新的迁移的setting,即从有大量数据的aspect category级别的 task任务的迁移到只有少量数据的Aspect term级别的任务。由于aspect category比较粗粒度,而aspect term更加细粒度,所以这个新问题也叫做,由粗到细的任务迁移。 AT-level task: Fine-grained aspect Target domain (Laptop) Aspect term Sentiment polarity < screen, +> < system, - > Screen is crystal clear but the system is quite slow.
Challenges Task discrepancy: inconsistent aspects granularity between tasks. Source aspects: coarse-grained aspect categories, lack a priori position context information. Target aspects are fine-grained aspect terms, which have accurate position information. Feature distribution discrepancy: the distribution shift for both the aspects and its context. 解决这个问题主要由两个挑战,第一个是任务间的差异即两个任务的对象粒度不一致,原领域的对象是粗粒度的aspect category,其缺乏在上下文中先验的位置信息,但是目标领域的aspect 是细粒度的aspect term,其有准确的位置信息。 第二个是这两个任务所处的领域是不一样的,因此对象以及其上下文的由特征分布上的差异,例如,在餐馆领域tasty跟delicious用来对food这个aspect category表达正向的情感,但是在laptop领域,lightweight与responsive用来对mouse指代正向的情感。 Example: Restaurant domain: “tasty” & “delicious” are used to express positive sentiment for the aspect category “food”. Laptop domain: “lightweight” & “responsive” indicate positive sentiment towards the aspect term “mouse”.
How to transfer? Multi-Granularity Alignment Network (MGAN): Source Network for AC task: (BiLSTM+C2A+C2F+PaS): one more coarse2fine attention layer Target Network AT task: (BiLSTM+C2A+PaS): a simple, common, attention-based model. 为了进行迁移,我们提出了一个多粒度对齐网络。原网络就比目标网络多了一层由粗到细的attention层,使得两个任务可以在相同的细粒度的级别上进行建模。两个网络的输出的aspect-specific的特征,我们采用了一个对比特征对齐的方法去对齐。目标网络是一个很简单的,常用的attention-based的模型。
Model Overview The proposed MGAN consists of the following five components: Bi-directional LSTM for memory building Generate contextualized word representations. Context2Aspect (C2A) Attention: - Measure the importance of the aspect words with regards to each context word, and generate aspect representations. Coarse2Fine (C2F) Attention (tackle task discrepancy) guided by an auxiliary task, help the AC task modeling at the same fine-grained level with the AT task. Position-aware Sentiment (PaS) Attention - Introduce position information of aspect to detect the most salient features more accurately. Contrastive Feature Alignment (CFA) (feature distribution discrepancy) - fully utilizing the limited target labeled data to semantically align representations across domains. 整个模型主要包含5个模块,双向lstm用于生成上下文的词表达,context2asepct attention用于测量aspect中每个aspect的重要性,并且生成aspect的表达,coarse2fine attention用来帮助aspect category的任务能够和aspect term级别的任务进行相同细力度的建模,position-aware sentiment attention引入了aspect在上下文中的位置信息来帮助检测与aspect最相关的情感特征。最后是由对比特征对齐来语意上的解决领域间特征分布的差异。
Align aspect granularity - Coarse2Fine (C2F) Attention Target task: AT-level position information is effective for better locating the salient sentiment features. C2A attention Tech at HP Tech at HP is very professional but the product is quite insensitive. PaS attention Source task: AC-level C2F attention: capture more specific semantics of the aspect category and its position information conditioned on its context. 对于目标任务,我们是通过绿色的context2aspect的attention来生成aspect的特征表达,比如tech at hp(tech更重要,因为tech是词头,而at hp是描述词),接着用这个表达去从上下文中找到对应的情感特征,即红色部分Position-aware attention。因为aspect term是有具体的位置信息的,而且这个信息是有助于模型找到显著的情感特征,所以我们就将这个aspect位置信息融入到了情感attention中。 对于原任务,因为asepct是粗粒度的,没有位置信息。同时呢,很多样本是包含了相同的aspect category,如果我们仅仅用aspect category字面去生成特征表达,这是不够的。所以我们提出了一个由粗到细的attention来帮助aspect category从上下文中捕获更加具体的,丰富语义以及对应的位置信息。从而使得两个任务上的差异尽可能的小。 As observed in (Chen et al., 2017; Li and Lam, 2017), especially in the hard cases when a sentence contains multiple aspects with different polarities Aspect representation learning may be not enough with only aspect category. The underlying entities can behave diversely in different contexts. - e.g., “food seafood fish” can be instantiated as “salmon”, “tuna”, “taste” and etc. C2A attention food seafood fish The salmon is delicious but the waiter is very rude. PaS attention
Align aspect granularity - Coarse2Fine (C2F) Attention The C2F attention layer consists of three parts: (1) Learning coarse-to-fine process based on an auxiliary self-prediction task: the auto-encoders' manner of reconstructing itself => predicting itself. no need for any additional labeling. Attention mechanism: source aspect 𝑎 𝑠 is not only regarded as a sequence of aspect words, but also as a pseudo-label (category of the aspect) 𝑦 𝑐 , where c∈C and C is a set of aspect categories. 𝑧 𝑖 𝑓 = 𝐮 𝑓 𝑇 tanh 𝐖 𝑓 𝐡 𝑖 ; 𝐡 𝑠 𝑎 + 𝒃 𝑓 , 𝛽 𝑖 𝑓 = exp( 𝑧 𝑖 𝑓 ) 𝑖 ′ =1 n exp( 𝑧 𝑖 ′ 𝑓 ) , 𝐯 a = 𝒊=𝟏 𝒏 𝛽 𝑖 𝑓 𝐡 𝑖 . 这个由粗到细的attention类似于auto-encoder,是通过自我预测的辅助任务来学习,不需要任何额外的标注的。在该辅助任务中,原对象不仅看作是一系列aspect词,同时也作为一个伪标签,例如这里的food seafood fish,我们将其生成的aspect特征表达继续attend 上下文,并且用attend的特征表达反过来预测这个aspect的伪标签。这样的话,如果上下文中包含了与food seafood fish很相关的aspect term的话,例如这里的salmon,那么就能很好的预测它本身。 Auxiliary loss: ℒ 𝑎𝑢𝑥 = 1 𝑛 𝑠 𝑘=1 𝑛 𝑠 𝑐∈𝐶 𝑦 𝑘 𝑐 log( y k 𝑐 )
Align aspect granularity - Coarse2Fine (C2F) Attention (2) Dynamically Incorporating more specific information 𝐯 a of aspect category. Basic idea: there may not exist corresponding aspect term when the context implicitly expresses a sentiment toward the aspect category. Fusion gate 𝐅 : similar to highway connection (Jozefowicz et. al 2015) 𝐅=sigmoid(W[ 𝐯 𝑎 ; 𝐡 𝑠 𝑎 ]+b), 𝒓 𝑠 𝑎 = 𝐅 𝐡 𝑠 𝑎 +(𝟏−𝐅) 𝐯 𝑎 . (3) ) Exploit position information with the aid of the C2F attention 𝛽 𝑖 𝑓 - Basic idea: Up-weighting the words close to the aspect and down-weighting those far away from the aspect (e.g., ‘great food but the service is dreadful.’). 之后我们我们采用的是一个gate的方式,去动态的融合更具体的aspect信息,类似于highway 连接。之所以这样做的目的也是考虑到上下文中可能没有对应的aspect term。 同时呢,由粗到细的attention也提供了一个aspect的位置信息,我们可以通过一系列的转换将该信息用于后面的情感attention层。 C2F attention 𝛽 𝑖 𝑓 can help establish the position relevance with the aid of a location matrix 𝑳. the 𝑖−th context word closer to a possible aspect term with a large value in f will have a larger position relevance 𝑝 𝑖 𝑠 . 𝐿 𝑖 𝑖 ′ =1− 𝑖− 𝑖 ′ 𝑛 , 𝑖, 𝑖 ′ ∈ 1,𝑛 . 𝑝 𝑖 𝑠 = 𝑳 𝑖 𝛽 𝑖 𝑓 .
Align aspect-specific representation - Contrastive Feature Alignment (CFA) The existing unsupervised domain adaptation methods may be impractical: The effectiveness depends on enormous unlabeled target data (expensive: annotations of all aspect terms in the sentences). Contrastive feature alignment (CFA) semantic alignment: ensure distributions from different domains but the same class to be similar. semantic separation: force distributions from different domains and different class to be far apart. We resort to a point-wise way: ℒ 𝑐𝑓𝑎 = 𝑘, 𝑘 ′ 𝜔( 𝑔 𝑠 𝐱 𝑘 𝑠 , 𝒂 𝑘 𝑠 , 𝑔 𝒕 ( 𝐱 𝑘 ′ 𝑠 , 𝒂 𝑘 ′ 𝑠 )) 𝑔 𝑠 & 𝑔 𝑡 : the source and target network. 对于aspect-specific的特征的对齐,之前现有的无监督的领域自适应的方法可能不适应,这是因为它们依赖于目标领域大量的无标签的数据。然后这在我们的问题是是很能获取的,这是因为无标签数据也需要标记中句子中所有的aspect term。所以我们采用了一个对比特征对齐来充分的利用目标领域少量的标签数据。对比特征对齐包含了语义对齐与分离,语义对齐就是希望不同领域中相关类的特征分布能够尽可能的相似,而语义分离指的是希望不同领域中不同类的特征分布能够尽可能的远。由于目标领域数据的缺乏,我们就采用一个point-wise的语义对齐loss。 II. Contrastive function: 𝜔(𝒖,𝐯)= 𝒖−𝐯 2 max(0,𝐷− 𝒖−𝐯 2 ) if y 𝑘 𝑠 = y 𝑘 ′ 𝑡 if y 𝑘 𝑠 ≠ y 𝑘 ′ 𝑡 𝐷: the degree of separation
ℒ 𝑡𝑎𝑟 = ℒ sen 𝑡 + ℒ cfa + ℒ 𝑟𝑒𝑔 𝑡 Alternating Training Training objective Simultaneously align aspect granularity and aspect-specific representations. for the source domain ℒ 𝑠𝑟𝑐 = ℒ sen 𝑠 + ℒ 𝑎𝑢𝑥 + ℒ cfa + ℒ 𝑟𝑒𝑔 𝑠 for the target domain ℒ 𝑡𝑎𝑟 = ℒ sen 𝑡 + ℒ cfa + ℒ 𝑟𝑒𝑔 𝑡 sentiment loss auxiliary loss transfer loss regularization 这是两个网络的训练目标,原网络会比目标网络多一个辅助loss来减少任务之间的差异,那么对比特征对齐就是这里的迁移loss用来减少领域特征分布的差异。 sentiment loss transfer loss regularization
Experiment Setup Datasets Baselines source target AC-level AT-level New corpus: YelpAspect Multi-domains: Restaurant, Hotel, Beautyspa Large-scale: 100K samples for each domain. The dataset is available at the github. https://github.com/hsqmlzno1/MGAN Public benchmarks Multi-domains: SemEval 2014 ABSA challenge (Kiritchenko et al., 2014): Laptop, Restaurant Twitter: collected by (Dong et al., 2014) Small-scale: 1K-3K samples for each domain. Baselines 实验上,我们构造了一个大规模的,多领域,aspect category的级别的语料YelpAspect来作为迁移的原领域,数据集整理好之后会在github上面公开,目标领域我们就用了公共的benchmark,比如Sematic Evaluation的中ABSA的比赛数据集以及一个语法不规则的twiter数据集。我们与现有的一些先进的非迁移的模型进行对比,同时也与一系列监督的领域自适应方法进行对比。 Non-Transfer: AE-LSTM, ATAE-LSTM (Wang et al. 2016): TD-LSTM (Tang et al. 2015) IAN (Ma et al. 2017) MemNet (Tang, Qin, and Liu 2016) RAM (Chen et al. 2017) Transfer: SO: source only FT: Fine-tuning M-DAN: multi-adversarial NN (Ganin et al. 2016) M-MDD: multi-MMD (Gretton et al. 2012).
Comparison with Non-Transfer 与非迁移的方法相比,我们的方法目前取得了最好的结果。 主要可以得出两个结论, 第一个:即使我们对于目标任务用简单的模型,通过迁移的方法,我们也可以取得很好的效果。 第二个,C2F模块可以有效的降低任务间对象的粒度差异,使得更多有用的知识可以提炼出来来帮助目标任务。 Conclusion 1: Even with a simple model for the target task, our model can achieve the best performances than all existing non-transfer methods. Conclusion 2: C2F module can effectively reduce the aspect granularity gap between tasks such that more useful knowledge can be distilled to facilitate the target task.
Comparison with Transfer 与迁移的方法相比,我们发现当很难获取目标领域大量无监督数据的时候,对比特征对齐可以通过考虑领域间,类间与类内的关系来很好的利用少量的标签数据。 Conclusion 3: When it is hard to obtain enormous unlabeled data, CFA can effectively utilize the few labeled data by considering inter/intra-class relations between domains.
Visualization 然后我们比较了有跟没有C2F的模块的区别,我们在一组包含了多种情感的例子下进行了可视化。 可以发现有了C2F模型之后,模型可以进一步的发现原领域更具体的对象的含义,从而使得情感attention模型可以找到真正形容它的情感特征。比如这里的food food cheese,我们先通过C2F找到了ricotta cheese然后找到了 real与not the fake junk。而对于restaurant cuisine,我们先找到了Italian place这个具体的aspect term,然后找到了fake junk情感词,而不是not the fake junk。得益于原领域学习到的知识,目标领域也可以很好的处理这种包含了多种情感的复杂的case。
Visualization
Contribution To the best of our knowledge, a novel transfer setting cross both domain and granularity is first proposed for aspect-level sentiment Analysis A new large-scale, multi-domain AC-level dataset is constructed. The novel coarse2fine attention is proposed to effectively reduce the aspect granularity gap between tasks 我们的贡献主要就是,提出了一个新的迁移的setting,跨领域以及对象粒度。 第二个我们构造了一个大规模的,多领域,aspect category的级别的语料来作为迁移的原领域 第三个就是我们提出了一个由粗到细的attention,来用解决任务间对象粒度的不一致。
Future work Transfer between different aspect categories across domains. Transfer to a AT-level task where the aspect terms are also not given and need to be firstly identified (joint term extraction and aspect sentiment prediction).
Thank You! Questions?