Representation Learning Using Multi-Task Deep Neural Networks for Semantic Classification and Information Retrieval Xiaodong Liu, Jianfeng Gao, Xiaodong.

Representation Learning Using Multi-Task Deep Neural Networks for Semantic Classification and Information Retrieval Xiaodong Liu, Jianfeng Gao, Xiaodong He, Li Deng, Kevin Duh and Ye-Yi Wang 2015/09/01 Ming-Han Yang 1 1

Outline Abstract Introduction Multi-Task Representation Learning
Experiments Related Work Conclusion

Abstract Methods of deep neural networks (DNNs) have recently demonstrated superior performance on a number of natural language processing tasks. However, in most previous work, the models are learned based on : unsupervised objectives, which does not directly optimize the desired task single-task supervised objectives, which often suffer from insufficient training data We develop a multi-task DNN for learning representations across multiple tasks, not only leveraging large amounts of cross-task data, but also benefiting from a regularization effect that leads to more general representations to help tasks in new domains. Our multi-task DNN approach combines tasks of multiple-domain classification (for query classification) and information retrieval (ranking for web search), and demonstrates significant gains over strong baselines in a comprehensive set of domain adaptation. 最近DNN的方法在NLP領域有很好的效果, 而這些model可以透過unsupervised learning或supervised learning 我們提出了一個multi-task DNN 來學習不同task之間的representation 我們的multi-task DNN將query分類和資訊檢索的task合併在一起, 與baseline相比有很大的進步 1

Introduction Recent advances in deep neural networks (DNNs) have demonstrated the importance of learning vector-space representations of text, e.g., words and sentences, for a number of natural language processing tasks Our contributions are of two-folds: First, we propose a multi-task deep neural network for representation learning, in particular focusing on semantic classification (query classification) and semantic information retrieval (ranking for web search) tasks Our model learns to map arbitrary text queries and documents into semantic vector representations in a low dimensional latent space Second, we demonstrate strong results on query classification and web search. Our multi-task representation learning consistently outperforms state-of-the-art baselines Meanwhile, we show that our model is not only compact but it also enables agile deployment into new domains This is because the learned representations allow domain adaptation with substantially fewer in-domain labels NLP領域中, 近年來透過DNN把文字或句子轉換到向量空間的表示法的研究越來越多我們的貢獻有兩點: 1)我們提出multi-task DNN來做表示法學習, 特別是在語義分類(query classification)及語義的資訊檢索(ranking for web search) - 我們的model可以把query和document轉換到低維度的語義向量空間表示法 2)我們的方法與baseline相比有更好的效果 , 而且我們的模型不僅體積小巧，它也能靈活調配到新領域 1

Multi-Task Representation Learning (1/3)
Our multi-task model combines classification and ranking tasks. For concreteness, throughout this paper we will use query classification as the classification task and web search as the ranking task. Query Classification Given a search query 𝑄, the model classifies in the binary fashion as to whether it belongs to one of the domains of interest. Web Search Given a search query 𝑄 and a document list 𝐿, the model ranks documents in the order of relevance. Briefly, our proposed model maps any arbitrary queries Q or documents D into fixed low-dimensional vector representations using DNNs. These vectors can then be used to perform query classification or web search. Our model learns these representations using multi-task objectives. 我們的multi-task模型合併了分類及排序問題, 我們的paper的multi-task的兩個問題分別是query classification及web search ranking Query分類問題方面, 假設search query Q, 我們希望分類器能夠分類出這個query是屬於哪一個domain (binary decision) // 假設現在query是Denver sushi, 如果有4個類別(餐廳, 機場, 旅館, 夜店), 我們希望分類器把此query分類到餐廳, (每個query可以分類到多個domain) 而web search方面, 假設query Q 及 document list L, model依照相關性排序dacuments ---- 我們提出的model, 透過DNN可以map 任意的query或documment 到較低維度的向量 (用multi task的方法來學representation) 1

The architecture of our multi-task DNN model is shown in Figure 1. The lower layers are shared across different tasks, whereas the top layers represent task-speciﬁc outputs. word cat is hashed as the bag of letter trigram { #-c-a, c-a-t, a-t-# } , where # is a boundary symbol. Semantic-Representation Layer Word Hashing layer Input X (query或document) 初始以bag of word表示, map到L2向量(300維), L2這層是我們multi task訓練的語義的表示法 -- 以下分別介紹L1及L2各自的功用 : Word Hash Layer (ℓ1) 傳統方法每個詞是用one-hot representation表示法, 但是在詞彙量很大的時候, 向量維度就會很長, 學習model的代價很高所以他們使用word hashing的方法, 將one-hot向量轉換到letter-trigram空間(50萬維 -> 5萬維), 舉例來說cat會被hash成letter-trigram Word hashing 的方法有兩點可以彌補one-hot representation : 1)OOV可以用letter-trigram來表達, 2)同一個字不同拼法可以map到相近的letter-trigram space Semantic-Representation Layer (ℓ2) 這層在不同task中, 學習共享的表示法, 這層將letter-trigram map到300維的向量 Figure 1 1

This is a shared representation learned across different tasks. This layer maps the letter-trigram inputs into a 300-dimensional vector by : ℓ 2 =𝑓( 𝐖 𝟏 ∙ ℓ 1 ) For each task, a nonlinear transformation maps the 300-dimension semantic representation l 2 into the 128-dimension task- specific representation by : ℓ 3 =𝑓( 𝐖 𝟐 𝐭 ∙ ℓ 2 ) Query Classification Output: Suppose Q C 1 ≡ 𝓁 3 =𝑓( 𝐖 2 𝑡= c 1 ∙ 𝓁 2 ) is the 128-dimension task-specific representation for a query Q. The probability that Q belongs to class C 1 is predicted by a logistic regression Web Search Output: For the web search task, both the query Q and the document D are mapped into 128-dimension task-specific representations. Then, the relevance score is computed by cosine similarity. W1是5萬*300的矩陣, f是tanh函數, 1

The Training Procedure
In order to learn the parameters of our model, we use mini-batch-based stochastic gradient descent (SGD). In each iteration, a task 𝑡 is selected randomly, and the model is updated cording to the task-speciﬁc objective. Additional training details: Model parameters are initialized with uniform distribution Moment methods and AdaGrad training speed up the convergence speed but gave similar results as plain SGD. The SGD learning rate is ﬁxed at 𝜖 = 0.1/1024 We run Algorithm 1 for 800K iterations, taking 13 hours on an NVidia K20 GPU. 訓練的細節 : 1) 模型參數採取uniform distribution初始化 2) AdaGrad的方法能夠加速收斂速度, 效果跟SGD差不多 3)我們使用GPU訓練, 跑了80萬個iteration, 共13個小時 1

An Alternative View of the Multi-Task Model
Our proposed multi-task DNN (Figure 1) can be viewed as a combination of a standard DNN for classiﬁcation and a Deep Structured Semantic Model (DSSM) for ranking Figure 3 shows an alternative multi-task architecture, where only the query part is shared among all tasks and the DSSM retains independent parameters for computing the document representations. 我們提出的MLT-DNN(圖1)也能夠看成是使用DNN來分類 + DSSM來排序(如下圖), 只有query的部分有share 我們試著用這種架構來訓練Algorithm 1的方法, 但是發現query classification的效果比較好, 而web search ranking的效果比較差這可能是因為update的數量不平衡, 舉例來說, query的參數update的次數通常會比document的次數多, 這意味著設計multi-task共享的量是很重要的 Figure 2 Figure 3 1

Experiments (1/5) We employ large-scale, real data sets in our evaluation The test data for query classification were sampled from one-year log files of a commercial search engine with labels (yes or no) judged by humans The evaluation metric for query classification is the Area under of Receiver Operating Characteristic (ROC) curve (AUC) score (Bradley, 1997). For web search, we employ the Normalized Discounted Cumulative Gain (NDCG) (Jarvelin and Kekalainen, 2000). Test set中, Web search的query資料有12071個英文query, 每個query-document的pair都有人工標記5個等級的relevance label (bad, fair, good, excellent and perfect) 評估 1) query classification -> 用ROC curve的面積 , 也就是AUC的分數 2) web search -> 用NDCG 1

Experiments (2/5) - Query Classification
Table 2 summarizes the AUC scores for query classification, comparing the following classifiers: SVM-Word: a SVM model with unigram, bigram and trigram surface-form word features. SVM-Letter: a SVM model with letter trigram features DNN: single-task deep neural net MT-DNN: our multi-task proposal The results show that the proposed MT-DNN performs best in all four domains. 1

Experiments (3/5) – Web Search
Table 3 summarizes the NDCG results on web search, comparing the following models: Popular baselines in the web search literature, e.g. BM25, Language Model, PLSA DSSM: single-task ranking model MT-DNN: our multi-task proposal Again, we observe that MT-DNN performs best. 1

Experiments (4/5) – Model Compactness and Domain Adaptation
Comparison of features using SVM classiﬁers 1

Experiments (5/5) – Model Compactness and Domain Adaptation
Comparison of different DNNs DNN1=w1, 是random初始化 ; DNN2=w1是兩個task一起訓練; SVM在數千筆資料的時候表現最好而DNN2在中間範圍表現最好 1

Conclusion In this work, we propose a robust and practical representation learning algorithm based on multi-task objectives. Our multi-task DNN model successfully combines tasks as disparate as classification and ranking, and the experimental results demonstrate that the model consistently outperforms strong baselines in various query classification and web search tasks. Meanwhile, we demonstrated compactness of the model and the utility of the learned query/document representation for domain adaptation. Our model can be viewed as a general method for learning semantic representations beyond the word level. Beyond query classification and web search, we believe there are many other knowledge sources (e.g. sentiment, paraphrase) that can be incorporated either as classification or ranking tasks. A comprehensive exploration will be pursued as future work. 我們提出multi-task DNN的方法來學習representation, multi-task DNN成功合併了分類與排序的問題, 而且實驗顯示效果比baseline好我們也展示了這個model能針對不同的domain學習不同的representation, 並且能夠學習word-level的語義表示法除了query 分類及web search外, 應該有其他資訊像是情緒或改述能夠做分類或排序的任務未來我們將朝著這方面研究 1

Representation Learning Using Multi-Task Deep Neural Networks for Semantic Classification and Information Retrieval Xiaodong Liu, Jianfeng Gao, Xiaodong.

Similar presentations

Presentation on theme: "Representation Learning Using Multi-Task Deep Neural Networks for Semantic Classification and Information Retrieval Xiaodong Liu, Jianfeng Gao, Xiaodong."— Presentation transcript:

Similar presentations

About project

反馈

请登录

Auth with social network:

Representation Learning Using Multi-Task Deep Neural Networks for Semantic Classification and Information Retrieval Xiaodong Liu, Jianfeng Gao, Xiaodong.

Similar presentations

Presentation on theme: "Representation Learning Using Multi-Task Deep Neural Networks for Semantic Classification and Information Retrieval Xiaodong Liu, Jianfeng Gao, Xiaodong."— Presentation transcript:

Similar presentations

About project

反馈