Representation Learning Using Multi-Task Deep Neural Networks for Semantic Classification and Information Retrieval Xiaodong Liu, Jianfeng Gao, Xiaodong.

Slides:



Advertisements
Similar presentations
Unsupervised feature learning: autoencoders
Advertisements

Classification of Web Query Intent Using Encyclopedia 基于百科知识的查询意图获取
資料採礦與商業智慧 第四章 類神經網路-Neural Net.
如何在Elsevier期刊上发表文章 china.elsevier.com
How can we become good leamers
二維品質模式與麻醉前訪視滿意度 中文摘要 麻醉前訪視,是麻醉醫護人員對病患提供麻醉相關資訊與服務,並建立良好醫病關係的第一次接觸。本研究目的是以Kano‘s 二維品質模式,設計病患滿意度問卷,探討麻醉前訪視內容與病患滿意度之關係,以期分析關鍵品質要素為何,作為提高病患對醫療滿意度之參考。 本研究於台灣北部某醫學中心,通過該院人體試驗委員會審查後進行。對象為婦科排程手術住院病患,其中實驗組共107位病患,在麻醉醫師訪視之前,安排先觀看麻醉流程衛教影片;另外對照組111位病患,則未提供衛教影片。問卷於麻醉醫師
华东师范大学软件学院 王科强 (第一作者), 王晓玲
个人总结及展望 主讲人:胡玲玲.
Homework 2 : VSM and Summary
-Artificial Neural Network- Hopfield Neural Network(HNN) 朝陽科技大學 資訊管理系 李麗華 教授.
Academic Year TFC EFL Data Collection Outline 学年美丽中国英语测试数据收集概述
Leftmost Longest Regular Expression Matching in Reconfigurable Logic
A Question Answering Approach to Emotion Cause Extraction
深層學習 暑期訓練 (2017).
Visualizing and Understanding Neural Machine Translation
Module 5.
Some Effective Techniques for Naive Bayes Text Classification
Improving classification models with taxonomy information
指導教授:許子衡 教授 報告學生:翁偉傑 Qiangyuan Yu , Geert Heijenk
毕业论文报告 孙悦明
NLP Group, Dept. of CS&T, Tsinghua University
模式识别 Pattern Recognition
机器翻译前沿动态 张家俊 中国科学院自动化研究所
Source: IEEE Access, vol. 5, pp , October 2017
Digital Terrain Modeling
创建型设计模式.
InterSpeech 2013 Investigation of Recurrent-Neural-Network Architectures and Learning Methods for Spoken Language Understanding University of Rouen(France)
Advanced Artificial Intelligence
Word-Entity Duet Representations for Document Ranking
Interval Estimation區間估計
參加2006 SAE年會-與會心得報告 臺灣大學機械工程系所 黃元茂教授
Formal Pivot to both Language and Intelligence in Science
药物和疾病啥关系 ? 李智恒.
消費者偏好與效用概念.
数据库内容及检索功能 – 如何利用这些资源帮助科技论文的写作与发表 钟似璇 (Sixuan Zhong s.
Source: IEEE Transactions on Image Processing, Vol. 25, pp ,
Towards Emotional Awareness in Software Development Teams
Answering aggregation question over knowledge base
高性能计算与天文技术联合实验室 智能与计算学部 天津大学
Advanced word vector representations
Ericsson Innovation Award 2018 爱立信创新大赛 2018
Date: 2012/05/14 Source: Bo Zhao et. al (CIKM’11)
虚 拟 仪 器 virtual instrument
Course 4 分類與預測 Classification and Prediction
Learn Question Focus and Dependency Relations from Web Search Results for Question Classification 各位老師大家好,這是我今天要報告的論文題目,…… 那在題目上的括號是因為,前陣子我們有投airs的paper,那有reviewer對model的名稱產生意見.
Representation Learning of Knowledge Graphs with Hierarchical Types
A Data Mining Algorithm for Generalized Web Prefetching
績效考核 一.績效考核: 1.意義 2.目的 3.影響績效的因素 二.要考核什麼? 三.誰來負責考核? 四.運用什麼工具與方法?
Distance Vector vs Link State
An Efficient MSB Prediction-based Method for High-capacity Reversible Data Hiding in Encrypted Images 基于有效MSB预测的加密图像大容量可逆数据隐藏方法。 本文目的: 做到既有较高的藏量(1bpp),
An organizational learning approach to information systems development
计算机问题求解 – 论题1-5 - 数据与数据结构 2018年10月16日.
李宏毅專題 Track A, B, C 的時間、地點開學前通知
Efficient Query Relaxation for Complex Relationship Search on Graph Data 李舒馨
Introduction of this course
(二)盲信号分离.
More About Auto-encoder
钱炘祺 一种面向实体浏览中属性融合的人机交互的设计与实现 Designing Human-Computer Interaction of Property Consolidation for Entity Browsing 钱炘祺
Speaker : YI-CHENG HUNG
參考資料: 林秋燕 曾元顯 卜小蝶,Chap. 1、3 Chowdhury,Chap.9
Distance Vector vs Link State Routing Protocols
Class imbalance in Classification
如何在Elsevier期刊上发表文章 china.elsevier.com
之前都是分类的蒸馏很简单。然后从分类到分割也是一样,下一篇是检测的蒸馏
WiFi is a powerful sensing medium
Homework 2 : VSM and Summary
Gaussian Process Ruohua Shi Meeting
《神经网络与深度学习》 第10章 模型独立的学习方式
Hybrid fractal zerotree wavelet image coding
Presentation transcript:

Representation Learning Using Multi-Task Deep Neural Networks for Semantic Classification and Information Retrieval Xiaodong Liu, Jianfeng Gao, Xiaodong He, Li Deng, Kevin Duh and Ye-Yi Wang 2015/09/01 Ming-Han Yang 1 1

Outline Abstract Introduction Multi-Task Representation Learning Experiments Related Work Conclusion

Abstract Methods of deep neural networks (DNNs) have recently demonstrated superior performance on a number of natural language processing tasks. However, in most previous work, the models are learned based on : unsupervised objectives, which does not directly optimize the desired task single-task supervised objectives, which often suffer from insufficient training data We develop a multi-task DNN for learning representations across multiple tasks, not only leveraging large amounts of cross-task data, but also benefiting from a regularization effect that leads to more general representations to help tasks in new domains. Our multi-task DNN approach combines tasks of multiple-domain classification (for query classification) and information retrieval (ranking for web search), and demonstrates significant gains over strong baselines in a comprehensive set of domain adaptation. 最近DNN的方法在NLP領域有很好的效果, 而這些model可以透過unsupervised learning或supervised learning 我們提出了一個multi-task DNN 來學習不同task之間的representation 我們的multi-task DNN將query分類和資訊檢索的task合併在一起, 與baseline相比有很大的進步 1

Introduction Recent advances in deep neural networks (DNNs) have demonstrated the importance of learning vector-space representations of text, e.g., words and sentences, for a number of natural language processing tasks Our contributions are of two-folds: First, we propose a multi-task deep neural network for representation learning, in particular focusing on semantic classification (query classification) and semantic information retrieval (ranking for web search) tasks Our model learns to map arbitrary text queries and documents into semantic vector representations in a low dimensional latent space Second, we demonstrate strong results on query classification and web search. Our multi-task representation learning consistently outperforms state-of-the-art baselines Meanwhile, we show that our model is not only compact but it also enables agile deployment into new domains This is because the learned representations allow domain adaptation with substantially fewer in-domain labels NLP領域中, 近年來透過DNN把文字或句子轉換到向量空間的表示法的研究越來越多 我們的貢獻有兩點: 1)我們提出multi-task DNN來做表示法學習, 特別是在語義分類(query classification)及語義的資訊檢索(ranking for web search) - 我們的model可以把query和document轉換到低維度的語義向量空間表示法 2)我們的方法與baseline相比有更好的效果 , 而且我們的模型不僅體積小巧,它也能靈活調配到新領域 1

Multi-Task Representation Learning (1/3) Our multi-task model combines classification and ranking tasks. For concreteness, throughout this paper we will use query classification as the classification task and web search as the ranking task. Query Classification Given a search query 𝑄, the model classifies in the binary fashion as to whether it belongs to one of the domains of interest. Web Search Given a search query 𝑄 and a document list 𝐿, the model ranks documents in the order of relevance. Briefly, our proposed model maps any arbitrary queries Q or documents D into fixed low-dimensional vector representations using DNNs. These vectors can then be used to perform query classification or web search. Our model learns these representations using multi-task objectives. 我們的multi-task模型合併了分類及排序問題, 我們的paper的multi-task的兩個問題分別是query classification及web search ranking Query分類問題方面, 假設search query Q, 我們希望分類器能夠分類出這個query是屬於哪一個domain (binary decision) // 假設現在query是Denver sushi, 如果有4個類別(餐廳, 機場, 旅館, 夜店), 我們希望分類器把此query分類到餐廳, (每個query可以分類到多個domain) 而web search方面, 假設query Q 及 document list L, model依照相關性排序dacuments ---- 我們提出的model, 透過DNN可以map 任意的query或documment 到較低維度的向量 (用multi task的方法來學representation) 1

Multi-Task Representation Learning (2/3) The architecture of our multi-task DNN model is shown in Figure 1. The lower layers are shared across different tasks, whereas the top layers represent task-specific outputs. word cat is hashed as the bag of letter trigram { #-c-a, c-a-t, a-t-# } , where # is a boundary symbol. Semantic-Representation Layer Word Hashing layer Input X (query或document) 初始以bag of word表示, map到L2向量(300維), L2這層是我們multi task訓練的語義的表示法 -- 以下分別介紹L1及L2各自的功用 : Word Hash Layer (ℓ1) 傳統方法每個詞是用one-hot representation表示法, 但是在詞彙量很大的時候, 向量維度就會很長, 學習model的代價很高 所以他們使用word hashing的方法, 將one-hot向量轉換到letter-trigram空間(50萬維 -> 5萬維), 舉例來說cat會被hash成letter-trigram Word hashing 的方法有兩點可以彌補one-hot representation : 1)OOV可以用letter-trigram來表達, 2)同一個字不同拼法可以map到相近的letter-trigram space Semantic-Representation Layer (ℓ2) 這層在不同task中, 學習共享的表示法, 這層將letter-trigram map到300維的向量 Figure 1 1

Multi-Task Representation Learning (3/3) This is a shared representation learned across different tasks. This layer maps the letter-trigram inputs into a 300-dimensional vector by : ℓ 2 =𝑓( 𝐖 𝟏 ∙ ℓ 1 ) For each task, a nonlinear transformation maps the 300-dimension semantic representation l 2 into the 128-dimension task- specific representation by : ℓ 3 =𝑓( 𝐖 𝟐 𝐭 ∙ ℓ 2 ) Query Classification Output: Suppose Q C 1 ≡ 𝓁 3 =𝑓( 𝐖 2 𝑡= c 1 ∙ 𝓁 2 ) is the 128-dimension task-specific representation for a query Q. The probability that Q belongs to class C 1 is predicted by a logistic regression Web Search Output: For the web search task, both the query Q and the document D are mapped into 128-dimension task-specific representations. Then, the relevance score is computed by cosine similarity. W1是5萬*300的矩陣, f是tanh函數, 1

The Training Procedure In order to learn the parameters of our model, we use mini-batch-based stochastic gradient descent (SGD). In each iteration, a task 𝑡 is selected randomly, and the model is updated cording to the task-specific objective. Additional training details: Model parameters are initialized with uniform distribution Moment methods and AdaGrad training speed up the convergence speed but gave similar results as plain SGD. The SGD learning rate is fixed at 𝜖 = 0.1/1024 We run Algorithm 1 for 800K iterations, taking 13 hours on an NVidia K20 GPU. 訓練的細節 : 1) 模型參數採取uniform distribution初始化 2) AdaGrad的方法能夠加速收斂速度, 效果跟SGD差不多 3)我們使用GPU訓練, 跑了80萬個iteration, 共13個小時 1

An Alternative View of the Multi-Task Model Our proposed multi-task DNN (Figure 1) can be viewed as a combination of a standard DNN for classification and a Deep Structured Semantic Model (DSSM) for ranking Figure 3 shows an alternative multi-task architecture, where only the query part is shared among all tasks and the DSSM retains independent parameters for computing the document representations. 我們提出的MLT-DNN(圖1)也能夠 看成是使用DNN來分類 + DSSM來排序(如下圖), 只有query的部分有share 我們試著用這種架構來訓練Algorithm 1的方法, 但是發現query classification的效果比較好, 而web search ranking的效果比較差 這可能是因為update的數量不平衡, 舉例來說, query的參數update的次數通常會比document的次數多, 這意味著設計multi-task共享的量是很重要的 Figure 2 Figure 3 1

Experiments (1/5) We employ large-scale, real data sets in our evaluation The test data for query classification were sampled from one-year log files of a commercial search engine with labels (yes or no) judged by humans The evaluation metric for query classification is the Area under of Receiver Operating Characteristic (ROC) curve (AUC) score (Bradley, 1997). For web search, we employ the Normalized Discounted Cumulative Gain (NDCG) (Jarvelin and Kekalainen, 2000). Test set中, Web search的query資料有12071個英文query, 每個query-document的pair都有人工標記5個等級的relevance label (bad, fair, good, excellent and perfect) 評估 1) query classification -> 用ROC curve的面積 , 也就是AUC的分數 2) web search -> 用NDCG 1

Experiments (2/5) - Query Classification Table 2 summarizes the AUC scores for query classification, comparing the following classifiers: SVM-Word: a SVM model with unigram, bigram and trigram surface-form word features. SVM-Letter: a SVM model with letter trigram features DNN: single-task deep neural net MT-DNN: our multi-task proposal The results show that the proposed MT-DNN performs best in all four domains. 1

Experiments (3/5) – Web Search Table 3 summarizes the NDCG results on web search, comparing the following models: Popular baselines in the web search literature, e.g. BM25, Language Model, PLSA DSSM: single-task ranking model MT-DNN: our multi-task proposal Again, we observe that MT-DNN performs best. 1

Experiments (4/5) – Model Compactness and Domain Adaptation Comparison of features using SVM classifiers 1

Experiments (5/5) – Model Compactness and Domain Adaptation Comparison of different DNNs DNN1=w1, 是random初始化 ; DNN2=w1是兩個task一起訓練; SVM在數千筆資料的時候表現最好 而DNN2在中間範圍表現最好 1

Conclusion In this work, we propose a robust and practical representation learning algorithm based on multi-task objectives. Our multi-task DNN model successfully combines tasks as disparate as classification and ranking, and the experimental results demonstrate that the model consistently outperforms strong baselines in various query classification and web search tasks. Meanwhile, we demonstrated compactness of the model and the utility of the learned query/document representation for domain adaptation. Our model can be viewed as a general method for learning semantic representations beyond the word level. Beyond query classification and web search, we believe there are many other knowledge sources (e.g. sentiment, paraphrase) that can be incorporated either as classification or ranking tasks. A comprehensive exploration will be pursued as future work. 我們提出multi-task DNN的方法來學習representation, multi-task DNN成功合併了分類與排序的問題, 而且實驗顯示效果比baseline好 我們也展示了這個model能針對不同的domain學習不同的representation, 並且能夠學習word-level的語義表示法 除了query 分類及web search外, 應該有其他資訊像是情緒或改述能夠做分類或排序的任務 未來我們將朝著這方面研究 1

1