A Survey of Multitask Learning

Slides:



Advertisements
Similar presentations
期末考试作文讲解 % 的同学赞成住校 30% 的学生反对住校 1. 有利于培养我们良好的学 习和生活习惯; 1. 学生住校不利于了解外 界信息; 2 可与老师及同学充分交流有 利于共同进步。 2. 和家人交流少。 在寄宿制高中,大部分学生住校,但仍有一部分学生选 择走读。你校就就此开展了一次问卷调查,主题为.
Advertisements

考研英语复试 口语准备 考研英语口语复试. 考研英语复试 口语准备 服装 谦虚、微笑、自信 态度积极 乐观沉稳.
高中英语教材分析与教学建议 福建教育学院外语研修部特级教师:周大明. 课程目录  一、理论创新与教材发展  二、现行教材的理论基础和编写体系  三、图式理论与 “ 话题教学 ”  四、课例分析与教学建议.
Some theoretical notes on boosting
統合分析臨床試驗實之文獻品質評分:以針灸療法之統合分析為例
专题八 书面表达.
-Artificial Neural Network- Hopfield Neural Network(HNN) 朝陽科技大學 資訊管理系 李麗華 教授.
Chapter 8 Liner Regression and Correlation 第八章 直线回归和相关
统计学习基础 卿来云 中国科学院研究生院信息学院 / 统计对研究的意义:
Semantic-Synaptic Web Mining: A Novel Model for Improving the Web Mining 報告者:陳宜樺 報告日期:2015/9/25.
libD3C: 一种免参数的、支持不平衡分类的二类分类器
深層學習 暑期訓練 (2017).
Homework 4 an innovative design process model TEAM 7
-Artificial Neural Network- Adaline & Madaline
An Adaptive Cross-Layer Multi-Path Routing Protocol for Urban VANET
Some Effective Techniques for Naive Bayes Text Classification
Thinking of Instrumentation Survivability Under Severe Accident
模式识别 Pattern Recognition
Manifold Learning Kai Yang
丁 承 國立交通大學經營管理研究所教授 成大統計68級 民國103年6月14日
優質教育基金研究計劃研討會: 經驗分享 - 透過Web 2.0推動高小程度 探究式專題研習的協作教學模式
The Empirical Study on the Correlation between Equity Incentive and Enterprise Performance for Listed Companies 上市公司股权激励与企业绩效相关性的实证研究 汇报人:白欣蓉 学 号:
Source: IEEE Access, vol. 5, pp , October 2017
On Some Fuzzy Optimization Problems
Journal Citation Reports® 期刊引文分析報告的使用和檢索
旅游景点与度假村管理 中山大学新华学院 (Management of Attractions & Resorts) 总学时:54
创建型设计模式.
Pattern Recognition Chapter1 Introduction.
Mechanisms and Machine Theory.
Advanced Artificial Intelligence
Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi
971研究方法課程第九次上課 認識、理解及選擇一項適當的研究策略
Interval Estimation區間估計
子博弈完美Nash均衡 我们知道,一个博弈可以有多于一个的Nash均衡。在某些情况下,我们可以按照“子博弈完美”的要求,把不符合这个要求的均衡去掉。 扩展型博弈G的一部分g叫做一个子博弈,如果g包含某个节点和它所有的后继点,并且一个G的信息集或者和g不相交,或者整个含于g。 一个Nash均衡称为子博弈完美的,如果它在每.
Formal Pivot to both Language and Intelligence in Science
增强型MR可解决 临床放射成像的 多供应商互操作性问题
Lesson 44:Popular Sayings
A Study on the Next Generation Automatic Speech Recognition -- Phase 2
IBM SWG Overall Introduction
谈模式识别方法在林业管理问题中的应用 报告人:管理工程系 马宁 报告地点:学研B107
高性能计算与天文技术联合实验室 智能与计算学部 天津大学
About dotAC.
Guide to a successful PowerPoint design – simple is best
Total Review of Data Structures
前向人工神经网络敏感性研究 曾晓勤 河海大学计算机及信息工程学院 2003年10月.
虚 拟 仪 器 virtual instrument
中央社新聞— <LTTC:台灣學生英語聽說提升 讀寫相對下降>
关联词 Writing.
Learn Question Focus and Dependency Relations from Web Search Results for Question Classification 各位老師大家好,這是我今天要報告的論文題目,…… 那在題目上的括號是因為,前陣子我們有投airs的paper,那有reviewer對model的名稱產生意見.
主講人:陳鴻文 副教授 銘傳大學資訊傳播工程系所 日期:3/13/2010
971研究方法課程第六次上課必讀教材導讀 如何提出一個論文題目或研究問題
高考应试作文写作训练 5. 正反观点对比.
政府的减贫计划如何使资源有效向穷人传递? How should government make and implement specific poverty reduction program to effectively transfer resources to the poor? Wang Sangui.
An organizational learning approach to information systems development
Statistics Chapter 1 Introduction Instructor: Yanzhi Wang.
李宏毅專題 Track A, B, C 的時間、地點開學前通知
Efficient Query Relaxation for Complex Relationship Search on Graph Data 李舒馨
Introduction of this course
(二)盲信号分离.
More About Auto-encoder
钱炘祺 一种面向实体浏览中属性融合的人机交互的设计与实现 Designing Human-Computer Interaction of Property Consolidation for Entity Browsing 钱炘祺
Speaker : YI-CHENG HUNG
國立東華大學課程設計與潛能開發學系張德勝
怎樣把同一評估 給與在不同班級的學生 How to administer the Same assessment to students from Different classes and groups.
Chapter 9 Validation Prof. Dehan Luo
Class imbalance in Classification
WiFi is a powerful sensing medium
Gaussian Process Ruohua Shi Meeting
國際理事的角色 講師: 年指派理事 G L T 地 區 領 導 人 江達隆 博士.
《神经网络与深度学习》 第10章 模型独立的学习方式
Presentation transcript:

A Survey of Multitask Learning 2015/09/22 Ming-Han Yang 1 1

Outline An overview of multitask learning The history of multitask learning Multitask有集體智慧,共同學習的概念 2 1

What is Multitask learning? Multitask learning (MTL) is a machine learning technique that aims at improving the generalization performance of a learning task by jointly learning multiple-related tasks. The key to the successful application of MTL is that the tasks need to be related. Here related does not mean the tasks are similar. Instead, it means at some level of abstraction these tasks share part of the representation. If the tasks are indeed similar learning them together can help transfer knowledge among tasks since it effectively increases the amount of training data for each task. 鄧立的課本提到的multitask: Multitask learning(MTL)是一種machine learning的技術, 它的目標是要透過一起學習多個相關的任務來提升它的generalization的能力(效能) MTL要能成功應用的關鍵是, 一起訓練的任務必須互相有相關性, 而相關性並不代表這些任務彼此很像 而是在一些抽象的概念上, 這些任務共享部分的representation 如果任務之間確實是相似的, 那麼一起學習就可以幫助它們互相傳遞資訊, 也可以相對增加每個任務的訓練資料 3 D. Yu and L. Deng (2014). “Automatic speech recognition - a deep learning approach”, Springer, 219-220.

SDM2012的投影片提到 Multi-task learning與single task learning在training時的不同處在於 : multitask時是一起訓練來截取task之間的內部關聯性 4 Jiayu Zhou, Jianhui Chen and Jieping Ye, Multi-Task Learning: Theory, Algorithms, and Applications, SIAM International Conference on Data Mining, 2012

Learning Methods 5 同樣是SDM2012的投影片提到 Multi-task learning 其實也包含了multi-label learning, multi-label又包含了multi-class learning 也有人認為 multi-task learning 包含在transfer learning裡面, 不過不是paper而是個人的心得報告, 所以比較沒有公信力 5 Jiayu Zhou, Jianhui Chen and Jieping Ye, Multi-Task Learning: Theory, Algorithms, and Applications, SIAM International Conference on Data Mining, 2012

How to do Multitask learning? Multi-task learning is a technique wherein a primary learning task is solved jointly with additional related tasks using a shared input representation. If these secondary tasks are chosen well, the shared structure serves to improve generalization of the model, and its accuracy on an unseen test set. In multi-task learning, the key aspect is choosing appropriate secondary tasks for the network to learn. When choosing secondary tasks for multi-task learning, one should select a task that is related to the primary task, but gives more information about the structure of the problem. ICASSAP 2013, 美國微軟的paper提到 Multi-task learning是一種技術, 透過額外的相關輔助任務與主要任務share representation 如果輔助任務選的好的話, 這個共享的結構能夠增進模型的一般化能力, 並且在沒看過的test set上測試, 也有不錯的正確率 MTL中,關鍵是如何選擇適當的輔助任務 選擇次要任務時,要選擇與主要任務相關,又能提供更多關於這個結構所沒有的資訊 // 這篇paper主要任務是phone 的辨識, 輔助任務他選了三項來實驗效能, 分別是: 預測目前的phone label, 預測前後的state label, 預測前後的phone label, 效果最好的是預測前後phone label 6 ML. Seltzer and J. Droppo(2013). Multi-task learning in deep neural networks for improved phoneme recognition, ICASSP.

The Beginning of Multitask learning Multitask learning has many names and incarnations including learning-to-learn, meta-learning, lifelong learning, and inductive transfer [1] J. Baxter. Learning internal representations. In Proceedings of the International ACM Workshop on Computational Learning Theory, 1995. [2] S. Thrun and L.Y. Pratt. Learning to Learn. Kluwer Academic, 1997 [3] R. Caruana. Multitask learning. Machine Learning, 28:41–75, 1997. [4] S. Thrun. Is learning the n-th thing any easier than learning the first? , NIPS, 1995. Early implementations of multitask learning primarily investigated neural network or nearest neighbor learners [1][3][4]. In addition to neural approaches, Bayesian methods have been explored that implement multitask learning by assuming dependencies between the various models and tasks[5][6]. [5] T. Heskes. Solving a huge number of similar tasks: A combination of multi-task learning and a hierarchical Bayesian approach. ICML, 1998 [6] T. Heskes. Empirical Bayes for learning to learn. ICML, 2004. 1993其實陸陸續續有些multi task概念的paper, 1997年有兩篇集大成, 大家都cite1997這兩篇paper (learning to learn, multitask learning) Multi-task learning的研究大概20年之久, 由於single task learning的過程中忽略了任務之間的聯繫,而現實生活中的學習任務往往是有千絲萬縷的聯繫的,比如圖片的多label分類,人臉的識別等等,這些任務都可以分為多個子任務去學習,多任務學習的優勢就在於能發掘這些子任務之間的關係,同時又能區分這些任務之間的差別。 --- Multitask有很多名字, 像是learning to learn, meta learning, lifelong learning 或者inductive transfer (歸納轉換) 比較早年的134這3篇有的是用neural network, 除了neural network之外; 也有使用bayesian method的, 這類方法是假設不同的模型跟人物之間有相互關係 7 T. Jebara. (2011) Multitask Sparsity via Maximum Entropy Discrimination. In Journal of Machine LearningResearch, (12):75-110.

1997 Multitask learning (1) Multitask Learning is an approach to inductive transfer that improves learning for one task by using the information contained in the training signals of other related tasks. It does this by learning tasks in parallel while using a shared representation; what is learned for each task can help other tasks be learned better. A task will be learned better if we can leverage the information contained in the training signals of other related tasks during learning. 博士論文, 最早MTL paper,通常做NN的multi-task都會cite這篇 MTL是一種歸納轉換的研究, 通過運用與其它相關任務共同訓練中的訊息來增進一個任務的效果 這是透過平行地學習多個任務, share representation, ;每個任務所學到的東西可以幫助其他任務學習得更好。 如果我們可以利用在學習中所含的, 相關任務訓練時的訊息, 那麼就可以學習的比較好。 8 R. Caruana (1997). Multitask learning. Machine Learning, 28(1), 41–75.

1997 Multitask learning (2) This paper reviews prior work on MTL, presents new evidence that MTL in backprop nets discovers task relatedness without the need of supervisory signals. We present an algorithm and results for multitask learning with case-based methods like k-nearest neighbor and kernel regression, and sketch an algorithm for multitask learning in decision trees. 這篇paper回顧MTL以前的相關研究, 並提出了證明MTL在neural network中可以發現任務之間的相關性, 而不用人告訴他 他們也提出了一個MTL應用在KNN及kernel regression, 並且提出了一個將MTL用在decision tree的方法 Figure 2. Multitask Backpropagation (MTL) of four tasks with the same inputs. 9 R. Caruana (1997). Multitask learning. Machine Learning, 28(1), 41–75.

Learning Rate in Backprop MTL 1997 Multitask learning (3) Learning Rate in Backprop MTL Usually better performance is obtained in backprop MTL when all tasks learn at similar rates and reach best performance at roughly the same time. If the main task trains long before the extra tasks, it cannot benefit from what has not yet been learned for the extra tasks. If the main task trains long after the extra tasks, it cannot shape what is learned for the extra tasks. Moreover, if the extra tasks begin to overtrain, they may cause the main task to overtrain too because of the overlap in hidden layer representation. 關鍵問題 = MTL在NN的learning rate 通常所有的任務都用差不多的learning rate, 然後這些任務同時達到最佳的performance, 這樣效果是最好的 如果主要任務訓練的比輔助任務快, 他就沒辦法學到 輔助任務中還沒學到的資訊 如果主要任務訓練的比輔助任務久, 他也沒辦法知道哪些是從輔助任務學到的 而輔助任務如果overtrain, 也會造成主要任務overtrain, 因為任務間是在hidden layer是互相share representation的 解決方法 最簡單的方法就是先讓所有的任務learning rate都相同, train一次; 然後看哪個任務收斂的比較快, 就降低它的learning tate, 再train一次, 重複做個幾次就可以讓所有任務大概在同一個時間收斂 10 R. Caruana (1997). Multitask learning. Machine Learning, 28(1), 41–75.

training experience for each of these tasks, and 1997 Learning to learn Given a family of tasks training experience for each of these tasks, and a family of performance measures (e.g., one for each task), an algorithm is said to learn to learn if its performance at each task improves with experience and with the number of tasks Put differently, a learning algorithm whose performance does not depend on the number of learning tasks, which hence would not benefit from the presence of other learning tasks, is not said to learn to learn. For an algorithm to fit this definition, some kind of transfer must occur between multiple tasks that must have a positive impact on expected task-performance. 這篇定義了learning to learn 已知有一堆相關的任務, 每個任務training的經驗, 以及評估效能的方法, 加入每個任務能夠經由其他任務的經驗改善自己的效能, 就叫learning to learn 要注意的是, 效能並不跟相關任務的數量成正比 一個演算法如果符合這個定義, 就表示不同任務之間發生了某些轉換, 這些轉換是有一些正面的影響的 //舉例: 人臉辨識 除非所有人臉都長得一樣, 否則辨識某個人的模型不能直接用在辨識另一個人的 實際上, 可以假設所有的臉部辨識任務共享某些不變性, 例如同一個人的不同表情, 或者臉的不同角度, 跟光線不同的照射角度 如果這些不變性的資訊能夠在不同學習任務中share, 那麼就可以提高辨識的效果 11 S. Thrun and L. Pratt (1997). Learning to Learn. Norwell, MA, USA: Kluwer.

Regularized multi–task learning (1) 2004 Regularized multi–task learning (1) Past empirical work has shown that learning multiple related tasks from data simultaneously can be advantageous in terms of predictive performance relative to learning these tasks independently. In this paper we present an approach to multi–task learning based on the minimization of regularization functionals similar to existing ones, such as the one for Support Vector Machines (SVMs), that have been successfully used in the past for single–task learning. Our approach allows to model the relation between tasks in terms of a novel kernel function that uses a task–coupling parameter. SVM的multi-task都會cite這篇 過去的經驗表示,從資料中同時學習多個相關的任務, 會比單獨學習這些任務還要好 這篇paper提出了基于minimization regularization 方程式的MTL,并以SVM为例, 推導出MTL的学习支持向量机,将MTL与跟单任务学习SVM联系在一起,并给出了详细的求解过程和他们之间的联系,当然实验结果也证明了多任务支持向量机的优势。文中最重要的假设就是所有任务的分界面共享一个中心分界面,然后再次基础上平移,偏移量和中心分界面最终决定了当前任务的分界面。 我們的方法透過一個新的kernel function來model任務之間的相關性 -- 12 T. Evgeniou and M. Pontil(2004). Regularized multi–task learning, In Proc. of the 10th SIGKDD Int’l Conf. on Knowledge discovery and data mining

Regularized multi–task learning (2) 2004 Regularized multi–task learning (2) When there are relations between the tasks to learn, it can be advantageous to learn all tasks simultaneously instead of following the more traditional approach of learning each task independently of the others. There has been a lot of experimental work showing the benefits of such multi–task learning relative to individual task learning when tasks are related, see [*]. [*] B. Bakker and T. Heskes. Task clustering and gating for Bayesian multi–task learning. JMLR, 4: 83–99, 2003. [*] R. Caruana. Multi–Task Learning. Machine Learning, 28, p. 41–75, 1997. [*] T. Heskes. Empirical Bayes for learning to learn. Proceedings of ICML–2000, ed. Langley, P., pp. 367–374, 2000. [*] S. Thrun and L. Pratt. Learning to Learn. Kluwer Academic Publishers, 1997. In this paper we develop methods for multi–task learning that are natural extensions of existing kernel based learning methods for single task learning, such as Support Vector Machines (SVMs). To the best of our knowledge, this is the first generalization of regularization–based methods from single–task to multi–task learning. 13 T. Evgeniou and M. Pontil(2004). Regularized multi–task learning, In Proc. of the 10th SIGKDD Int’l Conf. on Knowledge discovery and data mining

Regularized multi–task learning (3) 2004 Regularized multi–task learning (3) A statistical learning theory based approach to multi–task learning has been developed in [1-3]. [1] J. Baxter. A Bayesian/Information Theoretic Model of Learning to Learn via Multiple Task Sampling. Machine Learning, 28, pp. 7–39, 1997. [2] J. Baxter. A Model for Inductive Bias Learning. Journal of Artificial Intelligence Research, 12, p. 149–198, 2000. [3] S. Ben-David and R. Schuller, Exploiting Task Relatedness for Multiple Task Learning, COLT, 2003. The problem of multi–task learning has been also studied in the statistics literature [4-5]. [4] L. Breiman and J.H Friedman. Predicting Multivariate Responses in Multiple Linear Regression. Royal Statistical Society Series B, 1998. [5] P.J. Brown and J.V. Zidek. Adaptive Multivariate Ridge Regression. The Annals of Statistics, Vol. 8, No. 1, p. 64–74, 1980. Finally, a number of approaches for learning multiple tasks or for learning to learn are Bayesian, where a probability model capturing the relations between the different tasks is estimated simultaneously with the models’ parameters for each of the individual tasks[6-9]. [6] G.M. Allenby and P.E. Rossi. Marketing Models of Consumer Heterogeneity. Journal of Econometrics, 89, p. 57–78, 1999. [7] N. Arora G.M Allenby, and J. Ginter. A Hierarchical Bayes Model of Primary and Secondary Demand. Marketing Science, 17,1, p. 29–44, 1998 [8] B. Bakker and T. Heskes. Task clustering and gating for Bayesian multi–task learning, JMLR, 4: 83–99, 2003 [9] T. Heskes. Empirical Bayes for learning to learn. Proceedings of ICML–2000, ed. Langley, P., pp. 367–374, 2000 那么VC Dimension本质上到底是什么呢?自由度的概念,体现在我们能够包含多少feature w,能够有多少假设H的数量,以及我们最终的分类能力Dvc,也就是说Dvc本质上大体上是H的分类能力,同样可以理解为feature的个数,假设的个数,因为它们都是互相成正比的。 14 T. Evgeniou and M. Pontil(2004). Regularized multi–task learning, In Proc. of the 10th SIGKDD Int’l Conf. on Knowledge discovery and data mining

Convex multitask feature learning 2008 Convex multitask feature learning We study the problem of learning data representations that are common across multiple related supervised learning tasks. This is a problem of interest in many research areas In this paper, we present a novel method for learning sparse representations common across many supervised learning tasks. In particular, we develop a novel non-convex multi-task generalization of the 1-norm regularization known to provide sparse variable selection in the single-task case. Our method learns a few features common across the tasks using a novel regularizer which both couples the tasks and enforces sparsity. For example, in computer vision the problem of detecting a specific object in images is treated as a single supervised learning task. Images of different objects may share a number of features that are different from the pixel representation of images 15 A. Argyriou,T. Evgeniou and M. Pontil. Convex multitask feature learning. In MachineLearning, 73(3):243-272, 2008.

Clustered multi-task learning: A convex formulation 2008 Clustered multi-task learning: A convex formulation In multi-task learning several related tasks are considered simultaneously, with the hope that by an appropriate sharing of information across tasks, each task may benefit from the others. In this paper, we assume that tasks are clustered into groups, which are unknown beforehand, and that tasks within a group have similar weight vectors. We design a new spectral norm that encodes this a priori assumption, without the prior knowledge of the partition of tasks into groups, resulting in a new convex optimization formulation for multi-task learning. For example, in computer vision the problem of detecting a specific object in images is treated as a single supervised learning task. Images of different objects may share a number of features that are different from the pixel representation of images 16 L. Jacob, F. Bach, and J. Vert. Clustered multi-task learning: A convex formulation. NIPS, 2008

2010 Multi-Task Learning for Boosting with Application to Web Search Ranking Multi-task learning algorithms aim to improve the performance of several learning tasks through shared models. Previous work focussed primarily on neural networks, k-nearest neighbors[1] and support vector machines[2]. In this paper, we introduce a novel multi-task learning algorithm for gradient boosting. [1] T. Evgeniou and M. Pontil. Regularized multi–task learning. In KDD, pages 109–117, 2004. [2] R. Caruana. Multitask learning. In Machine Learning, pages 41–75, 1997. Figure 1: (Multitask 𝜖-boosting) A layout of 4 ranking tasks that are learned jointly. The four countries symbolize the different ranking functions that need to be learned, where 𝛽 1 , . . . , 𝛽 4 are the parameter vectors that store the specifics of each individual task. The various tasks interact through the joint model, symbolized as a globe with parameter vector 𝛽 0 . For example, in computer vision the problem of detecting a specific object in images is treated as a single supervised learning task. Images of different objects may share a number of features that are different from the pixel representation of images 17 O. Chappelle, P. Shivaswamy and S. Vadrevu, Multi-Task Learning for Boosting with Application to Web Search Ranking, ACM, 2010.

Multitask sparsity via maximum entropy discrimination 2011 Multitask sparsity via maximum entropy discrimination A multitask learning framework is developed for discriminative classification and regression where multiple large-margin linear classifiers are estimated for different prediction problems Most machine learning approaches take a single-task perspective where one large homogeneous repository of uniformly collected iid (independent and identically distributed) samples is given and labeled consistently. A more realistic, multitask learning approach is to combine data from multiple smaller sources and synergistically leverage heterogeneous labeling or annotation efforts. feature selection, kernel selection, adaptive pooling and graphical model structure 2011JMLR的paper提到 MTL的framework是為了判別式的分類或regression設計的 大多數machine learning方法都是由單任務的角度來看, 就是假設這些樣本互相獨立且同分佈(iid), 而且有label 更實際一點,MTL的方法是要結合多個較小的資料(來源),並一起leverage這些混合在一起的label資料。 这篇文章可以看作是比较全面的总结性文章,文中总共讨论了四种情况,feature selection, kernel selection, adaptive pooling 跟 graphical model structure。 并详细介绍了四种多任务学习方法。 --- i.i.d. 就是identical independent distributed > 就是這些取到的data的distrbutions都一模一樣且互相獨立 關於 "independent" 的概念, 我 再補充一下: independent 的概念在實作上就是: 取下一個觀測值時, 並不因前面已取得的觀測值而改變方法或改變取樣的群體 特性。 例如: 在抽獎中已抽出的籤不再放回, 因此前面抽到的有 沒有中獎, 會影響後抽者的 (條件) 機率。 若抽出之籤再放回, 或每個人抽的籤箱各不相同, 則前面 的人抽中甚麼, 不會影響到後抽的人。 這樣的結果就會是 「獨立的」。 18 T. Jebara (2011). Multitask Sparsity via Maximum Entropy Discrimination. In Journal of Machine Learning Research, (12):75-110.

Learning task grouping and overlap in multi-task learning (1) 2012 Learning task grouping and overlap in multi-task learning (1) The key aspect in all multi-task learning methods is the introduction of an inductive bias in the joint hypothesis space of all tasks that reflects our prior beliefs about task relatedness structure. Assumptions that task parameters lie close to each other in some geometric sense[1] or parameters share a common prior[2][3][4] or they lie in a low dimensional subspace[1] or on a manifold[5] are some examples of introducing an inductive bias in the hope of achieving better generalization. [1] A. Argyriou,T. Evgeniou and M. Pontil. Convex multitask feature learning. In Machine Learning, 73(3):243-272, 2008. [2] Yu, Kai, Tresp, Volker, and Schwaighofer, Anton. Learning Gaussian Processes from Multiple Task. In ICML, 2005. [3] Lee, S.I., Chatalbashev, V., Vickrey, D., and Koller, D. Learning a meta-level prior for feature relevance from multiple related tasks. In ICML, 2007 [4] Daum´e III, Hal. Bayesian Multitask Learning with Latent Hierarchies. In UAI, 2009. [5] Agarwal, Arvind, Daum´e III, Hal, and Gerber, Samuel. Learning Multiple Tasks using Manifold Regularization. In NIPS, 2010. A major challenge in multi-task learning is how to selectively screen the sharing of information so that unrelated tasks do not end up influencing each other. The key aspect in all multi-task learning methods is the introduction of an inductive bias in the joint hypothesis space of all tasks that reflects our prior beliefs about task relatedness structure. 所有的MTL方法的關鍵是, 它假設一個joint假設空間, 相關的任務之間其實會有一個相關的基底, 透過不同的偏移量就可以adapt到不同的task。 不同的研究對任務參數有不同的假設, 像是1這篇假設在低維度的子空間中上, 任務的參數其實互相躺在旁邊(很靠近) 234這三篇假設參數會共享一個prior 最大的挑戰 = 如何選擇性地屏蔽訓息的共享,以便不相關的任務,最終不會互相影響 19 A. Kumar, H. Daume(2012). Learning Task Grouping and Overlap in Multi-Task Learning, the 29 th International Conference on Machine Learning.

Learning task grouping and overlap in multi-task learning (2) 2012 Learning task grouping and overlap in multi-task learning (2) Sharing information between two unrelated tasks can worsen the performance of both tasks. This phenomenon is also known as negative transfer. We propose a framework for multi-task learning that enables one to selectively share the information across the tasks. We assume that each task parameter vector is a linear combination of a finite number of underlying basis tasks. Our model is based on the assumption that task parameters within a group lie in a low dimensional subspace but allows the tasks in different groups to overlap with each other in one or more bases. 在兩個不相關的任務共享資訊可能會比個別訓練兩個任務的的效果還差, 這個現象叫做negative transfer 我們提出了一個MTL的framework, 讓任務可以選擇性地共享或不共享, 假設每個任務的參數向量是透過無限多個底層的基底任務線性組合而成, 他們提出的model是基於一個假設, 就是group內的任務的參數會在一個低維度的子空間, 但是不同group之間會有overlap 20 A. Kumar, H. Daume(2012). Learning Task Grouping and Overlap in Multi-Task Learning, the 29 th International Conference on Machine Learning.

2015 Joint acoustic modeling of triphones and trigraphemes by multi-task learning deep neural networks for low-resource speech recognition It is well-known in machine learning that multitask learning (MTL) can help improve the generalization performance of singly learning tasks if the tasks being trained in parallel are related, especially when the amount of training data is relatively small. In this paper, we investigate the estimation of triphone acoustic models in parallel with the estimation of trigrapheme acoustic models under the MTL framework using deep neural network (DNN). 這篇paper透過DNN來做multitask learning, 分成triphone與trigrapheme的任務 D. Chen, C. Leung , Joint acoustic modeling of triphones and trigraphemes by multi-task learning deep neural networks for low-resource speech recognition, ICASSP, 2015. 21

THANK YOU! 22 1