DeepPath 周天烁 2018.04.04
Outline Introduction Reinforcement Learning recap Methodology Modeling Training Experiment Conclusion
Introduction (h,?,t) (h,r,?)
Reinforcement Learning recap Markov Decision Processes (MDP) М = <S, A, T, R> S : state space A : action space T : transition R : reward
Methodology —— Modeling М = < ?, ?, ?, ? >
Methodology —— Modeling М = <S, A, T, R> States Actions Transition Reward Global accuracy Path efficiency Path diversity ?
Methodology —— Training Target Function expected total rewards Supervised Policy Learning Retraining with Rewards
Experiment —— setup Dataset Tasks Metric FB15K-237 NELL-995 link prediction (h , r , ?) fact prediction (h , ? , t) Metric MAP (Mean Average Precision) 例如:假设有两个主题,主题1有4个相关网页,主题2有5个相关网页。某系统对 于主题1检索出4个相关网页,其rank分别为1, 2, 4, 7;对于主题2检索出3个相关网 页,其rank分别为1,3,5。对于主题1,平均准确率为(1/1+2/2+3/4+4/7)/4=0.83。对 于主题2,平均准确率为(1/1+2/3+3/5+0+0)/5=0.45。 则MAP= (0.83+0.45)/2=0.64。 取值 [ 0 , 1 ]
Experiment —— result
Experiment —— example reasoning paths
Conclusion pros cons Novel Code public Selective experiment : didn’t cover the dataset Baseline too old Time consuming
Reference Wenhan Xiong, Thien Hoang, and William Yang Wang. 2017. Deeppath: A reinforcement learning method for knowledge graph reasoning Yang, Fan, Zhilin Yang, and William W. Cohen. http://www.cs.cmu.edu/~christos/courses/826.F11/FOILS-pdf/992_rwr.pdf https://medium.com/machine-learning-for-humans/reinforcement-learning-6eacf258b265
Q & A