Learn Question Focus and Dependency Relations from Web Search Results for Question Classification 各位老師大家好,這是我今天要報告的論文題目,…… 那在題目上的括號是因為,前陣子我們有投airs的paper,那有reviewer對model的名稱產生意見.

Slides:



Advertisements
Similar presentations
高中英语教材分析与教学建议 福建教育学院外语研修部特级教师:周大明. 课程目录  一、理论创新与教材发展  二、现行教材的理论基础和编写体系  三、图式理论与 “ 话题教学 ”  四、课例分析与教学建议.
Advertisements

公文寫作 演講人 高 文 民.
2014 年上学期 湖南长郡卫星远程学校 制作 13 Getting news from the Internet.
Presented By: 王信傑 Ricky Wang Date:2010/10/6
加油添醋話擴寫 日新國小 鄒彩完.
2007年8月龙星课程 周源源老师课程体会 包云岗 中科院计算所
-CHINESE TIME (中文时间): Free Response idea: 你周末做了什么?
刘立明 江南大学生物工程学院环境生物技术室
實證護理之應用 Literature Search 文獻搜尋
Classification of Web Query Intent Using Encyclopedia 基于百科知识的查询意图获取
Business English Reading
2012 年下学期 湖南长郡卫星远程学校 制作 13 Unit 4 The next step 年下学期 湖南长郡卫星远程学校 制作 13 Discussion Which university do you want to study at? Have you thought carefully.
How can we become good leamers
寫教案—教學設計的格式與規範 林 進 材 台南大學教育系教授
Chapter 29 English Learning Strategy Of High School Students
Homework 2 : VSM and Summary
書報討論課程實施要點說明 Implementation Instructions for Seminar Curriculum
云实践引导产业升级 沈寓实 博士 教授 MBA 中国云体系产业创新战略联盟秘书长 微软云计算中国区总监 WinHEC 2015
Leftmost Longest Regular Expression Matching in Reconfigurable Logic
Semantic-Synaptic Web Mining: A Novel Model for Improving the Web Mining 報告者:陳宜樺 報告日期:2015/9/25.
Operating System CPU Scheduing - 3 Monday, August 11, 2008.
The Construction of a Chinese Named Entity Tagged Corpus: CNEC1.0
A Question Answering Approach to Emotion Cause Extraction
Unit title: 爱好 Hobbies Area of interaction focus Significant concepts
Some Effective Techniques for Naive Bayes Text Classification
Improving classification models with taxonomy information
毕业论文报告 孙悦明
NLP Group, Dept. of CS&T, Tsinghua University
Manifold Learning Kai Yang
Unit title: 买东西 - Shopping
Source: IEEE Access, vol. 5, pp , October 2017
Department of Computer Science & Information Engineering
加油添醋話擴寫 鄒彩完.
文字探勘與知識工程 Text Mining & Knowledge Engineering
Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi
971研究方法課程第九次上課 認識、理解及選擇一項適當的研究策略
This Is English 3 双向视频文稿.
Formal Pivot to both Language and Intelligence in Science
药物和疾病啥关系 ? 李智恒.
十七課 選課(xuǎn kè) 十七课 选课(xuǎn kè)
基于课程标准的校本课程教学研究 乐清中学 赵海霞.
研究經驗與趨勢分享 黃悅民 Department of Engineering Science,
—— 周小多.
谈模式识别方法在林业管理问题中的应用 报告人:管理工程系 马宁 报告地点:学研B107
数据摘要现状调研报告 上下文摘要初步思考 徐丹云.
Unit title: 买东西 - Shopping
Answering aggregation question over knowledge base
研究技巧與論文撰寫方法 中央大學資管系 陳彥良.
高性能计算与天文技术联合实验室 智能与计算学部 天津大学
BORROWING SUBTRACTION WITHIN 20
水源保育與回饋制度論壇 水源保育與回饋制度現況與癥結 報告人:陳起鳳 助理教授 主辦單位:臺北科技大學水環境研究中心
中央社新聞— <LTTC:台灣學生英語聽說提升 讀寫相對下降>
Review and Analysis of the Usage of Degree Adverbs
Representation Learning of Knowledge Graphs with Hierarchical Types
Google Local Search API Research and Implementation
A Data Mining Algorithm for Generalized Web Prefetching
Distance Vector vs Link State
PBL的核心目標與實例分享 國立台南大學 蔣佳玲.
Efficient Query Relaxation for Complex Relationship Search on Graph Data 李舒馨
Paper Prototyping Michael Tsai 2011/10/14.
钱炘祺 一种面向实体浏览中属性融合的人机交互的设计与实现 Designing Human-Computer Interaction of Property Consolidation for Entity Browsing 钱炘祺
Speaker : YI-CHENG HUNG
參考資料: 林秋燕 曾元顯 卜小蝶,Chap. 1、3 Chowdhury,Chap.9
Distance Vector vs Link State Routing Protocols
Chapter 9 Validation Prof. Dehan Luo
二项式的分解因式 Factoring binomials
My favorite subject science.
WiFi is a powerful sensing medium
Homework 2 : VSM and Summary
Gaussian Process Ruohua Shi Meeting
Presentation transcript:

Learn Question Focus and Dependency Relations from Web Search Results for Question Classification 各位老師大家好,這是我今天要報告的論文題目,…… 那在題目上的括號是因為,前陣子我們有投airs的paper,那有reviewer對model的名稱產生意見 所以待會再請老師們給一些建議 Wen-Hsiang Lu (盧文祥) whlu@mail.ncku.edu.tw Web Mining and Multilingual Knowledge System Laboratory, Department of Computer Science and Information Engineering, National Cheng Kung University 2019/4/25 WMMKS Lab

Research Interest Web Mining Natural Information Language Retrieval Processing Information Retrieval 這是我今天的outline 2019/4/25 WMMKS Lab

Research Issues Unknown Term Translation & Cross-Language Information Retrieval A Multi-Stage Translation Extraction Method for Unknown Terms Using Web Search Results Question Answering & Machine Translation Using Web Search Results to Learn Question Focus and Dependency Relations for Question Classification Using Phrase and Fluency to Improve Statistical Machine Translation User Modeling & Web Search Learning Question Structure based on Website Link Structure to Improve Natural Language Search Improving Short-Query Web Search based on User Goal Identification Cross-Language Medical Information Retrieval MMODE: http://mmode.no-ip.org/ 我們直接開始介紹introduction 2019/4/25 WMMKS Lab

雅各氏症候群 這是我今天的outline 2019/4/25 WMMKS Lab

Outline Introduction Related Work Approach Experiment Conclusion Future Work 這是我今天的outline 2019/4/25 WMMKS Lab

Outline Introduction Related Work Approach Experiment Conclusion Future Work 這是我今天的outline 2019/4/25 WMMKS Lab

Question Answering (QA) System 一開始問題進來系統,透過問題分析後,可以得到一些問題資訊 然後再透過第二步驟的資訊擷取,從文件集中擷取相關的文件 最後再透過第三步驟答案擷取,從相關文件中擷取出最後的答案 1. Question Analysis: Question Classification, Keywords Extraction. 2. Document Retrieval: Retrieve related documents. 3. Answer Extraction: Extract a exact answer. 2019/4/25 WMMKS Lab

Motivation (1/3) Importance of Question Classification Dan Moldovan proposed a report [Dan Moldovan 2000] 那為什麼我們會想做這一個題目… 在2000年,moldovan提出過一份關於qa系統分析的報告,裡面提及造成qa系統最後結果錯誤的影響,問題分類導致的錯誤率高達36.4 所以這邊讓我們了解到,問題分類雖然只是在qa系統中的一開始,但是它是非常重要的 2019/4/25 WMMKS Lab

Motivation (2/3) Rule-based Question Classification Manual and unrealistic method. Machine Learning-based Question Classification Support Vector Machine (SVM) . Need a large number of training data. . Too many features may be noise. 另外,我們看到過去一些研究人員針對問題分類提出的方法… 最傳統的就是以rule-based的分類方法為主,這個分法需要靠人工去分析大量的問題,人工整理出很多條規則,耗時間耗人力,而且也不是很實際 那近年來,我們目前看到最普遍的是,利用機器學習的方式來做問題分類 譬如svm,svm在分類效果上確實可以達到很不錯的效果,但是它比須依靠大量的問題來當training data 如果training data不夠充足,可能會導致結果不好 另外一般使用svm來解決問題分類的方法,都是儘可能的擷取問題中的所有特徵,有時候過多的特徵,可能也會導致雜訊產生 2019/4/25 WMMKS Lab

Motivation (3/3) A new method for question classification. Observe some useful features of question. Solve the problem of insufficient training data. 所以我們希望可以實際去觀察問題中有哪些可用的特徵,並且提出一個跟以往不大一樣的問題分類技術,也可以解決training data不足的問題 2019/4/25 WMMKS Lab

Idea of Approach (1/4) Many questions have ambiguous question words Importance of Question Focus (QF). Use QF identification for question classification. 我們把問題分成兩種, 一種是以rule就可以解決的問題分類,這些問題都會包含一些可以直接用來判斷類型的問題字 但是仍然有很多問題是包含混淆不清問題字 譬如…這兩個例子… 所以在這邊,我們可以了解到,對於問題分類,qf在問題中佔有著一個很重要的角色 所以我們希望可以透過qf來解決問題分類… 2019/4/25 WMMKS Lab

Dependency Quantifier Idea of Approach (2/4) If we do not have enough information to identify the type of QF. Question QF Dependency Verb Dependency Quantifier Dependency Noun Question Type 但是我們可能沒有足夠資訊來辨識qf的類型… 所以這時候我們認為,在一個問題中,除了qf,也許還有其他可用的特徵,可以拿來協助問題分類 在這邊,我們定義了三種相依詞類… : Dependency Features : Question Type : (Unigram) Semantic Dependency Relation : (Bigram) Semantic Dependency Relation 2019/4/25 WMMKS Lab

Idea of Approach (3/4) Example 以這個例子來說… 2019/4/25 WMMKS Lab

Idea of Approach (4/4) Use QF and dependency features to classify questions. Learning QF and other dependency features from Web. Propose a Semantic Dependency Relation Model (SDRM). 所以我們希望可以透過qf和其他三種相依特徵,來解決問題分類的問題… 並且透過豐物的網路資源來訓練這些我們想使用的特徵… 以解決過去很多方法會面臨到的訓練資源不足的問題… 2019/4/25 WMMKS Lab

Outline Introduction Related Work Approach Experiment Conclusion Future Work 2019/4/25 WMMKS Lab

Rule-based Question Classification [Richard F. E. Sutcliffe 2005][Kui-Lam Kwok 2005][Ellen Riloff 2000] 5W(Who, When, Where, What, Why) Who → Person. When → Time. Where → Location. What → Difficult type. Why → Reason. Rule-based的問題分類方法是最傳統的方法, 也就是簡單利用一些patterm來分類,譬如… 2019/4/25 WMMKS Lab

Machine Learning-based Question Classification Several methods based on SVM. [Zhang, 2003; Suzuki, 2003; Day, 2005] Question KDAG Kernel Feature Vector SVM Question Type 機器學習問題分類方法,是我們目前看到最多人使用的方法 以jun suzuki在2003年提出的方法來說… 他們利用一個現成的語法結構分析工具,擷取問題的語法結構當作要給svm使用的特徵 譬如…. 2019/4/25 WMMKS Lab

Web-based Question Classification Use a Web search engine to identify question type. [Solorio, 2004] “Who is the President of the French Republic?” Web的問題分類方法… 在thamar2004年有提出… 他們其實最後也是利用svm在處理分類 但是他有利用web來擷取要給svm的特徵 以這個例子來說…. 2019/4/25 WMMKS Lab

Statistics-based Question Classification Language Model for Question Classification [Li, 2002] Too many features may be noise. 最後這個統計式的問題分類方式… 跟以往的分類方法比較不一樣 它是利用lm來處理分類,不過他有幾個缺點…. 2019/4/25 WMMKS Lab

Outline Introduction Related Work Approach Experiment Conclusion Future Work 2019/4/25 WMMKS Lab

Architecture of Question Classification 這是我們在這篇論文提出的分類技術架構… 主要有兩個stage,第一個是…第二個是… 2019/4/25 WMMKS Lab

Question Type 6 types of questions Person Location Organization Number Date Artifact 2019/4/25 WMMKS Lab

Basic Classification Rules We define 17 basic rules for simple questions. 一開始第一個stage的基本分類規則…. 因為不是本文的重點,所以我們簡單定義了17條規則…譬如…. 2019/4/25 WMMKS Lab

Dependency Features (1/3) Learning Semantic Dependency Features (1/3) Architecture for Learning Dependency Features Extracting Dependency Features Algorithm 接下來,我們開始介紹我們如何學習這些我們所要使用的特徵… 這部份分為學習架構的介紹和我們簡單提出一個擷取相依特徵的演算法 2019/4/25 WMMKS Lab

Dependency Features (2/3) Learning Semantic Dependency Features (2/3) Architecture for Learning Dependency Features 這是我們的學習架構,一開始透過一些seeds,從網路中擷取搜尋結果,然後在從這些結果中擷取我們所要的特徵qf,dv,dq,dn 虛線框框裡是代表我們希望可以透過boostrapping的方式來處理 以person來說,我們希望一開始可以僅輸入少數的人名當seeds,然後從網路擷取很多qf後,再用這些qf繼續從google擷取更多人名 以至於可以一直持續收到更多的qf,dv,dq和dn 2019/4/25 WMMKS Lab

Dependency Features (3/3) Learning Semantic Dependency Features (3/3) Extracting Dependency Features Algorithm . . 2019/4/25 WMMKS Lab

Identification Algorithm (1/2) Question Focus Identification Algorithm (1/2) Algorithm 我們前面有提到,我們認為對於問題分類,問題中的qf佔有一個非常重要的角色 所以在這邊,我們提出一個辨識qf的演算法,希望儘可能可以正確的擷取出問題的qf 2019/4/25 WMMKS Lab

Identification Algorithm (2/2) Question Focus Identification Algorithm (2/2) Example 2019/4/25 WMMKS Lab

Relation Model (SDMR) (1/12) Semantic Dependency Relation Model (SDMR) (1/12) Unigram-SDRM Bigram-SDRM 接下來,我們要介紹我們所提出的一個model…sdrm 2019/4/25 WMMKS Lab

Relation Model (SDMR) (2/12) Semantic Dependency Relation Model (SDMR) (2/12) Unigram-SDRM P(C|Q) Q C Question Question Type P(C|Q) need many questions to train. 一開始,我們想要從問題推到它的類型,所以機率表示成… 但是這個機率表示法需要大量的問題去訓練,而我們現在沒有那麼多現成的問題可以訓練… 所以我們把機率透過貝氏轉換成這個樣子…. 加上對於每一個類型,我們假設機率一致,所以省略之… 每一個問題對於每一個類型,也是一致,所以我們也省略之… 最後轉換成…. 2019/4/25 WMMKS Lab

Relation Model (SDMR) (3/12) Semantic Dependency Relation Model (SDMR) (3/12) Unigram-SDRM P(DC|C) P(Q|DC) C DC Q Question Type Web search result Question P(DC|C): Collect related search results by every type. P(Q|DC): Use DC to determine the question type. 因為我們是透過網路資源學習問題中的特徵… 所以再這邊我們引入一個web search results 2019/4/25 WMMKS Lab

Relation Model (SDRM) (4/12) Semantic Dependency Relation Model (SDRM) (4/12) Unigram-SDRM 因為dc是透過c來得到的,所以在這邊我們假設dc包含c的意思,所以把c省略之 2019/4/25 WMMKS Lab

Relation Model (SDRM) (5/12) Semantic Dependency Relation Model (SDRM) (5/12) DV : Dependency Verb DQ: Dependency Quantifier DN: Dependency Noun Unigram-SDRM Q={QF,QD}, QD={DV,DQ,DN}. 2019/4/25 WMMKS Lab

Relation Model (SDRM) (6/12) Semantic Dependency Relation Model (SDRM) (6/12) Unigram-SDRM DV={ dv1, dv2,⋯,dvi}, DQ={ dq1, dq2,⋯, dqj}, DN={ dn1, dn2,⋯, dnk}. 2019/4/25 WMMKS Lab

Relation Model (SDRM) (7/12) Semantic Dependency Relation Model (SDRM) (7/12) Parameter Estimation of Unigram-SDRM P(DC|C) P(QF |DC), P(dv|DC), P(dq|DC), P(dn|DC) 第一個…因為在本篇論文中,我們只用到web search result一組資源來當訓練語料… 所以我們假設對於每個類型,都是一致的,所以省略之…. N(QF): The number of occurrence of the QF in Q. NQF(DC): Total number of all QF collected from search results. 2019/4/25 WMMKS Lab

Relation Model (SDRM) (8/12) Semantic Dependency Relation Model (SDRM) (8/12) Parameter Estimation of Unigram-SDRM 2019/4/25 WMMKS Lab

Relation Model (SDRM) (9/12) Semantic Dependency Relation Model (SDRM) (9/12) Bigram-SDRM 2019/4/25 WMMKS Lab

Relation Model (SDRM) (10/12) Semantic Dependency Relation Model (SDRM) (10/12) Bigram-SDRM 2019/4/25 WMMKS Lab

Relation Model (SDRM) (11/12) Semantic Dependency Relation Model (SDRM) (11/12) Parameter Estimation of Bigram-SDRM P(DC|C): The same as Unigram-SDRM P(QF|DC): The same as Unigram-SDRM P(dV|QF,DC), P(dQ|QF,DC), P(dN|QF,DC) Nsentence(dv,QF): The number of sentence containing dv and QF. Nsentence(QF): Total number of sentence containing QF. 2019/4/25 WMMKS Lab

Relation Model (SDRM) (12/12) Semantic Dependency Relation Model (SDRM) (12/12) Parameter Estimation of Bigram-SDRM 2019/4/25 WMMKS Lab

Outline Introduction Related Work Approach Experiment Conclusion Future Work 2019/4/25 WMMKS Lab

Experiment SDRM Performance Evaluation SDRM v.s. Language Model . Unigram-SDRM v.s. Bigram-SDRM . Combination with different weights SDRM v.s. Language Model . Use questions as training data . Use Web as training data . Questions v.s. Web 2019/4/25 WMMKS Lab

Experimental Data Collect questions from NTCIR-5 CLQA. 4-fold cross-validation. 2019/4/25 WMMKS Lab

Unigram-SDRM v.s. Bigram-SDRM Result 2019/4/25 WMMKS Lab

Unigram-SDRM v.s. Bigram-SDRM (2/2) Example For unigram: “人”,”創下”,”駕駛” are trained successfully. For bigram: “人_創下” are not trained successfully. 2019/4/25 WMMKS Lab

Combination with different weight (1/3) Different weights for different features α: The weight of QF, β: The weight of dV, γ: The weight of dQ, δ: The weight of dN. 2019/4/25 WMMKS Lab

Combination with different weight (2/3) Comparison of 4 dependency features 2019/4/25 WMMKS Lab

Combination with different weight (3/3) 16 experiments Best weighting: 0.23QF, 0.29DV, 0.48DQ. To solve some problem about mathematics. Example: QF and DV α: The weight of QF β: The weight of DV. α=(1-0.77)/[(1-0.77)+(1-0.71)] β=(1-0.71)/ [(1-0.77)+(1-0.71)] 2019/4/25 WMMKS Lab

Use questions as training data (1/2) Result 2019/4/25 WMMKS Lab

Use questions as training data (2/2) Example For LM: “網球選手”,”選手為” are not trained successfully. For SDRM: “選手”, ”奪得” are trained successfully. 2019/4/25 WMMKS Lab

Use Web search results as training data (1/2) 2019/4/25 WMMKS Lab

Use Web search results as training data (2/2) Example For LM: “何國” are not trained successfully. For SDRM: “國”, ”設於” are trained successfully. 2019/4/25 WMMKS Lab

Question v.s. Web (1/3) Result Trained Question: LM can train QF of the question. Untrained Question: LM can’t train QF of the question. 2019/4/25 WMMKS Lab

Question v.s. Web (2/3) Example of trained question For LM: “何地” are trained successfully. For SDRM: “地”, ”舉行” are trained successfully, but these terms are also trained on other types. 2019/4/25 WMMKS Lab

Question vs. Web (3/3) Example of untrained question For LM: “女星”, ”獲得” are not trained successfully. For SDRM: “女星”, ”獲得” are trained successfully. 2019/4/25 WMMKS Lab

Conclusion Discussion Conclusion We need to enhance our learning method and performance. We need better smoothing method. Conclusion We propose a new model SDRM which uses question focus and dependency features for question classification. Use Web search results as training data to solve the problem of insufficient training data. 2019/4/25 WMMKS Lab

Future Work Further works in the future Enhance the performance of learning method. Consider the importance of features in the question. Question focus and dependency features may be used for other process steps of question answer systems. 2019/4/25 WMMKS Lab

Thank You 2019/4/25 WMMKS Lab