Learn Question Focus and Dependency Relations from Web Search Results for Question Classification 各位老師大家好,這是我今天要報告的論文題目,…… 那在題目上的括號是因為,前陣子我們有投airs的paper,那有reviewer對model的名稱產生意見.

Slides:

Advertisements

Similar presentations

高中英语教材分析与教学建议福建教育学院外语研修部特级教师：周大明. 课程目录  一、理论创新与教材发展  二、现行教材的理论基础和编写体系  三、图式理论与 “ 话题教学 ”  四、课例分析与教学建议.

Advertisements

公文寫作演講人高文民.

2014 年上学期湖南长郡卫星远程学校制作 13 Getting news from the Internet.

Presented By: 王信傑 Ricky Wang Date:2010/10/6

加油添醋話擴寫日新國小鄒彩完.

2007年8月龙星课程周源源老师课程体会包云岗中科院计算所

-CHINESE TIME (中文时间): Free Response idea: 你周末做了什么？

刘立明江南大学生物工程学院环境生物技术室

實證護理之應用 Literature Search 文獻搜尋

Classification of Web Query Intent Using Encyclopedia 基于百科知识的查询意图获取

Business English Reading

2012 年下学期湖南长郡卫星远程学校制作 13 Unit 4 The next step 年下学期湖南长郡卫星远程学校制作 13 Discussion Which university do you want to study at? Have you thought carefully.

How can we become good leamers

寫教案—教學設計的格式與規範林進材台南大學教育系教授

Chapter 29 English Learning Strategy Of High School Students

Homework 2 : VSM and Summary

書報討論課程實施要點說明 Implementation Instructions for Seminar Curriculum

云实践引导产业升级沈寓实博士教授 MBA 中国云体系产业创新战略联盟秘书长微软云计算中国区总监 WinHEC 2015

Leftmost Longest Regular Expression Matching in Reconfigurable Logic

Semantic-Synaptic Web Mining: A Novel Model for Improving the Web Mining 報告者：陳宜樺報告日期：2015/9/25.

Operating System CPU Scheduing - 3 Monday, August 11, 2008.

The Construction of a Chinese Named Entity Tagged Corpus: CNEC1.0

A Question Answering Approach to Emotion Cause Extraction

Unit title: 爱好 Hobbies Area of interaction focus Significant concepts

Some Effective Techniques for Naive Bayes Text Classification

Improving classiﬁcation models with taxonomy information

毕业论文报告孙悦明

NLP Group, Dept. of CS&T, Tsinghua University

Manifold Learning Kai Yang

Unit title: 买东西 - Shopping

Source: IEEE Access, vol. 5, pp , October 2017

Department of Computer Science & Information Engineering

加油添醋話擴寫鄒彩完.

文字探勘與知識工程 Text Mining & Knowledge Engineering

Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi

971研究方法課程第九次上課認識、理解及選擇一項適當的研究策略

This Is English 3 双向视频文稿.

Formal Pivot to both Language and Intelligence in Science

药物和疾病啥关系？李智恒.

十七課選課(xuǎn kè) 十七课选课(xuǎn kè)

基于课程标准的校本课程教学研究乐清中学赵海霞.

研究經驗與趨勢分享黃悅民 Department of Engineering Science,

—— 周小多.

谈模式识别方法在林业管理问题中的应用报告人：管理工程系马宁报告地点：学研B107

数据摘要现状调研报告上下文摘要初步思考徐丹云.

Unit title: 买东西 - Shopping

Answering aggregation question over knowledge base

研究技巧與論文撰寫方法中央大學資管系陳彥良.

高性能计算与天文技术联合实验室智能与计算学部天津大学

BORROWING SUBTRACTION WITHIN 20

水源保育與回饋制度論壇水源保育與回饋制度現況與癥結報告人：陳起鳳助理教授主辦單位：臺北科技大學水環境研究中心

中央社新聞— ＜LTTC：台灣學生英語聽說提升讀寫相對下降＞

Review and Analysis of the Usage of Degree Adverbs

Representation Learning of Knowledge Graphs with Hierarchical Types

Google Local Search API Research and Implementation

A Data Mining Algorithm for Generalized Web Prefetching

Distance Vector vs Link State

PBL的核心目標與實例分享國立台南大學蔣佳玲.

Efficient Query Relaxation for Complex Relationship Search on Graph Data 李舒馨

Paper Prototyping Michael Tsai 2011/10/14.

钱炘祺一种面向实体浏览中属性融合的人机交互的设计与实现 Designing Human-Computer Interaction of Property Consolidation for Entity Browsing 钱炘祺

Speaker : YI-CHENG HUNG

參考資料：林秋燕曾元顯卜小蝶，Chap. 1、3 Chowdhury，Chap.9

Distance Vector vs Link State Routing Protocols

Chapter 9 Validation Prof. Dehan Luo

二项式的分解因式 Factoring binomials

My favorite subject science.

WiFi is a powerful sensing medium

Homework 2 : VSM and Summary

Gaussian Process Ruohua Shi Meeting

Presentation transcript:

Learn Question Focus and Dependency Relations from Web Search Results for Question Classification 各位老師大家好,這是我今天要報告的論文題目,…… 那在題目上的括號是因為,前陣子我們有投airs的paper,那有reviewer對model的名稱產生意見所以待會再請老師們給一些建議 Wen-Hsiang Lu (盧文祥) whlu@mail.ncku.edu.tw Web Mining and Multilingual Knowledge System Laboratory, Department of Computer Science and Information Engineering, National Cheng Kung University 2019/4/25 WMMKS Lab

Research Interest Web Mining Natural Information Language Retrieval Processing Information Retrieval 這是我今天的outline 2019/4/25 WMMKS Lab

Research Issues Unknown Term Translation & Cross-Language Information Retrieval A Multi-Stage Translation Extraction Method for Unknown Terms Using Web Search Results Question Answering & Machine Translation Using Web Search Results to Learn Question Focus and Dependency Relations for Question Classification Using Phrase and Fluency to Improve Statistical Machine Translation User Modeling & Web Search Learning Question Structure based on Website Link Structure to Improve Natural Language Search Improving Short-Query Web Search based on User Goal Identification Cross-Language Medical Information Retrieval MMODE: http://mmode.no-ip.org/ 我們直接開始介紹introduction 2019/4/25 WMMKS Lab

雅各氏症候群這是我今天的outline 2019/4/25 WMMKS Lab

Outline Introduction Related Work Approach Experiment Conclusion Future Work 這是我今天的outline 2019/4/25 WMMKS Lab

Outline Introduction Related Work Approach Experiment Conclusion Future Work 這是我今天的outline 2019/4/25 WMMKS Lab

Question Answering (QA) System 一開始問題進來系統,透過問題分析後,可以得到一些問題資訊然後再透過第二步驟的資訊擷取,從文件集中擷取相關的文件最後再透過第三步驟答案擷取,從相關文件中擷取出最後的答案 1. Question Analysis: Question Classification, Keywords Extraction. 2. Document Retrieval: Retrieve related documents. 3. Answer Extraction: Extract a exact answer. 2019/4/25 WMMKS Lab

Motivation (1/3) Importance of Question Classification Dan Moldovan proposed a report [Dan Moldovan 2000] 那為什麼我們會想做這一個題目… 在2000年,moldovan提出過一份關於qa系統分析的報告,裡面提及造成qa系統最後結果錯誤的影響,問題分類導致的錯誤率高達36.4 所以這邊讓我們了解到,問題分類雖然只是在qa系統中的一開始,但是它是非常重要的 2019/4/25 WMMKS Lab

Motivation (2/3) Rule-based Question Classification Manual and unrealistic method. Machine Learning-based Question Classification Support Vector Machine (SVM) . Need a large number of training data. . Too many features may be noise. 另外,我們看到過去一些研究人員針對問題分類提出的方法… 最傳統的就是以rule-based的分類方法為主,這個分法需要靠人工去分析大量的問題,人工整理出很多條規則,耗時間耗人力,而且也不是很實際那近年來,我們目前看到最普遍的是,利用機器學習的方式來做問題分類譬如svm,svm在分類效果上確實可以達到很不錯的效果,但是它比須依靠大量的問題來當training data 如果training data不夠充足,可能會導致結果不好另外一般使用svm來解決問題分類的方法,都是儘可能的擷取問題中的所有特徵,有時候過多的特徵,可能也會導致雜訊產生 2019/4/25 WMMKS Lab

Motivation (3/3) A new method for question classification. Observe some useful features of question. Solve the problem of insufficient training data. 所以我們希望可以實際去觀察問題中有哪些可用的特徵,並且提出一個跟以往不大一樣的問題分類技術,也可以解決training data不足的問題 2019/4/25 WMMKS Lab

Idea of Approach (1/4) Many questions have ambiguous question words Importance of Question Focus (QF). Use QF identification for question classification. 我們把問題分成兩種, 一種是以rule就可以解決的問題分類,這些問題都會包含一些可以直接用來判斷類型的問題字但是仍然有很多問題是包含混淆不清問題字譬如…這兩個例子… 所以在這邊,我們可以了解到,對於問題分類,qf在問題中佔有著一個很重要的角色所以我們希望可以透過qf來解決問題分類… 2019/4/25 WMMKS Lab

Dependency Quantifier Idea of Approach (2/4) If we do not have enough information to identify the type of QF. Question QF Dependency Verb Dependency Quantifier Dependency Noun Question Type 但是我們可能沒有足夠資訊來辨識qf的類型… 所以這時候我們認為,在一個問題中,除了qf,也許還有其他可用的特徵,可以拿來協助問題分類在這邊,我們定義了三種相依詞類… : Dependency Features : Question Type : (Unigram) Semantic Dependency Relation : (Bigram) Semantic Dependency Relation 2019/4/25 WMMKS Lab

Idea of Approach (3/4) Example 以這個例子來說… 2019/4/25 WMMKS Lab

Idea of Approach (4/4) Use QF and dependency features to classify questions. Learning QF and other dependency features from Web. Propose a Semantic Dependency Relation Model (SDRM). 所以我們希望可以透過qf和其他三種相依特徵,來解決問題分類的問題… 並且透過豐物的網路資源來訓練這些我們想使用的特徵… 以解決過去很多方法會面臨到的訓練資源不足的問題… 2019/4/25 WMMKS Lab

Outline Introduction Related Work Approach Experiment Conclusion Future Work 2019/4/25 WMMKS Lab

Rule-based Question Classification [Richard F. E. Sutcliffe 2005][Kui-Lam Kwok 2005][Ellen Riloff 2000] 5W(Who, When, Where, What, Why) Who → Person. When → Time. Where → Location. What → Difficult type. Why → Reason. Rule-based的問題分類方法是最傳統的方法, 也就是簡單利用一些patterm來分類,譬如… 2019/4/25 WMMKS Lab

Machine Learning-based Question Classification Several methods based on SVM. [Zhang, 2003; Suzuki, 2003; Day, 2005] Question KDAG Kernel Feature Vector SVM Question Type 機器學習問題分類方法,是我們目前看到最多人使用的方法以jun suzuki在2003年提出的方法來說… 他們利用一個現成的語法結構分析工具,擷取問題的語法結構當作要給svm使用的特徵譬如…. 2019/4/25 WMMKS Lab

Web-based Question Classification Use a Web search engine to identify question type. [Solorio, 2004] “Who is the President of the French Republic?” Web的問題分類方法… 在thamar2004年有提出… 他們其實最後也是利用svm在處理分類但是他有利用web來擷取要給svm的特徵以這個例子來說…. 2019/4/25 WMMKS Lab

Statistics-based Question Classification Language Model for Question Classification [Li, 2002] Too many features may be noise. 最後這個統計式的問題分類方式… 跟以往的分類方法比較不一樣它是利用lm來處理分類,不過他有幾個缺點…. 2019/4/25 WMMKS Lab

Outline Introduction Related Work Approach Experiment Conclusion Future Work 2019/4/25 WMMKS Lab

Architecture of Question Classification 這是我們在這篇論文提出的分類技術架構… 主要有兩個stage,第一個是…第二個是… 2019/4/25 WMMKS Lab

Question Type 6 types of questions Person Location Organization Number Date Artifact 2019/4/25 WMMKS Lab

Basic Classification Rules We define 17 basic rules for simple questions. 一開始第一個stage的基本分類規則…. 因為不是本文的重點,所以我們簡單定義了17條規則…譬如…. 2019/4/25 WMMKS Lab

Dependency Features (1/3) Learning Semantic Dependency Features (1/3) Architecture for Learning Dependency Features Extracting Dependency Features Algorithm 接下來,我們開始介紹我們如何學習這些我們所要使用的特徵… 這部份分為學習架構的介紹和我們簡單提出一個擷取相依特徵的演算法 2019/4/25 WMMKS Lab

Dependency Features (2/3) Learning Semantic Dependency Features (2/3) Architecture for Learning Dependency Features 這是我們的學習架構,一開始透過一些seeds,從網路中擷取搜尋結果,然後在從這些結果中擷取我們所要的特徵qf,dv,dq,dn 虛線框框裡是代表我們希望可以透過boostrapping的方式來處理以person來說,我們希望一開始可以僅輸入少數的人名當seeds,然後從網路擷取很多qf後,再用這些qf繼續從google擷取更多人名以至於可以一直持續收到更多的qf,dv,dq和dn 2019/4/25 WMMKS Lab

Dependency Features (3/3) Learning Semantic Dependency Features (3/3) Extracting Dependency Features Algorithm . . 2019/4/25 WMMKS Lab

Identification Algorithm (1/2) Question Focus Identification Algorithm (1/2) Algorithm 我們前面有提到,我們認為對於問題分類,問題中的qf佔有一個非常重要的角色所以在這邊,我們提出一個辨識qf的演算法,希望儘可能可以正確的擷取出問題的qf 2019/4/25 WMMKS Lab

Identification Algorithm (2/2) Question Focus Identification Algorithm (2/2) Example 2019/4/25 WMMKS Lab

Relation Model (SDMR) (1/12) Semantic Dependency Relation Model (SDMR) (1/12) Unigram-SDRM Bigram-SDRM 接下來,我們要介紹我們所提出的一個model…sdrm 2019/4/25 WMMKS Lab

Relation Model (SDMR) (2/12) Semantic Dependency Relation Model (SDMR) (2/12) Unigram-SDRM P(C|Q) Q C Question Question Type P(C|Q) need many questions to train. 一開始,我們想要從問題推到它的類型,所以機率表示成… 但是這個機率表示法需要大量的問題去訓練,而我們現在沒有那麼多現成的問題可以訓練… 所以我們把機率透過貝氏轉換成這個樣子…. 加上對於每一個類型,我們假設機率一致,所以省略之… 每一個問題對於每一個類型,也是一致,所以我們也省略之… 最後轉換成…. 2019/4/25 WMMKS Lab

Relation Model (SDMR) (3/12) Semantic Dependency Relation Model (SDMR) (3/12) Unigram-SDRM P(DC|C) P(Q|DC) C DC Q Question Type Web search result Question P(DC|C): Collect related search results by every type. P(Q|DC): Use DC to determine the question type. 因為我們是透過網路資源學習問題中的特徵… 所以再這邊我們引入一個web search results 2019/4/25 WMMKS Lab

Relation Model (SDRM) (4/12) Semantic Dependency Relation Model (SDRM) (4/12) Unigram-SDRM 因為dc是透過c來得到的,所以在這邊我們假設dc包含c的意思,所以把c省略之 2019/4/25 WMMKS Lab

Relation Model (SDRM) (5/12) Semantic Dependency Relation Model (SDRM) (5/12) DV : Dependency Verb DQ: Dependency Quantifier DN: Dependency Noun Unigram-SDRM Q={QF,QD}, QD={DV,DQ,DN}. 2019/4/25 WMMKS Lab

Relation Model (SDRM) (6/12) Semantic Dependency Relation Model (SDRM) (6/12) Unigram-SDRM DV={ dv1, dv2,⋯,dvi}, DQ={ dq1, dq2,⋯, dqj}, DN={ dn1, dn2,⋯, dnk}. 2019/4/25 WMMKS Lab

Relation Model (SDRM) (7/12) Semantic Dependency Relation Model (SDRM) (7/12) Parameter Estimation of Unigram-SDRM P(DC|C) P(QF |DC), P(dv|DC), P(dq|DC), P(dn|DC) 第一個…因為在本篇論文中,我們只用到web search result一組資源來當訓練語料… 所以我們假設對於每個類型,都是一致的,所以省略之…. N(QF): The number of occurrence of the QF in Q. NQF(DC): Total number of all QF collected from search results. 2019/4/25 WMMKS Lab

Relation Model (SDRM) (8/12) Semantic Dependency Relation Model (SDRM) (8/12) Parameter Estimation of Unigram-SDRM 2019/4/25 WMMKS Lab

Relation Model (SDRM) (9/12) Semantic Dependency Relation Model (SDRM) (9/12) Bigram-SDRM 2019/4/25 WMMKS Lab

Relation Model (SDRM) (10/12) Semantic Dependency Relation Model (SDRM) (10/12) Bigram-SDRM 2019/4/25 WMMKS Lab

Relation Model (SDRM) (11/12) Semantic Dependency Relation Model (SDRM) (11/12) Parameter Estimation of Bigram-SDRM P(DC|C): The same as Unigram-SDRM P(QF|DC): The same as Unigram-SDRM P(dV|QF,DC), P(dQ|QF,DC), P(dN|QF,DC) Nsentence(dv,QF): The number of sentence containing dv and QF. Nsentence(QF): Total number of sentence containing QF. 2019/4/25 WMMKS Lab

Relation Model (SDRM) (12/12) Semantic Dependency Relation Model (SDRM) (12/12) Parameter Estimation of Bigram-SDRM 2019/4/25 WMMKS Lab

Outline Introduction Related Work Approach Experiment Conclusion Future Work 2019/4/25 WMMKS Lab

Experiment SDRM Performance Evaluation SDRM v.s. Language Model . Unigram-SDRM v.s. Bigram-SDRM . Combination with different weights SDRM v.s. Language Model . Use questions as training data . Use Web as training data . Questions v.s. Web 2019/4/25 WMMKS Lab

Experimental Data Collect questions from NTCIR-5 CLQA. 4-fold cross-validation. 2019/4/25 WMMKS Lab

Unigram-SDRM v.s. Bigram-SDRM Result 2019/4/25 WMMKS Lab

Unigram-SDRM v.s. Bigram-SDRM (2/2) Example For unigram: “人”,”創下”,”駕駛” are trained successfully. For bigram: “人_創下” are not trained successfully. 2019/4/25 WMMKS Lab

Combination with different weight (1/3) Different weights for different features α: The weight of QF, β: The weight of dV, γ: The weight of dQ, δ: The weight of dN. 2019/4/25 WMMKS Lab

Combination with different weight (2/3) Comparison of 4 dependency features 2019/4/25 WMMKS Lab

Combination with different weight (3/3) 16 experiments Best weighting: 0.23QF, 0.29DV, 0.48DQ. To solve some problem about mathematics. Example: QF and DV α: The weight of QF β: The weight of DV. α=(1-0.77)/[(1-0.77)+(1-0.71)] β=(1-0.71)/ [(1-0.77)+(1-0.71)] 2019/4/25 WMMKS Lab

Use questions as training data (1/2) Result 2019/4/25 WMMKS Lab

Use questions as training data (2/2) Example For LM: “網球選手”,”選手為” are not trained successfully. For SDRM: “選手”, ”奪得” are trained successfully. 2019/4/25 WMMKS Lab

Use Web search results as training data (1/2) 2019/4/25 WMMKS Lab

Use Web search results as training data (2/2) Example For LM: “何國” are not trained successfully. For SDRM: “國”, ”設於” are trained successfully. 2019/4/25 WMMKS Lab

Question v.s. Web (1/3) Result Trained Question: LM can train QF of the question. Untrained Question: LM can’t train QF of the question. 2019/4/25 WMMKS Lab

Question v.s. Web (2/3) Example of trained question For LM: “何地” are trained successfully. For SDRM: “地”, ”舉行” are trained successfully, but these terms are also trained on other types. 2019/4/25 WMMKS Lab

Question vs. Web (3/3) Example of untrained question For LM: “女星”, ”獲得” are not trained successfully. For SDRM: “女星”, ”獲得” are trained successfully. 2019/4/25 WMMKS Lab

Conclusion Discussion Conclusion We need to enhance our learning method and performance. We need better smoothing method. Conclusion We propose a new model SDRM which uses question focus and dependency features for question classification. Use Web search results as training data to solve the problem of insufficient training data. 2019/4/25 WMMKS Lab

Future Work Further works in the future Enhance the performance of learning method. Consider the importance of features in the question. Question focus and dependency features may be used for other process steps of question answer systems. 2019/4/25 WMMKS Lab

Thank You 2019/4/25 WMMKS Lab