Some Effective Techniques for Naive Bayes Text Classification

Slides:

Advertisements

Similar presentations

平衡飲食保健強身整理至簡體版，作者不可考。內容為參加國際健康會議所發表的心得。. 人應該活多久有人告訴我五六十歲就差不多了。我在醫院工作四十年了，絕大部分病死的人是很痛苦的。我在美國遇見張學良，一進門見到他就大吃ㄧ驚，他眼不花，耳不聾，很多人問他：少帥，您怎麼能活這麼久？他回答：不是我活的久，是他們活的太短了。

Advertisements

联合国提出个口号：“千万不要死于无知” 保健的三个里程碑平衡饮食有氧运动心理状态.

平衡飲食保健強身.

Classification of Web Query Intent Using Encyclopedia 基于百科知识的查询意图获取

師資培育中心外埠教育參觀.

二維品質模式與麻醉前訪視滿意度中文摘要麻醉前訪視，是麻醉醫護人員對病患提供麻醉相關資訊與服務，並建立良好醫病關係的第一次接觸。本研究目的是以Kano‘s 二維品質模式，設計病患滿意度問卷，探討麻醉前訪視內容與病患滿意度之關係，以期分析關鍵品質要素為何，作為提高病患對醫療滿意度之參考。本研究於台灣北部某醫學中心，通過該院人體試驗委員會審查後進行。對象為婦科排程手術住院病患，其中實驗組共107位病患，在麻醉醫師訪視之前，安排先觀看麻醉流程衛教影片；另外對照組111位病患，則未提供衛教影片。問卷於麻醉醫師

华东师范大学软件学院王科强 (第一作者), 王晓玲

信息技术支持回归教育本质刘美凤教授北京师范大学教育学部教育技术学院.

Mode Selection and Resource Allocation for Deviceto- Device Communications in 5G Cellular Networks 林柏毅羅傑文.

Semantic-Synaptic Web Mining: A Novel Model for Improving the Web Mining 報告者：陳宜樺報告日期：2015/9/25.

libD3C: 一种免参数的、支持不平衡分类的二类分类器

A Question Answering Approach to Emotion Cause Extraction

Blind dual watermarking for color images’ authentication and copyright protection Source : IEEE Transactions on Circuits and Systems for Video Technology.

What water is more suitable for nurturing the goldfish

Rate and Distortion Optimization for Reversible Data Hiding Using Multiple Histogram Shifting Source: IEEE Transactions On Cybernetics, Vol. 47, No. 2,February.

Improving classiﬁcation models with taxonomy information

Thinking of Instrumentation Survivability Under Severe Accident

Population proportion and sample proportion

毕业论文报告孙悦明

模式识别 Pattern Recognition

文本分类综述王斌中国科学院计算技术研究所 2002年12月.

Manifold Learning Kai Yang

軟體原型 (Software Prototyping)

前言兩千年來仇敵撒旦不斷以兩方面攻擊神的教會：

Source: IEEE Access, vol. 5, pp , October 2017

Journal Citation Reports® 期刊引文分析報告的使用和檢索

Linguistics and language teaching

製程能力分析何正斌教授國立屏東科技大學工業管理學系.

Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi

Outrigger Optimization for Super Tall Structures Under Multiple Constraints 多约束条件下超高结构伸臂系统优化.

传感器网络数据融合技术研究张小波广东工业大学自动化学院网络工程系

Understanding masses of charm-strange states in Regge phenomenology

学习报告 —语音转换（voice conversion)

永續運輸資訊系統 -交通事故資料分析研究周家慶高級分析師交通部運輸研究所.

A Study on the Next Generation Automatic Speech Recognition -- Phase 2

从百科类网站抽取infobox 报告人：徐波.

Towards Emotional Awareness in Software Development Teams

指導教授：黃三益教授學生洪瑞麟 m 蔡育洲 m 陳怡綾 m

谈模式识别方法在林业管理问题中的应用报告人：管理工程系马宁报告地点：学研B107

Version Control System Based DSNs

研究技巧與論文撰寫方法中央大學資管系陳彥良.

高性能计算与天文技术联合实验室智能与计算学部天津大学

Maintaining Frequent Itemsets over High-Speed Data Streams

表情识别研究 Sources of facial expressions

Safety science and engineering department

Date: 2012/05/14 Source: Bo Zhao et. al (CIKM’11)

Learn Question Focus and Dependency Relations from Web Search Results for Question Classification 各位老師大家好,這是我今天要報告的論文題目,…… 那在題目上的括號是因為,前陣子我們有投airs的paper,那有reviewer對model的名稱產生意見.

以時間序列分析法偵測台灣一等一級水準網之殘留系統誤差 Detecting Remained Systematic Errors In The First-Order ClassⅠLeveling Network of Taiwan By Using Time series 指導教授：許榮欣學生：林曾進.

Inter-band calibration for atmosphere

A Data Mining Algorithm for Generalized Web Prefetching

12 由薪資反映員工對組織的貢獻度說明獎金及員工績效的關聯性。說明組織如何依員工個人績效核發獎金。指出組織如何依團體績效核發獎金。

Efficient Query Relaxation for Complex Relationship Search on Graph Data 李舒馨

101年國民中學實驗場所安全衛生輔導先驅計畫推動

An Quick Introduction to R and its Application for Bioinformatics

More About Auto-encoder

Speaker : YI-CHENG HUNG

參考資料：林秋燕曾元顯卜小蝶，Chap. 1、3 Chowdhury，Chap.9

專業倫理 (Professional Ethics) 2008 FALL SEMESTER (N3)

臺北市101年度社會學習領域活化教學地理科教學活動設計

Reversible Data Hiding in Color Image with Grayscale Invariance

Chapter 9 Validation Prof. Dehan Luo

Format of Posters in English

Term Project : Requirement

之前都是分类的蒸馏很简单。然后从分类到分割也是一样，下一篇是检测的蒸馏

WiFi is a powerful sensing medium

Homework 2 : VSM and Summary

Gaussian Process Ruohua Shi Meeting

《神经网络与深度学习》第10章模型独立的学习方式

Some discussions on Entity Identification

Presentation transcript:

Some Effective Techniques for Naive Bayes Text Classification Advisor : Dr. Hsu Presenter : Ai-Chen Liao Authors : Sang-Bum Kim, Kyoung-Soo Han, Hae-Chang Rim, and Sung Hyon Myaeng 2006 . TKDE . Page(s) : 1457 - 1466

Outline Motivation Objective About Naïve Bayes Method A per-document length normalization approach Weight-enhancing method Experimental Result Conclusion Personal Opinions

Motivation While naïve Bayes is quite effective in various data mining tasks, it shows a disappointing result in the automatic text classification problem. Based on the observation of naïve Bayes for the natural language text, we found a serious problem in the parameter estimation process, which cause poor results in text classification domain.

Objective We hope to propose some methods that can improve these problems.

About Naive Bayes Multivariate Bernoulli naïve Bayes A document is considered as a binary feature vector representing whether each word is present or absent. It is not equipped to utilize term frequencies in documents. Multinomial model Two serious problems: (1) rough parameter estimation (2) handling rare categories Naive bayes是非常有效率，且實作上容易，也能夠與別的學習演算法去比較，但傳統上的naïve bayes沒有比其他的統計方法來的好，像是svm或boosting,最近鄰居分類器等等，所以希望能夠改善它。

About Naive Bayes

Method ─ Multivariate Poisson Model for Text Classification λ表示某特定區間內某事件所發生的平均次數假設一個document是由一個多變量的poisson model所產生的。

Method ─ A per-document length normalization approach 假設一個document是由一個多變量的poisson model所產生的。根據每一篇文章的長度，對文章內的term作正規化。

Method ─ Feature Weighting Scheme

Experimental Results DS1: Reuters21578 (consists of 21,578 news articles) DS2: 20Newsgroups (consists of 19,997 Usenet articles collected from 20 different newsgroups)

Experimental Results high high high high

Experimental Results

Conclusion We propose a Poisson naive Bayes text classification model with weight-enhancing method. We suggest per-document term frequency normalization to estimate the Poisson parameter, while the traditional multinomial classifier estimates its parameters by considering all the training documents as a unique huge training document.

Personal Opinions Advantage Drawback Application … Text classification…