Presentation is loading. Please wait.

Presentation is loading. Please wait.

Some Effective Techniques for Naive Bayes Text Classification

Similar presentations


Presentation on theme: "Some Effective Techniques for Naive Bayes Text Classification"— Presentation transcript:

1 Some Effective Techniques for Naive Bayes Text Classification
Advisor : Dr. Hsu Presenter : Ai-Chen Liao Authors : Sang-Bum Kim, Kyoung-Soo Han, Hae-Chang Rim, and Sung Hyon Myaeng TKDE . Page(s) :

2 Outline Motivation Objective About Naïve Bayes Method
A per-document length normalization approach Weight-enhancing method Experimental Result Conclusion Personal Opinions

3 Motivation While naïve Bayes is quite effective in various data mining tasks, it shows a disappointing result in the automatic text classification problem. Based on the observation of naïve Bayes for the natural language text, we found a serious problem in the parameter estimation process, which cause poor results in text classification domain.

4 Objective We hope to propose some methods that can improve these problems.

5 About Naive Bayes Multivariate Bernoulli naïve Bayes
A document is considered as a binary feature vector representing whether each word is present or absent. It is not equipped to utilize term frequencies in documents. Multinomial model Two serious problems: (1) rough parameter estimation (2) handling rare categories Naive bayes是非常有效率,且實作上容易,也能夠與別的學習演算法去比較,但傳統上的naïve bayes沒有比其他的統計方法來的好,像是svm或boosting,最近鄰居分類器等等,所以希望能夠改善它。

6 About Naive Bayes

7 Method ─ Multivariate Poisson Model for Text Classification
λ表示某特定區間內某事件所發生的平均次數 假設一個document是由一個多變量的poisson model所產生的。

8 Method ─ A per-document length normalization approach
假設一個document是由一個多變量的poisson model所產生的。 根據每一篇文章的長度,對文章內的term作正規化。

9 Method ─ Feature Weighting Scheme

10 Experimental Results DS1: Reuters21578 (consists of 21,578 news articles) DS2: 20Newsgroups (consists of 19,997 Usenet articles collected from 20 different newsgroups)

11 Experimental Results high high high high

12 Experimental Results

13 Conclusion We propose a Poisson naive Bayes text classification model with weight-enhancing method. We suggest per-document term frequency normalization to estimate the Poisson parameter, while the traditional multinomial classifier estimates its parameters by considering all the training documents as a unique huge training document.

14 Personal Opinions Advantage Drawback Application …
Text classification…


Download ppt "Some Effective Techniques for Naive Bayes Text Classification"

Similar presentations


Ads by Google