Long-Sheng Chena, Cheng-Hsiang Liub, Hui-Ju Chiua A neural network based approach for sentiment classification in the blogosphere Long-Sheng Chena, Cheng-Hsiang Liub, Hui-Ju Chiua Journal of Informetrics 5 (2011) 313–322 Report:Yi-Hsiang Hsieh
Outline Introduction Methodology Experiments Conclusion
Introduction(1/3) Recognizing emotion is extremely important for a text-based communication tool such as a blog. On commercial blogs, the evaluation comments by bloggers of a product can spread at an explosive rate in cyberspace. Lately, researchers have been paying much attention to sentiment classification. Semantic orientation indexes and machine learning methods are usually employed to achieve this goal.
Introduction(2/3) This study proposed a neural-network based approach. The proposed NN based method combines the BPN and SO indexes to classify bloggers’ sentiment. NN based method can reduce training time when classifying textual data. NN based method outperforms traditional sentiment classification methods, BPN and SO index, from experimental results. 為了結合這兩種方法的優點,本研究提出了一種神經網絡為基礎的方法 ►所提出的基於神經網絡方法結合了BPN和SO指標進行分類博客“的情緒。 ►基於神經網絡的方法可以在分類文本數據,從而減少培訓時間。 ►基於神經網絡方法優於傳統的情感分類方法,BPN和SO指數,從實驗結果。
Introduction(3/3) Our method uses the results of the SO indexes as the inputs for the BPN. Several cases collected from real world blogs or databases are provided to demonstrate the effectiveness of our method. The experimental results indicate that our method can efficiently increase the performance of sentiment classification and save a substantial amount of training time compared with traditional IR and ML techniques, respectively.
Methodology(1/8) Back-propagation neural networks Step 1. For each training pattern (presented in random order): Step 1.1. Apply the inputs to the network. Step 1.2. Calculate the output for every neuron from the input layer, through the hidden layer(s), to the output layer. Step 1.3. Calculate the error at the outputs. Step 1.4. Use the output error to compute error signals for pre-output layers. Step 1.5. Use the error signals to compute weight adjustments. Step 1.6. Apply the weight adjustments. Step 2. Periodically evaluate the network performance.
Methodology(2/8) Semantic orientation indexes The general SO index is used to infer semantic orientation from the semantic association (SO-A). In the SO-A index defined in Eq. (1). A word, word, is classified as having a positive (negative) semantic orientation when the SO-A(word) is positive (negative). The magnitude (absolute value) of the SO-A(word) can be considered as the strength of the semantic orientation: 一般SO索引用於從語義關聯(SO-A)推斷語義傾向。在公式中定義的SO-A指數。 一個字,一句話,被歸類為具有正(負)語義傾向時, SO-A(字)為正(負)。的幅度將SO-A(字)的(絕對值)可以被認為是的強度 語義方向:
Methodology(3/8) The second index calculates the semantic orientation from the PMI, called the SO-PMI index. Unlike the SO-A, the SO-PMI uses the PMI-IR to estimate the semantic orientation of a phrase. The PMI between two words, word1 and word2, is defined as the SO-PMI can be calculated as follows: 第二個指標計算從PMI的傾向性,叫SO-PMI指數。該指數是從SO-A擴展,它被廣泛應用於實踐(阿巴西等人,2008年,Chaovalit週,2005年,特尼,2002年和特尼和利特曼,2003)。不同的是SO-A,該SO-PMI採用PMI-IR(點式互信息和信息檢索)來估算一個短語(教會和漢克斯,1989年和特尼,2002)的語義指向。之間的兩個詞,WORD1和WORD2的PMI,被定義為
Methodology(4/8) Thus, using 2 different operators, we have two SO-PMI indexes, SO- PMI(AND) and SO-PMI(NEAR) in this study. The last index is SO-LSA which calculates the strength of the semantic association between words using LSA 因此,使用2個不同的運營商,我們有兩個SO-PMI索引,在這項研究中的SO-PMI(AND)和SO-PMI(近端)。 最後一個指標(SO-LSA,它計算使用LSA詞之間語義關聯的強度
Methodology(5/8) 本節將介紹所提出的基於神經網絡的方法。如圖所示。2,我們的方法的實現可以分為4個步驟。這四個步驟可以證明如下。
Methodology(6/8) Step 1: prepare data
Methodology(7/8) Step 2: calculate the SO indexes In this study, we use four SO indexes including SO-A, SO-PMI(AND), SO- PMI(NEAR), SO-LSA as the input neurons of BPN. Therefore, the second step of our method is to calculate these SO indexes. Step 3: train the neural network The experimental data set is divided into training and test sets. systematically tried a different proportion (50–90%) of all examples to be the training data set, Then, we begin the training process of the BPN using the training data set.
Methodology(8/8) Step 4: performance evaluation In this step, we use the test data to evaluate the performance of our NN based approach, the BPN, and the four SO indexes.
Experiments(1/8) Data preparation
Experiments(2/8) Performance evaluation The performance evaluation matrices, overall accuracy (OA) and F1 have been used. In short, the common way for evaluating the performance of classifiers is based on the confusion matrix shown in Table 3. 性能評價矩陣,總體準確度(OA)與F1已被使用。總之,對於評估分類器的性能的常見方法是根據在表3中所示的混淆矩陣。
Experiments(3/8) In general, the performance of a sentiment classifier is evaluated by the OA compared to the number of test cases. OA can be defined by Eq. Another popular index is F1 whose formula comes from the combination of Precision and Recall. F1, Precision, and Recall are defined by Eqs.
Experiments(4/8) Experimental results First, we attempted to compare the effectiveness of SO indexes, SO-A, SO-PMI(NEAR), SO-PMI(AND), and SO-LSA. Table 4 summarizes the results of these four indexes. 本節提供了實現的結果。首先,我們試圖比較SO索引的有效性,SO-A,SO-PMI(近端),SO-PMI(AND),和SO-LSA。表4總結了這四個指標的結果。
Experiments(5/8) However, this performance of SO-LSA is not good enough. Therefore, next, we implemented BPN and our method. To find the best performance of BPN and our method, we systematically tried a different proportion (50–90%) of all examples to be the training data set, with the rest of the samples as the test set. After the experiments, we picked the best performance.
Experiments(6/8)
Experiments(7/8) From Fig. 3 and Table 5, we found that the proposed method, including quantitative and qualitative representation, has the best OAs in Movie-1, Movie-2, EC and Blog data sets. Compared with the original BPN, the NN based method can increase the classification performance by 4–6% in these 4 data sets.
Experiments(8/8) Table 6 summarizes the average processing time of the BPN and NN based methods.
Conclusion(1/2) This study proposed an NN based approach to classify sentiment in blogospheres by combining the advantages of the BPN and SO indexes. Compared with traditional techniques such as BPN and SO indexes, the proposed approach shows its superiority not only in classification accuracy, but also in training time. In order to obtain better or more robust results, additional experiments of using different ML approaches such as Support Vector Machines (SVM) and Naïve Bayes are necessary in future researches.
Conclusion(2/2) It should also be noted that our proposed method is not only specific to blogs, it can be employed to classify sentiment in any text based communication tool. We just used blogs as an example in this study. Readers can apply the proposed method to any new media such as Twitter, Plurk, Facebook, and so on. But, to testify the limitations of the proposed method, future works could use different data sets or data types.
Thanks for your attention