- Preparing data
- Vectorizing texts
- Training the model and predicting the test data
- Source code listing
import pandas as pd from sklearn.feature_extraction.text import CountVectorizer from sklearn.model_selection import train_test_split from sklearn.naive_bayes import GaussianNB from sklearn.metrics import accuracy_score,confusion_matrix
Preparing data
Here, I collected a simple sentiment data for this tutorial. The data contains imaginary random opinions that positive opinion labeled '1' and negative opinion with '0'. The below is a sample content of sentiment training data.
1,"I like it " 1,"like it a lot " 1,"It's really good " 1,"Recommend! I really enjoyed! " 1,"It's really good " 1,"recommend too " 1,"outstanding performance " ... 0,"it's mediocre! not recommend " 0,"Not good at all! " 0,"It is rude " 0,"I don't like this type " 0,"poor performance " 0,"Boring, not good at all! " 0,"not liked " 0,"I hate this type of things " ...
You can find the full list of the sentiment data below. Copy the text and save it as a sentiments.csv on your target folder.
Next, we'll load the sentiments.csv data and separate it into x and y parts.
df = pd.read_csv('datasets/sentiments.csv') df.columns = ["label","text"] x = df['text'].values y = df['label'].values
To train the model and to predict new data, we'll split the data into train and test parts.
x_train, x_test, y_train, y_test = \ train_test_split(x, y, test_size=0.12, random_state=121)
Vectorizing texts
CountVectorizer() class helps us to build a vector from the text data. We'll create matrix data from the train and test text vectors.
vectorizer = CountVectorizer() vectorizer.fit(x_train) Xtrain = vectorizer.transform(x_train) Xtest = vectorizer.transform(x_test) print(Xtrain.shape)
(42, 67)
print(Xtest.shape)
(6, 67)
Training the model and predicting the test data
Next, we'll build the Gaussian Naive Bayes model and train it with training data.
model = GaussianNB().fit(Xtrain.toarray(), y_train)
Finally, we'll predict the test data and check the accuracy.
ypred = model.predict(Xtest.toarray()) accuracy = accuracy_score(y_test, ypred) cm = confusion_matrix(y_test, ypred) print("Accuracy: ", accuracy)
Accuracy: 0.8333333333333334
print("Confusion matrix:") print(cm)
Confusion matrix:
[[2 1] [0 3]]
result=zip(x_test, y_test, ypred) for i in result: print(i)
("it's good! recommend! ", 1, 1)
('This is truly a good one! ', 1, 1)
('It is rude ', 0, 1)
('Nasty and horrible! ', 0, 0)
('waste of time, poor show ', 0, 0)
('exciting show ', 1, 1) 
In this post, we've briefly learned sentiment classification in python. Although the accuracy has reached 83 percent, the model needs a larger training dataset to improve its prediction accuracy.
The full source code is listed below.
Source code listing
import pandas as pd from sklearn.feature_extraction.text import CountVectorizer from sklearn.model_selection import train_test_split from sklearn.naive_bayes import GaussianNB from sklearn.metrics import accuracy_score,confusion_matrix df = pd.read_csv('datasets/sentiments.csv') df.columns = ["label","text"] x = df['text'].values y = df['label'].values x_train, x_test, y_train, y_test = \ train_test_split(x, y, test_size=0.12, random_state=121) vectorizer = CountVectorizer() vectorizer.fit(x_train) Xtrain = vectorizer.transform(x_train) Xtest = vectorizer.transform(x_test) print(Xtrain.shape) print(Xtest.shape) model = GaussianNB().fit(Xtrain.toarray(), y_train) ypred = model.predict(Xtest.toarray()) accuracy = accuracy_score(y_test, ypred) cm = confusion_matrix(y_test, ypred) print("Accuracy: ", accuracy) print("Confusion matrix:") print(cm) result=zip(x_test, y_test, ypred) for i in result: print(i)
sentiments.csv data
1,"I like it " 1,"like it a lot " 1,"It's really good " 1,"Recommend! I really enjoyed! " 1,"It's really good " 1,"recommend too " 1,"outstanding performance " 1,"it's good! recommend! " 1,"Great! " 1,"really good. Definitely, recommend! " 1,"It is fun " 1,"Exceptional! liked a lot! " 1,"highly recommend this " 1,"fantastic show " 1,"exciting, liked. " 1,"it's ok " 1,"exciting show " 1,"amazing performance " 1,"it is great! " 1,"I am excited a lot " 1,"it is terrific " 1,"Definitely good one " 1,"Excellent, very satisfied " 1,"Glad we went " 1,"Once again outstanding! " 1,"awesome! excellent show " 1,"This is truly a good one! " 0,"it's mediocre! not recommend " 0,"Not good at all! " 0,"It is rude " 0,"I don't like this type " 0,"poor performance " 0,"Boring, not good at all! " 0,"not liked " 0,"I hate this type of things " 0,"not recommend, not satisfied " 0,"not enjoyed, I don't recommend this. " 0,"disgusting movie " 0,"waste of time, poor show " 0,"feel tired after watching this " 0,"horrible performance " 0,"not so good " 0,"so boring I fell asleep " 0,"a bit strange " 0,"terrible! I did not expect. " 0,"This is an awful " 0,"Nasty and horrible! " 0,"Offensive, it is a crap! " 0,"Disappointing! not liked. "
 
No comments:
Post a Comment