DataTechNotes: Sentiment Classification Example with Keras in Python

The sentiment classification is about classifying the text according to the tone of sentences whether it is positive or negative. In this approach, we'll convert the text data into the numeric vectors and train the model on these data. We'll use the CountVector class of the sklearn library to build vector data. In this tutorial, we'll briefly learn how to classy sentiment data by applying the Keras sequential model. The tutorial covers:

Preparing the data
Vectorizing text
Building keras model
Predicting test data and the accuracy check
Source code listing

We'll start by loading the required libraries.

import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score,confusion_matrix
from keras.models import Sequential
from keras import layers

Preparing the data

Here, I collected a simple sentiment data for this tutorial. The data contains imaginary random opinions that positive opinion labeled '1' and negative opinion with '0'. The below is a sample text for sentiment training data.

1,"I like it "
1,"like it a lot "
1,"It's really good "
1,"Recommend! I really enjoyed! "
1,"It's really good "
1,"recommend too "
1,"outstanding performance "
...
0,"it's mediocre! not recommend "
0,"Not good at all! "
0,"It is rude "
0,"I don't like this type "
0,"poor performance "
0,"Boring, not good at all! "
0,"not liked "
0,"I hate this type of things "
...

You can find the full list of the sentiment data below. Copy the text and save it as a sentiments.csv on your target folder.

Next, we'll load the sentiments.csv data and separate it into x and y parts.

df = pd.read_csv('datasets/sentiments.csv')
df.columns = ["label","text"]
x = df['text'].values
y = df['label'].values

To train the model and to predict new data, we'll split the data into train and test parts.

x_train, x_test, y_train, y_test = \
    train_test_split(x, y, test_size=0.12, random_state=123)

Vectorizing texts

CountVectorizer() class helps us to build a vector from the text data. We'll create matrix data from the train and test text vectors.

vectorizer = CountVectorizer()
vectorizer.fit(x_train)
Xtrain = vectorizer.transform(x_train)
Xtest = vectorizer.transform(x_test)
print(Xtrain.shape)

(49, 77)

print(Xtest.shape)

(7, 77)

Building keras model

Next, we'll build a keras sequential model. We'll use the input layer with 'relu' activation and the output layer with 'sigmoid' activation.

model = Sequential()
model.add(layers.Dense(32, input_dim=Xtrain.shape[1], activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', 
              optimizer='adam', metrics=['accuracy'])

model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_1 (Dense)              (None, 32)                2496      
_________________________________________________________________
dense_2 (Dense)              (None, 1)                 33        
=================================================================
Total params: 2,529
Trainable params: 2,529
Non-trainable params: 0
_________________________________________________________________

We can train the model with train data.

model.fit(Xtrain, y_train, epochs=50, 
    batch_size=32,verbose=False)

Then, we'll check the training accuracy.

loss, accTrain = model.evaluate(Xtrain, y_train, verbose=False)
print("Train accuracy:", accTrain.round(2)," loss: ", loss.round(2))

Train accuracy: 0.96  loss:  0.42

Predicting test data and the accuracy check

Finally, we'll predict test data and check the prediction accuracy.

ypred=model.predict(Xtest)

ypred[ypred>0.5]=1 
ypred[ypred<=0.5]=0 
cm = confusion_matrix(y_test, ypred)
print(cm)

acc=accuracy_score(y_test,ypred)
print("Test accuracy:", acc)

[[2 1]
 [0 4]]

Test accuracy: 0.8571428571428571

We can also check the original and predicted outputs in test data.

result=zip(x_test, y_test, ypred)
for i in result:
 print(i)

('I am excited a lot ', 1, array([1.], dtype=float32))
('exciting, liked. ', 1, array([1.], dtype=float32))
('terrible! I did not expect. ', 0, array([0.], dtype=float32))
('What a nice restaurant.', 1, array([1.], dtype=float32))
('not recommend, not satisfied ', 0, array([0.], dtype=float32))
('What a nice show.', 1, array([1.], dtype=float32))
('Offensive, it is a crap! ', 0, array([1.], dtype=float32)

In this tutorial, we've briefly learned sentiment classification with the Keras deep learning model in Python. To improve the accuracy of the prediction and training, we need a larger dataset to train the model.
The full source code is listed below.

Source code listing

import pandas as pd
from keras.models import Sequential
from keras import layers
from sklearn.metrics import accuracy_score,confusion_matrix

df = pd.read_csv('datasets/sentiments.csv')
df.columns = ["label","text"]
x = df['text'].values
y = df['label'].values

x_train, x_test, y_train, y_test = \
 train_test_split(x, y, test_size=0.12, random_state=123)

vectorizer = CountVectorizer()
vectorizer.fit(x_train)
Xtrain = vectorizer.transform(x_train)
Xtest = vectorizer.transform(x_test)
print(Xtrain.shape)
print(Xtest.shape)

model = Sequential()
model.add(layers.Dense(32, input_dim=Xtrain.shape[1], activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', 
              optimizer='adam', metrics=['accuracy'])

model.summary()

model.fit(Xtrain, y_train, epochs=50, 
    batch_size=32,verbose=False)

model.evaluate(Xtest, y_test, verbose=False)
loss, accTrain = model.evaluate(Xtrain, y_train, verbose=False)
print("Train accuracy:", accTrain.round(2)," loss: ", loss.round(2))

ypred=model.predict(Xtest)

ypred[ypred>0.5]=1 
ypred[ypred<=0.5]=0 
cm = confusion_matrix(y_test, ypred)
print(cm)
acc=accuracy_score(y_test,ypred)
print("Test accuracy:", acc)

result=zip(x_test, y_test, ypred)
for i in result:
 print(i)

sentiments.csv data

1,"I like it "
1,"like it a lot "
1,"It's really good "
1,"Recommend! I really enjoyed! "
1,"It's really good "
1,"recommend too "
1,"outstanding performance "
1,"it's good! recommend! "
1,"Great! "
1,"really good. Definitely, recommend! "
1,"It is fun "
1,"Exceptional! liked a lot! "
1,"highly recommend this "
1,"fantastic show "
1,"exciting, liked. "
1,"it's ok "
1,"exciting show "
1,"amazing performance "
1,"it is great! "
1,"I am excited a lot "
1,"it is terrific "
1,"Definitely good one "
1,"Excellent, very satisfied "
1,"Glad we went "
1,"Once again outstanding! "
1,"awesome! excellent show "
1,"This is truly a good one! "
1,"What a nice restaurant."
1,"What a nice show."
1,"what a great place!"
1,"Great atmosphere"
1,"Definitely you should go"
1,"This is a great!"
1,"I really love it"
0,"it's mediocre! not recommend "
0,"Not good at all! "
0,"It is rude "
0,"I don't like this type "
0,"poor performance "
0,"Boring, not good at all! "
0,"not liked "
0,"I hate this type of things "
0,"not recommend, not satisfied "
0,"not enjoyed, I don't recommend this. "
0,"disgusting movie "
0,"waste of time, poor show "
0,"feel tired after watching this "
0,"horrible performance "
0,"not so good "
0,"so boring I fell asleep "
0,"a bit strange "
0,"terrible! I did not expect. "
0,"This is an awful "
0,"Nasty and horrible! "
0,"Offensive, it is a crap! "
0,"Disappointing! not liked. "
0,"The service is a nightmare"

DataTechNotes

Pages

Sentiment Classification Example with Keras in Python

No comments:

Post a Comment