Sentiment Classification with Keras in Python

   Our next binary classification method of sentiment data will be a keras model. Keras provides powerful methods to process text classification. In this post, we'll briefly learn how to classy text data with keras sequential model. We'll use CountVector class of sklearn library to build a vector data. The post covers:
  • Preparing data
  • Vectorizing text
  • Building keras model
  • Predicting test data and the accuracy check

We'll start by loading the required libraries.

import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score,confusion_matrix
from keras.models import Sequential
from keras import layers


Preparing data

   I prepared a simple sentiment data for this tutorial. The data contains imaginary random opinions that positive opinion labeled '1' and negative opinion with '0'. The below is sample content of sentiment training data.

1,"I like it "
1,"like it a lot "
1,"It's really good "
1,"Recommend! I really enjoyed! "
1,"It's really good "
1,"recommend too "
1,"outstanding performance "
...
0,"it's mediocre! not recommend "
0,"Not good at all! "
0,"It is rude "
0,"I don't like this type "
0,"poor performance "
0,"Boring, not good at all! "
0,"not liked "
0,"I hate this type of things "
...

You can find the full list of the sentiment data below. Copy the text and save it as a sentiments.csv on your target folder.

Next, we'll load the sentiments.csv data and separate it into x and y parts.

df = pd.read_csv('datasets/sentiments.csv')
df.columns = ["label","text"]
x = df['text'].values
y = df['label'].values

To train the model and to predict new data, we'll split the data into train and test parts.

x_train, x_test, y_train, y_test = \
 train_test_split(x, y, test_size=0.12, random_state=123)


Vectorizing texts

CountVectorizer() class helps us to build a vector from the text data. We'll create matrix data from the train and test text vectors.

vectorizer = CountVectorizer()
vectorizer.fit(x_train)
Xtrain = vectorizer.transform(x_train)
Xtest = vectorizer.transform(x_test)
print(Xtrain.shape)
(49, 77)
print(Xtest.shape)
(7, 77) 


Building keras model

Next, we'll build a keras sequential model. The model is simple. We'll use the input layer with 'relu' activation and the output layer with 'sigmoid' activation.

model = Sequential()
model.add(layers.Dense(32, input_dim=Xtrain.shape[1], activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', 
              optimizer='adam', metrics=['accuracy'])

model.summary()
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_1 (Dense)              (None, 32)                2496      
_________________________________________________________________
dense_2 (Dense)              (None, 1)                 33        
=================================================================
Total params: 2,529
Trainable params: 2,529
Non-trainable params: 0
_________________________________________________________________ 

We can train the model with train data.

model.fit(Xtrain, y_train, epochs=50, 
    batch_size=32,verbose=False)

Then, we'll check the training accuracy.

loss, accTrain = model.evaluate(Xtrain, y_train, verbose=False)
print("Train accuracy:", accTrain.round(2)," loss: ", loss.round(2))
Train accuracy: 0.96  loss:  0.42 


Predicting test data and the accuracy check

Finally, we'll predict test data and check the prediction accuracy.

ypred=model.predict(Xtest)

ypred[ypred>0.5]=1 
ypred[ypred<=0.5]=0 
cm = confusion_matrix(y_test, ypred)
print(cm)
acc=accuracy_score(y_test,ypred)
print("Test accuracy:", acc)
[[2 1]
 [0 4]]
Test accuracy: 0.8571428571428571 

We can also check the original and predicted outputs in test data.

result=zip(x_test, y_test, ypred)
for i in result:
 print(i)
('I am excited a lot ', 1, array([1.], dtype=float32))
('exciting, liked. ', 1, array([1.], dtype=float32))
('terrible! I did not expect. ', 0, array([0.], dtype=float32))
('What a nice restaurant.', 1, array([1.], dtype=float32))
('not recommend, not satisfied ', 0, array([0.], dtype=float32))
('What a nice show.', 1, array([1.], dtype=float32))
('Offensive, it is a crap! ', 0, array([1.], dtype=float32) 


   In this post, we've briefly learned sentiment text classification with Keras model in Python. Although the accuracy has reached 85 percent, the model needs larger training dataset to improve its prediction accuracy.
   The full source code is listed below.

import pandas as pd
from keras.models import Sequential
from keras import layers
from sklearn.metrics import accuracy_score,confusion_matrix

df = pd.read_csv('datasets/sentiments.csv')
df.columns = ["label","text"]
x = df['text'].values
y = df['label'].values

x_train, x_test, y_train, y_test = \
 train_test_split(x, y, test_size=0.12, random_state=123)

vectorizer = CountVectorizer()
vectorizer.fit(x_train)
Xtrain = vectorizer.transform(x_train)
Xtest = vectorizer.transform(x_test)
print(Xtrain.shape)
print(Xtest.shape)

model = Sequential()
model.add(layers.Dense(32, input_dim=Xtrain.shape[1], activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', 
              optimizer='adam', metrics=['accuracy'])

model.summary()

model.fit(Xtrain, y_train, epochs=50, 
    batch_size=32,verbose=False)

model.evaluate(Xtest, y_test, verbose=False)
loss, accTrain = model.evaluate(Xtrain, y_train, verbose=False)
print("Train accuracy:", accTrain.round(2)," loss: ", loss.round(2))

ypred=model.predict(Xtest)

ypred[ypred>0.5]=1 
ypred[ypred<=0.5]=0 
cm = confusion_matrix(y_test, ypred)
print(cm)
acc=accuracy_score(y_test,ypred)
print("Test accuracy:", acc)

result=zip(x_test, y_test, ypred)
for i in result:
 print(i)

sentiments.csv data

1,"I like it "
1,"like it a lot "
1,"It's really good "
1,"Recommend! I really enjoyed! "
1,"It's really good "
1,"recommend too "
1,"outstanding performance "
1,"it's good! recommend! "
1,"Great! "
1,"really good. Definitely, recommend! "
1,"It is fun "
1,"Exceptional! liked a lot! "
1,"highly recommend this "
1,"fantastic show "
1,"exciting, liked. "
1,"it's ok "
1,"exciting show "
1,"amazing performance "
1,"it is great! "
1,"I am excited a lot "
1,"it is terrific "
1,"Definitely good one "
1,"Excellent, very satisfied "
1,"Glad we went "
1,"Once again outstanding! "
1,"awesome! excellent show "
1,"This is truly a good one! "
1,"What a nice restaurant."
1,"What a nice show."
1,"what a great place!"
1,"Great atmosphere"
1,"Definitely you should go"
1,"This is a great!"
1,"I really love it"
0,"it's mediocre! not recommend "
0,"Not good at all! "
0,"It is rude "
0,"I don't like this type "
0,"poor performance "
0,"Boring, not good at all! "
0,"not liked "
0,"I hate this type of things "
0,"not recommend, not satisfied "
0,"not enjoyed, I don't recommend this. "
0,"disgusting movie "
0,"waste of time, poor show "
0,"feel tired after watching this "
0,"horrible performance "
0,"not so good "
0,"so boring I fell asleep "
0,"a bit strange "
0,"terrible! I did not expect. "
0,"This is an awful "
0,"Nasty and horrible! "
0,"Offensive, it is a crap! "
0,"Disappointing! not liked. "
0,"The service is a nightmare"



No comments:

Post a Comment