Classification with Gaussian Naive Bayes model in Python

   Naive Bayes model, based on Bayes Theorem is a supervised learning technique to solve classification problems. The model calculates the probability and conditional probability of each class based on input data and performs the classification. In Gaussian naive Bayes model, the values of each class are distributed in the form of a Gaussian distribution. In this post, we'll learn how to implement a Navie Bayes model in Python with a 'sklearn' library.
   The post covers:
  1. Creating sample dataset
  2. Splitting dataset into train and test parts
  3. Building Gaussian Naive Bayes model
  4. Predicting test data and checking the results
   First, we add required libraries into our source code.

from sklearn.naive_bayes import GaussianNB
import pandas as pd
import numpy as np
from sklearn.metrics import accuracy_score,confusion_matrix,classification_report
from sklearn.model_selection import train_test_split

Creating sample dataset

   For learning purpose, I will generate a simple dataset with below function. You may use your dataset too. Please make sure that your data is shaped correctly (X, Y parts).

def CreateDataFrame(N):
 columns = ['a','b','c','y']
 df = pd.DataFrame(columns=columns)
 for i in range(N):
  a = np.random.randint(10)
  b = np.random.randint(20)
  c = np.random.randint(5)
  y = "normal"
  if((a+b+c)>25):
   y="high"
  elif((a+b+c)<12):
   y= "low"

  df.loc[i]= [a, b, c, y]
 return df

df = CreateDataFrame(200)
>>> df.head()
   a   b  c       y
0  8   7  0  normal
1  9   0  1     low
2  8  11  1  normal
3  6   8  4  normal
4  2  12  4  normal

Splitting dataset into train and test parts

We will extract feature part -  X and label part - Y  from a dataset, and split them into a train and test parts.

X = df[["a","b","c"]]
Y = df[["y"]]
Xtrain, Xtest, ytrain, ytest = train_test_split(X, Y, random_state=0)

Building Gaussian Naive Bayes model

   We will use GaussianNB() function of  'sklearn.naive_bayes' library and fit the model with x, y train data.

model = GaussianNB().fit(Xtrain, ytrain)  

Now, we can check the attributes of our model.
Classes in a model:
>>> model.classes_
array(['high', 'low', 'normal'], dtype='<U6')

The number of training samples observed in each class:
>>> model.class_count_
array([13., 49., 88.])

The probability of each class:
>>> model.class_prior_
array([0.08666667, 0.32666667, 0.58666667])


Predicting test data and checking the results

Finally, we'll predict test data and evaluate the results.

ypred=model.predict(Xtest)
accuracy = accuracy_score(ytest,ypred)
report = classification_report(ypred, ytest)
cm = confusion_matrix(ytest, ypred)

print("Classification report:")
print("Accuracy: ",accuracy)
print(report)
print("Confusion matrix:")
print(cm)

Classification report:
Accuracy:  0.96
             precision    recall  f1-score   support

       high       0.00      0.00      0.00         0
        low       0.93      1.00      0.97        14
     normal     1.00      0.94      0.97        36

avg / total     0.98      0.96      0.97        50

Confusion matrix:
[[ 0  0  1]
 [ 0 14  1]
 [ 0  0 34]]

   In this post, we have learned how to classify data with Gaussian Naive Bayes model in Python. Thank you for reading!

The full source code.

from sklearn.naive_bayes import GaussianNB
import pandas as pd
import numpy as np
from sklearn.metrics import accuracy_score,confusion_matrix,classification_report
from sklearn.model_selection import train_test_split

def CreateDataFrame(N):
 columns = ['a','b','c','y']
 df = pd.DataFrame(columns=columns)
 for i in range(N):
  a = np.random.randint(10)
  b = np.random.randint(20)
  c = np.random.randint(5)
  y = "normal"
  if((a+b+c)>25):
   y="high"
  elif((a+b+c)<12):
   y= "low"

  df.loc[i]= [a, b, c, y]
 return df

df = CreateDataFrame(200)

X = df[["a","b","c"]]
Y = df[["y"]]
Xtrain, Xtest, ytrain, ytest = train_test_split(X, Y, random_state=0)

model = GaussianNB().fit(Xtrain, ytrain)  
ypred=model.predict(Xtest)
accuracy = accuracy_score(ytest,ypred)
report = classification_report(ypred, ytest)
cm = confusion_matrix(ytest, ypred)

print("Classification report:")
print("Accuracy: ",accuracy)
print(report)
print("Confusion matrix:")
print(cm)


No comments:

Post a Comment