The post covers:

- Creating sample dataset
- Splitting dataset into train and test parts
- Building Gaussian Naive Bayes model
- Predicting test data and checking the results

from sklearn.naive_bayes import GaussianNB import pandas as pd import numpy as np from sklearn.metrics import accuracy_score,confusion_matrix,classification_report from sklearn.model_selection import train_test_split

**Creating sample dataset**

For learning purpose, I will generate a simple dataset with below function. You may use your dataset too. Please make sure that your data is shaped correctly (X, Y parts).

def CreateDataFrame(N): columns = ['a','b','c','y'] df = pd.DataFrame(columns=columns) for i in range(N): a = np.random.randint(10) b = np.random.randint(20) c = np.random.randint(5) y = "normal" if((a+b+c)>25): y="high" elif((a+b+c)<12): y= "low" df.loc[i]= [a, b, c, y] return df df = CreateDataFrame(200) >>> df.head() a b c y 0 8 7 0 normal 1 9 0 1 low 2 8 11 1 normal 3 6 8 4 normal 4 2 12 4 normal

**Splitting dataset into train and test parts**

We will extract feature part - X and label part - Y from a dataset, and split them into a train and test parts.

X = df[["a","b","c"]] Y = df[["y"]] Xtrain, Xtest, ytrain, ytest = train_test_split(X, Y, random_state=0)

**Building Gaussian Naive Bayes model**

We will use GaussianNB() function of 'sklearn.naive_bayes' library and fit the model with x, y train data.

model = GaussianNB().fit(Xtrain, ytrain)

Now, we can check the attributes of our model.

Classes in a model:

>>> model.classes_ array(['high', 'low', 'normal'], dtype='<U6')

The number of training samples observed in each class:

>>> model.class_count_ array([13., 49., 88.])

The probability of each class:

>>> model.class_prior_ array([0.08666667, 0.32666667, 0.58666667])

**Predicting test data and checking the results**

Finally, we'll predict test data and evaluate the results.

ypred=model.predict(Xtest) accuracy = accuracy_score(ytest,ypred) report = classification_report(ypred, ytest) cm = confusion_matrix(ytest, ypred) print("Classification report:") print("Accuracy: ",accuracy) print(report) print("Confusion matrix:") print(cm)

Classification report:

Accuracy: 0.96

precision recall f1-score support

high 0.00 0.00 0.00 0

low 0.93 1.00 0.97 14

normal 1.00 0.94 0.97 36

avg / total 0.98 0.96 0.97 50

Confusion matrix:

[[ 0 0 1]

[ 0 14 1]

[ 0 0 34]]

In this post, we have learned how to classify data with Gaussian Naive Bayes model in Python. Thank you for reading!

The full source code.

from sklearn.naive_bayes import GaussianNB import pandas as pd import numpy as np from sklearn.metrics import accuracy_score,confusion_matrix,classification_report from sklearn.model_selection import train_test_split def CreateDataFrame(N): columns = ['a','b','c','y'] df = pd.DataFrame(columns=columns) for i in range(N): a = np.random.randint(10) b = np.random.randint(20) c = np.random.randint(5) y = "normal" if((a+b+c)>25): y="high" elif((a+b+c)<12): y= "low" df.loc[i]= [a, b, c, y] return df df = CreateDataFrame(200) X = df[["a","b","c"]] Y = df[["y"]] Xtrain, Xtest, ytrain, ytest = train_test_split(X, Y, random_state=0) model = GaussianNB().fit(Xtrain, ytrain) ypred=model.predict(Xtest) accuracy = accuracy_score(ytest,ypred) report = classification_report(ypred, ytest) cm = confusion_matrix(ytest, ypred) print("Classification report:") print("Accuracy: ",accuracy) print(report) print("Confusion matrix:") print(cm)

## No comments:

## Post a Comment