The post covers:
- Creating sample dataset
- Splitting dataset into train and test parts
- Building Gaussian Naive Bayes model
- Predicting test data and checking the results
from sklearn.naive_bayes import GaussianNB import pandas as pd import numpy as np from sklearn.metrics import accuracy_score,confusion_matrix,classification_report from sklearn.model_selection import train_test_split
Creating sample dataset
For learning purpose, I will generate a simple dataset with below function. You may use your dataset too. Please make sure that your data is shaped correctly (X, Y parts).
def CreateDataFrame(N): columns = ['a','b','c','y'] df = pd.DataFrame(columns=columns) for i in range(N): a = np.random.randint(10) b = np.random.randint(20) c = np.random.randint(5) y = "normal" if((a+b+c)>25): y="high" elif((a+b+c)<12): y= "low" df.loc[i]= [a, b, c, y] return df df = CreateDataFrame(200) >>> df.head() a b c y 0 8 7 0 normal 1 9 0 1 low 2 8 11 1 normal 3 6 8 4 normal 4 2 12 4 normal
Splitting dataset into train and test parts
We will extract feature part - X and label part - Y from a dataset, and split them into a train and test parts.
X = df[["a","b","c"]] Y = df[["y"]] Xtrain, Xtest, ytrain, ytest = train_test_split(X, Y, random_state=0)
Building Gaussian Naive Bayes model
We will use GaussianNB() function of 'sklearn.naive_bayes' library and fit the model with x, y train data.
model = GaussianNB().fit(Xtrain, ytrain)
Now, we can check the attributes of our model.
Classes in a model:
>>> model.classes_ array(['high', 'low', 'normal'], dtype='<U6')
The number of training samples observed in each class:
>>> model.class_count_ array([13., 49., 88.])
The probability of each class:
>>> model.class_prior_ array([0.08666667, 0.32666667, 0.58666667])
Predicting test data and checking the results
Finally, we'll predict test data and evaluate the results.
ypred=model.predict(Xtest) accuracy = accuracy_score(ytest,ypred) report = classification_report(ypred, ytest) cm = confusion_matrix(ytest, ypred) print("Classification report:") print("Accuracy: ",accuracy) print(report) print("Confusion matrix:") print(cm)
Classification report:
Accuracy: 0.96
precision recall f1-score support
high 0.00 0.00 0.00 0
low 0.93 1.00 0.97 14
normal 1.00 0.94 0.97 36
avg / total 0.98 0.96 0.97 50
Confusion matrix:
[[ 0 0 1]
[ 0 14 1]
[ 0 0 34]]
In this post, we have learned how to classify data with Gaussian Naive Bayes model in Python. Thank you for reading!
The full source code.
from sklearn.naive_bayes import GaussianNB import pandas as pd import numpy as np from sklearn.metrics import accuracy_score,confusion_matrix,classification_report from sklearn.model_selection import train_test_split def CreateDataFrame(N): columns = ['a','b','c','y'] df = pd.DataFrame(columns=columns) for i in range(N): a = np.random.randint(10) b = np.random.randint(20) c = np.random.randint(5) y = "normal" if((a+b+c)>25): y="high" elif((a+b+c)<12): y= "low" df.loc[i]= [a, b, c, y] return df df = CreateDataFrame(200) X = df[["a","b","c"]] Y = df[["y"]] Xtrain, Xtest, ytrain, ytest = train_test_split(X, Y, random_state=0) model = GaussianNB().fit(Xtrain, ytrain) ypred=model.predict(Xtest) accuracy = accuracy_score(ytest,ypred) report = classification_report(ypred, ytest) cm = confusion_matrix(ytest, ypred) print("Classification report:") print("Accuracy: ",accuracy) print(report) print("Confusion matrix:") print(cm)
No comments:
Post a Comment