The k-neighbors is commonly used and easy to apply classification method which implements the k neighbors queries to classify data. It is an instant-based and non-parametric learning method. In this method, the classifier learns from the instances in the training dataset and classifies new input by using the previously measured scores.
    Scikit-learn
 API provides the KNeighborsClassifier class to implement k-neighbors method for 
classification problems. In this tutorial, we'll briefly learn how to classify data by using the KNeighborsClassifier class in Python. The tutorial 
covers:
- Preparing the data
- Training the model
- Predicting and accuracy check
- Iris dataset classification example
- Source code listing
from sklearn.neighbors import KNeighborsClassifier
from sklearn.datasets import load_iris
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report 
Preparing the data
    First,
 we'll generate random classification dataset with make_classification()
 function. The dataset contains 4 classes with 10 features and the 
number of samples is 10000. 
x, y = make_classification(n_samples=10000, n_features=10, 
                           n_classes=4, 
                           n_clusters_per_class=1)
Then, we'll split the data into train and test parts. Here, we'll extract 15 percent of it as test data.
xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size = 0.15)
Training the model
     Next,
 we'll define the classifier by using the KNeighborsClassifier class. The neighbors number is important in this method. Selecting the right number of neighbors provides the more accurate results. Here, we'll set 4 into the n_neighbors parameter of the class.
knc = KNeighborsClassifier(n_neighbors = 4) print(knc)
 KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
                     metric_params=None, n_jobs=None, n_neighbors=4, p=2,
                     weights='uniform') 
   
We'll fit the model on the train data. After the training the classifier, we'll check the model accuracy score. 
knc.fit(xtrain, ytrain) score = knc.score(xtrain, ytrain)
print("Training score: ", score)  Training Score:  0.8647058823529412
Predicting and accuracy check
     Now, we can predict the test data by using the trained model. After the 
prediction, we'll check the accuracy level by using the confusion matrix
 function.
ypred = knc.predict(xtest)
cm = confusion_matrix(ytest, ypred)
print(cm)  [[342  19   2   3]
 [ 27 289  16  39]
 [ 16   9 318  46]
 [  5  62  59 248]]   
We can also create a classification report by using 
classification_report() function on predicted data to check the other 
accuracy metrics. 
cr = classification_report(ytest, ypred)
print(cr)
              precision    recall  f1-score   support
           0       0.88      0.93      0.90       366
           1       0.76      0.78      0.77       371
           2       0.81      0.82      0.81       389
           3       0.74      0.66      0.70       374
    accuracy                           0.80      1500
   macro avg       0.80      0.80      0.80      1500
weighted avg       0.80      0.80      0.80      1500
Iris dataset classification example
    We'll
 load the Iris dataset with load_iris() function, extract the x and y 
parts, then split into the train and test parts.
# Iris dataset example
 
iris = load_iris()
x, y = iris.data, iris.target
xtrain, xtest, ytrain, ytest=train_test_split(x, y, test_size=0.10) 
Then, we'll use the same method mentioned above. 
 knc = KNeighborsClassifier(n_neighbors = 3)
print(knc)
knc.fit(xtrain, ytrain)
score = knc.score(xtrain, ytrain)
print("Score: ", score)
ypred = knc.predict(xtest)
cm = confusion_matrix(ytest, ypred)
print(cm)
cr = classification_report(ytest, ypred)
print(cr)  KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
                     metric_params=None, n_jobs=None, n_neighbors=3, p=2,
                     weights='uniform')
Score:  0.9555555555555556
[[4 0 0]
 [0 8 0]
 [0 0 3]]
              precision    recall  f1-score   support
           0       1.00      1.00      1.00         4
           1       1.00      1.00      1.00         8
           2       1.00      1.00      1.00         3
    accuracy                           1.00        15
   macro avg       1.00      1.00      1.00        15
weighted avg       1.00      1.00      1.00        15      In this tutorial, we've briefly learned how to classify data by using
 Scikit-learn's KNeighborsClassifier class in Python. The full source code is listed below. 
Source code listing
 from sklearn.neighbors import KNeighborsClassifier
from sklearn.datasets import load_iris
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report
x, y = make_classification(n_samples=10000, n_features=10, 
                           n_classes=4, n_clusters_per_class=1)
xtrain, xtest, ytrain, ytest=train_test_split(x, y, test_size=0.15)
knc = KNeighborsClassifier(n_neighbors=4)
print(knc)
knc.fit(xtrain, ytrain)
score = knc.score(xtrain, ytrain)
print("Training score: ", score)
ypred = knc.predict(xtest)
cm = confusion_matrix(ytest, ypred)
print(cm)
cr = classification_report(ytest, ypred)
print(cr)
# Iris dataset example
iris = load_iris()
x, y = iris.data, iris.target
xtrain, xtest, ytrain, ytest=train_test_split(x, y, test_size=0.10)
knc = KNeighborsClassifier(n_neighbors=3)
print(knc)
knc.fit(xtrain, ytrain)
score = knc.score(xtrain, ytrain)
print("Score: ", score)
ypred = knc.predict(xtest)
cm = confusion_matrix(ytest, ypred)
print(cm)
cr = classification_report(ytest, ypred)
print(cr)  
References:
 
No comments:
Post a Comment