Classification Example with RadiusNeighborsClassifier in Python

    RadiusNeighborsClassifier is a type of nearest-neighbor classification method and it implements radius-based neighbor classification that learning is based the number of neighbors within a fixed radius.

     Nearest-neighbor classification is an instance-based learning method. In this type of learning the algorithm compares the test data with the instances stored in the memory.  

   In this tutorial, we'll briefly learn how to classify data by using Scikit-learn's RadiusNeighborsClassifier class in Python. The tutorial covers:
  1. Preparing the data
  2. Training the model
  3. Predicting and accuracy check
  4. Iris dataset classification example
  5. Source code listing
   We'll start by loading the required libraries and functions.

from sklearn.neighbors import RadiusNeighborsClassifier
from sklearn.datasets import load_iris
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report
from sklearn.metrics import roc_auc_score

Preparing the data

    First, we'll generate random classification dataset with make_classification() function. The dataset contains 2 classes with 5 features and the number of samples is 5000.

x, y = make_classification(n_samples=5000, n_features=5, 
                           n_classes=2, n_clusters_per_class=1)


Then, we'll extract 15 percent of dataset as a test data and use all x and y as a training data. Here, we only extract test data from the dataset to do a prediction. Because, there is a possibility that the unseen test data may contain the samples that not covered by the model training scope, so we use all data as a training.

_, xtest, _, ytest=train_test_split(x, y, test_size=0.15)
 


Training the model

     Next, we'll define the classifier by using the RadiusNeighborsClassifier class by its default parameters. 

rnc = RadiusNeighborsClassifier()
print(rnc) 
 
RadiusNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
metric_params=None, n_jobs=None, outlier_label=None,
p=2, radius=1.0, weights='uniform')  

Then, we'll fit the it with x and y data. After the training the classifier, we'll check the model accuracy score.

rnc.fit(x, y)

score = rnc.score(x, y)
print("Training score: ", score)

Score:  0.9606


Predicting and accuracy check

     Now, we can predict the test data by using the trained model. After the prediction, we'll check the accuracy level by using the confusion matrix function.

ypred = rnc.predict(xtest)
cm = confusion_matrix(ytest, ypred)
print(cm) 
 
[[383  18]
[ 26 323]]
 
cr = classification_report(ytest, ypred)
print(cr)

auc_y = roc_auc_score(ytest, ypred)
print("ROC AUC y: %.4f" % auc_y)
 
We can also create a classification report by using classification_report() function on predicted data to check the other accuracy metrics.

cr = classification_report(ytest, ypred)
print(cr)

              precision    recall  f1-score   support

0 0.94 0.96 0.95 401
1 0.95 0.93 0.94 349

accuracy 0.94 750
macro avg 0.94 0.94 0.94 750
weighted avg 0.94 0.94 0.94 750


Area Under the Curver (AUC) for predicted data can be seen as below.

auc_y = roc_auc_score(ytest, ypred)
print("ROC AUC y: %.4f" % auc_y)
 
ROC AUC y: 0.9403 
 

Iris dataset classification example

    In this part of the tutorial, we'll apply the same method to classify the Iris dataset. First, we'll load the Iris dataset with load_iris() function, extract the x and y parts, then get the test data to predict.

print("Iris dataset classification")

iris = load_iris() x, y = iris.data, iris.target _, xtest, _, ytest=train_test_split(x, y, test_size=0.15)

Then, we'll fit the classifier, predict test data, and check the accuracy.

rnc = RadiusNeighborsClassifier()
print(rnc)

rnc.fit(x, y)
score = rnc.score(x, y)
print("Score: ", score)

ypred = nsvc.predict(xtest)

cm = confusion_matrix(ytest, ypred)
print(cm)

cr = classification_report(ytest, ypred)
print(cr)

Iris dataset classification
RadiusNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
metric_params=None, n_jobs=None, outlier_label=None,
p=2, radius=1.0, weights='uniform')
Score: 0.9733333333333334
[[11 0 0]
[ 0 5 0]
[ 0 1 6]]
precision recall f1-score support

0 1.00 1.00 1.00 11
1 0.83 1.00 0.91 5
2 1.00 0.86 0.92 7

accuracy 0.96 23
macro avg 0.94 0.95 0.94 23
weighted avg 0.96 0.96 0.96 23


    In this tutorial, we've briefly learned how to classify data by using Scikit-learn's RadiusNeighborsClassifier class in Python. The full source code is listed below.


Source code listing

from sklearn.neighbors import RadiusNeighborsClassifier
from sklearn.datasets import load_iris
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report
from sklearn.metrics import roc_auc_score

x, y = make_classification(n_samples=5000, n_features=5, 
                           n_classes=2, n_clusters_per_class=1)

_, xtest, _, ytest=train_test_split(x, y, test_size=0.15)

rnc = RadiusNeighborsClassifier()
print(rnc)

rnc.fit(x, y)

score = rnc.score(x, y)
print("Training score: ", score)

ypred = rnc.predict(xtest)
cm = confusion_matrix(ytest, ypred)
print(cm)

cr = classification_report(ytest, ypred)
print(cr)

auc_y = roc_auc_score(ytest, ypred)
print("ROC AUC y: %.4f" % auc_y)

# Iris dataset example
print("Iris dataset classification")
iris = load_iris()
x, y = iris.data, iris.target
_, xtest, _, ytest = train_test_split(x, y, test_size=0.15)

rnc = RadiusNeighborsClassifier()
print(rnc)

rnc.fit(x, y)
score = rnc.score(x, y)
print("Score: ", score)

ypred = nsvc.predict(xtest)

cm = confusion_matrix(ytest, ypred)
print(cm)

cr = classification_report(ytest, ypred)
print(cr) 
 


References:

No comments:

Post a Comment