## Pages

### Classification Example with Nearest Centroid in Python

The nearest centroid is simple classifier algorithm that represents each class by its centroid value. The algorithm does not accept any parameter to set. The Scikit-learn API provides the NearestCentroid class for this algorithm.

In this tutorial, we'll briefly learn how to classify data by using Scikit-learn's NearestCentroid class in Python. The tutorial covers:
1. Preparing the data
2. Training the model
3. Predicting and accuracy check
4. Iris dataset classification example
5. Source code listing

```from sklearn.svm import NearestCentroid
from sklearn.datasets import load_irisfrom sklearn.datasets import make_classificationfrom sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report```

Preparing the data

First, we'll generate random classification dataset with make_classification() function. The dataset contains 3 classes with 10 features and the number of samples is 5000.

```x, y = make_classification(n_samples=5000, n_features=10,
n_classes=3,
n_clusters_per_class=1)```

Then, we'll split the data into train and test parts. Here, we'll extract 15 percent of it as test data.

```xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=0.15)
```

Training the model

Next, we'll define the classifier by using the NearestCentroid class. Then fit it on the train data.

`nc = NearestCentroid()`
```nc.fit(xtrain, ytrain)
```

After the training the classifier, we'll check the model accuracy score.

```score = nc.score(xtrain, ytrain)
print("Score: ", score)

Score:  0.8296470588235294
```

We can also apply a cross-validation training method to the model and check the training score.

```cv_scores = cross_val_score(nc, xtrain, ytrain, cv=10)
print("CV average score: %.2f" % cv_scores.mean())
```
`CV average score: 0.83`

Predicting and accuracy check

Now, we can predict the test data by using the trained model. After the prediction, we'll check the accuracy level by using the confusion matrix function.

```ypred = nc.predict(xtest)

cm = confusion_matrix(ytest, ypred)
print(cm)

[[212  41   0] [  2 212  45] [  0  44 194]]```

We can also create a classification report by using classification_report() function on predicted data to check the other accuracy metrics.

```cr = classification_report(ytest, ypred)
print(cr)

precision    recall  f1-score   support
0       0.99      0.84      0.91       253           1       0.71      0.82      0.76       259           2       0.81      0.82      0.81       238    accuracy                           0.82       750   macro avg       0.84      0.82      0.83       750weighted avg       0.84      0.82      0.83       750```

Iris dataset classification example

We'll load the Iris dataset with load_iris() function, extract the x and y parts, then split into the train and test parts.

```print("Iris dataset classification with SVC")
x, y = iris.data, iris.targetxtrain, xtest, ytrain, ytest=train_test_split(x, y, test_size=0.15)```

Then, we'll use the same method mentioned above.

```nc = NearestCentroid(verbose=0)
print(nc)

nc.fit(xtrain, ytrain)
score = nc.score(xtrain, ytrain)
print("Score: ", score)

cv_scores = cross_val_score(nc, xtrain, ytrain, cv=10)
print("CV average score: %.2f" % cv_scores.mean())

ypred = nc.predict(xtest)

cm = confusion_matrix(ytest, ypred)
print(cm)

cr = classification_report(ytest, ypred)
print(cr) ```
`Iris dataset classification with SVCNearestCentroid()Score:  0.9212598425196851CV average score: 0.92[[ 6  0  0] [ 0 12  0] [ 0  0  5]]              precision    recall  f1-score   support           0       1.00      1.00      1.00         6           1       1.00      1.00      1.00        12           2       1.00      1.00      1.00         5    accuracy                           1.00        23   macro avg       1.00      1.00      1.00        23weighted avg       1.00      1.00      1.00        23 `

In this tutorial, we've briefly learned how to classify data by using Scikit-learn's NearestCentroid class in Python. The full source code is listed below.

Source code listing

```from sklearn.neighbors import NearestCentroid
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report

x, y = make_classification(n_samples=5000, n_features=10,
n_classes=3,
n_clusters_per_class=1)

xtrain, xtest, ytrain, ytest=train_test_split(x, y, test_size=0.15)

nc = NearestCentroid()
nc.fit(xtrain, ytrain)```
` `
```score = nc.score(xtrain, ytrain)
print("Score: ", score)

cv_scores = cross_val_score(nc, xtrain, ytrain, cv=10)
print("CV average score: %.2f" % cv_scores.mean())

ypred = nc.predict(xtest)

cm = confusion_matrix(ytest, ypred)
print(cm)

cr = classification_report(ytest, ypred)
print(cr)

# Iris dataset classification
print("Iris dataset classification with SVC")
x, y = iris.data, iris.target
xtrain, xtest, ytrain, ytest=train_test_split(x, y, test_size=0.15)

nc = NearestCentroid()
nc.fit(xtrain, ytrain)```
` `
```score = nc.score(xtrain, ytrain)
print("Score: ", score)

cv_scores = cross_val_score(nc, xtrain, ytrain, cv=10)
print("CV average score: %.2f" % cv_scores.mean())

ypred = nc.predict(xtest)

cm = confusion_matrix(ytest, ypred)
print(cm)

cr = classification_report(ytest, ypred)
print(cr) ```

References: