DataTechNotes: Classification Example with Linear SVC in Python

The Linear Support Vector Classifier (SVC) method applies a linear kernel function to perform classification and it performs well with a large number of samples. If we compare it with the SVC model, the Linear SVC has additional parameters such as penalty normalization which applies 'L1' or 'L2' and loss function. The kernel method can not be changed in linear SVC, because it is based on the kernel linear method.

In this tutorial, we'll briefly learn how to classify data by using Scikit-learn's LinearSVC class in Python. The tutorial covers:

Preparing the data
Training the model
Predicting and accuracy check
Iris dataset classification example
Video tutorial
Source code listing

We'll start by loading the required libraries.

from sklearn.svm import LinearSVC
from sklearn.datasets import load_iris
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report

Preparing the data

First, we'll generate random classification dataset with make_classification() function. The dataset contains 3 classes with 10 features and the number of samples is 5000.

x, y = make_classification(n_samples=5000, n_features=10, 
                           n_classes=3, 
                           n_clusters_per_class=1)

Then, we'll split the data into train and test parts. Here, we'll extract 15 percent of it as test data.

xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=0.15)

Training the model

Next, we'll define the classifier by using the LinearSVC class. We can use the default parameters of the class. The parameters can be changed according to classification data content.

lsvc = LinearSVC(verbose=0)
print(lsvc)

LinearSVC(C=1.0, class_weight=None, dual=True, fit_intercept=True,
          intercept_scaling=1, loss='squared_hinge', max_iter=1000,
          multi_class='ovr', penalty='l2', random_state=None, tol=0.0001,
          verbose=0)

Then, we'll fit the model on train data and check the model accuracy score.

lsvc.fit(xtrain, ytrain)
score = lsvc.score(xtrain, ytrain)
print("Score: ", score)

Score:  0.8602352941176471

We can also apply a cross-validation training method to the model and check the training score.

cv_scores = cross_val_score(lsvc, xtrain, ytrain, cv=10)
print("CV average score: %.2f" % cv_scores.mean())

CV average score: 0.86

Predicting and accuracy check

Now, we can predict the test data by using the trained model. After the prediction, we'll check the accuracy level by using the confusion matrix function.

ypred = lsvc.predict(xtest)

cm = confusion_matrix(ytest, ypred)
print(cm)

[[196  46  30]
 [  5 213  10]
 [ 26   7 217]]

We can also create a classification report by using classification_report() function on predicted data to check the other accuracy metrics.

cr = classification_report(ytest, ypred)
print(cr)

              precision    recall  f1-score   support

           0       0.86      0.72      0.79       272
           1       0.80      0.93      0.86       228
           2       0.84      0.87      0.86       250

    accuracy                           0.83       750
   macro avg       0.84      0.84      0.83       750
weighted avg       0.84      0.83      0.83       750

Iris dataset classification example

We'll load the Iris dataset with load_iris() function, extract the x and y parts, then split into the train and test parts.

print("Iris dataset classification with SVC")

iris = load_iris()
x, y = iris.data, iris.target
xtrain, xtest, ytrain, ytest=train_test_split(x, y, test_size=0.15)

Then, we'll use the same method mentioned above.

lsvc = LinearSVC(verbose=0)
print(lsvc)

lsvc.fit(xtrain, ytrain)
score = lsvc.score(xtrain, ytrain)
print("Score: ", score)

cv_scores = cross_val_score(lsvc, xtrain, ytrain, cv=10)
print("CV average score: %.2f" % cv_scores.mean())

ypred = svc.predict(xtest)

cm = confusion_matrix(ytest, ypred)
print(cm)

cr = classification_report(ytest, ypred)
print(cr) 


Iris dataset classification with SVC
LinearSVC(C=1.0, class_weight=None, dual=True, fit_intercept=True,
          intercept_scaling=1, loss='squared_hinge', max_iter=1000,
          multi_class='ovr', penalty='l2', random_state=None, tol=0.0001,
          verbose=0)
Score:  0.9763779527559056
CV average score: 0.95
[[7 0 0]
 [0 7 0]
 [0 1 8]]
              precision    recall  f1-score   support

           0       1.00      1.00      1.00         7
           1       0.88      1.00      0.93         7
           2       1.00      0.89      0.94         9

    accuracy                           0.96        23
   macro avg       0.96      0.96      0.96        23
weighted avg       0.96      0.96      0.96        23

In this tutorial, we've briefly learned how to classify data by using Scikit-learn's LinearSVC class in Python. The full source code is listed below.

Video tutorial

https://youtu.be/t3-0PU5LvNk

Source code listing

from sklearn.svm import LinearSVC
from sklearn.datasets import load_iris
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report

x, y = make_classification(n_samples=5000, n_features=10, 
                           n_classes=3, 
                           n_clusters_per_class=1)

xtrain, xtest, ytrain, ytest=train_test_split(x, y, test_size=0.15)

lsvc = LinearSVC()
print(lsvc)

lsvc.fit(xtrain, ytrain)
score = lsvc.score(xtrain, ytrain)
print("Score: ", score)

cv_scores = cross_val_score(lsvc, xtrain, ytrain, cv=10)
print("CV average score: %.2f" % cv_scores.mean())

ypred = lsvc.predict(xtest)

cm = confusion_matrix(ytest, ypred)
print(cm)

cr = classification_report(ytest, ypred)
print(cr) 


# Iris dataset classification
print("Iris dataset classification with SVC")
iris = load_iris()
x, y = iris.data, iris.target
xtrain, xtest, ytrain, ytest=train_test_split(x, y, test_size=0.15)

lsvc = LinearSVC()
print(lsvc)

lsvc.fit(xtrain, ytrain)
score = lsvc.score(xtrain, ytrain)
print("Score: ", score)

cv_scores = cross_val_score(lsvc, xtrain, ytrain, cv=10)
print("CV average score: %.2f" % cv_scores.mean())

ypred = lsvc.predict(xtest)

cm = confusion_matrix(ytest, ypred)
print(cm)

cr = classification_report(ytest, ypred)
print(cr)

References:

Scikit learn API

DataTechNotes

Pages

Classification Example with Linear SVC in Python

1 comment: