DataTechNotes: Anomaly Detection Example with One-Class SVM in Python

A One-class classification method is used to detect the outliers and anomalies in a dataset. Based on Support Vector Machines (SVM) evaluation, the One-class SVM applies a One-class classification method for novelty detection.
In this tutorial, we'll briefly learn how to detect anomaly in a dataset by using the One-class SVM method in Python. The Scikit-learn API provides the OneClassSVM class for this algorithm and we'll use it in this tutorial. The tutorial covers:

Preparing the data
Defining the model and prediction
Anomaly detection with scores
Source code listing

If you want to know other anomaly detection methods, please check out my A Brief Explanation of 8 Anomaly Detection Methods with Python tutorial.

We'll start by loading the required libraries for this tutorial.

from sklearn.svm import OneClassSVM
from sklearn.datasets import make_blobs
from numpy import quantile, where, random
import matplotlib.pyplot as plt

Preparing the data

We'll create a random sample dataset for this tutorial by using the make_blob() function. We'll check the dataset by visualizing it in a plot.

random.seed(13)
x, _ = make_blobs(n_samples=200, centers=1, cluster_std=.3, center_box=(8, 8))

plt.scatter(x[:,0], x[:,1])
plt.show()

Defining the model and prediction

We'll define the model by using the OneClassSVM class of Scikit-learn API. Here, we'll set RBF for kernel type and define the gamma and the 'nu' arguments.

svm = OneClassSVM(kernel='rbf', gamma=0.001, nu=0.03)
print(svm)

OneClassSVM(cache_size=200, coef0=0.0, degree=3, gamma=0.001, kernel='rbf',
            max_iter=-1, nu=0.03, shrinking=True, tol=0.001, verbose=False)

We'll fit the model with x dataset and get the prediction data by using the fit() and predict() method.

svm.fit(x)
pred = svm.predict(x)

Next, we'll extract the negative outputs as the outliers.

anom_index = where(pred==-1)
values = x[anom_index]

Finally, we'll visualize the results in a plot by highlighting the anomalies with a color.

plt.scatter(x[:,0], x[:,1])
plt.scatter(values[:,0], values[:,1], color='r')
plt.show()

Anomaly detection with scores

We can find anomalies by using their scores. In this method, we'll define the model, fit it on the x data by using the fit_predict() method. We'll calculate the outliers according to the score value of each element.

svm = OneClassSVM(kernel='rbf', gamma=0.001, nu=0.02)
print(svm)

Next, we'll fit the model on x dataset, then extract the samples score.

pred = svm.fit_predict(x)
scores = svm.score_samples(x)

Next, we'll obtain the threshold value from the scores by using the quantile function. Here, we'll get the lowest 3 percent of score values as the anomalies.

thresh = quantile(scores, 0.03)
print(thresh)

3.994389673293594

Next, we'll extract the anomalies by comparing the threshold value and identify the values of elements.

index = where(scores<=thresh)
values = x[index]

Finally, we can visualize the results in a plot by highlighting the anomalies with a color.

plt.scatter(x[:,0], x[:,1])
plt.scatter(values[:,0], values[:,1], color='r')
plt.show()

In this tutorial, we've learned how to detect the anomalies with the One-class SVM method by using the Scikit-learn's OneClassSVM class in Python. We've seen two types of outlier detection methods with OneClassSVM. The full source code is listed below.

Source code listing

from sklearn.svm import OneClassSVM
from sklearn.datasets import make_blobs
from numpy import quantile, where, random
import matplotlib.pyplot as plt

random.seed(13)
x, _ = make_blobs(n_samples=200, centers=1, cluster_std=.3, center_box=(8, 8))

plt.scatter(x[:,0], x[:,1])
plt.show()

svm = OneClassSVM(kernel='rbf', gamma=0.001, nu=0.03)
print(svm)

svm.fit(x)
pred = svm.predict(x)
anom_index = where(pred==-1)
values = x[anom_index]

plt.scatter(x[:,0], x[:,1])
plt.scatter(values[:,0], values[:,1], color='r')
plt.show()

svm = OneClassSVM(kernel='rbf', gamma=0.001, nu=0.02)
print(svm)

pred = svm.fit_predict(x)
scores = svm.score_samples(x)

thresh = quantile(scores, 0.03)
print(thresh)
index = where(scores<=thresh)
values = x[index]

plt.scatter(x[:,0], x[:,1])
plt.scatter(values[:,0], values[:,1], color='r')
plt.show()

References:

Scikit-learn One-class SVM

DataTechNotes

Pages

Anomaly Detection Example with One-Class SVM in Python

No comments:

Post a Comment