In this tutorial, we'll briefly learn how to detect anomaly in a dataset by using the One-class SVM method in Python. The Scikit-learn API provides the OneClassSVM class for this algorithm and we'll use it in this tutorial. The tutorial covers:

- Preparing the data
- Defining the model and prediction
- Anomaly detection with scores
- Source code listing

from sklearn.svm import OneClassSVM from sklearn.datasets import make_blobs from numpy import quantile, where, random import matplotlib.pyplot as plt

**Preparing the data**

We'll create a random sample dataset for this tutorial by using the make_blob() function. We'll check the dataset by visualizing it in a plot.

random.seed(13) x, _ = make_blobs(n_samples=200, centers=1, cluster_std=.3, center_box=(8, 8)) plt.scatter(x[:,0], x[:,1]) plt.show()

**Defining the model and prediction**

We'll define the model by using the OneClassSVM class of Scikit-learn API. Here, we'll set RBF for kernel type and define the gamma and the 'nu' arguments.

svm = OneClassSVM(kernel='rbf', gamma=0.001, nu=0.03) print(svm)

OneClassSVM(cache_size=200, coef0=0.0, degree=3, gamma=0.001, kernel='rbf', max_iter=-1, nu=0.03, shrinking=True, tol=0.001, verbose=False)

We'll fit the model with x dataset and get the prediction data by using the fit() and predict() method.

svm.fit(x) pred = svm.predict(x)

Next, we'll extract the negative outputs as the outliers.

anom_index = where(pred==-1) values = x[anom_index]

Finally, we'll visualize the results in a plot by highlighting the anomalies with a color.

plt.scatter(x[:,0], x[:,1]) plt.scatter(values[:,0], values[:,1], color='r') plt.show()

**Anomaly detection with scores**

We can find anomalies by using their scores. In this method, we'll define the model, fit it on the x data by using the fit_predict() method. We'll calculate the outliers according to the score value of each element.

svm = OneClassSVM(kernel='rbf', gamma=0.001, nu=0.02) print(svm)

Next, we'll fit the model on x dataset, then extract the samples score.

pred = svm.fit_predict(x) scores = svm.score_samples(x)

Next, we'll obtain the threshold value from the scores by using the quantile function. Here, we'll get the lowest 3 percent of score values as the anomalies.

thresh = quantile(scores, 0.03) print(thresh)

3.994389673293594

Next, we'll extract the anomalies by comparing the threshold value and identify the values of elements.

index = where(scores<=thresh) values = x[index]

Finally, we can visualize the results in a plot by highlighting the anomalies with a color.

plt.scatter(x[:,0], x[:,1]) plt.scatter(values[:,0], values[:,1], color='r') plt.show()

In this tutorial, we've learned how to detect the anomalies with the One-class SVM method by using the Scikit-learn's OneClassSVM class in Python. We've seen two types of outlier detection methods with OneClassSVM. The full source code is listed below.

**Source code listing**

from sklearn.svm import OneClassSVM from sklearn.datasets import make_blobs from numpy import quantile, where, random import matplotlib.pyplot as plt random.seed(13) x, _ = make_blobs(n_samples=200, centers=1, cluster_std=.3, center_box=(8, 8)) plt.scatter(x[:,0], x[:,1]) plt.show() svm = OneClassSVM(kernel='rbf', gamma=0.001, nu=0.03) print(svm) svm.fit(x) pred = svm.predict(x) anom_index = where(pred==-1) values = x[anom_index] plt.scatter(x[:,0], x[:,1]) plt.scatter(values[:,0], values[:,1], color='r') plt.show() svm = OneClassSVM(kernel='rbf', gamma=0.001, nu=0.02) print(svm) pred = svm.fit_predict(x) scores = svm.score_samples(x) thresh = quantile(scores, 0.03) print(thresh) index = where(scores<=thresh) values = x[index] plt.scatter(x[:,0], x[:,1]) plt.scatter(values[:,0], values[:,1], color='r') plt.show()

**References:**

## No comments:

## Post a Comment