## Pages

### Anomaly Detection Example With OPTICS Method in Python

Ordering Points To Identify the Clustering Structure (OPTICS) is an algorithm that estimates density-based clustering structure of a given data. It applies the clustering method similar to DBSCAN algorithm.

In this tutorial, we'll learn how to apply OPTICS method to detect anomalies in given data. Here, we use OPTIC class of Scikit-learn API. The tutorial covers:

1. Preparing the data
2. Anomaly detection with OPTICS
3. Source code listing

If you want to know other anomaly detection methods, please check out my tutorial.

We'll start by loading the required libraries and functions for this tutorial.

```from sklearn.cluster import OPTICS
from sklearn.datasets import make_blobs
from numpy import quantile, where, random
import matplotlib.pyplot as plt
```

Preparing the data

We'll generate simple data data for this tutorial by using the make_blob() function and visualize it in a plot.

```random.seed(123)
x, _ = make_blobs(n_samples=350, centers=1, cluster_std=.4, center_box=(20, 5))

plt.scatter(x[:,0], x[:,1])
plt.grid(True)
plt.show() ```
Anomaly detection with OPTICS

We'll define the model by using OPTICS class with its default parameters then we'll fit it on x data. You can check the parameters of the class and change them according to your analysis and target data.

```model = OPTICS().fit(x)
print(model)```
` `
`OPTICS(algorithm='auto', cluster_method='xi', eps=None, leaf_size=30,       max_eps=inf, metric='minkowski', metric_params=None,       min_cluster_size=None, min_samples=5, n_jobs=None, p=2,       predecessor_correction=True, xi=0.05) `

Next, we'll obtain the scores of each sample of x data by using core_distance_ property of the model.

`scores = model.core_distances_ `
` `

Then, we'll extract the threshold value from the scores data by using quantile() function. You can set your target percentage to quantile, in this example we'll set 98% data as normal and remaining part of data the data becomes an outlier.

```thresh = quantile(scores, .98)
print(thresh) ```
` 0.35064484877392416 `
```
```

By using threshold value, we'll find the samples with the scores that are equal to or higher than the threshold value.

```index = where(scores >= thresh)
values = x[index]```
`print(values)`
` `
`[[ 9.45071447 14.58847433] [ 8.500387   16.2113985 ] [ 9.56481939 16.89136015] [ 9.63176979 14.41548797] [ 8.43771706 15.07302741] [10.33672675 14.89789167] [10.43533425 16.58262441]] `

Finally, we'll visualize the results in a plot by highlighting the anomalies with a color.

```plt.scatter(x[:,0], x[:,1])
plt.scatter(values[:,0],values[:,1], color='r')
plt.grid(True)
plt.show()  ```
` `

In this tutorial, we've briefly learned how to detect the anomalies by using the OPTICS method by using the Scikit-learn's OPTICS class in Python. The full source code is listed below.

Source code listing

```from sklearn.cluster import OPTICS
from sklearn.datasets import make_blobs
from numpy import quantile, where, random
import matplotlib.pyplot as plt

random.seed(123)
x, _ = make_blobs(n_samples=350, centers=1, cluster_std=.4, center_box=(20, 5))

plt.scatter(x[:,0], x[:,1])
plt.grid(True)
plt.show()

model = OPTICS().fit(x)
print(model)

scores = model.core_distances_

thresh = quantile(scores, .98)
print(thresh)

index = where(scores >= thresh)
values = x[index]
print(values)

plt.scatter(x[:,0], x[:,1])
plt.scatter(values[:,0],values[:,1], color='r')
`  `