## Pages

### Anomaly Detection Example with DBSCAN in Python

The DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is a density-based clustering algorithm. The main principle of this algorithm is that it finds core samples in a dense area and groups the samples around those core samples to create clusters. The samples in a low-density area become the outliers. We'll focus on finding out those outliers in this tutorial.

The Scikit-learn API provides the DBSCAN class for this algorithm and we'll use it in this tutorial. The tutorial covers:
1. Preparing the dataset
2. Defining the model and anomaly detection
3. Source code listing

If you want to know other anomaly detection methods, please check out my tutorial.

```from sklearn.cluster import DBSCAN
from sklearn.datasets import make_blobs
from numpy import random, where
import matplotlib.pyplot as plt```

Preparing the dataset

We'll create a random sample dataset for this tutorial by using the make_blob() function.

```random.seed(7)
x, _ = make_blobs(n_samples=200, centers=1, cluster_std=.3, center_box=(20, 5))
```

We'll check the dataset by visualizing it in a plot.

```plt.scatter(x[:,0], x[:,1])
plt.show()```

Defining the model and anomaly detection

We'll define the model by using the DBSCAN class of Scikit-learn API. We'll define the 'eps' and 'min_sample' in the arguments of the class. The argument 'eps' is the distance between two samples to be considered as a neighborhood and 'min_samples' is the number of samples in a neighborhood.

```dbscan = DBSCAN(eps = 0.28, min_samples = 20)
print(dbscan) ```
` `
```DBSCAN(algorithm='auto', eps=0.28, leaf_size=30, metric='euclidean',
metric_params=None, min_samples=20, n_jobs=None, p=None)```

We'll fit the model with x dataset and get the prediction data with the fit_predict() method.

`pred = elenv.fit_predict(x)`
` `
Next, we'll extract the negative outputs as the outliers.

```anom_index = where(pred == -1)
values = x[anom_index]```

Finally, we'll visualize the results in a plot by highlighting the anomalies with a color.

```plt.scatter(x[:,0], x[:,1])
plt.scatter(values[:,0], values[:,1], color='r')
plt.show()```

In this tutorial, we've learned how to detect the anomalies with the DBSCAN method by using the Scikit-learn's DBSCAN class in Python. The full source code is listed below.

We've been learned several methods of anomaly detection by using different methods with Python and R in previous tutorials. Please check this blog to learn more about them.

Source code listing

` `
```from sklearn.cluster import DBSCAN
from sklearn.datasets import make_blobs
from numpy import random, where
import matplotlib.pyplot as plt

random.seed(7)
x, _ = make_blobs(n_samples=200, centers=1, cluster_std=.3, center_box=(20, 5))

plt.scatter(x[:,0], x[:,1])
plt.show()

dbscan = DBSCAN(eps = 0.28, min_samples = 20)
print(dbscan)

pred = dbscan.fit_predict(x)
anom_index = where(pred == -1)
values = x[anom_index]

plt.scatter(x[:,0], x[:,1])
plt.scatter(values[:,0], values[:,1], color='r')
plt.show()```
` `

References:

#### 1 comment:

1. thanks for post
www.softscients.com