"The local outlier factor is based on a concept of a local density, where locality is given by nearest neighbors, whose distance is used to estimate the density. By comparing the local density of an object to the local densities of its neighbors, one can identify regions of similar density, and points that have a substantially lower density than their neighbors. These are considered to be outliers."

In this tutorial, we'll learn how to detect anomaly in a dataset by using the Local Outlier Factor method in Python. The Scikit-learn API provides the LocalOutlierFactor class for this algorithm and we'll use it in this tutorial. The tutorial covers:

- Preparing the dataset
- Defining the model and prediction
- Anomaly detection with scores
- Source code listing

from sklearn.neighbors import LocalOutlierFactor from sklearn.datasets import make_blobs from numpy import quantile, where, random import matplotlib.pyplot as plt

**Preparing the dataset**

We'll create a random sample dataset for this tutorial by using the make_blob() function.

random.seed(1) x, _ = make_blobs(n_samples=200, centers=1, cluster_std=.3, center_box=(10,10))

We'll check the dataset by visualizing it in a plot.

plt.scatter(x[:,0], x[:,1]) plt.show()

**Defining the model and prediction**

We'll define the model by using the LocalOutlierFactor class of Scikit-learn API. We'll set estimators number and contamination value in arguments. Contamination defines the proportion of outliers in a dataset.

lof = LocalOutlierFactor(n_neighbors=20, contamination=.03)

We'll fit the model with x dataset and get the prediction data with the fit_predict() method.

y_pred = lof.fit_predict(x)

We'll extract the negative outputs as the outliers.

lofs_index = where(y_pred==-1) values = x[lofs_index]

Finally, we'll visualize the results in a plot by highlighting the anomalies with a color.

plt.scatter(x[:,0], x[:,1]) plt.scatter(values[:,0],values[:,1], color='r') plt.show()

**Anomaly detection with scores**

In the second method, we'll define the model without setting the contamination argument.

model = LocalOutlierFactor(n_neighbors=20)

We'll fit the model with x dataset, then extract the samples score.

`model.fit_predict(x)`

lof = model.negative_outlier_factor_

Next, we'll obtain the threshold value from the scores by using the quantile function. Here, we'll get the lowest 3 percent of score values as the anomalies.

thresh = quantile(lof, .03) print(thresh)

-1.8191482960907037

We'll extract the anomalies by comparing the threshold value and identify the values of elements.

index = where(lof<=thresh) values = x[index]

Finally, we can visualize the results in a plot by highlighting the anomalies with a color.

plt.scatter(x[:,0], x[:,1]) plt.scatter(values[:,0], values[:,1], color='r') plt.show()

In both methods above we've got the same result. You can use any of them in your analysis. The threshold or contamination value can be changed to filter out more extreme cases.

In this tutorial, we've learned how to detect the anomalies with the Local Outlier Factor algorithm by using the Scikit-learn API class in Python. The full source code is listed below.

**Source code listing**

from sklearn.neighbors import LocalOutlierFactor from sklearn.datasets import make_blobs from numpy import quantile, where, random import matplotlib.pyplot as plt random.seed(1) x, _ = make_blobs(n_samples=200, centers=1, cluster_std=.3, center_box=(10,10)) plt.scatter(x[:,0], x[:,1]) plt.show() lof = LocalOutlierFactor(n_neighbors=20, contamination=.03)

`print(thresh) `

y_pred = lof.fit_predict(x) lofs_index=where(y_pred==-1) values = x[lofs_index] plt.scatter(x[:,0], x[:,1]) plt.scatter(values[:,0],values[:,1], color='r') plt.show() model = LocalOutlierFactor(n_neighbors=20)

`print(model) `

`model.fit_predict(x) `

lof = model.negative_outlier_factor_ thresh = quantile(lof, .03) print(thresh)

index = where(lof<=thresh) values = x[index] plt.scatter(x[:,0], x[:,1]) plt.scatter(values[:,0],values[:,1], color='r') plt.show()

**References:**

## No comments:

## Post a Comment