DataTechNotes: Spectral Clustering Example in Python

Spectral clustering is a popular technique in machine learning and data analysis for clustering data points based on the relationships or similarities between them. It apples the spectrum of a similarity matrix to partition the data into clusters. Spectral clustering can be particularly useful for data that doesn't have a clear linear separation.

The Scikit-learn API provides SpectralClustering class to implement spectral clustering method in Python. The SpectralClustering class applies the clustering to a projection of the normalized Laplacian. In this tutorial, we'll briefly learn how to cluster data with SpectralClustering class in Python. The tutorial covers:

Preparing the data
Clustering with the SpectralClustering
Source code listing

We'll start by importing the required libraries and functions.

from sklearn.cluster import SpectralClustering
from sklearn.datasets.samples_generator import make_blobs
import matplotlib.pyplot as plt
from numpy import random

Preparing the data

We'll prepare a target data for this tutorial by generating a simple dataset using the make_blob() function and visualize it in a plot.

random.seed(1)
x, _ = make_blobs(n_samples=400, centers=4, cluster_std=1.5)

plt.scatter(x[:,0], x[:,1])
plt.show()

Our task is to cluster the above data by using spectral cluster method.

Clustering with the SpectralClustering

We'll define model by using SpectralClustering class then we'll fit it on x data.The SpectralClustering requires the number of clusters so w'll set 4 to n_cluster parameter. You can check the parameters the class and change them according to your analysis and target data.

sc = SpectralClustering(n_clusters=4).fit(x)
print(sc)

SSpectralClustering(affinity='rbf', assign_labels='kmeans', coef0=1, degree=3,
                   eigen_solver=None, eigen_tol=0.0, gamma=1.0,
                   kernel_params=None, n_clusters=4, n_components=None,
                   n_init=10, n_jobs=None, n_neighbors=10, random_state=None)

Next, we'll visualize the clustered data in a plot. To highlight the clusters by a color, we'll extract label data from the fitted model.

labels = sc.labels_

plt.scatter(x[:,0], x[:,1], c=labels)
plt.show()

We can also check the clustering the result by changing the number of clusters.

f = plt.figure()
f.add_subplot(2, 2, 1)
for i in range(2, 6):
 sc = SpectralClustering(n_clusters=i).fit(x)
 f.add_subplot(2, 2, i-1)
 plt.scatter(x[:,0], x[:,1], s=5, c=sc.labels_, label="n_cluster-"+str(i))
 plt.legend()

plt.show()

In this tutorial, we've briefly learned how to how to cluster and visualize the data by using the SpectralClustering class in Python. The full source code is listed below.

Source code listing

from sklearn.cluster import SpectralClustering
from sklearn.datasets.samples_generator import make_blobs
import matplotlib.pyplot as plt
from numpy import random

random.seed(1)
x, _ = make_blobs(n_samples=400, centers=4, cluster_std=1.5)
plt.scatter(x[:,0], x[:,1])
plt.show()

sc = SpectralClustering(n_clusters=4).fit(x)
print(sc)

labels = sc.labels_
plt.scatter(x[:,0], x[:,1], c=labels)
plt.show()

f = plt.figure()
f.add_subplot(2, 2, 1)
for i in range(2, 6):
 sc = SpectralClustering(n_clusters=i).fit(x)
 f.add_subplot(2, 2, i-1)
 plt.scatter(x[:,0], x[:,1], s=5, c=sc.labels_, label="n_cluster-"+str(i))
 plt.legend()

plt.show()

References:

Scikit-learn SpectralClustering

DataTechNotes

Pages

Spectral Clustering Example in Python

No comments:

Post a Comment