Spectral Clustering Example in Python

    Spectral clustering is a technique to apply the spectrum of the similarity matrix of the data in dimensionality reduction. It is useful and easy to implement clustering method.  

    The Scikit-learn API provides SpectralClustering class to implement spectral clustering method in Python. The SpectralClustering applies the clustering to a projection of the normalized Laplacian. In this tutorial, we'll briefly learn how to cluster and visualize data with SpectralClustering in Python. The tutorial covers:

  1. Preparing the data
  2. Clustering with the SpectralClustering and visualizing
  3. Source code listing

We'll start by importing the required libraries and functions.

from sklearn.cluster import SpectralClustering
from sklearn.datasets.samples_generator import make_blobs
import matplotlib.pyplot as plt
from numpy import random
  

Preparing the data

    We'll prepare a target data for this tutorial by generating a simple dataset using the make_blob() function and visualize it in a plot.
 
random.seed(1)
x, _ = make_blobs(n_samples=400, centers=4, cluster_std=1.5)
 
plt.scatter(x[:,0], x[:,1])
plt.show()
 

It is an easy to understand data so we'll cluster it with spectral cluster method.
 
 
Clustering with the SpectralClustering and visualizing

    We'll define model by using SpectralClustering class then we'll fit it on x data.The SpectralClustering requires the number of clusters so w'll set 4 to n_cluster parameter. You can check the parameters the class and change them according to your analysis and target data.
 
sc = SpectralClustering(n_clusters=4).fit(x)
print(sc)
 
SSpectralClustering(affinity='rbf', assign_labels='kmeans', coef0=1, degree=3,
eigen_solver=None, eigen_tol=0.0, gamma=1.0,
kernel_params=None, n_clusters=4, n_components=None,
n_init=10, n_jobs=None, n_neighbors=10, random_state=None) 
   

Next, we'll visualize the clustered data in a plot. To separate the clusters by a color, we'll extract label data from the fitted model.

labels = sc.labels_

plt.scatter(x[:,0], x[:,1], c=labels)
plt.show()  
 
We can also check the clustering the result by changing the number of clusters.

f = plt.figure()
f.add_subplot(2, 2, 1)
for i in range(2, 6):
 sc = SpectralClustering(n_clusters=i).fit(x)
 f.add_subplot(2, 2, i-1)
 plt.scatter(x[:,0], x[:,1], s=5, c=sc.labels_, label="n_cluster-"+str(i))
 plt.legend()

plt.show() 
 
    In this tutorial, we've briefly learned how to how to cluster and visualize the data by using the SpectralClustering class in Python. The full source code is listed below.
 
 
 Source code listing

from sklearn.cluster import SpectralClustering
from sklearn.datasets.samples_generator import make_blobs
import matplotlib.pyplot as plt
from numpy import random

random.seed(1)
x, _ = make_blobs(n_samples=400, centers=4, cluster_std=1.5)
plt.scatter(x[:,0], x[:,1])
plt.show()

sc = SpectralClustering(n_clusters=4).fit(x)
print(sc)
 
labels = sc.labels_
plt.scatter(x[:,0], x[:,1], c=labels)
plt.show()

f = plt.figure()
f.add_subplot(2, 2, 1)
for i in range(2, 6):
 sc = SpectralClustering(n_clusters=i).fit(x)
 f.add_subplot(2, 2, i-1)
 plt.scatter(x[:,0], x[:,1], s=5, c=sc.labels_, label="n_cluster-"+str(i))
 plt.legend()

plt.show() 
  


References:

No comments:

Post a Comment