Spectral Clustering Example in Python

    Spectral clustering is a popular technique in machine learning and data analysis for clustering data points based on the relationships or similarities between them. It apples the spectrum of a similarity matrix to partition the data into clusters. Spectral clustering can be particularly useful for data that doesn't have a clear linear separation.  

    The Scikit-learn API provides SpectralClustering class to implement spectral clustering method in Python. The SpectralClustering class applies the clustering to a projection of the normalized Laplacian. In this tutorial, we'll briefly learn how to cluster data with SpectralClustering class in Python. The tutorial covers:

  1. Preparing the data
  2. Clustering with the SpectralClustering
  3. Source code listing


We'll start by importing the required libraries and functions.

 
from sklearn.cluster import SpectralClustering
from sklearn.datasets.samples_generator import make_blobs
import matplotlib.pyplot as plt
from numpy import random
  

Preparing the data

    We'll prepare a target data for this tutorial by generating a simple dataset using the make_blob() function and visualize it in a plot.
 
 
random.seed(1)
x, _ = make_blobs(n_samples=400, centers=4, cluster_std=1.5)
 
plt.scatter(x[:,0], x[:,1])
plt.show()
 

Our task is to cluster the above data by using spectral cluster method.
 
 
Clustering with the SpectralClustering

    We'll define model by using SpectralClustering class then we'll fit it on x data.The SpectralClustering requires the number of clusters so w'll set 4 to n_cluster parameter. You can check the parameters the class and change them according to your analysis and target data.
 
 
sc = SpectralClustering(n_clusters=4).fit(x)
print(sc)
 
SSpectralClustering(affinity='rbf', assign_labels='kmeans', coef0=1, degree=3,
eigen_solver=None, eigen_tol=0.0, gamma=1.0,
kernel_params=None, n_clusters=4, n_components=None,
n_init=10, n_jobs=None, n_neighbors=10, random_state=None) 
   

Next, we'll visualize the clustered data in a plot. To highlight the clusters by a color, we'll extract label data from the fitted model.

 
labels = sc.labels_

plt.scatter(x[:,0], x[:,1], c=labels)
plt.show()  
 
We can also check the clustering the result by changing the number of clusters.

 
f = plt.figure()
f.add_subplot(2, 2, 1)
for i in range(2, 6):
 sc = SpectralClustering(n_clusters=i).fit(x)
 f.add_subplot(2, 2, i-1)
 plt.scatter(x[:,0], x[:,1], s=5, c=sc.labels_, label="n_cluster-"+str(i))
 plt.legend()

plt.show() 
 
    In this tutorial, we've briefly learned how to how to cluster and visualize the data by using the SpectralClustering class in Python. The full source code is listed below.
 
 
 Source code listing

 
from sklearn.cluster import SpectralClustering
from sklearn.datasets.samples_generator import make_blobs
import matplotlib.pyplot as plt
from numpy import random

random.seed(1)
x, _ = make_blobs(n_samples=400, centers=4, cluster_std=1.5)
plt.scatter(x[:,0], x[:,1])
plt.show()

sc = SpectralClustering(n_clusters=4).fit(x)
print(sc)
 
labels = sc.labels_
plt.scatter(x[:,0], x[:,1], c=labels)
plt.show()

f = plt.figure()
f.add_subplot(2, 2, 1)
for i in range(2, 6):
 sc = SpectralClustering(n_clusters=i).fit(x)
 f.add_subplot(2, 2, i-1)
 plt.scatter(x[:,0], x[:,1], s=5, c=sc.labels_, label="n_cluster-"+str(i))
 plt.legend()

plt.show() 
  


References:

No comments:

Post a Comment