Spectral clustering is a technique to apply the spectrum of the similarity matrix of the data in dimensionality reduction. It is useful and easy to implement clustering method.
The Scikit-learn API provides SpectralClustering class to implement spectral clustering method in Python. The SpectralClustering applies the clustering to a projection of the normalized Laplacian. In this tutorial, we'll briefly learn how to cluster and visualize data with SpectralClustering in Python. The tutorial covers:
- Preparing the data
- Clustering with the SpectralClustering and visualizing
- Source code listing
We'll start by importing the required libraries and functions.
from sklearn.cluster import SpectralClustering
from sklearn.datasets.samples_generator import make_blobs
import matplotlib.pyplot as plt
from numpy import random
Preparing the data
We'll prepare a target data for this tutorial by generating a simple dataset using the make_blob() function and visualize it in a plot.
random.seed(1)
x, _ = make_blobs(n_samples=400, centers=4, cluster_std=1.5)
plt.scatter(x[:,0], x[:,1])
plt.show()
It is an easy to understand data so we'll cluster it with spectral cluster method.
Clustering with the SpectralClustering and visualizing
We'll define model by using SpectralClustering class then we'll fit it on x data.The SpectralClustering requires the number of clusters so w'll set 4 to n_cluster parameter. You can check the parameters the class and change them according to your analysis and target data.
sc = SpectralClustering(n_clusters=4).fit(x)
print(sc)
SSpectralClustering(affinity='rbf', assign_labels='kmeans', coef0=1, degree=3,
eigen_solver=None, eigen_tol=0.0, gamma=1.0,
kernel_params=None, n_clusters=4, n_components=None,
n_init=10, n_jobs=None, n_neighbors=10, random_state=None)
Next, we'll visualize the clustered data in a plot. To separate the clusters by a color, we'll extract label data from the fitted model.
We can also check the clustering the result by changing the number of clusters.
f = plt.figure()
f.add_subplot(2, 2, 1)
for i in range(2, 6):
sc = SpectralClustering(n_clusters=i).fit(x)
f.add_subplot(2, 2, i-1)
plt.scatter(x[:,0], x[:,1], s=5, c=sc.labels_, label="n_cluster-"+str(i))
plt.legend()
plt.show()
In this tutorial, we've briefly learned how to how to cluster and visualize the data by using the SpectralClustering class in Python. The full source code is listed below.
Source code listing
from sklearn.cluster import SpectralClustering
from sklearn.datasets.samples_generator import make_blobs
import matplotlib.pyplot as plt
from numpy import random
random.seed(1)
x, _ = make_blobs(n_samples=400, centers=4, cluster_std=1.5)
plt.scatter(x[:,0], x[:,1])
plt.show()
sc = SpectralClustering(n_clusters=4).fit(x)
print(sc)
labels = sc.labels_
plt.scatter(x[:,0], x[:,1], c=labels)
plt.show()
f = plt.figure()
f.add_subplot(2, 2, 1)
for i in range(2, 6):
sc = SpectralClustering(n_clusters=i).fit(x)
f.add_subplot(2, 2, i-1)
plt.scatter(x[:,0], x[:,1], s=5, c=sc.labels_, label="n_cluster-"+str(i))
plt.legend()
plt.show()
References:
No comments:
Post a Comment