A hierarchical type of clustering applies either "top-down" or "bottom-up" method for clustering observation data. Agglomerative is a hierarchical clustering method that applies the "bottom-up" approach to group the elements in a dataset. In this method, each element starts its own cluster and progressively merges with other clusters according to certain criteria.
A scikit-learn provides the AgglomerativeClustering class to implement the agglomerative clustering method. In this tutorial, we'll learn how to cluster data with the AgglomerativeClustering method in Python. The tutorial covers:
- Preparing the data
- Clustering with the AgglomerativeClustering
- Source code listing
from sklearn.cluster import AgglomerativeClustering from sklearn.datasets.samples_generator import make_blobs import matplotlib.pyplot as plt import numpy as np
Preparing the data
We'll create a sample dataset to implement clustering in this tutorial. We'll use make_blob function to generate data and visualize it in a plot.
np.random.seed(1) x, _ = make_blobs(n_samples=300, centers=5, cluster_std=.8) plt.scatter(x[:,0], x[:,1]) plt.show()
Clustering with the AgglomerativeClustering
Next, we'll define the model and fit it on x data. A scikit-learn provides an AgglomerativeClustering class to implement the agglomerative clustering algorithm. It has several parameters to set. The linkage parameter defines the merging criteria that the distance method between the sets of the observation data. The "ward", "complete", "average", and "single" methods can be used. Affinity parameter defines the distance metric to compute the linkage. The number of clusters can be set with the n_clusters parameter.
Here, we'll set n_clusters number and keep the other parameters as default.
aggloclust=AgglomerativeClustering(n_clusters=5).fit(x) print(aggloclust)
AgglomerativeClustering(affinity='euclidean', compute_full_tree='auto', connectivity=None, linkage='ward', memory=None, n_clusters=5, pooling_func=)We'll get the clustered labels
labels = aggloclust.labels_
Finally, we'll visualize the clustered points by separating them with different colors.
plt.scatter(x[:,0], x[:,1], c=labels) plt.show()
We can also check the clustering results by changing the number of clusters.
f = plt.figure() f.add_subplot(2, 2, 1) for i in range(2, 6): aggloclust=AgglomerativeClustering(n_clusters=i).fit(x) f.add_subplot(2, 2, i-1) plt.scatter(x[:,0], x[:,1], s=5, c=aggloclust.labels_, label="n_cluster-"+str(i)) plt.legend() plt.show()
In this tutorial, we've briefly learned how to cluster data with the Agglomerative clustering method in Python. The model is fast and it provides better results in clustering. The source code is listed below.
Source code listing
from sklearn.cluster import AgglomerativeClustering from sklearn.datasets.samples_generator import make_blobs import matplotlib.pyplot as plt import numpy as np np.random.seed(1) x, _ = make_blobs(n_samples=300, centers=5, cluster_std=.8) plt.scatter(x[:,0], x[:,1]) plt.show() aggloclust=AgglomerativeClustering(n_clusters=5).fit(x) print(aggloclust) labels = aggloclust.labels_ plt.scatter(x[:,0], x[:,1], c=labels) plt.show() f = plt.figure() f.add_subplot(2, 2, 1) for i in range(2, 6): aggloclust=AgglomerativeClustering(n_clusters=i).fit(x) f.add_subplot(2, 2, i-1) plt.scatter(x[:,0], x[:,1], s=5, c=aggloclust.labels_, label="n_cluster-"+str(i)) plt.legend() plt.show()
No comments:
Post a Comment