K-means Data Clustering in R

   K-means algorithm clusters a dataset into multiple groups. Each group has its center point that is the center point in the whole data in the group. Clustering is a useful technique to learn dataset, do initial observations, and separate it into groups based on their similar features. In R, we use 'kmeans()' function to cluster the dataset with K-means method. It can be simply used as a following:

 kmeans(x, k)
       x - is numeric vector data,
       k - the number of clusters


Please refer the documentation for other options of kmeans() function.

Usage

Let's generate sample data to use.

> df=data.frame(x=sample(1:800,100),y=sample(1:500,100))
> head(df)
    x   y
1 485 448
2 292  37
3  46  67
4 582 293
5 218  63
6 580 196

Then, we cluster our 'df' data into 3 cluster groups.

> df.km=kmeans(df,3)
> df.km           # shows kmeans function results
K-means clustering with 3 clusters of sizes 39, 35, 26

Cluster means:
         x        y
1 654.8205 210.5385
2 191.0286 117.3143
3 313.1923 389.0769

Clustering vector:
  [1] 3 2 2 1 2 1 3 3 1 1 2 1 1 3 2 2 2 2 2 2 2 1 3 2 2 3 1 3 3 2 3 1 3
 [34] 3 1 3 2 2 3 2 2 3 1 1 1 3 1 1 1 3 1 2 1 3 1 1 1 3 3 3 2 2 1 2 1 2
 [67] 2 2 2 2 2 1 2 1 1 2 1 1 1 3 1 3 3 2 2 1 1 2 1 1 3 1 1 1 1 1 2 3 2
[100] 3

Within cluster sum of squares by cluster:
[1] 977327.4 681782.5 565713.9
 (between_SS / total_SS =  70.7 %)

Available components:

[1] "cluster"      "centers"      "totss"        "withinss"    
[5] "tot.withinss" "betweenss"    "size"         "iter"        
[9] "ifault"


Visualizing in graph

Next, we plot clustered df.km data.

> plot(df[c("x","y")],col=df.km$cluster) 

Finally, we add center points of each cluster in a graph.
 
> points(df.km$centers,col=1:3,pch=c(6,7,8),cex=2) 
In this post, we have learned how to use the kmeans function to cluster dataset and visualize it in a plot.
Thank you for reading!

No comments:

Post a Comment