## Pages

### K-means Data Clustering in R

K-means algorithm clusters a dataset into multiple groups. Each group has its center point that is the center point in the whole data in the group. Clustering is a useful technique to learn dataset, do initial observations, and separate it into groups based on their similar features. In R, we use 'kmeans()' function to cluster the dataset with K-means method. It can be simply used as a following:

kmeans(x, k)
x - is numeric vector data,
k - the number of clusters

Please refer the documentation for other options of kmeans() function.

Usage

Let's generate sample data to use.

`> df=data.frame(x=sample(1:800,100),y=sample(1:500,100))`
```> head(df)
x   y
1 485 448
2 292  37
3  46  67
4 582 293
5 218  63
6 580 196```

Then, we cluster our 'df' data into 3 cluster groups.

```> df.km=kmeans(df,3)
> df.km           # shows kmeans function results
K-means clustering with 3 clusters of sizes 39, 35, 26

Cluster means:
x        y
1 654.8205 210.5385
2 191.0286 117.3143
3 313.1923 389.0769

Clustering vector:
 3 2 2 1 2 1 3 3 1 1 2 1 1 3 2 2 2 2 2 2 2 1 3 2 2 3 1 3 3 2 3 1 3
 3 1 3 2 2 3 2 2 3 1 1 1 3 1 1 1 3 1 2 1 3 1 1 1 3 3 3 2 2 1 2 1 2
 2 2 2 2 2 1 2 1 1 2 1 1 1 3 1 3 3 2 2 1 1 2 1 1 3 1 1 1 1 1 2 3 2
 3

Within cluster sum of squares by cluster:
 977327.4 681782.5 565713.9
(between_SS / total_SS =  70.7 %)

Available components:

 "cluster"      "centers"      "totss"        "withinss"
 "tot.withinss" "betweenss"    "size"         "iter"
 "ifault"```

Visualizing in graph

Next, we plot clustered df.km data.

`> plot(df[c("x","y")],col=df.km\$cluster) `

Finally, we add center points of each cluster in a graph.
` `
`> points(df.km\$centers,col=1:3,pch=c(6,7,8),cex=2) `
In this post, we have learned how to use the kmeans function to cluster dataset and visualize it in a plot.