Z-score can be calculated with below formula,

###
** z = ( x - μ ) / σ **

where,

x - x vector (elements of x vector)

μ - mean value of x vector

σ - standard deviation of x vector

The normal distribution curve can easily explain a z-score. Z-score values are located around the curve below. Zero is a mean center value. The highest and lowest values can be found in the right and left most parts of the curve.

Let's generate some sample data and get its z-scores.

`set.seed(123)`

`x = sample(1:50, 100, replace=T)`

Getting z-scores with a formula.

```
m = mean(x)
s = sd(x)
```

`zs = (x - m)/s`

```
summary(x)
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.00 16.00 25.50 26.21 36.25 50.00
```

```
summary(zs)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-1.91447 -0.77536 -0.05392 0.00000 0.76245 1.80663
```

As summary shows, x vector centered into 0 mean value. In 'zs', the value of x vector's 1 is equal to -1.91, and 50 to 1.8 sigma value.

In R, we can use scale() command to get z-scores.

```
scale(x)
[,1]
[1,] 0.28781591
[2,] -0.69941543
[3,] -0.09188846
........
[98,] 0.51563852
[99,] -1.38288328
[100,] 0.21187503
attr(,"scaled:center")
[1] 26.21
attr(,"scaled:scale")
[1] 13.16814
```

We need the first part of a scale function result.

sc_zs = scale(x)[,1] summary(sc_zs) Min. 1st Qu. Median Mean 3rd Qu. Max. -1.91447 -0.77536 -0.05392 0.00000 0.76245 1.80663

A summary shows that the result is the same as the one that taken with a formula.

The scale function is often used to clean up data like removing the mean value of a vector.

## No comments:

## Post a Comment