First, we'll generate sample vector data for this tutorial. We can create it with the sample() command in R.

> set.seed(1234) # reproduces the same result

> x <- sample(-50:50, 100, replace = T) > x[sample(1:100, 80)] = sample(-20:20,80, replace = T)

To check the content of the x vector, we'll visualize it in a plot.

> plot(x, type = "l", col = "blue")

Before going to the standard deviation, we need to understand the mean value of giving vector data.

*(μ) is a central value of elements in a numerical set. We can get the mean value of an x vector with the mean() command in R.*

**Mean**> mean(x) [1] -0.93

*is a measurement value of variations (differences) of the elements from the mean value of a set. It can be represented by σ letter, std, or SD. To get σ value, we'll use the sd() command in R.*

**Standard deviation**> sd(x) [1] 16.07951

*is the value of squared deviation from the mean value of a set. Variance can be taken with the below commands.*

**Variance**> var(x) [1] 258.5506 > sd(x)^2 [1] 258.5506

#### 68-95-99.7 rule

The percentage of values located in a range of 2σ, 4σ, and 6σ will be 68%, 95%, and 99.7% respectively. The 68-95-99.7 rule is based on those values and its name comes from those percentage values. It explains the distribution of sample data in the range of 2, 4 and 6 sigmas and their statistical percentage in those areas. Here, 2σ contains the range between -σ to σ and 68% of data fall within this area.We can check the x data and its sigma range by plotting normal distribution plot.

> s <- sd(x) > m <- mean(x)

`> index <- seq(min(x), max(x), length = 100)`

> dn <- dnorm(index, mean = m, sd = s)

> plot(index, dn, type = "l", lwd = 2) + abline(m, m) + grid()

> text(m - 2, .02, "μ", pos = 3) > abline(s, 1, col = "green") > abline(-s, 1, col = "green") > text(s + 2, .02, "σ", pos = 3, col = "darkgreen") > text(-s + 2, .02, "-σ", pos = 3, co = "darkgreen") > abline(-2 * s, 1, col = "blue") > abline(2 * s, 1, col = "blue") > text(2 * s + 3, .02, "2σ", pos = 3, col = "blue") > text(-2 * s + 3, .02, "-3σ", pos = 3, col = "blue") > abline(3 * s, 1, col = "red") > abline(-3 * s, 1, col = "red") > text(3 * s + 3, .02, "3σ", pos = 3, col = "red") > text(-3 * s + 3, .02, "-3σ", pos = 3, col = "red")

Finally, we'll calculate the percentages of values in 2σ [-σ:σ], 4σ [-2σ:2σ], and 6σ [-3σ:3σ] ranges.

> x <= s & x >= (-s) -> sigma1 > length(sigma1[sigma1 == TRUE]) [1] 67 > x <= (s * 2) & x >= (-s * 2) -> sigma2 > length(sigma2[sigma2 == TRUE]) [1] 95 > x <= (s * 3) & x >= (-s * 3) -> sigma3 > length(sigma3[sigma3 == TRUE]) [1] 99

The results shows the closest outputs to the expected values. If we increase the number of samples we'll come closer to the values of 68-95-99.7.

## No comments:

## Post a Comment