DataTechNotes: Introduction to T-test with R

A t-test is used to compare the means of normally distributed data sets and identify how much they are different from each other. The results from regular patients and patients who receive a new treatment can be two sets of data to compare and t-test can get the differences. T-statistics was introduced by Student (pen name of William Sealy Gosset) and it is called Student's t-test.
In this post, we'll briefly learn how to use and do tests with t-test for given sets in R. The tutorial covers:

A t.test command usage
Null hypothesis
T-distribution table

A t.test command usage

We can do a t-test by using the t.test() function in R. Simple usage of t.test() function can be:

t.test(rnorm(10)+5, mu = 4)

 One Sample t-test

data:  rnorm(10) + 5
t = 2.1038739, df = 9, p-value = 0.06471015
alternative hypothesis: true mean is not equal to 4
95 percent confidence interval:
 3.940511891 5.640899209
sample estimates:
 mean of x 
4.79070555

Here, we've checked a one-sample with 10 randomly generated numbers and indicating mean value mu=4. The output definitions are:

    t - a value of t statistics,
    df - degree of freedom,
    p-value - probability value that is 6.5%.
    alternative hypothesis description
    95% confidence interval for the mean

Next, we'll generate two sets of data to compare.

set.seed(123)
a = rnorm(10)+10
print(a)
 [1]  9.439524353  9.769822511 11.558708314 10.070508391
 [5] 10.129287735 11.715064987 10.460916206  8.734938765
 [9]  9.313147148  9.554338030

b = rnorm(10)+11
print(b)
 [1] 12.224081797 11.359813827 11.400771451 11.110682716
 [5] 10.444158865 12.786913137 11.497850478  9.033382843
 [9] 11.701355902 10.527208592

Comparing a and b with t.test() function. We'll set a true into the var.equal (variance equal) parameter.

t.test(a, b, var.equal = T)
 Two Sample t-test

data:  a and b
t = -2.543782, df = 18, p-value = 0.02036269
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -2.0705694499 -0.1974231836
sample estimates:
  mean of x   mean of y 
10.07462564 11.20862196

The t value of two sets can be calculated with the below formula.

t = (mean(a)-mean(b))/sqrt(sd(a)^2/length(a)+sd(b)^2/length(b))

t = (mean(a)-mean(b))/sqrt(sd(a)^2/length(a)+sd(b)^2/length(b))
print(t)
[1] -2.543781976

The result shows that the t is equal to the t-value of the t.test() function.

Null hypothesis

The null hypothesis is an important concept in statistics to explain the tests. It is important to understand the t-test too as the result defines the alternative hypothesis. A null hypothesis, H₀ statement defines that the means of the two populations are equal. Otherwise, it becomes an alternative hypothesis, H_A or H₁.

T distribution table

In R, we can get values of t distribution table with qt() function with specifying probability value and degree of freedom. Getting one-tail t values with a five percent probability.

qt(0.95, df=10)
[1] 1.812461123

A degree of freedom from 1 to 20

qt(0.95, df=1:20)
 [1] 6.313751515 2.919985580 2.353363435 2.131846786 2.015048373
 [6] 1.943180281 1.894578605 1.859548038 1.833112933 1.812461123
[11] 1.795884819 1.782287556 1.770933396 1.761310136 1.753050356
[16] 1.745883676 1.739606726 1.734063607 1.729132812 1.724718243

In this tutorial, we've briefly learned the t-test with R. Thank you for reading!

DataTechNotes

Pages

Introduction to T-test with R

No comments:

Post a Comment