One Sample t-Test

Looking at the data we are often tempted to arrive at statements about that data whose validity may be in doubt. To supplement the visual analysis we look at some basic statistic methods of testing.

In testing we usually define a null hypothesis and calculate some test statistic. We then look up a critical value in a table for a given significance level and either accept or reject the null hypothesis.

Let's assume that the company announced a sales figure of 22 per person over all departments. It turns out that our department's sales were a little below that value:

In [1]:
%reload_ext rpy2.ipython
In [3]:
%%R
options(width=60)
sales <- c( 20, 17, 24, 19, 24, 24, 21, 29, 13,  9)
mean(sales)
[1] 20

One question to ask is whether the difference is 'significant' i.e. not just a result of random fluctuation which would always be present in data of this kind; in other words, how likely are we to get a sample mean of 20 when the population mean is 22?

By calculating the mean we are summing values, and we already know that in this case the distribution of the result is approximately normal. The division by N does not matter in this respect.

Therefore, in order to answer our question we can use a t-test, formulating the null hypothesis that the population mean is $m = 22$. We also chose a significance level, such as $\alpha = 0.05$.

The test statistic is $t = \frac{m - \bar{x}}{SE}$ with standard error $SE = \frac{\sigma}{\sqrt{N}}$

where $\sigma$ is the sample standard deviation and $N$ is the sample size.

Note that for the sample standard deviation $N-1$ is used:

$s_x = \sqrt{ \frac{ \sum (x-\bar{x})^2 }{N-1} }$

In [5]:
%%R
sqrt(sum(sapply(c( 20, 17, 24, 19, 24, 24, 21, 29, 13,  9), function(x) (x-20)^2))/(10-1))
[1] 5.868939

For our data

In [6]:
%%R
sd(sales)
[1] 5.868939
In [7]:
%%R
(20 - 22) / (5.868939 / sqrt(10))
[1] -1.077632

In the old days before cheap desktop computers we would have to consult a table and look up the critical value for the given degrees of freedom (n-1) and the significance level, and based on that value and the test statistic we would either accept or reject the null hypothesis.

1T  0.1     0.05    0.025   0.01    0.005
2T  0.2     0.1     0.05    0.02    0.01
----------------------------------------------
1   3.078   6.314  12.706  31.821  63.657
2   1.886   2.92    4.303   6.965   9.925
3   1.638   2.353   3.182   4.541   5.841
4   1.533   2.132   2.776   3.747   4.604
5   1.476   2.015   2.571   3.365   4.032
6   1.44    1.943   2.447   3.143   3.707
7   1.415   1.895   2.365   2.998   3.499
8   1.397   1.86    2.306   2.896   3.355
9   1.383   1.833   2.262   2.821   3.25
10  1.372   1.812   2.228   2.764   3.169
15  1.341   1.753   2.131   2.602   2.947

There are n-1 = 9 degrees of freedom since if we still want to arrive at a given mean we can change 9 values, but not the 10th one, since this last value must reflect (i.e. negate) our changes.

We are doing a two-tailed test i.e. we are testing whether the mean is different from the given value, accounting for both directions. A one-tailed test would be appropriate if e.g. we only test for m < μ.

  • The critical value for df = 9 and α = 0.05 (two-tailed) is 2.262

  • $1.077632 < 2.262$ i.e. the absolute value of the test statistic is smaller than the critical value

  • therefore we accept the null hypothesis

Today we can easily compute the p-value directly up to several decimal places:

In [8]:
%%R
t.test(sales, mu = 22)
	One Sample t-test

data:  sales
t = -1.0776, df = 9, p-value = 0.3092
alternative hypothesis: true mean is not equal to 22
95 percent confidence interval:
 15.80161 24.19839
sample estimates:
mean of x 
       20 

The p-value is well above $\alpha$ and therefore we accept the null hypothesis (which states that the true mean is 22).

Another interpretation of the t.test result is: the probability of arriving at a test statistic of -1.0776 (or higher absolute value) is 0.3092 i.e. quite likely.

  • The p-value is the probability of observing a test statistic at least as large as the one calculated assuming the null hypothesis is true.

  • The p-value is NOT the probability that the null hypothesis is true!

For the confidence interval we compute the standard error

In [9]:
%%R
SE <- sd(sales) / sqrt(length(sales))
SE
[1] 1.855921

The quantile for $\alpha = 0.05$ and $N-1 = 9$ degrees of freedom is not 1.96 but

In [10]:
%%R
qt(1-0.05/2, 9)
[1] 2.262157

which is of course the critical value from the table and results in the confidence intervals given by the t.test:

In [13]:
%%R
20 - 2.262157 * 1.855921
[1] 15.80162
In [12]:
%%R
20 + 2.262157 * 1.855921
[1] 24.19838

The value of m = 22 lies well within the confidence interval.

For a value just outside the confidence interval, such as m = 25, the result would have been different:

In [14]:
%%R
t.test(sales, mu = 25)
	One Sample t-test

data:  sales
t = -2.6941, df = 9, p-value = 0.02463
alternative hypothesis: true mean is not equal to 25
95 percent confidence interval:
 15.80161 24.19839
sample estimates:
mean of x 
       20 

In this case with $\alpha = 0.05$ we would reject the null hypothesis.

Note that for $m = 0$ the t-test statistic becomes simply

$t = \frac{m}{SE}$

Two Sample t-Test

The t.test function in R can also be used to perform a two-sample t-test i.e. compare two means:

In [15]:
%%R
t.test(c(2,3,7,11,13,17,19),c(7,11,13,17,19,23,29,31))
	Welch Two Sample t-test

data:  c(2, 3, 7, 11, 13, 17, 19) and c(7, 11, 13, 17, 19, 23, 29, 31)
t = -2.1649, df = 12.847, p-value = 0.04983
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -16.921242403  -0.007329025
sample estimates:
mean of x mean of y 
 10.28571  18.75000 

Here the null hypothesis is that the two population means are identical; with $\alpha=0.05$ this would be rejected in this case since the p-value is (slightly) below $\alpha$.

Other Sources