Chi Square for Goodness of Fit

Let's assume that the highest-paid two people in our sales team have earned their position by being top performers for several years, while two people who are new to the team get lowest pay since they lack previous experience, and probably sell less than the seniors. Or do they?

  sort(salary)
 [1] 15000 15000 18464 19658 20495 21914 22061 22423 23335 23552

The sort() function returns the sorted values and does not help here, but the order() function returns the indices of the sorted values, and we can use it to sort the sales by salary.

  salary
 [1] 20495 22061 18464 23335 19658 22423 23552 21914 15000 15000
  sales
 [1] 20 17 24 19 24 24 21 29 13  9
  order(salary)
 [1]  9 10  3  5  1  8  2  6  4  7
  sales[order(salary)]
 [1] 13  9 24 24 20 29 17 24 19 21

When sorted in this fashion, the data show that

  z <- sales[order(salary)]
  jun <- z[1]+z[2]
  sen <- z[9]+z[10]
  jun
[1] 22
  sen
[1] 40

On the other hand, the difference is not really huge. One could argue that there is clearly an element of random chance in sales, and this particular result is just coincidence - it does not signify a greater sales talent on the part of the seniors.

One way to tackle this question is using the chisq.test() with one-dimensional count data, in this case a goodness-of-fit test.

The Chi-square test is used here to test if a sample of data came from a population with a specific distribution.

    c(jun, sen)
[1] 22 40

  chisq.test(c(jun,sen))

	Chi-squared test for given probabilities

data:  c(jun, sen) 
X-squared = 5.2258, df = 1, p-value = 0.02225

The p-value is the probability of obtaining a test statistic 'at least as extreme' as the one that was actually observed, assuming that the null hypothesis is true (in this case, that the population probabilities are equal).

The significance level is used to arrive at a decision: if the p-value is less than or equal to an (arbitrary!) significance level α, then the null hypothesis is rejected, the outcome is said to be statistically significant at a given level α, and the p-value is the probability of making a type I error i.e. rejecting the null hypothesis when it is in fact true.

Traditionally, either the α = 0.05 level (5% level) or the α = 0.01 level (1% level) have been used. Obviously, α = 0.01 is much more conservative than α = 0.05.

The choice of α is crucial in the above example. The p-value = 0.02225 means that

The Chi-square test uses the following assumptions and definitions:

Some interesting properties:

In order to understand the computation done by the R package and the reasoning behind the procedure we will now do the test 'by hand', as it had been done before the general availability of high computing power, when it was infeasible to compute the p-value.

If we reject H0 in the above procedure the actual p-value remains unknown. This value can of course be calculated by hand, if only approximately, but it is very time-consuming to do so. However, we already know that it is not greater than α, and with the help of the table we can find further limits: by looking up the next critical value for the given degrees of freedom in the example above we can determine that the p-value is smaller than 0.05 but greater than 0.01.

To further motivate the discussion, here is a Python simulation code for tossing a fair coin and counting the number of cases where we arrive at a test statistic value at least as large as the one given on the command line; and here is some output from this program:

xmdimrill:% chisim.py 2 62 5.2258 10000
check: [22, 40] 5.22580645161
[37, 25] 2.3226
[33, 29] 0.2581
[24, 38] 3.1613
[40, 22] 5.2258
[28, 34] 0.5806
[29, 33] 0.2581
[34, 28] 0.5806
[30, 32] 0.0645
[31, 31] 0.0000
[31, 31] 0.0000
[33, 29] 0.2581
[32, 30] 0.0645
[33, 29] 0.2581
[27, 35] 1.0323
[32, 30] 0.0645
[29, 33] 0.2581
[27, 35] 1.0323
[27, 35] 1.0323
[27, 35] 1.0323
observed avg: [31.045000000000002, 30.954999999999998]
chisq test stat >= 5.2258 : 0.0289

The simulation works with many-sided fair coins, in this case the standard two-sided coin.

Another question that may arise in the context of the sales analysis is: How close was the result? Was it maybe a close shave for the seniors?

With just 3 more sales for the juniors the result would have been:

    c(jun+3,sen)
[1] 25 40

  chisq.test(c(jun+3,sen))

	Chi-squared test for given probabilities

data:  c(jun + 3, sen) 
X-squared = 3.4615, df = 1, p-value = 0.06281

This p-value is higher than 0.05, i.e. in this case we can not reject the hypotheses that the population probabilites are equal, unless we assume the rather questionable significance level of α = 0.1.

Note that even 0.05 is not a very strict significance level - it means a chance of 1 in 20.

In the example above, with 3 more sales for the seniors the result would be:

  c(jun,sen+3)
[1] 22 43

  chisq.test(c(jun,sen+3))

	Chi-squared test for given probabilities

data:  c(jun, sen + 3) 
X-squared = 6.7846, df = 1, p-value = 0.009195

Here, the p-value is below α = 0.01, and therefore our statement that the population probabilities are not equal would be much safer.