This was one of the hardest math topics for me to grasp, particularly since the ideas of hypothesis testing and the tools used to quantify it are jumbled up, producing a convoluted Kindiak. I make an attempt to deconvolute any confusion that learning this topic may create.
What is hypothesis testing? At its core, hypothesis testing is about evaluating whether a claim is false or not. I claim to be 170 cm, but when you use a metre ruler to measure me I come short at 160 cm. Claim disproven! Of course, what makes hypothesis testing unique is that it uses probabilities to conclude.
Suppose a cola company claims that the mean volume of their drinks are 300 ml in each can. You smell horseshit and decide to test their claim. You buy 20 cans of cola, poured them out individually, and all of them measure only 290 ml. It would be fair to conclude that the company’s cheating your money right?
- suppose they were right, then
- the probability of all 20 cans having less than 300 ml would be miniscule (, which is approximately one in a million!).
- It’s too improbable!
- Hence, the company is lying!
Take note of the bolded argument. This is the core of hypothesis testing. Step 1 assumes the null hypothesis (that is, the default claim) is true. Step 2 tests that claim with reality. Step 3 evaluates how compatible, or incompatible, the claim is with reality using probabilities (p-value). Step 4 makes the final conviction on whether the default claim is false or not.
Instead of 20 cans, all of which are less than 290 ml, let’s put some numbers and distributions on it (for simplicity’s sake; there are many variants to this).
Let the volume of cola in a can , follow a normal distribution with standard deviation 3 ml. Suppose we find that the mean volume of cola in 20 cans is 299.5 ml. Not so easy to conclude now, huh? It’s simple, really. We follow the four steps of the argument, but this time use statistics to justify these steps.
Step 0: The set-up
Okay, I lied. We need to first set up the courtroom. We call their claim, that the mean volume, , is 300 ml, the null hypothesis and denote it as . (technically we call it the population mean, since it is the mean volume of ALL the cans of cola, but if this confuses you, disregard until future reference). The alternate claim is that they’re lying, maybe that the mean volume is less than 300 ml. We call this the alternative hypothesis and denote it as . We state the hypotheses as follows:
Step 1: Suppose the null hypothesis is true
We test this claim using 20 cans of cola with standard deviation 3 ml. Assume they are right. That means suppose the mean volume really is 300 ml. The mean volume of 20 cans, , using normal distribution concepts, will follow a normal distribution as follows:
Step 2: Test the claim with reality
If , is true, then in theory, the sample mean will be about 300 ml with small variations. In reality, the sample mean is 299.5 ml. Is that too much of a deviation from the proposed mean? If it is too much, will the probability be too low?
In other words, give that follows such a distribution, what is the probability that the sample mean of the 20 cans of cola would have a value of 299.5 ml? Well, we find the probability simply by computing the value of , which either by using the G.C. or by using z-values and a z-table can be found. You should get a probability, or, the p-value, of 0.228, correct to 3 significant figures.
Step 3: Is it too improbable?
Improbable or probable are relative words. Some think that a 1/3 chance in winning a dice-based game is low, while a 1/3 chance in winning the lottery is mostrously large. In order to assess whether 0.228 is too improbable or not, we need to set a limit. What probability or less do we consider it as ‘too improbable’? This is called the level of significance, denoted by , and is arbitrarily chosen, depending on the context. In short, for the simplest of questions, it’s given to you in the question (there will be variants asking that you find this value of )
Anyway, let’s set for this question. Any probability less than this is, well, too improbable, GIVEN that we assume to be true. What’s our p-value? It’s 0.228. Is it less than 0.25? YES. IT’S TOO IMPROBABLE!
Step 4: Final conviction
Under , we assume that the mean volume of cola is 300 ml. The probability that the sample mean volume of 20 cans of cola is 299.5, aka less than 300, is 0.228, which is too improbable (compared to , 0.25) to occur. Hence, it’s more probable that the company is lying.
We therefore find sufficient evidence to reject and conclude that the actual mean volume of cola is less than 300 ml (accepting , but writing these two words alone won’t get you credit).
Step E: Error
I lied to you again. What if the company isn’t lying, but we were lucky (or unlucky) enough to pick the 20 cans that gave us a sample mean of 299.5 ml? Then that 0.229 probability was too low, but not low enough to render the company’s claim wrong. The p-value is therefore alternatively defined as the probability of wrongly rejecting .
That’s the core idea of hypothesis testing and a very simple example to get you started. What about finding the sample mean? What if you use a large sample that doesn’t follow a normal distribution? What if the population variance isn’t known? What if is not whether is less or more than some claimed value, but just not? What if any of the parameters are unknown?
Welcome to hypothesis testing. Make sure your understanding of distributions and means of random variables are top-notch before coming here, since they are the ABCs of hypothesis testing.