Small sample sizes means variations will fluctuate wildly
Consider a gumball machine that is opaque and filled with thousands of gumballs that are either green or red. Green are good, red are bad.
A nickel is put in and out comes a green gumball. The question is: Based on this one-for-one achievement of a green gumball on the first try, what can we reasonably infer about the proportion of green and red gumballs in the gumball machine? If another nickel is put in and the next gumball out of the machine is also green, now what can we reasonably infer about the proportion of greens to reds? Are all of them green? Most of them?
The only thing we can know with certainty in this context is that the first two gumballs we got out of the machine were green. Everything beyond that is inference, and the susceptibility of our inference to error is inversely proportional to our sample size. The smaller the sample size, the higher the susceptibility of our inference to error, and vice versa. We do not know if we got the only two green gumballs in the machine or if all but two of the gumballs in the machine are green or something in between. If our sample size were 5,000, we could predict the ratio with a fairly high degree of reliability.
Two key observations should be understood from this:
- Large samples are more precise than small samples.
- Small samples yield extreme results more often than large samples do.[1]
A diminutive sample size should caution us against assuming too much.
Kahneman goes on to note that our bias toward believing small numbers is part of a larger tendency:
The strong bias toward believing that small samples closely resemble the population from which they are drawn is also part of a larger story: we are prone to exaggerate the consistency and coherence of what we see. The exaggerated faith of researchers in what can be learned from a few observations is closely related to the halo effect, the sense we often get that we know and understand a person about whom we actually know very little. System 1 runs ahead of the facts in constructing a rich image on the basis of scraps of evidence. A machine for jumping to conclusions will act as if it believed in the law of small numbers. More generally, it will produce a representation of reality that makes too much sense.[2]
See also:
Thinking, Fast and Slow – Kahneman (2013), ch. 10. Kahneman notes that these are actually the same statement, worded two different ways. ↩︎
Thinking, Fast and Slow – Kahneman (2013), ch. 10, § “A Bias of Confidence Over Doubt.” ↩︎