Regression toward the mean
In his lecture Joseph Blitzstein talks about two basic statistical phenomena: Regression toward to the mean (RTTM) and survivor bias (or conditioning more broadly). The RTTM is today’s topic. The following was written mainly by GPT 4.
Regression toward the mean is a statistical phenomenon where extreme data points are likely to be followed by less extreme ones when measured again. In simpler terms, it means that if an extreme observation is observed, the next observation is likely to be closer to the mean or average.
Why does it happen?
It’s primarily a matter of probability. Extreme values are, by definition, rare. So, when you take a second measurement, it’s simply more probable that the new value will be closer to the mean than the previous extreme value was.
Examples:
Sports Performance: Imagine a basketball player who has an outstanding game, scoring well above their average number of points. If they’ve played at such an exceptional level, it’s likely that in the next game they will score closer to their average (not necessarily because their skill has decreased, but simply due to the natural variability in performance).
Student Test Scores: If a student who typically scores around the average on exams gets an exceptionally high score on one test, they might score closer to their average on a subsequent test. Conversely, if they score exceptionally low on one test, they might score higher the next time.
Investment Returns: If a particular stock has an exceptionally good year with returns way above the market average, it might have more modest (or even below-average) returns the following year.
Height of Parents and their Children: This is a classic example by Sir Francis Galton. He found that very tall parents tend to have children who are shorter than them (but still above average), and very short parents tend to have children who are taller than them (but still below average). The children’s heights tend to “regress” towards the mean height.
Medical Treatments: If patients are selected for a clinical trial because they have exceptionally high blood pressure, some of them will probably show a reduction in blood pressure over time even without any treatment. This isn’t because of any therapeutic effect, but simply because their initial measurements were unusually high and subsequent measurements tend to be closer to the mean.
Misunderstandings:
This concept is often misunderstood. For instance, if a student performs poorly on a test and then improves on the next one, it might be tempting to attribute this improvement to a particular intervention (like tutoring). While the intervention might have had an effect, it’s also possible that some of the improvement was due to regression toward the mean.
In research and experimental designs, this phenomenon needs to be taken into account, especially when making causal inferences from observed changes.
Toward the end of his talk, he mentions about the quote by Daniel Kahneman that very clearly explain the concept of RTTM.
I had the most satisfying Eureka experience of my career while attempting to teach flight instructors that praise is more effective than punishment for promoting skill-learning. …. [A flight instructor objected:] “On many occasions I have praised flight cadets for clean execution of some aerobatic maneuver, and in general when they try it again, they do worse. On the other hand, I have often screamed at cadets for bad execution, and in general they do better the next time. So please don’t tell us that reinforcement works and punishment does not, because the opposite is the case.” …
This was a joyous moment, in which I understood an important truth about the world: because we tend to reward others when they do well and punish them when they do badly, and because there is regression to the mean, it is part of the human condition that we are statistically punished for rewarding others and rewarded for punishing them.
Full details of Kahneman’s quote can be found from the following article