Simpson's Paradox

Initializing live version
Download to Desktop

Requires a Wolfram Notebook System

Interact on desktop, mobile and cloud with the free Wolfram Player or other Wolfram Language products.

Seven examples of Simpson's paradox are presented. The pooled data is generally misleading, as certain important variables are ignored. When the data is conditioned on the additional variable (e.g. the race of the victim in Florida murder cases, or the age of the smoker or nonsmoker), the results are the opposite of what one sees in the pooled data. The models shown give a logistic regression summary of the effect of certain parameters on the response variable, which is always taken to be the "good" thing: survival, not getting the death penalty, hits, or weight loss. The color is preserved in the two plots; that is, third class on the Titanic is red both in the pooled and conditioned plots. The case of employment data is interesting, as there the pooled data is a more reliable indicator of economic health than the separated data.

Contributed by: Karl Heiner and Stan Wagon (April 2020)
SUNY at New Paltz and Macalester College
Open content licensed under CC BY-NC-SA


Snapshots


Details

Snapshot 1: If gender is ignored, then Titanic survival is about equal for third-class passengers and crew members (but third-class survival rate is a little greater). But when gender is taken into account, one sees that a male crew member had a greater chance of surviving than a male third-class passenger, and a female crew member had a much, much better chance of surviving than a female in third class.

Snapshot 2: If age is ignored, it appears that smokers have a better chance of living than nonsmokers. But within each of the three age groups, nonsmokers do better.

Snapshot 3: For the pooled data, the unemployment rate went down from March 1982 to January 2009. But at each of the education levels, the rate rose. However, this is because of a large shift in the proportion of people in each class. To evaluate the general behavior of the economy, it is appropriate to look at the pooled data.

The exact data is shown in the output. Some details: for the Titanic, children are excluded from the third-class data. For Weight Loss, BMI is Body Mass Index, and the fraction form in the table is successful weight loss / number of subjects. The smoking data is from a 20-year study of women in Whickham, England; the Alive or Dead categories refer to their state after 20 years. Note that the kidney stone data is shown only as it pertains to the Simpson paradox discussion; there are excellent nonsurgical ways (ultrasound lithotripsy, laser lithotripsy) to treat kidney stones (see the original study [6]).

For more information on the datasets, see [1–6]. Item [7] contains information about an interesting occurrence of Simpson's paradox in an accusation of sex bias in graduate admissions at the University of California at Berkeley. See [8] for the kidney stone example. The unemployment data example was provided by Jeffrey Witmer (Oberlin College) using data from the Bureau of Labor Statistics.

References

[1] C. R. Charig, D. R. Webb, S. R. Payne and J. E. A. Wickham, "Comparison of Treatment of Renal Calculi by Open Surgery, Percutaneous Nephrolithotomy, and Extracorporeal Shockwave Lithotripsy", British Medical Journal (Clinical Research Edition) 292, 1986 pp. 879–882. doi:10.1136/bmj.292.6524.879.

[2] Select Statistical Services. "Hidden Data and Surviving a Sinking Ship: Simpson's Paradox." (Apr 23, 2020) www.select-statistics.co.uk/article/blog-post/hidden-data-and-surviving-a-sinking-ship-simpsons-paradox.

[3] M. L. Radelet and G. L. Pierce, "Choosing Those Who Will Die: Race and the Death Penalty in Florida," Florida Law Review, 43(1), 1991 pp. 1–34. www.ncjrs.gov/App/Publications/abstract.aspx?ID=134288.

[4] A. Smith, "At the Plate, a Statistical Puzzler: Understanding Simpson's Paradox," The State of the USA (blog), Aug 20, 2010. www.stateoftheusa.org/content/at-the-plate-a-statistical-puz.php.

[5] S. Skrivanek. "Simpson's Paradox (and How to Avoid Its Effects)." (Apr 23, 2020) www.moresteam.com/whitepapers/download/simpsons-paradox.pdf.

[6] D. R. Appleton, J. M. French and M. P. J. Vanderpump, "Ignoring a Covariate: An Example of Simpson's Paradox," The American Statistician, 50(4), 1996 pp. 340–341. doi:10.1080/00031305.1996.10473563.

[7] P. J. Bickel, E. A. Hammel and J. W. O'Connell, "Sex Bias in Graduate Admissions: Data from Berkeley," Science, New Series, 187(4175), 1975 pp. 398–404. science.sciencemag.org/content/187/4175/398.



Feedback (field required)
Email (field required) Name
Occupation Organization
Note: Your message & contact information may be shared with the author of any specific Demonstration for which you give feedback.
Send