Simpson's Paradox

Five examples of Simpson's paradox are presented. The pooled data is generally misleading, as certain important variables are ignored. When the data is conditioned on the additional variable (e.g. the race of the victim in Florida murder cases), the results are the opposite of what one sees in the pooled data. The models shown give a logistic regression summary of the effect of certain parameters on the response variable, which is always taken to be the "good" thing: survival, not getting the death penalty, hits, or weight loss. The color is preserved in the two plots; i.e. third class on the Titanic is red both in the pooled and conditioned plots.


  • [Snapshot]
  • [Snapshot]
  • [Snapshot]


Snapshot 1: If gender is ignored, then Titanic survival is about equal for third-class passengers and crew members. But when gender is taken into account, one sees that a male crew member had a greater chance of surviving than a male third-class passenger, and a female crew member had a much, much better chance of surviving than a female in third class.
Snapshot 2: If age is ignored, it appears that smokers have a better chance of living than nonsmokers. But within each of the three age groups, nonsmokers do better.
The exact data is shown in the output. Some details: for the Titanic, children are excluded from the third-class data. For Weight Loss, BMI is Body Mass Index, and the fraction form in the table is successful weight loss / number of subjects. The smoking data is from a study of women over 20 years in Whickham, England; the Alive or Dead categories refer to their state after 20 years. Note that the kidney stone data is shown only as it pertains to the Simpson paradox discussion; there are excellent nonsurgical ways (ultrasound lithotripsy, laser lithotripsy) to treat kidney stones (see the original study [8]).
For more information on the datasets, see [1–8] and also [11] for the kidney stone example. Items [9] and [10] contain information about an interesting occurrence of Simpson's paradox in an accusation of sex bias in graduate admissions at the University of California at Berkeley.
[1] Select Statistical Services. "Hidden Data and Surviving a Sinking Ship: Simpson's Paradox." (Oct 22, 2015)
[2] D. Patterson. "Simpson's Paradox." (Oct 22, 2015)
[3] M. L. Radelet and G. L. Pierce, "Choosing Those Who Will Die: Race and the Death Penalty in Florida," Florida Law Review, 43(1), 1991 pp. 1–34.
[4] A. Smith, "At the Plate, a Statistical Puzzler: Understanding Simpson's Paradox," The State of the USA (blog), Aug 20, 2010.
[5] S. Skrivanek. "Simpson's Paradox (and How to Avoid Its Effects)." (Oct 22, 2015)
[6] M. Irwin. "Simpson's Paradox: What Can Happen if You Ignore an Important Variable." (Oct 22, 2015) 2.html.
[7] D. R. Appleton, J. M. French, and M. P. J. Vanderpump, "Ignoring a Covariate: An Example of Simpson's Paradox," The American Statistician, 50(4), 1996 pp. 340–341. doi:10.1080/00031305.1996.10473563.
[8] C. R. Charig, D. R. Webb, S. R. Payne, and J. E. A. Wickham, "Comparison of treatment of renal calculi by open surgery, percutaneous nephrolithotomy, and extracorporeal shockwave lithotripsy", Br. Med. J. (Clin. Res. Ed.) 292, 1986, pp. 879–882.
[9] L. Lehe and V. Powell. "Simpson's Paradox." (Oct 22, 2015)
[10] P. J. Bickel, E. A. Hammel, and J. W. O'Connell, "Sex Bias in Graduate Admissions: Data from Berkeley," Science, New Series, 187(4175), 1975 pp. 398–404.
    • Share:

Embed Interactive Demonstration New!

Just copy and paste this snippet of JavaScript code into your website or blog to put the live Demonstration on your site. More details »

Files require Wolfram CDF Player or Mathematica.