11454

Correlation and Regression Explorer

The scatter plot shown is based on an initial data configuration that was called Andrew's example in a lecture by J. W. Tukey that the author attended. Three types of regression lines are available: least squares (LS), least absolute deviation (L1), and a resistant line regression. See the Details section for more information about the regression lines. In Andrew's example, LS, L1, and RLINE all produce very different fits.
As you drag, create, or delete points, the correlation and regression line are updated.
Try moving the points to create a set that looks similar to what might be obtained from a bivariate normal with correlation . Then see if you can drag one point away from the others so it is an outlier without changing the resistant regression line very much. Notice, however, that the correlation does change. Now drag the point so the resistant regression line also changes.
Can you create data in which there is a strong association between and but ?

DETAILS

Snapshot 1: the points are similar to data generated from a bivariate normal distribution with correlation coefficient about 0.86
Snapshot 2: the data is the same as in snapshot 1 except a point has been added in the bottom-right corner; the LS regression line slope does not change much but the correlation does
Snapshot 3: same as in snapshot 2 but the point is much further to the right; both correlation and regression lines have completely changed
Snapshot 4: five points have been moved to roughly follow a parabolic curve; this illustrates that correlation does not measure nonlinear association
For LS regression, Mathematica's Fit function is used.
L1 regression minimizes the sum of absolute errors. This is computed using linear programming; see eqn. (3) in [2]. L1 regression is more robust than LS when moderate outliers are present, but it is still sensitive to extreme outliers.
RLINE: resistant regression line §5 [3] is based on medians.
[1] F. J. Anscombe, "Graphs in Statistical Analysis," The American Statistician, 27(2), 1973 pp. 17–21.
[2] S. C. Narula and J. F. Wellington, "The Minimum Sum of Absolute Errors Regression: A State of the Art Survey," International Statistical Review, 50(2), 1982 pp. 317–326.
[3] P. F. Velleman and D. C. Hoaglin, Applications, Basics and Computing of Exploratory Data Analysis, Boston: Duxbury Press, 1981.

PERMANENT CITATION

Contributed by: Ian McLeod (The University of Western Ontario)
 Share: Embed Interactive Demonstration New! Just copy and paste this snippet of JavaScript code into your website or blog to put the live Demonstration on your site. More details » Download Demonstration as CDF » Download Author Code »(preview ») Files require Wolfram CDF Player or Mathematica.

Related Curriculum Standards

US Common Core State Standards, Mathematics

 RELATED RESOURCES
 The #1 tool for creating Demonstrations and anything technical. Explore anything with the first computational knowledge engine. The web's most extensive mathematics resource. An app for every course—right in the palm of your hand. Read our views on math,science, and technology. The format that makes Demonstrations (and any information) easy to share and interact with. Programs & resources for educators, schools & students. Join the initiative for modernizing math education. Walk through homework problems one step at a time, with hints to help along the way. Unlimited random practice problems and answers with built-in step-by-step solutions. Practice online or make a printable study sheet. Knowledge-based programming for everyone.