9772

Influential Points in Regression

A random sample of size from a bivariate normal distribution with mean , unit variances, and correlation coefficient is generated. The sample correlation is shown as well as the Cook's distance corresponding to the locator point. Several methods of fitting the regression line are available.
The LS (least-squares) method uses Mathematica's built-in function LinearModelFit. See the Details section for more information about L1 (least absolute deviation) and RLINE (resistant line). Cook's distances provide an indication of points that have a large influence on the slope of the LS regression. As a rough rule, points that exceed , where is the sample size, may be influential. The recommended practice is to look at a plot of all Cook's distances. The Cook's distances are determined using LinearModelFit to fit the LS regression. Two plots are available for the Cook's distances. See Details for more information.
The slider zoom can be used to zoom out and move the locator some distance away to explore its influence on the regression, correlation, and Cook's distance. The effect of sample size and correlation may also be explored. By varying the random seed, you can explore the stochastic variation for a fixed initial data configuration.
  • Contributed by: Ian McLeod
  • (University of Western Ontario)

THINGS TO TRY

SNAPSHOTS

  • [Snapshot]
  • [Snapshot]
  • [Snapshot]
  • [Snapshot]
  • [Snapshot]

DETAILS

For the definition of Cook's distance, see [1]. For discussion of its use in detecting influential points in regression, see [2, 3].
Pages 67–68 of [2] suggest that observations with Cook's distances with values exceeding may be influential but that it is better to look at a plot of the Cook's distances versus with a benchmark line at .
Page 70 of [3] suggests looking at the half-normal plot of the Cook's distances to see those that are relatively large compared with the rest.
L1 Regression: minimizes the absolute sum of errors. This is computed using linear programming; see eqn. (3) in [4]. L1 regression is more robust than LS when moderate outliers are present, but it is still sensitive to extreme outliers.
RLINE: resistant regression line, discussed in §5 of [5], is based on medians.
[1] Cook's distance, Wikipedia.
[2] S. J. Sheather, A Modern Approach to Regression with R, New York: Springer, 2009.
[3] J. J. Faraway, Linear Models with R, Boca Raton: Chapman & Hall/CRC, 2005.
[4] S. C. Narula and J. F. Wellington, "The Minimum Sum of Absolute Errors Regression: A State of the Art Survey," International Statistical Review, 50(2), 1982 pp. 317–326.
[5] P. F. Velleman and D. C. Hoaglin, Applications, Basics and Computing of Exploratory Data Analysis, Boston: Duxbury Press, 1981.
    • Share:

Embed Interactive Demonstration New!

Just copy and paste this snippet of JavaScript code into your website or blog to put the live Demonstration on your site. More details »

Files require Wolfram CDF Player or Mathematica.









 
RELATED RESOURCES
Mathematica »
The #1 tool for creating Demonstrations
and anything technical.
Wolfram|Alpha »
Explore anything with the first
computational knowledge engine.
MathWorld »
The web's most extensive
mathematics resource.
Course Assistant Apps »
An app for every course—
right in the palm of your hand.
Wolfram Blog »
Read our views on math,
science, and technology.
Computable Document Format »
The format that makes Demonstrations
(and any information) easy to share and
interact with.
STEM Initiative »
Programs & resources for
educators, schools & students.
Computerbasedmath.org »
Join the initiative for modernizing
math education.
Step-by-step Solutions »
Walk through homework problems one step at a time, with hints to help along the way.
Wolfram Problem Generator »
Unlimited random practice problems and answers with built-in Step-by-step solutions. Practice online or make a printable study sheet.
Wolfram Language »
Knowledge-based programming for everyone.
Powered by Wolfram Mathematica © 2014 Wolfram Demonstrations Project & Contributors  |  Terms of Use  |  Privacy Policy  |  RSS Give us your feedback
Note: To run this Demonstration you need Mathematica 7+ or the free Mathematica Player 7EX
Download or upgrade to Mathematica Player 7EX
I already have Mathematica Player or Mathematica 7+