Hidden Correlation in Regression

This Demonstration simulates the linear regression , , where , , and the are random independent variables from a continuous uniform distribution on ; is generated from a multivariate normal distribution with mean vector 0 and covariance matrix , where and , . The thumbnail shows the Poincaré plot (or scatterplot) of the lagged reordered residuals versus from the linear model fit. The Kendall rank correlation and its two-sided -value shown in the plot provide a diagnostic test for the presence of hidden correlation. From this residual plot, we clearly see that the errors violate the usual regression assumption of independence. This model misspecification is less obvious using the traditional residual dependency plot.
  • Contributed by: Ian McLeod and Yun Shi
  • Department of Statistical and Actuarial Sciences, Western University

SNAPSHOTS

  • [Snapshot]
  • [Snapshot]
  • [Snapshot]
  • [Snapshot]
  • [Snapshot]
  • [Snapshot]

DETAILS

Snapshot 1: The parameter estimates, their standard errors, and -values in the fitted regression with true parameters are shown. Due to model misspecification, the standard errors are too small, and the -values falsely suggest the coefficient is nonzero while the estimate for with a -value of about 6% is borderline.
Snapshot 2: The residual dependency plot is flat, suggesting model adequacy. Looking at this plot more carefully, we do see a nonrandom pattern, but it is less evident than in the Poincaré plot.
Snapshot 3: Comparison of the estimated and theoretical correlation functions. The parameter is estimated by nonlinear least squares.
Snapshot 4-6: In the next 3 shapshots, and the other settings remain the same. In this case the effect of model misspecification increases and is again detected better in the Poincaré plot than in the residual dependency plot. Both regression parameters are erroneously reported as very significant.
Residual dependency plots for checking regression fits are discussed in most regression textbooks as for example ([1, 2]).
Lagged scatterplots are sometimes called Poincaré plots ([3, 4]).
See [5] for further discussion of hidden correlation in regression.
References
[1] W. S. Cleveland, Visualizing Data. Summit, NJ: Hobart Press, 1993.
[2] S. J. Sheather, A Modern Approach to Regression with R, New York: Springer, 2009.
[3] D. Kaplan and L. Glass, Understanding Nonlinear Dynamics, New York: Springer, 1995.
[4] Wikipedia. "Poincaré plot." (Mar 20, 2013). en.wikipedia.org/wiki/Poincare_plot.
[5] E. Mahdi. Diagnostic Checking, Time Series and Regression, Ph.D Thesis, Western University, http://ir.lib.uwo.ca/etd/244.
    • Share:

Embed Interactive Demonstration New!

Just copy and paste this snippet of JavaScript code into your website or blog to put the live Demonstration on your site. More details »

Files require Wolfram CDF Player or Mathematica.