Spread-Location Regression Diagnostic Check

Initializing live version
Download to Desktop

Requires a Wolfram Notebook System

Interact on desktop, mobile and cloud with the free Wolfram Player or other Wolfram Language products.

The spread-location plot from a linear regression shown on the left is a plot of versus , where , is the power transformation of the absolute residual, and is the fitted value.

[more]

The red line is a nonparametric smoother used to enhance the visualization. The purpose of this plot is to check for possible model mis-specification caused by monotonic change in variance related to the level as estimated by the fitted values. On the right, the box-whisker chart of the is shown. By adjusting the power transformation parameter to make the distribution of the more symmetric, the visualization of the spread-location relationship is improved. Some popular software programs fix , but this Demonstration illustrates that this might not always be the best choice. Vary the sample size, , and the random seed to see the impact of sample size and randomness. See Details for further discussion.

[less]

Contributed by: Ian McLeod (December 2013)
(Western University)
Open content licensed under CC BY-NC-SA


Snapshots


Details

The spread-location plot was suggested in [1] and the version in this Demonstration in [2], which used Mathematica to derive the optimal symmetrizing transformation for for a variety of error distributions.

In this Demonstration, the linear regression is fitted to data generated with , and is t-distributed on four degrees of freedom, is uniformly distributed on , and is set to . So the linear regression model is mis-specified and a log transformation of the response variable is needed. The purpose of the spread-location plot is to detect this type of mis-specification. The loess smoother, shown in red, helps to show if there is a relationship between the variance as measured by and the location as measured by .

Snapshot 1: using a log-transformation, , improves the visualization in the plot of versus for the data shown in the thumbnail, with ; the box-whisker chart confirms that is more symmetrically distributed

Snapshot 2: referring again to the data used in the thumbnail, Snapshot 2 shows that does not work as well

Snapshots 3 and 4: a smaller sample, , is used; the effect of the skewness of when is less dramatic and so is the improvement in using

References

[1] W. S. Cleveland, Visualizing Data, Summit, NJ: Hobart Press, 1993.

[2] A. I. McLeod, "Improved Spread-Location Visualization," Journal of Computational and Graphical Statistics, 8(1), 1999 pp. 135–141. doi:10.1080/10618600.1999.10474806.



Feedback (field required)
Email (field required) Name
Occupation Organization
Note: Your message & contact information may be shared with the author of any specific Demonstration for which you give feedback.
Send