9474

Choosing a Data Transformation with the Box-Whisker Plot

This Demonstration shows the effect of a power data transformation,
,
on data, , from simulated samples of size or from normal, exponential, lognormal, inverse Gaussian, or Weibull distributions for .
In practice, a suitable power transformation can be selected by examining the effect of the transformation using a box-and-whisker plot. The simplest power transformation which makes the data approximately symmetric is selected. With actual data, often corresponding to reciprocal, log, square root, or no transformation.
Two skewness statistics—the usual Pearson skewness, , and the Bowley skewness, —are displayed for comparison with the plot.
Another method for choosing treats as a parameter and makes the assumption that for some value of , the data is normally distributed. Under this assumption, the likelihood function may be obtained and it may be numerically maximized to obtain the maximum likelihood estimate for , . A range of plausible values for is given by all for which , where .
Try experimenting with different sample sizes and different distributions.
In actual applications, real data (not simulated data) would be used. Using a suitable power transformation often simplifies the statistical analysis.

SNAPSHOTS

  • [Snapshot]
  • [Snapshot]
  • [Snapshot]
  • [Snapshot]
  • [Snapshot]
  • [Snapshot]
  • [Snapshot]
  • [Snapshot]

DETAILS

Data transformations such as square-root and logs are often used in statistics to improve the model assumptions. See [1] for examples, with actual data, of the use of box-and-whisker plots to choose a transformation.
Using Mathematica's built-in functions Manipulate and BoxWhisker with the family of power transformations provides a simple and effective method for choosing a suitable transformation with real data. For comparison and for pedagogical purposes, we have included skewness and maximum likelihood methods for choosing .
[2] discusses the use of maximum likelihood estimation for in the family of power transformations,
.
[3] discusses choosing a power transformation by minimizing absolute skewness. The robust skewness statistic computed using QuartileSkewness is sometimes called Bowley skewness.
The use of the relative likelihood function for statistical inference is discussed in the books [4] and [5].
References:
[1] W. S. Cleveland, Visualizing Data, Summit, NJ: Hobart Press, 1993.
[2] G. E. P. Box and D. R. Cox, "An Analysis of Transformations," Journal of the Royal Statistical Society B, 26(2), 1964 pp. 211–252.
[3] D. V. Hinkley, "On Power Transformations to Symmetry,” Biometrika, 62, 1975 pp. 101–111.
[4] A. Azzalini, Statistical Inference, Boca Raton, FL: Chapman & Hall/CRC, 1996.
[5] D. A. Sprott, Statistical Inference in Science, New York: Springer, 2000.
    • Share:

Embed Interactive Demonstration New!

Just copy and paste this snippet of JavaScript code into your website or blog to put the live Demonstration on your site. More details »

Files require Wolfram CDF Player or Mathematica.









 
RELATED RESOURCES
Mathematica »
The #1 tool for creating Demonstrations
and anything technical.
Wolfram|Alpha »
Explore anything with the first
computational knowledge engine.
MathWorld »
The web's most extensive
mathematics resource.
Course Assistant Apps »
An app for every course—
right in the palm of your hand.
Wolfram Blog »
Read our views on math,
science, and technology.
Computable Document Format »
The format that makes Demonstrations
(and any information) easy to share and interact with.
STEM Initiative »
Programs & resources for
educators, schools & students.
Computerbasedmath.org »
Join the initiative for modernizing
math education.
Powered by Wolfram Mathematica © 2014 Wolfram Demonstrations Project & Contributors  |  Terms of Use  |  Privacy Policy  |  RSS Give us your feedback
Note: To run this Demonstration you need Mathematica 7+ or the free Mathematica Player 7EX
Download or upgrade to Mathematica Player 7EX
I already have Mathematica Player or Mathematica 7+