Exploring Multivariate Data
This Demonstration explores two five-dimensional datasets. Basic multivariate numerical statistical summaries are provided, but the emphasis is on interactive graphical exploration using three common displays. The "simulation" dataset has 1000 (five-dimensional) observations and is produced in real-time using built-in Mathematica functions. The variables in the simulation data are uniform(0, 10), triangular(0, 10), Poisson(3), standard normal, and beta(5, 3). The "pollen" data has 3848 (six-dimensional) observations. The "pollen" data is a famous dataset used in a statistical analysis competition at the 1986 Joint Meetings of the American Statistical Association. In the pollen dataset, the first five variables are "ridge", "nub", "crack", "weight", and "density". The sixth variable is just an index number and is not used. A careful exploration of the pollen data will reveal some surprising results.
The viewpoint sliders will adjust dynamically when you use the mouse to zoom or rotate!
Theoretically, the five simulation variables are (mutually) independent and this independence is reflected by a correlation matrix that is approximately a 5×5 identity matrix. The coordinate axes are color coded (: red, : green, : blue) and the axis is thicker than the others to assist in tracking the orientation of the 3D data cloud. For a thorough discussion of the pollen data, see the following reference.
R. A. Becker, L. Denby, R. McGill, and A. R. Wilks, "Datacryptanalysis: A Case Study," AT&T Bell Laboratories, Statistical Research Reports No. 36, 1986.