Loading Plot of a Principal Component Analysis (PCA)

Requires a Wolfram Notebook System

Interact on desktop, mobile and cloud with the free Wolfram CDF Player or other Wolfram Language products.

Requires a Wolfram Notebook System

Edit on desktop, mobile and cloud with any Wolfram Language product.

Principal component analysis (PCA) is a statistical procedure that converts data with possibly correlated variables into a set of linearly uncorrelated variables, analogous to a principal-axis transformation in mechanics.

[more]

This Demonstration shows the loading plot in the space of principal components (PCs) extrapolated from a dataset of three rows. The rows come from three periodic functions, two fixed and uncorrelated and one is described by the parameters phase, frequency, and amplitude. The data is shown at the top-right and the correlation factors, at the top-left.

The calculation of PCs initially requires data standardization in order for the correlation matrix to be obtained. The latter has been used to obtain the eigenvectors matrix which, when multiplied by the original standardized data, gives the PC matrix, whose initial two columns give the new coordinates in the PC space (PC1, PC2) [1].

The percentage of variance explained by this model is calculated by using eigenvalues. The ones in PCA tell you how much variance can be explained by its associated eigenvector. Therefore, the highest eigenvalue indicates the highest variance in the data was observed in the direction of its eigenvector. Singular contribution to variance is calculated by first summing up all eigenvalues and then dividing by an eigenvalue.

Generally, the PCA is used for large datasets as a powerful tool allowing the identification of any correlation among any subsets. In this case, only three sets have been used to better understand this type of data representation.

[less]

Contributed by: D. Meliga and S. Z. Lavagnino (May 2016)
With additional contributions by: A. Chiavassaand M. Aria
Open content licensed under CC BY-NC-SA


Snapshots


Details

In the loading plot, the high correlation between two variables leads to two vectors that are very close to each other, the non-correlation leads to two vectors out of phase by , while the anti-correlation leads to two vectors that are out of phase by [2].

Snapshot 1: a strong correlation between and , non-correlation in the remaining cases . From a graphical point of view, we can see two vectors that are very close, while the others are out of phase with each other by about .

Snapshot 2: a strong correlation between and , non-correlation in the remaining cases . From a graphical point of view, we can see two opposite vectors, while the others are out of phase to each other by about .

References

[1] S. J. Press, Applied Multivariate Analysis: Using Bayesian and Frequentist Methods of Inference, Mineola, NY: Dover Publications, 2005.

[2] M. Aria. "L'analisi in Componenti Principali." (May 25, 2016) www.federica.unina.it/economia/analisi-statistica-sociologica/analisi-componenti-principali.



Feedback (field required)
Email (field required) Name
Occupation Organization
Note: Your message & contact information may be shared with the author of any specific Demonstration for which you give feedback.
Send