Exploratory Factor Analysis

Initializing live version
Download to Desktop

Requires a Wolfram Notebook System

Interact on desktop, mobile and cloud with the free Wolfram Player or other Wolfram Language products.

This Demonstration implements exploratory factor analysis (EFA) using singular value decomposition for Procrustes least squares.

Contributed by: Stuart Nettleton (June 2011)
Open content licensed under CC BY-NC-SA


Snapshots


Details

In 2004 and 2008, Jan De Leeuw proposed a major paradigm shift in factor extraction methods ([1], [2]). Classical exploratory factor analysis (EFA) estimates factor loadings and unique variances indirectly by fitting the EFA correlation structure to the sample correlation matrix. This is often frustrated by matrix singularity. Furthermore, there is an infinite number of solutions due to factor indeterminacy and rotational indeterminacy, which require standardized processing sequences (for repeatability) and supplementary rotations, respectively. De Leeuw's new EFA treats the problem as data matrix decomposition with simultaneous estimation of fixed unknown parameters as realizations of random variables, retaining all information and not subject to singularity. Nor is factor indeterminacy an issue because data matrix decomposition estimates all parameters simultaneously. [6] shows that rotational indeterminacy may be avoided through the use of triangular factors. Classical EFA most often uses a maximum likelihood goodness-of-fit criterion, which frustrates direct methods because it becomes unbounded. New EFA uses alternating least-squares (ALS) goodness-of-fit criterion, through the elegant solution of [4], which leads to results comparable to maximum likelihood. Unkel and Trendfilov [7] shows that the new EFA technique may use the nullspace associated with singular value decomposition to allow models with "horizontal data" having more variables than observations, such as global atmospheric or geophysical data and genome expression research. This overcomes classical EFA's restriction of "vertical data" with more observations than variables.

The new EFA is computationally feasible through the availability of fast singular value decomposition (although QR decomposition is also a useful method):

If then , where is the matrix of the eigenvectors of , is the matrix of the eigenvectors of , are the eigenvalues of both and , and are orthogonal such that and .

Procrustes method: Given , find by least squares: , .

Three standard datasets, considered to be quite difficult for extracting factors, are provided in the Demonstration. These are based on Thurstone's 20-variable databox problem with 19, 20, and 26 variables ([5], [3]). As factor fitting is iterative, it may take from a few seconds to a few minutes to process data depending on computer speed. Lastly, as the fitting process begins from a random start, the processing time may vary due to this random factor and it may sometimes be efficacious to manually restart the processing.

References

[1] J. De Leeuw, "Least Squares Optimal Scaling of Partially Observed Systems," in K. van Montfort, J. Oud, and A. Satorra (eds.), Recent Developments on Structural Equation Models: Theory and Applications, Dordrecht, NL: Kluwer Academic Publishers, 2004 pp. 121–134.

[2] J. De Leeuw, "Factor Analysis as Matrix Decomposition," Preprint series: Department of Statistics, University of California, Los Angeles, 2008.

[3] H. F. Kaiser and P. Horsch, "A Score Matrix for Thurston's Box Problem," Multivariate Behavioral Research, January 1975 pp. 17–26.

[4] P. H. Schönemann, "A Generalized Solution of the Orthogonal Procrustes Problem," Psychometrika, 31(1), 1966 1–10.

[5] L. L. Thurstone, Multiple-Factor Analysis, Chicago: University of Chicago Press, 1947 pp. 140–146.

[6] S. Unkel and N. T. Trendfilov, "Simultaneous Parameter Estimation in Exploratory Factor Analysis: An Expository Review," Department of Mathematics and Statistics, The Open University, Milton Keynes, UK, August 17, 2010.

[7] S. Unkel and N. T. Trendfilov, "Zig-Zag Exploratory Factor Analysis with More Variables Than Observations," Computational Statistics, 2010, under consideration.



Feedback (field required)
Email (field required) Name
Occupation Organization
Note: Your message & contact information may be shared with the author of any specific Demonstration for which you give feedback.
Send