Principal Component Analysis (PCA) is used in financial risk management to reduce the dimensionality of a multivariate problem, thus creating a simpler representation of the risk factors in the dataset. Only a few judiciously chosen hypothetical variables are needed to explain a large proportion of the variability in the data. These principal components are obtained through the singular value decomposition of the return series.

Consider the case of a portfolio consisting of 10 assets, each yielding returns characterized by the standard normal distribution (with mean 0 and standard deviation 1) and correlated with one another via a correlation matrix, with the entry between asset

and asset

given by

. To study the effectiveness of PCA, a series of synthetic portfolio returns is generated, each incorporating an increasing number of principal components. As functions of the number

of principal components, both Value at Risk (VaR) and Expected Shortfall (ES) of the synthetic portfolios are relatively flat for

. Thus, only three principal components are needed to approximate these extreme statistics of the portfolio.

Suppose an investor concerned about possible losses in the value of a portfolio wants to know, out of the worst five possible losses during the next 100 days, what is the smallest of these five losses: that is VaR at 95% confidence level (𝒸ℓ) over a 100-day horizon. The average of these five worst losses is given by ES at 95% 𝒸ℓ.

As a reference, the "asymptotic" VaR and ES are shown as the horizontal dashed lines. These asymptotic statistics are based on simulating a large amount of data (

observation days) using the full set of (

) principal components. The time evolution shows that the 95% confidence level (𝒸ℓ) VaR and ES are more robust than their 99% 𝒸ℓ counterparts, hugging closer to their asymptotic values. As expected, the higher 99% 𝒸ℓ VaR and ES are less robust because of the greater data variability inherent in the more extreme tail of the distribution.

The precision of the portfolio VaR and ES is a function of sample size: the larger the number of data points in the return series, the smaller the dispersion in the statistics. This notion is evidenced through smaller fluctuations in the PCA VaR and ES using a larger number of observation days.