Monte Carlo Expectation-Maximization (EM) Algorithm

The Monte Carlo expectation-maximization (EM) algorithm is used to estimate the mean in a random sample of size from a left-censored standard normal distribution with censor point, , where is the censor rate and is the inverse cumulative distribution function of the standard normal distribution. The random sample consists of non-censored observations and censored observations. You can see these observations in the quantile plot along with the empirical censor rate, .
Each Monte Carlo iteration uses simulations of latent censored values in the right-truncated normal distribution on to provide an estimate of the mean. Monte Carlo iterations are done and these are shown as the blue points. The blue line segment shows the corresponding average for the blue points. The short red line segment on the right shows the deterministic EM estimate.
You can see that using the average improves the accuracy, since the blue line showing the average is closer to the red line than the individual points. Also, you see that as the number of iterations increases, the points lie closer, illustrating the well-known result that the iterations converge to the deterministic value. The gray line shows the true parameter value, , and this Demonstration shows that as the sample size increases, both the Monte Carlo and deterministic EM algorithm converge to the true value. See Details for further discussion.

SNAPSHOTS

  • [Snapshot]
  • [Snapshot]
  • [Snapshot]
  • [Snapshot]
  • [Snapshot]

DETAILS

Snapshot 1: This shows the quantile plot of the data used in the thumbnail. You see from the plot that and that there were five censored values, so the empirical censor rate was 20%.
Snapshot 2: The initial point is obtained as the mean of random variables generated from the right-truncated normal distribution with mean parameter , where is the sample mean of original data, including the censored values. Some crude data augmentation algorithms used in practice stop at this point. The plot shows this does not usually provide an accurate result.
Snapshot 3: Increasing the number of iterations from 0 to 5 greatly improves the accuracy. The average of these iterations is very close to the deterministic EM result.
Snapshot 4: Increasing both the number of iterations and the number of simulations shows convergence to the deterministic EM result.
Snapshot 5: Using only iterations and simulations but increasing the data sample size from to results in estimates that more accurately estimate the true value, . Also note the decrease in variability as the iterations increase.
The Monte Carlo EM algorithm for the censored normal distribution is discussed in [1, p. 533].
A new discovery illustrated in this Demonstration is that a more accurate estimate may be obtained by using the average of the Monte Carlo EM iterates, as shown by the fact that the horizontal blue line segment has nearly the same ordinate as the red one. This is what is usually done in Markov chain Monte Carlo (MCMC) applications [2], and the Monte Carlo EM algorithm may be viewed as a special case of MCMC [3].
References
[1] C. R. Robert and G. Casella, Monte Carlo Statistical Methods, New York: Springer, 2004.
[2] C. J. Geyer, "Introduction to Markov Chain Monte Carlo" in Handbook of Markov Chain Monte Carlo (S. Brooks, A. Gelman, G. L. Jones, and X.-L. Meng, eds.), CRC Press: Boca Raton, 2011.
[3] D. A. van Dyk and X.-L. Meng, "The Art of Data Augmentation," Journal of Computational and Graphical Statistics, 10(1), 2001 pp. 1–50. doi:10.1198/10618600152418584.
    • Share:

Embed Interactive Demonstration New!

Just copy and paste this snippet of JavaScript code into your website or blog to put the live Demonstration on your site. More details »

Files require Wolfram CDF Player or Mathematica.