# Maximum Likelihood Estimation of Ordinary and Finite Mixture Distributions

Requires a Wolfram Notebook System

Interact on desktop, mobile and cloud with the free Wolfram Player or other Wolfram Language products.

For given data, this Demonstration initially shows a histogram and a plot of the sorted data. We can then ask for a plot of the estimated density, two densities, or a two-component mixture density; the mixture density is a weighted sum of two densities. The parameters of the densities are estimated by the maximum likelihood method. The plots of the estimated densities are shown on top of the histogram; this gives an easy visual check of how well the estimated densities fit the histogram. From the estimated densities, we also calculate quantiles, and in the second plot we show the so-called - plot for quantiles, providing another visual way to check the quality of the fit; the root mean square error (RMSE) of the quantiles summarizes the fit numerically.

Contributed by: Heikki Ruskeepää and M. A. Ghorbani (March 2011)

Open content licensed under CC BY-NC-SA

## Snapshots

## Details

Snapshot 1: adjusting the number of bins

Snapshot 2: estimating a density

Snapshot 3: comparing two estimated densities

Snapshot 4: estimating a mixture density

The Demonstration can be used to estimate both usual distributions and two-component mixture distributions. (Mixture distributions are also called compound distributions.)

As to the estimation of a usual distribution, a typical use of this Demonstration could proceed as follows:

1) Define the data as a list of values; give it the name "data".

2) Adjust the number of bins of the histogram (Snapshot 1).

3) Try several densities from "density 1". Once a promising candidate is found, keep it in "density 1" (Snapshot 2).

4) Choose other densities from "density 2", comparing them with "density 1" (Snapshot 3).

By doing these pairwise visual comparisons, try to find the best density, that is, the density that best fits the data as summarized by the histogram and gives the best - plot.

The - plot shows points whose first component is a component of the vector of sorted data and whose second component is the corresponding quantile calculated from the estimated density. The estimated density is better the closer the points of the - plot are to the diagonal line . The RMSE shown is the root mean square error between the sorted data and the calculated quantiles; the fit is better the closer the RMSE is to zero.

Consider then the estimation of a two-component mixture distribution. In a typical use of the Demonstration, steps 1) and 2) are the same as above. Proceed then as follows:

3) Check the "mixture density" checkbox. Choose two densities, say and , from "density 1" and "density 2". The mixture is then of the form , where is an unknown constant, .

4) Investigate various pairs of densities to find the best mixture (Snapshot 4).

A mixture density may be particularly valuable in estimating data whose histogram seems to not easily suggest a good ordinary density. In particular, a mixture density may be useful in cases where the histogram seems to be bimodal (i.e., contains two maxima).

In the example shown in the snapshots, we consider monthly maximum wind speed in Turku, Finland, from 1973 to 2009. For this data, the extreme value distribution gives a good fit (RMSE = 2.9, Snapshot 2). However, a mixture of the extreme value distribution and the inverse Gaussian distribution gives still a better density (, Snapshot 4; note that to get this result we have to adjust the initial value of the weight).

To get an example of bimodal data, replace in our example "MaxWindSpeed" with "MaxTemperature". In this case, a mixture of the extreme value distribution and the Gumbel distribution gives a very good fit.

Some technical details follow. The parameters of the densities are estimated with the maximum likelihood method by using *Mathematica*'s built-in function NMaximize. Some parameters of some densities are required to be positive, and this is taken into account in the optimization as constraints.

To help NMaximize find the global maximum, we have started the maximization of the log likelihood function from a point that corresponds with estimates of the parameters calculated with the method of moments. These estimates are deviated by ±0.1 to get two starting values.

The default starting values for the weighting constant are but the center of the starting values can be changed with a slider. A change of the starting value of may help if we do not get the global optimum. For example, if the histogram suggests a bimodal mixture density but the default value of only gives rise to a unimodal density; changing the value of may help in getting a bimodal density.

Estimating the mixture density is a demanding task, and in some cases the optimization may not succeed (in this case, the Demonstration only shows the histogram and the sorted data). In such situations the problem should be investigated in more detail, outside of the Demonstration. We could, for example,

1) try other methods for NMaximize like differential evolution, simulated annealing, or random search (the default method is Nelder–Mead);

2) try various seed numbers for random number generation in NMaximize;

3) try to give better starting values for the parameters in NMaximize.

In adjusting the number of bins, note that too low a value gives a histogram that is too coarse and does not reveal the true distribution of the data. On the other hand, too high a value for the number of bins results in a too detailed histogram where the heights of the bars vary wildly. Try to choose a value for the number of bins that gives a smooth enough histogram that is representative of the distribution of the data.

The densities in the dropdown menus are shown in three groups: densities defined on , densities defined on , and the beta density that is defined on .

The quantiles of estimated ordinary densities are calculated with *Mathematica*'s built-in function Quantile. To calculate the quantiles of a mixture density, we use *Mathematica*'s built-in function FindRoot to numerically find the solution of the equation that defines the quantile.

For finite mixture distributions, see, for example, [1], [2], or [3]. See also Mixture density at Wikipedia.

[1] B. S. Everitt and D. J. Hand, *Finite Mixture Distributions,* London: Chapman and Hall, 1981.

[2] G. McLachlan and D. Peel, *Finite Mixture Models,* New York: Wiley, 2000.

[3] D. M. Titterington, A. F. M. Smith, and U. E. Makov, *Statistical Analysis of Finite Mixture Distributions,* Chicester: Wiley, 1985.

## Permanent Citation