# Maximum Entropy Probability Density Functions

Requires a Wolfram Notebook System

Interact on desktop, mobile and cloud with the free Wolfram Player or other Wolfram Language products.

The principle of maximum entropy can be used to find the probability distribution, subject to a specified constraint, that is maximally noncommittal regarding missing information about the distribution. In this Demonstration the principle of maximum entropy is used to find the probability density function of discrete random variables defined on the interval subject to user-specified constraints regarding the mean and variance . The resulting probability distribution is referred to as an distribution [1]. The mean of the distribution associated with a proposition is the probability of that proposition, and the variance of the distribution is a measure of the amount of confidence associated with predicting the probability of the proposition. When only the mean is specified, the entropy of the distribution is maximal when the specified mean probability is When both mean and variance are specified, the entropy of the distribution decreases as the specified variance decreases.

Contributed by: Marshall Bradley (March 2011)

Open content licensed under CC BY-NC-SA

## Snapshots

## Details

Probabilities are used to characterize the likelihood of events or propositions. In some circumstances, predictions of probability carry a high degree of confidence. For example, an individual can confidently predict that a fair coin will produce “heads” in one flip with probability . By way of contrast, there is more uncertainty associated with a weather prediction that states the probability of rain tomorrow as . E. T. Jaynes developed the concept of the distribution to deal with what he described as different states of external and internal knowledge. In the terminology of Jaynes, the probability of the proposition is found by computing the mean of the distribution, and the variance of the distribution is a measure of the amount of confidence associated with the prediction of the mean. In situations where you have high states of internal knowledge, like the case of the coin, the variance of the distribution is small. In fact, for the case of coin, the variance of the distribution is 0.

The entropy is a measure of the amount of disorder in a probability density function. The principle of maximum entropy can be used to find distributions in circumstances where the only specified information is the mean of the distribution or the mean and variance of the distribution. The distributions in this Demonstration are evaluated at the points for . If the probability density at these points is denoted by , then the mean , variance , and entropy of the distribution are respectively given by

, , .

If the mean of the distribution is specified, then the corresponding maximum entropy probability distribution can be found using the technique of Lagrange multipliers [2]. This requires finding the maximum of the quantity

,

where the unknowns are the probabilities and the Lagrange multipliers and . If the mean and the variance of the distributions are both specified, then it is necessary to find the maximum value of the quantity

,

where is an additional Lagrange multiplier.

References

[1] E. T. Jaynes, *Probability Theory: The Logic of Science*, New York: Cambridge University Press, 2003.

[2] P. Gregory, *Bayesian Logical Data Analysis for the Physical Sciences*, Cambridge: Cambridge University Press, 2005.

## Permanent Citation