Maximum Entropy Probability Density Functions

Requires a Wolfram Notebook System
Interact on desktop, mobile and cloud with the free Wolfram Player or other Wolfram Language products.
The principle of maximum entropy can be used to find the probability distribution, subject to a specified constraint, that is maximally noncommittal regarding missing information about the distribution. In this Demonstration the principle of maximum entropy is used to find the probability density function of discrete random variables defined on the interval subject to user-specified constraints regarding the mean
and variance
. The resulting probability distribution is referred to as an
distribution [1]. The mean of the
distribution associated with a proposition
is the probability of that proposition, and the variance of the
distribution is a measure of the amount of confidence associated with predicting the probability of the proposition. When only the mean is specified, the entropy
of the
distribution is maximal when the specified mean probability is
When both mean and variance are specified, the entropy
of the
distribution decreases as the specified variance decreases.
Contributed by: Marshall Bradley (March 2011)
Open content licensed under CC BY-NC-SA
Snapshots
Details
Probabilities are used to characterize the likelihood of events or propositions. In some circumstances, predictions of probability carry a high degree of confidence. For example, an individual can confidently predict that a fair coin will produce “heads” in one flip with probability . By way of contrast, there is more uncertainty associated with a weather prediction that states the probability of rain tomorrow as
. E. T. Jaynes developed the concept of the
distribution to deal with what he described as different states of external and internal knowledge. In the terminology of Jaynes, the probability of the proposition
is found by computing the mean of the
distribution, and the variance of the
distribution is a measure of the amount of confidence associated with the prediction of the mean. In situations where you have high states of internal knowledge, like the case of the coin, the variance of the
distribution is small. In fact, for the case of coin, the variance of the
distribution is 0.
The entropy is a measure of the amount of disorder in a probability density function. The principle of maximum entropy can be used to find
distributions in circumstances where the only specified information is the mean of the distribution or the mean and variance of the distribution. The
distributions in this Demonstration are evaluated at the points
for
. If the probability density at these
points is denoted by
, then the mean
, variance
, and entropy
of the
distribution are respectively given by
,
,
.
If the mean of the
distribution is specified, then the corresponding maximum entropy probability distribution
can be found using the technique of Lagrange multipliers [2]. This requires finding the maximum of the quantity
,
where the unknowns are the probabilities and the Lagrange multipliers
and
. If the mean
and the variance
of the
distributions are both specified, then it is necessary to find the maximum value of the quantity
,
where is an additional Lagrange multiplier.
References
[1] E. T. Jaynes, Probability Theory: The Logic of Science, New York: Cambridge University Press, 2003.
[2] P. Gregory, Bayesian Logical Data Analysis for the Physical Sciences, Cambridge: Cambridge University Press, 2005.
Permanent Citation