The Bayesian approach to statistics provides an intuitive way of making probabilistic statements about statistical parameters of interest. Consider a situation in which

independent samples

of a random variable

have been obtained. Based upon prior information, the random variable is known to be normally distributed with unknown mean

and standard deviation

. Additional initial information consists of a priori knowledge about the distribution of the unknown mean

and the unknown standard deviation

of the random variable

. In the Bayesian approach, this a priori information is formally represented by the prior probability density functions

defined on the interval

and

defined on the interval

.
Bayes's theorem can be used to compute the joint posterior probability distribution of

and

given the data

and the initial information about

and

. Formally the result for the posterior distribution

is

,
where

is the likelihood of the data

, given specific values of the parameters

and

. Since the data samples are independent and drawn from a normal distribution, the likelihood of the data can be written

.
The likelihood of the data can be expressed in terms of the data sample mean

and standard deviation

. The result is

,

and

.
This last result demonstrates that knowledge about the data is captured by the sample statistics

and

together with the sample size

.
The posterior distribution

can now be written

.
The posterior distribution of

can be found by marginalizing across

. Formally the result is

.
For a normal distribution, the mean

and standard deviation

are respectively position and scale parameters. A normal distribution is centered on

and has a width that is related to

. In situations where there is significant prior uncertainty in the distributions of

and

, it is appropriate to assume that the prior probability distribution of the mean

is uniformly distributed with probability density function

,

.
Since the standard deviation

is a scale parameter and not a position parameter, it is appropriate to assume that the prior probability distribution for

is given by the Jeffrey's prior

,

.
With these specific representations for

and

, the posterior distribution of

can be written in terms of the initial and observed information:

.
The posterior probability distribution of

is

.
In the special case of a single observation of the random variable

, that is,

, and complete uncertainty in the mean

, that is,

and

, then

,

and the posterior distribution of the mean

is given by

,
and the posterior distribution of the standard deviation

is

,

.
This is identical to the prior distribution

. Thus in the special case

and complete uncertainty regarding

, Bayes's theorem says that a single observation tells us nothing about its own uncertainty.
In this Demonstration, you can choose values of the initial data

,

,

,

and the observed data

,

,

, and investigate the affects of these choices on the posterior distributions

and

. The black dots on the horizontal axes in the lower two plots indicate the locations of the observed data

and

with respect to their respective posterior distributions.
[1] E. T. Jaynes,
Probability Theory: The Logic of Science, Cambridge: Cambridge University Press, 2003.
[2] H. Jeffreys,
Scientific Inference, 3rd ed., Cambridge: Cambridge University Press, 1973.