Distribution of Records

Requires a Wolfram Notebook System

Interact on desktop, mobile and cloud with the free Wolfram CDF Player or other Wolfram Language products.

Requires a Wolfram Notebook System

Edit on desktop, mobile and cloud with any Wolfram Language product.

Consider a sequence of independent, identically distributed, continuous data values , , …. A value is a record value (or a high-water mark) if it is the largest value among all the values that have been recorded up to time . Let be the record value, ; define , that is, the first data value is the zeroth record value (or the trivial record). This Demonstration shows the distribution of the record values , , for some distributions of the data.

[more]

The blue curve is the probability density function of the data. The light red curves are the probability density functions of the record values, the dark red curve is the density of the record value currently chosen, and the green vertical line is the mean of this distribution. We can also study the conditional distribution of the next record given the current record.

[less]

Contributed by: Heikki Ruskeepää (March 2014)
Open content licensed under CC BY-NC-SA


Snapshots


Details

Snapshot 1: The data has a standard normal distribution. As is well known, a value of a random variable from this distribution is, with high probability, in the interval (see the blue curve). The dark red curve shows the density of the fifth record value (note that, in a sequence of 100 values, the mean number of records is approximately five). We see that the fifth record is, with high probability, in the interval . The mean of the fifth record is 2.72. The light red curves show that the variance of the record values has a decreasing pattern. The 10 record is, with high probability, in the interval . The means of the first through 10 record values are as follows: 0.90, 1.50, 1.97, 2.37, 2.72, 3.03, 3.32, 3.59, 3.85, and 4.09. Note that records occur seldomly. For example, for the mean number of records to be 10, we need a sequence of 12,367 values!

Snapshot 2: The data has an exponential distribution with mean 1. A value of a random variable from this distribution is, with high probability, in the interval (see the blue curve). The dark red curve again shows the density of the fifth record value. We see that the fifth record is, with high probability, in the interval . The mean of the fifth record is 6. The light red curves show that the variance of the record values has an increasing pattern. In fact, the mean and variance of the record is for the exponential distribution with mean 1.

Snapshot 3: The data has a Weibull distribution. The data is, with high probability, in the interval , while the fifth record value is, with high probability, in the interval . The variance of the record values remains almost constant.

Snapshot 4: The data has a standard uniform distribution. The fifth record value is, with high probability, in the interval . The mean of the fifth record is 0.984. The variance of the record values has a decreasing pattern. Note that the red curves have been cut at 3.6.

Snapshot 5: The data has a standard normal distribution. Suppose the current record value is 3.0. The figure shows the conditional density and mean of the next record. The conditional mean of the next record is 3.28.

Let the probability density and cumulative distribution functions of the data be and , respectively. The density function of the record value, , is ; see [1, p. 10]. Moments of record values for the Weibull, power function, Pareto, Gumbel, and normal distribution are derived in [1, Section 2.7] (the extreme value distribution considered in [1] is called the Gumbel distribution in Mathematica); we used Mathematica to derive the results for various distributions.

Suppose that the current record value is . The conditional density of the next record given the current record is , ; see [1, p. 11].

In another Demonstration, Records in Sequences of Random Variables, we also consider record values. There, the results do not depend on the distribution of the data.

Reference

[1] B. C. Arnold, N. Balakrishnan, and H. N. Nagaraja, Records, New York: Wiley, 1998.



Feedback (field required)
Email (field required) Name
Occupation Organization
Note: Your message & contact information may be shared with the author of any specific Demonstration for which you give feedback.
Send