Distribution of Discrete Records

Initializing live version
Download to Desktop

Requires a Wolfram Notebook System

Interact on desktop, mobile and cloud with the free Wolfram Player or other Wolfram Language products.

Consider a sequence of independent, identically distributed data values , , …. Assume that the data values are from among the non-negative integers 0, 1, 2, … with no upper bound. A value is a record value (or a high-water mark) if it is the largest value among all the values that have been recorded up to time . Let be the record value, ; define , that is, the first data value is the record value (or the trivial record). This Demonstration shows the distribution of the record values , , when the data has a Poisson, a negative binomial, or a geometric distribution (the special case of the negative binomial distribution with ).

[more]

The blue curve is the probability density function of the data (for a discrete distribution, the probability density function is often also called the probability mass function). The red curve is the density of the record value currently chosen. The green vertical line represents the approximate median of the distribution of the record value: the probability that the record value is at most this value is at least 0.5 (and the probability that the record value is at most this value minus one is less than 0.5).

[less]

Contributed by: Heikki Ruskeepää (April 2014)
Open content licensed under CC BY-NC-SA


Snapshots


Details

Snapshot 1: Here, has the Poisson distribution with the parameter (that is, the mean of is 2). The blue curve shows the probabilities of this distribution (up to the value 12 of the variable). The red curve shows the probabilities of the second record value. The second record is at most 5 with a probability of 0.597 and at least 6 with a probability of 0.403.

Example: Assume that the number of certain kinds of accidents in a given city in a day has the Poisson distribution with mean 2. The first few numbers of accidents in a year could be, say, 2, 0, 1, 2, 3, 1, 4, … The first number of accidents, 2, is the record. The first record is 3, the number of accidents in the fifth day. The second record is 4, the number of accidents in the sixth day. The red curve shows that the most probable value of the second record is 5.

Snapshot 2: Now consider the negative binomial distribution with the parameters and . This distribution is the same as the geometric distribution with the parameter . Recall that if has the geometric distribution with a parameter , then is the number of failures in a sequence of independent trials before the first success, each trial being a success with probability .

Example: Toss a dice until you get 6. Consider the number of tosses before you get 6; this number of failures has the geometric distribution with the parameter . The probabilities of this distribution are shown in blue. Play this game several times; at each experiment, write down the number of failures before you get 6. Consider then the records of the number of failures. For example, in the first few experiments you may have needed, say, 7, 4, 6, 9, 2, 5, 8, 3, 11, … failures. The first number of failures, 7, is the record. The first record is 9, the number of failures in the fourth experiment. The third record is 11, the number of failures in the ninth experiment. The red curve gives the probabilities of the first record. It has a median of approximately 9; so, the first record is at most 9 with a probability of 0.515 and at least 10 with a probability of 0.485.

Snapshot 3: Here, has the negative binomial distribution with the parameters and . Recall that if has the negative binomial distribution with the parameters and , then is the number of failures in a sequence of independent trials before the success, each trial being a success with probability .

Example: Toss again a dice until you get 6 for the second time. Consider the number of tosses before you get the second 6; this number of failures has the negative binomial distribution with parameters and . The probabilities of this distribution are shown in blue. Play this game several times; at each experiment, write down the number of failures before you get the second 6. Consider then the records of the number of failures. For example, in the first few experiments you may have needed, say, 11, 6, 9, 14, 12, 7, 13, 22, … failures. The first number of failures, 11, is the record. The first record is 14, the number of failures in the fourth experiment. The second record is 22, the number of failures in the eighth experiment. The red curve gives the probabilities of the first record. It has a median of approximately 16; so, the first record is at most 16 with a probability of 0.506 and at least 17 with a probability of 0.494.

Difficulty of Discrete Records

The theory of records for discrete random variables is much harder than the theory for continuous random variables. One exception is the geometric distribution, see [1, Section 2.9]. As is said in [1, pp. 38, 40], "for most integer valued random variables, the exact distribution of the record is, to say the least, cumbersome to describe" and "this explains the paucity of papers dealing with the exact distribution of discrete record values, or rather the fact that such papers inevitably focus on the geometric distribution."

This Demonstration shows the exact distribution of the record values for the Poisson and negative binomial distributions (for a range of numerical values of the parameters of these distributions). The geometric distribution is the special case of the negative binomial distribution. (Note, however, that the distribution called the geometric distribution in Mathematica is not exactly the same as the distribution considered in [1, Section 2.9]: in [1], the random variable takes on the values 1, 2, …, while in Mathematica the random variable takes on the values 0, 1, ….)

One important formula for records of discrete random variables is formula (2.8.10) in [1, p. 35]. Below, we first summarize this formula. However, we have to go somewhat further to actually get the probabilities of the record values.

An earlier Demonstration, Distribution of Records, considers the distribution of continuous records, and Records in Sequences of Random Variables shows results that do not depend on the distribution of the data.

A Basic Equality

Assume that the random variable describing the independent, identically distributed observations takes on non-negative integer values. To avoid terminating record sequences, assume that there is no upper bound on the value of the random variable (that is, the cumulative distribution function is smaller than 1 at each finite integer).

Define a Bernoulli variable as follows: if is a record value (that is, there exists an such that ), and if is not a record value. Define . Then, and the variables are independent [1, p. 35].

Recall that the record value is denoted by , ; . We can argue by contradiction that if and only if among the values 0, 1, …, there are at most record values ( being one of them). This means that the events and are equivalent, and so we have the basic equality

,

; (note a small correction here as compared to the formula (2.8.10) in [1]). For example, with and , the equality becomes , this holds true because both and . As another example, with and , the equality becomes , which also holds true, because both and .

In [1, p. 36], the basic equality is used to derive an asymptotic distribution for the record values. Here, we use the basic equality to derive the exact distribution of the record values.

Distribution of the Record Values

Let . Then, from the basic equality,

.

The probability generating function of is . The probability , , is the coefficient of of the generating function. Because , we finally have

,

, . We use this formula to calculate the probabilities of the record values. Note that if ; also, .

Reference

[1] B. C. Arnold, N. Balakrishnan, and H. N. Nagaraja, Records, New York: Wiley, 1998.



Feedback (field required)
Email (field required) Name
Occupation Organization
Note: Your message & contact information may be shared with the author of any specific Demonstration for which you give feedback.
Send