Let , …, be a random sample from a discrete distribution. Reorder the sample in increasing order; denote the corresponding variables by , …, . Thus, for example, is the smallest of the variables, the second smallest, and the largest. The variable is called the order statistic. This Demonstration shows the distributions of the order statistics (the red curves) when the sample is from a uniform, binomial, geometric, or Poisson distribution or when the sample is made without replacement from a finite set (the underlying distribution is shown in blue).
Snapshot 1: The data has the uniform distribution among the integers 1, 2, 3, 4, 5, and 6; see the blue curve. In general, a random variable with the uniform distribution takes on the values 1, …, with equal probabilities . In a sample with replacement from the integers 1, …, , the random variable describing the values has the uniform distribution. For example, toss a die three times. The three red curves in the snapshot show the distributions of the smallest (dark red), middle (light red), and largest (light red) results. We see that with a probability of almost 0.9, the smallest result is 1, 2, or 3; the expectation is 2.04. The smallest result is 4 with a small probability of approximately 0.1 and 5 or 6 with still smaller probabilities. The middle result is, with a high probability, 2, 3, 4, or 5; the expectation is 3.5. The largest result is, with a high probability, 4, 5, or 6; the expectation is 4.96. Snapshot 2: The data is obtained by sampling, without replacement, three elements from the integers 1, …, 6. In general, we have a set of integers 1, …, from which we draw, without replacement, a number of elements. From the snapshot, we see that the smallest number can be 1, 2, 3, or 4. We see that with a high probability of approximately 0.95, the smallest number is 1, 2, or 3; the expectation is 1.75. The middle number can be 2, 3, 4, or 5; the expectation is 3.5. The largest number can be 3, 4, 5, or 6; the expectation is 5.25. Snapshot 3: The data has the binomial distribution with parameters and . For example, toss a die six times and count the occurrences of the result 6. Repeat this experiment three times. We see that, with a high probability of over 0.95, the smallest number of 6s is 0 or 1; the expectation is 0.31. The middle number of 6s is, with a high probability, 0, 1, or 2; the expectation is 0.92. The largest number of 6s is, with a high probability, 1, 2, or 3; the expectation is 1.77. Snapshot 4: The data has the geometric distribution with the parameter . For example, toss a die until you get 6 for the first time. Count the number of failures, that is, tosses that precede the first 6. Repeat this experiment three times. We see that, with a high probability of almost 0.9, the smallest number of failures is 0, 1, 2, or 3; the expectation is 1.37. The middle number of failures is, with a high probability, at most, say, 10; the expectation is 4.07. The largest number of failures is, with a high probability, at most, say, 20; the expectation is 9.56. Snapshot 5: The data has the Poisson distribution with a mean of . For example, assume that the number of certain kinds of accidents in a given city in a day has this distribution. Consider the number of accidents in three days. With a high probability of over 0.95, the smallest number of accidents is 0, 1, or 2; the expectation is 0.89. The middle number of accidents is, with a high probability, 1, 2, or 3; the expectation is 1.90. The largest number of accidents is, with a high probability, 1, 2, 3, 4, or 5; the expectation is 3.21. Let the cumulative distribution function of the data variable be . The cumulative distribution function of the order statistic is then [1, p. 12] . and the probability density function (also called the probability mass function) is . The probabilities of the order statistics in a sample without replacement are given in [1, p. 54]. Note that in this case, the variables , …, are no longer independent (independence is assumed in all other cases). The expectations of the order statistics are calculated in the traditional way in the cases where the domain of the distribution is finite. For the geometric and Poisson distributions that have an infinite domain, we use the formula [1, p. 43] , where the sum is calculated approximately by replacing the infinite upper bound with 100 (this suffices for six correct decimals). [1] B. C. Arnold, N. Balakrishnan, and H. N. Nagaraja, A First Course in Order Statistics, Philadelphia: SIAM, 2008.
