Estimating the Size of a Population
Requires a Wolfram Notebook System
Interact on desktop, mobile and cloud with the free Wolfram Player or other Wolfram Language products.
Suppose that in a marathon each runner has a running number on their back but we do not know , the total number of runners. At some point along the course we observe random runners and write down their running numbers , . What would be our estimate of ? Let be largest of the observed numbers . It can be shown that a good estimate of is . The Demonstration shows examples of this estimate when is 100. It also shows the frequency distribution of the percent estimation error, when the is a random integer between 100 and 1000.
Contributed by: Heikki Ruskeepää (June 2012)
Open content licensed under CC BY-NC-SA
Snapshot 1: The estimated size of population can be considerably smaller than the true size if the largest number in the sample happens to be relatively small, as is the case here. Recall that the estimator is , where is the largest number in the sample. Thus, if is small, the estimate is small, too. The smallest value of is (this is the case when we happen to get the sample 1, 2, …, ); in this case the estimate of is .
Snapshot 2: The estimated size of population can also be somewhat larger than the true size if happens to be near to the true size of population. However, note that the estimate is always at most . For example, if and , the estimate is at most 109.
Snapshot 3: Here is the frequency distribution of the percent estimation error, when is a random integer between 100 and 1000 and the sampling percentage is 1. The frequencies are calculated for the error intervals …, , , , , …; the frequencies come from simulations for each sampling percentage 1, 2, …, 100. Note the smooth behavior on the left side of the distribution, the nonsmooth behavior on the right side, the very uncommon occurrence of an estimation error greater than 50%, and the peak in between 8.5% and 9.5%. In a vast majority of cases, the percent error is between -50% and 50%.
Snapshot 4: Here the sampling percentage is 2, and now the vast majority of percent estimation errors is between -30% and 30% and the peak is at between 4.5% and 5.5%.
Snapshot 5: Now the sampling percentage is 3. Most percent estimation errors are between -24% and 24% and the peak is between 2.5% and 3.5%.
Snapshot 6: When the sampling percentage is 14, the peak of the distribution is, for the first time, at the origin, that is, between 0.5% and 0.5%.
The method can also be used to estimate, for example, the number of taxicabs or tanks. According to , it can be shown that the given estimator, , is the uniformly minimum variance unbiased estimator of . Unbiasedness means that the expectation of the estimator is . The Demonstration is based on problem 12 in .
 R. W. Johnson, "Estimating the Size of a Population," Teaching Statistics, 16(2), 1994 pp. 50–52.
 P. J. Nahin, Digital Dice: Computational Solutions to Practical Probability Problems, Princeton: Princeton University Press, 2008.