The coalescent is an algorithmic approach to simulating gene genealogies. It is also a way of looking at the history of genes in a population, which has given rise to considerable theoretical development in population genetics.
The basic idea is that instead of considering the reproduction of genes forward in time, we instead look at the ancestry of a sample of genes looking backward in time. We assume a diploid Wright–Fisher population model. In each generation, two genes have a probability
of sharing a single common ancestor, and a probability
of having two distinct ancestors. Then the number of generations to have elapsed since their common ancestor is distributed as an exponential decay curve with mean .
With sample of size
, this Demonstration yields this distribution.
If we extend the sample to
gene copies, the expected time in generations until two of these
will coalesce is given as
again distributed as a negative exponential. At this time, the algorithm chooses two of the lineages randomly and combines them into a single ancestral lineage. This process iterated across
epochs yields a bifurcating tree. The times of the nodes and the total depth of the tree are one instantiation consistent with evolution by genetic drift alone in a finite population.
A fundamental review reference on the coalescent, including the algorithm used here and a C implementation, is
R. R. Hudson, "Gene Genealogies and the Coalescent Process," Oxford Surveys in Evolutionary Biology
, 1990 pp. 1–44.