Coalescent Gene Genealogies
The coalescent describes the genealogical relations of the lineages ancestral to a sample of genes. The sample size, , may vary from 1 to 50 gene copies; each tree shows a possible genealogy among these copies with nodes representing the common ancestors of the sampled genes. The lengths of branches are randomly based on the probability of coalescence in a population of gene copies.
The coalescent is an algorithmic approach to simulating gene genealogies. It is also a way of looking at the history of genes in a population, which has given rise to considerable theoretical development in population genetics.
The basic idea is that instead of considering the reproduction of genes forward in time, we instead look at the ancestry of a sample of genes looking backward in time. We assume a diploid Wright–Fisher population model. In each generation, two genes have a probability of sharing a single common ancestor, and a probability of having two distinct ancestors. Then the number of generations to have elapsed since their common ancestor is distributed as an exponential decay curve with mean . With sample of size , this Demonstration yields this distribution.
If we extend the sample to gene copies, the expected time in generations until two of these will coalesce is given as
again distributed as a negative exponential. At this time, the algorithm chooses two of the lineages randomly and combines them into a single ancestral lineage. This process iterated across epochs yields a bifurcating tree. The times of the nodes and the total depth of the tree are one instantiation consistent with evolution by genetic drift alone in a finite population.
A fundamental review reference on the coalescent, including the algorithm used here and a C implementation, is
R. R. Hudson, "Gene Genealogies and the Coalescent Process," Oxford Surveys in Evolutionary Biology, 7, 1990 pp. 1–44.