Infinite Monkey Theorem

This Demonstration illustrates the classical infinite monkey theorem as introduced by Emile Borel [1] and a modern version suggested by Gregory Chaitin in the context of his own work in algorithmic information theory [2], and the field of algorithmic probability as put forward by Ray Solomonoff [5] and Leonid Levin [7]. The algorithmic probability of a string is the probability that the string is produced as the output of a random computer program upon halting, running on a (prefix-free) universal Turing machine (here implemented with Mathematica's built-in TuringMachine function). A variation of the original infinite monkey theorem establishes that, given enough time, a hypothetical monkey typing at random will almost surely (with probability 1) produce in finite time (even if longer than the age of the universe) all of Shakespeare's plays (including Hamlet, of course) as a result of classical probability theory. The modern version, however, places the monkey on a digital computer with keystroke instructions typing computer programs at random (e.g., valid programs whose bits are the result of coin tossing). This is, of course, tricky, because this algorithmic probability measure is (upper) semi-uncomputable, which means one can only estimate lower bounds. Yet this Demonstration shows the power of algorithmic probability to explain emergence of structure, as the chances of producing a highly organized structure are exponentially larger than by pure classical chance with no computer in the middle, suggesting that nature may operate similarly based on rules that enable her to produce organization faster than with random chance [9].


  • [Snapshot]
  • [Snapshot]
  • [Snapshot]


Solomonoff and Levin established that nonrandom outputs (such as Shakespeare's plays) have greater chances to occur as the result of the execution of random computer programs running on a (prefix-free) general-purpose computer than when produced by picking one bit or letter at a time at random, as in Borel's infinite monkey theorem. This is established by the so-called algorithmic coding theorem, which intuitively states that low Kolmogorov complexity objects have short programs and short programs are therefore more likely to occur as the result of picking instructions at random than longer programs. In other words, the less random an object (and therefore more compact to be described or programmed), the higher the frequency of its occurrence as the result of random computer programs. This also means that, while for a monkey typewriter (a source of random letters) it may take more than the estimated age of the universe (4.32x10^17) and more than the rough estimated number of starts in the observable universe (7X10^24) to produce the sentence "to be or not to be", for a programmer monkey (a source of random computer programs) it would take it considerably less time, within the estimated age of the universe. The monkey is a metaphor for an abstract device that produces an endless random sequence of letters and symbols.
A "prefix-free" universal Turing machine or general-purpose computer is a computer that only takes as valid programs ones that are not the prefix of any other valid program. This technicality is key to be able to define a probability measure (more precisely a "semi-measure" because of the semi-computability of algorithmic probability).
This Demonstration illustrates how a short random program produces nonrandom outputs with much greater chances than by classical probability. Algorithmic probability cannot be computed, but it can be approximated. A lower bound using Shannon entropy indicates that the probability that the programmer monkey hits the target binary sequence cannot be shorter than the base-2 logarithm of the length of the targeted text and should be close to its algorithmic probability if the string is highly compressible (hence not Kolmogorov random). This Demonstration illustrates this difference between algorithmic probability and classical probability, or random programs versus random letters or digits.
[1] E. Borel, "Mécanique Statistique et Irréversibilité," Journal of Physics, 5(3), 1913 pp. 189–196.
[2] G. J. Chaitin, Algorithmic Information Theory, Cambridge: Cambridge University Press, 1987.
[3] A. N. Kolmogorov, "Three Approaches to the Quantitative Definition of Information," Problems of Information Transmission, 1, 1965 pp. 1–11.
[4] F. Soler-Toscano, H. Zenil, J.-P. Delahaye, N. Gauvrit, "Calculating Kolmogorov Complexity from the Output Frequency Distributions of Small Turing Machines."
[5] R. J. Solomonoff, "A Formal Theory of Inductive Inference: Parts 1 and 2," Information and Control, 7(1–2), 1964 pp. 1–22, 224–254.
[6] A. K. Zvonkin and L. A. Levin, "The Complexity of Finite Objects and the Development of the Concepts of Information and Randomness by Means of the Theory of Algorithms," Russian Mathematical Surveys, 25(6), 1970 pp. 83–124.
[7] L. A. Levin, "Laws of Information Conservation (Non-Growth) and Aspects of the Foundation of Probability Theory," Problems Information Transmission, 10(3), 1974 pp. 206–210.
[8] R. J. Solomonoff, "Algorithmic Probability—Its Discovery—Its Properties and Application to Strong AI," in Randomness through Computation: Some Answers, More Questions (H. Zenil, ed.), Hackensack, NK: World Scientific, 2012.
[9] H. Zenil, "Turing Patterns with Turing Machines: Emergence and Low-Level Structure Formation," Natural Computing, 12(2), 2013 pp. 291–303.
    • Share:

Embed Interactive Demonstration New!

Just copy and paste this snippet of JavaScript code into your website or blog to put the live Demonstration on your site. More details »

Files require Wolfram CDF Player or Mathematica.