Prediction and Entropy of Languages

Requires a Wolfram Notebook System

Interact on desktop, mobile and cloud with the free Wolfram CDF Player or other Wolfram Language products.

Requires a Wolfram Notebook System

Edit on desktop, mobile and cloud with any Wolfram Language product.

Shannon's information entropy [1] is defined by , where is the probability of , and the sum is over the elements of . Shannon's entropy is a measure of uncertainty.

[more]

An application is the redundancy in a language relating to the frequencies of letter -grams (letters, pairs of letters, triplets, etc.). This Demonstration shows the frequency of -grams calculated from the United Nations' Universal Declaration of Human Rights in 20 languages and illustrates the entropy rate calculated from these -gram frequency distributions. The entropy of a language is an estimation of the probabilistic information content of each letter in that language and so is also a measure of its predictability and redundancy.

[less]

Contributed by: Hector Zenil and Elena Villarreal (September 2012)
Open content licensed under CC BY-NC-SA


Snapshots


Details

Reference

[1] C. E. Shannon, "Prediction and Entropy of Printed English," Bell Systems Technical Journal, 30, 1951 pp. 50–64. www.ics.uci.edu/~fowlkes/class/cs177/shannon_51.pdf.



Feedback (field required)
Email (field required) Name
Occupation Organization
Note: Your message & contact information may be shared with the author of any specific Demonstration for which you give feedback.
Send