Zipf's Law for Natural Languages

Requires a Wolfram Notebook System

Interact on desktop, mobile and cloud with the free Wolfram CDF Player or other Wolfram Language products.

Requires a Wolfram Notebook System

Edit on desktop, mobile and cloud with any Wolfram Language product.

Zipf's law for natural languages states that the frequency of a word is inversely proportional to its rank in the frequency table. The law was originally proposed in the beginning of the twentieth century by George Kingsley Zipf for the English language.

[more]

In symbols, , where is the number of occurrences of the term , its rank in the frequency table, and denotes proportionality.

Zipf's law can be best verified by plotting rank and occurrences on a log-log plot. In such a plot, one can see how good the approximation is by looking at how closely the graph fits a linear model.

Choose a text document among one of the Wolfram ExampleData texts. Choose the maximum rank of the terms to be considered. Mouseover dots and labels in the plot to see more details.

[less]

Contributed by: Giovanna Roda (March 2011)
Open content licensed under CC BY-NC-SA


Snapshots


Details

Reference: C. D. Manning and H. Schütze, Foundations of Statistical Natural Language Processing, Cambridge, MA: MIT Press, 1999.



Feedback (field required)
Email (field required) Name
Occupation Organization
Note: Your message & contact information may be shared with the author of any specific Demonstration for which you give feedback.
Send