Zipf's Law for Natural Languages

Zipf's law for natural languages states that the frequency of a word is inversely proportional to its rank in the frequency table. The law was originally proposed in the beginning of the twentieth century by George Kingsley Zipf for the English language.
In symbols, , where is the number of occurrences of the term , its rank in the frequency table, and denotes proportionality.
Zipf's law can be best verified by plotting rank and occurrences on a log-log plot. In such a plot, one can see how good the approximation is by looking at how closely the graph fits a linear model.
Choose a text document among one of the Wolfram ExampleData texts. Choose the maximum rank of the terms to be considered. Mouseover dots and labels in the plot to see more details.

SNAPSHOTS

  • [Snapshot]
  • [Snapshot]
  • [Snapshot]

DETAILS

Reference: C. D. Manning and H. Schütze, Foundations of Statistical Natural Language Processing, Cambridge, MA: MIT Press, 1999.
    • Share:

Embed Interactive Demonstration New!

Just copy and paste this snippet of JavaScript code into your website or blog to put the live Demonstration on your site. More details »

Files require Wolfram CDF Player or Mathematica.







Related Curriculum Standards

US Common Core State Standards, Mathematics