Zipf's Law for Natural Languages
Zipf's law for natural languages states that the frequency of a word is inversely proportional to its rank in the frequency table. The law was originally proposed in the beginning of the twentieth century by George Kingsley Zipf for the English language.[more]
In symbols, , where is the number of occurrences of the term , its rank in the frequency table, and denotes proportionality.
Zipf's law can be best verified by plotting rank and occurrences on a log-log plot. In such a plot, one can see how good the approximation is by looking at how closely the graph fits a linear model.
Choose a text document among one of the Wolfram ExampleData texts. Choose the maximum rank of the terms to be considered. Mouseover dots and labels in the plot to see more details.[less]
Reference: C. D. Manning and H. Schütze, Foundations of Statistical Natural Language Processing, Cambridge, MA: MIT Press, 1999.