Random Character Sequences Do Not Follow Zipf's Law

This Demonstration shows that word frequencies [1] in random character sequences and real texts behave differently from the point of view of Zipf's law. (For random character sequences, a word means the smallest unit separated by blanks.) Data exhibiting Zipf-like behavior shows a roughly linear relationship between frequency and rank on a log-log plot.
We consider only one random sequence model. All characters, including the blank or space are equally likely. This model is specified with a single parameter, , the number of characters other than the space. was used in [2] (as mentioned in [1]). In this Demonstration, you can select between 2 and 26.



  • [Snapshot]
  • [Snapshot]
  • [Snapshot]


[1] R. Ferrer–i–Cancho and B. Elvevåg, "Random Texts Do Not Exhibit the Real Zipf's Law-Like Rank Distribution," PLoS ONE.
[2] W. Li, "Random Texts Exhibit Zipf's-Law-Like Word Frequency Distribution," IEEE Transactions on Information Theory, 38(6), 1992 pp. 1842–1845.
[3] Gabriel Altmann, Comments from Quantitative Linguistics.


    • Share:

Embed Interactive Demonstration New!

Just copy and paste this snippet of JavaScript code into your website or blog to put the live Demonstration on your site. More details »

Files require Wolfram CDF Player or Mathematica.