Random Character Sequences Do Not Follow Zipf's Law

Requires a Wolfram Notebook System

Interact on desktop, mobile and cloud with the free Wolfram CDF Player or other Wolfram Language products.

Requires a Wolfram Notebook System

Edit on desktop, mobile and cloud with any Wolfram Language product.

This Demonstration shows that word frequencies [1] in random character sequences and real texts behave differently from the point of view of Zipf's law. (For random character sequences, a word means the smallest unit separated by blanks.) Data exhibiting Zipf‐like behavior shows a roughly linear relationship between frequency and rank on a log‐log plot.

[more]

We consider only one random sequence model. All characters, including the blank or space are equally likely. This model is specified with a single parameter, , the number of characters other than the space. was used in [2] (as mentioned in [1]). In this Demonstration, you can select between 2 and 26.

[less]

Contributed by: Osman Tuna Gökgöz (May 2010)
Suggested by: Ramon Ferrer i Cancho
Open content licensed under CC BY-NC-SA


Snapshots


Details

[1] R. Ferrer–i–Cancho and B. Elvevåg, "Random Texts Do Not Exhibit the Real Zipf's Law-Like Rank Distribution," PLoS ONE.

[2] W. Li, "Random Texts Exhibit Zipf's-Law-Like Word Frequency Distribution," IEEE Transactions on Information Theory, 38(6), 1992 pp. 1842–1845.

[3] Gabriel Altmann, Comments from Quantitative Linguistics.



Feedback (field required)
Email (field required) Name
Occupation Organization
Note: Your message & contact information may be shared with the author of any specific Demonstration for which you give feedback.
Send