Time versus Size in Compressing Text

Initializing live version
Download to Desktop

Requires a Wolfram Notebook System

Interact on desktop, mobile and cloud with the free Wolfram Player or other Wolfram Language products.

Compare different types of compression on different documents. You can choose the original version, one-word replacement, or two-word replacement to see the time and digital space trade-off between a simple and more complex compression. One-word replacement takes frequently appearing words and replaces them with a code word, while two-word replacement takes frequently appearing two-word phrases and replaces them with a code word.

Contributed by: Ka Wai Lee (July 2013)
(Mathematica Summer Camp 2013)
With additional contributions by: Richard Hennigan
Open content licensed under CC BY-NC-SA


Snapshots


Details

The algorithm searches for the most efficient words and phrases to compress within the document, using a variation of Huffman coding to do so. Instead of using binary code to represent frequently appearing words, a list of predetermined code words consisting of infrequently appearing symbols are used.

At times, a part of a word is replaced because it contains the frequently appearing word (such as the "and" in "hand"). This is considered a bonus in the compression because it saves more digital space without the need for more code words.



Feedback (field required)
Email (field required) Name
Occupation Organization
Note: Your message & contact information may be shared with the author of any specific Demonstration for which you give feedback.
Send