Time versus Size in Compressing Text

Requires a Wolfram Notebook System

Interact on desktop, mobile and cloud with the free Wolfram CDF Player or other Wolfram Language products.

Requires a Wolfram Notebook System

Edit on desktop, mobile and cloud with any Wolfram Language product.

Compare different types of compression on different documents. You can choose the original version, one-word replacement, or two-word replacement to see the time and digital space trade-off between a simple and more complex compression. One-word replacement takes frequently appearing words and replaces them with a code word, while two-word replacement takes frequently appearing two-word phrases and replaces them with a code word.

Contributed by: Ka Wai Lee (July 2013)
(Mathematica Summer Camp 2013)
With additional contributions by: Richard Hennigan
Open content licensed under CC BY-NC-SA


Snapshots


Details

The algorithm searches for the most efficient words and phrases to compress within the document, using a variation of Huffman coding to do so. Instead of using binary code to represent frequently appearing words, a list of predetermined code words consisting of infrequently appearing symbols are used.

At times, a part of a word is replaced because it contains the frequently appearing word (such as the "and" in "hand"). This is considered a bonus in the compression because it saves more digital space without the need for more code words.



Feedback (field required)
Email (field required) Name
Occupation Organization
Note: Your message & contact information may be shared with the author of any specific Demonstration for which you give feedback.
Send