Time versus Size in Compressing Text
Compare different types of compression on different documents. You can choose the original version, one-word replacement, or two-word replacement to see the time and digital space trade-off between a simple and more complex compression. One-word replacement takes frequently appearing words and replaces them with a code word, while two-word replacement takes frequently appearing two-word phrases and replaces them with a code word.
The algorithm searches for the most efficient words and phrases to compress within the document, using a variation of Huffman coding to do so. Instead of using binary code to represent frequently appearing words, a list of predetermined code words consisting of infrequently appearing symbols are used.
At times, a part of a word is replaced because it contains the frequently appearing word (such as the "and" in "hand"). This is considered a bonus in the compression because it saves more digital space without the need for more code words.