Huffman coding is a method of data compression that assigns shorter code words to those characters that occur with higher probability and longer code words to those characters that occur with lower probability. A Huffman code is an example of a prefix code—no character has a code word that is a prefix of another character's code word. In the "show steps" mode, this Demonstration illustrates the step-by-step procedure for finding the Huffman code for a set of characters with given probabilities. The "decode" mode gives the user an opportunity to decipher strings that have been encoded using the Huffman code.
To find the Huffman code for a given set of characters and probabilities, the characters are sorted by increasing probability (weight). The character with smallest probability is given a 0 and the character with the second smallest probability is given a 1. The two characters are concatenated, and their probabilities added. This new string and its weight then take the place of both characters and their weights. At each stage the newly assigned 0's and 1's are prepended to the code strings already assigned to each letter in the corresponding string. This procedure is iterated until there is only one string with weight 1 (see the thumbnail and snapshots 1 and 2).
Four different codes are provided. All use the six characters A–F, but have different probabilities assigned to the characters. In the decode mode, nine fixed words are included to make it easy to illustrate the different code words that the different codes give. The random word option will give a random string of length between 3 and 7, inclusive.
Use the popup menus to enter a "guess" for decoding a string.