## Huffman Coding

We'll use greedy algorithm to implement Huffman coding. The input file describes an instance of the problem. It has the following format:

[number_of_symbols]

[weight of symbol #1]

[weight of symbol #2]

...

For example, the third line of the file is "6852892," indicating that the weight of the second symbol of the alphabet is 6852892.

Return the maximum length of a codeword in the resulting Huffman code?


In [6]:
from heapq import *

In [7]:
def huffman_coding(codebook):
    """ compute the min and max coding length from a given list of symbols
        using heap to find the two symbols with minimal weights
    """
    weight = [[w[0], w[1], w[2]] for node, w in codebook.items()]
    heapify(weight)
    while len(weight) > 1:
        i, j = heappop(weight), heappop(weight)
        heappush(weight, [i[0] + j[0], 1 + min(i[1], j[1]), 1 + max(i[2], j[2])])
    return weight[0][1], weight[0][2]

In [8]:
input_file = '/workspace/Algorithms/huffman.txt'
codebook = {}
with open(input_file) as f:
    lines = f.readlines()
    for i, line in enumerate(lines[1:]):
        codebook[i] = [int(line.split()[0]), 0, 0]

In [9]:
huffman_coding(codebook)

(9, 19)