### Huffman Coding
A Huffman code is a type of optimal prefix code that is used for compressing data. The Huffman encoding and decoding schema is also lossless, meaning that when compressing the data to make it smaller, there is no loss of information.

The Huffman algorithm works by assigning codes that correspond to the relative frequency of each character for each character. The Huffman code can be of any length and does not require a prefix; therefore, this binary code can be visualized on a binary tree with each encoded character being stored on leafs.

There are many types of pseudocode for this algorithm. At the basic core, it is comprised of building a Huffman tree, encoding the data, and, lastly, decoding the data.

Here is one type of pseudocode for this coding schema:

 - Take a string and determine the relevant frequencies of the characters.
 - Build and sort a list of tuples from lowest to highest frequencies.
 - Build the Huffman Tree by assigning a binary code to each letter, using shorter codes for the more frequent letters. (This is the heart of the Huffman algorithm.)
 - Trim the Huffman Tree (remove the frequencies from the previously built tree).
 - Encode the text into its compressed form.
 - Decode the text from its compressed form.
You then will need to create encoding, decoding, and sizing schemas.

#### Analyze:
I need to calculate the frequency of each character in the string , so I used Counter . and then I need a sorted list for huffman coding. and I choise the lestest frequency tow item from list. I need to create huffman tree to encoding my data.  so I create a Node class and Tree class. I made each unique character as a leaf node.

1. "huffman_encoding" takes O(n) ,because I using a while loop to create tree and "huffman_decoding" takes O(n)

2. The tree is expending data structures. I think the space complexity is O(n) 

In [474]:
import sys
from collections import Counter

class HuffManTreeNode(object):
    # define a Tree without value and store data by leaf node
    def __init__(self, left=None, right=None):
        self.left = left
        self.right = right
        
    def set_left_child(self,left):
        self.left = left
    
    def get_left_child(self):
        return self.left
    
    def set_right_child(self,right):
        self.right = right
    
    def get_right_child(self):
        return self.right
    

In [475]:
class HuffManTree(object):
    def __init__(self, data=None):
        self.frequencies = sorted(Counter(data).most_common(), key=lambda x: x[1])
        
    def get_tree_root(self):
        while len(self.frequencies) > 1:
            # create subtree from the lowest frequencies node to the highest frequencies node 
            (key1, freq1) = self.frequencies[0]
            (key2, freq2) = self.frequencies[1]
            self.frequencies = self.frequencies[2:]
            
            # The keys are string type in leaf node, otherwise are node type
            left_node = key1
            right_node = key2
            node = HuffManTreeNode(left_node, right_node)

            self.frequencies.append((node, freq1+freq2))
            self.frequencies = sorted(self.frequencies, key=lambda x: x[1])

        return self.frequencies[0][0]

In [476]:
def huffman_encoding_recursion(node, code=''):
    """
    Encoding characters using recursion
    Args:
      node(HuffManTreeNode):  leaf to save data and non-leaf node assign 0 and 1 to edges
      code(str): Assign 0 to the left edge and 1 to the right edge 

    Returns:
      encode_map(dict) : huffman coding map table
    """
    if type(node) is str:
        return {node: code}
    
    encode_map = dict()
    
    left_node = node.get_left_child()
    right_node = node.get_right_child()
    
    encode_map.update(huffman_encoding_recursion(left_node, code + '0'))
    encode_map.update(huffman_encoding_recursion(right_node, code + '1'))
    return encode_map
    

In [477]:
def huffman_encoding(data):
    if not data:
        return
    
    huffman_tree = HuffManTree(data)
    root_node = huffman_tree.get_tree_root()
    encode_map = huffman_encoding_recursion(root_node)
    
    encoded_list = [encode_map[c] for c in data]
    encoded_string = ''.join(encoded_list)
 
    return(encoded_string, root_node)
    

In [478]:
def huffman_decoding(data,root):
    node = root
    decoded_data = ''

    for b in data:   
        if b == '0' and type(node) is HuffManTreeNode:
            node = node.get_left_child()

        elif b == '1' and type(node) is HuffManTreeNode:
            node = node.get_right_child()
        
        if type(node) is str:
            decoded_data += node
            node = root

    return decoded_data

In [479]:
if __name__ == "__main__":
    codes = {}

    a_great_sentence = "The bird is the word"

    print ("The size of the data is: {}\n".format(sys.getsizeof(a_great_sentence)))
    print ("The content of the data is: {}\n".format(a_great_sentence))

    encoded_data, tree = huffman_encoding(a_great_sentence)

    print ("The size of the encoded data is: {}\n".format(sys.getsizeof(int(encoded_data, base=2))))
    print ("The content of the encoded data is: {}\n".format(encoded_data))

    decoded_data = huffman_decoding(encoded_data, tree)

    print ("The size of the decoded data is: {}\n".format(sys.getsizeof(decoded_data)))
    print ("The content of the encoded data is: {}\n".format(decoded_data))

The size of the data is: 69

The content of the data is: The bird is the word

The size of the encoded data is: 36

The content of the encoded data is: 0110111011111100111000001010110000100011010011110111111010101011001010

The size of the decoded data is: 69

The content of the encoded data is: The bird is the word



In [480]:
    codes = {}
    
    a_great_sentence = "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum."

    print ("The size of the data is: {}\n".format(sys.getsizeof(a_great_sentence)))
    print ("The content of the data is: {}\n".format(a_great_sentence))

    encoded_data, tree = huffman_encoding(a_great_sentence)

    print ("The size of the encoded data is: {}\n".format(sys.getsizeof(int(encoded_data, base=2))))
    print ("The content of the encoded data is: {}\n".format(encoded_data))

    decoded_data = huffman_decoding(encoded_data, tree)

    print ("The size of the decoded data is: {}\n".format(sys.getsizeof(decoded_data)))
    print ("The content of the encoded data is: {}\n".format(decoded_data))

The size of the data is: 494

The content of the data is: Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

The size of the encoded data is: 272

The content of the encoded data is: 0110110101000010011111011111000100001111000111101111101110110000001100001001101110000110101101001101111111101001101111101011010000101111001111101101010111110100111010011010011110100100001001111001011000101010110001110111100010011010011011111011100111111101110111011000110111100101111110010111100011101110101011111011100001100001001100010101101100011110100111101011101011010110011110101100001100101100101000010011111101111

In [483]:
    codes = {}
    
    a_great_sentence = "><?skdownnf()_()_*&^^%%%$$$###@@@_)(*^%$$@!~><MNNB)7783292jdj "

    print ("The size of the data is: {}\n".format(sys.getsizeof(a_great_sentence)))
    print ("The content of the data is: {}\n".format(a_great_sentence))

    encoded_data, tree = huffman_encoding(a_great_sentence)

    print ("The size of the encoded data is: {}\n".format(sys.getsizeof(int(encoded_data, base=2))))
    print ("The content of the encoded data is: {}\n".format(encoded_data))

    decoded_data = huffman_decoding(encoded_data, tree)

    print ("The size of the decoded data is: {}\n".format(sys.getsizeof(decoded_data)))
    print ("The content of the encoded data is: {}\n".format(decoded_data))

The size of the data is: 111

The content of the data is: ><?skdownnf()_()_*&^^%%%$$$###@@@_)(*^%$$@!~><MNNB)7783292jdj 

The size of the encoded data is: 64

The content of the encoded data is: 0011101110101100101101101110011111011111100001000010000110001111110100000011111010000001000111001000010001010101010101111011101110001000100010011001100110000001001111110001000101011110111001101100111101000011101110110101100101001011011001001001110011110111111100101001111011010010101011111010100110

The size of the decoded data is: 111

The content of the encoded data is: ><?skdownnf()_()_*&^^%%%$$$###@@@_)(*^%$$@!~><MNNB)7783292jdj 



#### Resources
[Huffman Visualization](https://people.ok.ubc.ca/ylucet/DS/Huffman.html)