# Elements Of Programming Interviews
## Hash Tables
### Track 9: 13.1, 13.2, 13.3, 13.4, 13.5, 13.7, 13.8, 13.11

### 13.1 - Partitioning Into Anagrams
Anagrams are popular word play puzzles, where by rearranging letters of one set of words, you get another set of words. For example, "eleven plus two" is an anagram for "twelve plus one".

Write a program that takes as input a set of words and returns group of anagrams for those words.

For example, if the input is:

*"debitcard", "elvis", "silent", "badcredit", "lives", "freedom", "listen", "levis"*

Then there are three groups of anagrams:

1. "debitcard", "badcredit"
2. "elvis", "lives", "levis"
3. "silent", "listen"

In [1]:
def group_anagrams(words):
    pass

In [2]:
class HashTable(object):
    #simple ht class to solve anagram algorithm below
    def __init__(self, hash_function, size=256):
        self.hash_function = hash_function
        self.buckets = [list() for i in range(size)]
        self.size = size

    def __getitem__(self, key):
        hash_value = self.hash_function(key) % self.size
        bucket = self.buckets[hash_value]
        if bucket:
            return bucket
        else:
            raise KeyError(key)

    def __setitem__(self, key, value):
        hash_value = self.hash_function(key) % self.size
        bucket = self.buckets[hash_value]
        i = 0
        found = False
        for stored_value in bucket:
            if stored_value == key:
                 found = True
                 break
            i += 1
        if not found:
            bucket.append(value)
            
    def get_buckets(self):
        ret = []
        for bucket in self.buckets:
            if len(bucket) >= 2:
                ret.append(bucket)
        return ret

In [3]:
def string_hash(word):
    val = 0
    for c in word:
        val += ord(c)
    return val

def group_anagrams(words):
    ht = HashTable(string_hash)
    for word in words:
        ht[word] = word
    return ht.get_buckets()

In [4]:
def group_anagrams_alt(words):
    h = {}
    for word in words:
        key = "".join(sorted(word))
        if key in h.keys():
            h[key].append(word)
        else:
            h[key] = [word]
    groups = []
    for key in h:
        if len(h[key]) >= 2:
            groups.append(h[key])
    return groups

In [5]:
arg = ["debitcard", "elvis", "listen", "badcredit", "lives", "silent"]
group_anagrams(arg), group_anagrams_alt(arg)

([['elvis', 'lives'], ['listen', 'silent'], ['debitcard', 'badcredit']],
 [['elvis', 'lives'], ['debitcard', 'badcredit'], ['listen', 'silent']])

### 13.2 - Test For Palindromic Permutations
Write a program to test whether the letters forming a string can be permuted to form a palindrome. For example, "edified" can be permuted to form "deified".

In [6]:
def test_for_palindromic_permutation(word):
    char_to_freq = {}
    for char in word:
        if char in char_to_freq.keys():
            char_to_freq[char] += 1
        else:
            char_to_freq[char] = 1
    odd_freq_count = 0
    for freq in char_to_freq.values():
        if freq % 2:
            odd_freq_count += 1
    return bool(odd_freq_count % 2)

In [7]:
test_for_palindromic_permutation("edified"), test_for_palindromic_permutation("banana")

(True, False)

### 13.3 - Is An Anonymous Letter Constructable
You are required to write a method which takes text for an anonymous letter and text for a magazine. Your method is to determine if it is possible to write the anonymous letter using the text from the magazine. The anonymous letter can be written from the magazine if for each character whether the number of times it appears in the anonymous letter is less than or equal to the number of times it appears in the magazine.

In [8]:
alphabet="abcdefghijklmnopqrstuvwxyz"
def construct_letter_from_text(letter, text):
    """
    Constructs letter only if text contains enough characters
    returns True if able to, False if not
    """
    letter_char_counts = {letter:0 for letter in alphabet }
    text_char_counts = {letter:0 for letter in alphabet}
    for c in letter:
        if c.isalpha():
            letter_char_counts[c.lower()] += 1
    for c in text:
        if c.isalpha():
            text_char_counts[c.lower()] += 1
    #now make sure that text has enough respective letters to construct
    #the anonymous letter
    for char, freq in text_char_counts.items():
        if freq < letter_char_counts[char]:
            return False
    return True

In [9]:
#this modified algorithm will terminate earlier
def construct_letter_from_text(letter, text):
    letter_char_freq = {}
    for c in letter:
        if c in letter_char_freq:
            letter_char_freq[c] += 1
        else:
            letter_char_freq[c] = 1
    for c in text:
        if not letter_char_freq:
            break
        if c in letter_char_freq:
            letter_char_freq[c] -= 1
            if letter_char_freq[c] == 0:
                del letter_char_freq[c]
    if not letter_char_freq:
        return True
    return False

In [10]:
(construct_letter_from_text("aa b c d e f g", "g f e d c b aa"),
 construct_letter_from_text("aa b c d e f g", "a b c d e f g"))

(True, False)

### 13.4 - Implement An ISBN Cache
Implement a cache for looking up prices of books identified by their ISBN. You should support lookup, insert, update, and remove methods. Use the Least Recenetly Used strategy for eviction policty.

In [11]:
import statistics
from datetime import datetime
import random
class ISBNCache():
    
    def __init__(self, max_size = 5):
        self.cache = {}
        self.max_size = max_size
    def lookup(self, ISBN):
        """returns the price"""
        if ISBN not in self.cache:
            raise KeyError(str(ISBN) + " not found in cache")
        #update the timestamp to indicate recently used
        self.cache[ISBN][1] = datetime.now()
        return self.cache[ISBN][0]
    
    def update(self, ISBN, price):
        if ISBN not in self.cache:
            raise KeyError(str(ISBN) + " not found in cache")
        #update the price
        self.cache[ISBN] = (price, datetime.now())
    
    def insert(self, ISBN, price):
        if len(self.cache) > 2 * self.max_size:
            timestamps = [bucket[1] for ISBN, bucket in self.cache.items()]
            median_time = statistics.median(timestamps)
            #deleting any ISBN that was last used before the median time
            ISBNs_to_delete = []
            for ISBN, bucket in self.cache.items():
                if bucket[1] < median_time:
                    ISBNs_to_delete.append(ISBN)
            for ISBN in ISBNs_to_delete:
                del self.cache[ISBN]
        self.cache[ISBN] = (price, datetime.now())
        
    def delete(self, ISBN):
        if ISBN not in self.cache:
            raise KeyError(str(ISBN) + " not found in cache")
        del self.cache[ISBN]
        
        

In [12]:
ic = ISBNCache()
for ISBN in range(409234051, 409234063):
    ic.insert(ISBN, float(random.randint(5,55)))

In [13]:
for ISBN, bucket in ic.cache.items():
    print(ISBN, bucket)

409234055 (40.0, datetime.datetime(2016, 8, 17, 7, 50, 45, 153163))
409234056 (54.0, datetime.datetime(2016, 8, 17, 7, 50, 45, 153094))
409234057 (39.0, datetime.datetime(2016, 8, 17, 7, 50, 45, 153103))
409234058 (9.0, datetime.datetime(2016, 8, 17, 7, 50, 45, 153112))
409234059 (21.0, datetime.datetime(2016, 8, 17, 7, 50, 45, 153120))
409234060 (43.0, datetime.datetime(2016, 8, 17, 7, 50, 45, 153128))
409234061 (15.0, datetime.datetime(2016, 8, 17, 7, 50, 45, 153136))


In [14]:
ic.update(409234057, 55)
for ISBN, bucket in ic.cache.items():
    print(ISBN, bucket)

409234055 (40.0, datetime.datetime(2016, 8, 17, 7, 50, 45, 153163))
409234056 (54.0, datetime.datetime(2016, 8, 17, 7, 50, 45, 153094))
409234057 (55, datetime.datetime(2016, 8, 17, 7, 50, 45, 358841))
409234058 (9.0, datetime.datetime(2016, 8, 17, 7, 50, 45, 153112))
409234059 (21.0, datetime.datetime(2016, 8, 17, 7, 50, 45, 153120))
409234060 (43.0, datetime.datetime(2016, 8, 17, 7, 50, 45, 153128))
409234061 (15.0, datetime.datetime(2016, 8, 17, 7, 50, 45, 153136))


In [15]:
ic.delete(409234057)
for ISBN, bucket in ic.cache.items():
    print(ISBN, bucket)

409234055 (40.0, datetime.datetime(2016, 8, 17, 7, 50, 45, 153163))
409234056 (54.0, datetime.datetime(2016, 8, 17, 7, 50, 45, 153094))
409234058 (9.0, datetime.datetime(2016, 8, 17, 7, 50, 45, 153112))
409234059 (21.0, datetime.datetime(2016, 8, 17, 7, 50, 45, 153120))
409234060 (43.0, datetime.datetime(2016, 8, 17, 7, 50, 45, 153128))
409234061 (15.0, datetime.datetime(2016, 8, 17, 7, 50, 45, 153136))


### 13.5 - Compute The LCA, Optimizing For Close Ancestors
Design an algorithms for computing the LCA of two nodes in a binary tree. The algorithms time complexity should depend only on the distance from the nodes to the LCA.

<img src='Images/traversals.jpg'>

In [17]:
#extremely inefficient but basic method of creating a BT w/ parent attr in nodes
class Node():
    def __init__(self, data, parent=None, left=None, right=None):
        self.data = data
        self.parent = parent
        self.left = left
        self.right = right
nodes = {ch : Node(ch) for ch in 'ABCDEFGHI'}
nodes['H'].parent = nodes['E'];nodes['I'].parent = nodes['E'] 
nodes['E'].left = nodes['H']; nodes['E'].right = nodes['I']
nodes['D'].parent = nodes['B']; nodes['E'].parent = nodes['B']
nodes['B'].left = nodes['D']; nodes['B'].right = nodes['E']
nodes['B'].parent = nodes['A']; nodes['C'].parent = nodes['A']
nodes['A'].left = nodes['B']; nodes['A'].right = nodes['C']
nodes['F'].parent = nodes['C']; nodes['G'].parent = nodes['C']
nodes['C'].left = nodes['F']; nodes['C'].right = nodes['G']
root = nodes['A']

In [19]:
def get_lca(node1, node2):
    visited_nodes = {}
    while node1 and node2:
        if node1:
            if node1 in visited_nodes:
                #LCA
                return node1
            else:
                visited_nodes[node1] = 1
            node1 = node1.parent
        if node2:
            if node2 in visited_nodes:
                return node2
            else:
                visited_nodes[node2] = 1
            node2 = node2.parent
    return None

In [26]:
(get_lca(nodes['H'], nodes['H']).data,#H
 get_lca(nodes['D'], nodes['G']).data,#A
 get_lca(nodes['D'], nodes['I']).data #B
 )

('H', 'A', 'B')

### 13.7 - Find The Nearest Repeated Entries In An Array
People do not like reading text in which a word is used multiple times in a short paragraph. You are to write a program which helps identify such a problem. Write a program which takes as input an array and finds the distance between a closest pairs of equal entires.

For example, if s = ["All", "work", "and", "no", "play", "makes", "for", "no", "work", "no", "fun", "and", "no", "results"] then the second and third appearences of "no" is the closest pair.

In [74]:
import math
def closest_repeated_entries(words):
    #key is a word, value is (last_seen index, closest_distance)
    visited_words = {}
    for index, word in enumerate(words):
        if word in visited_words:
            last_seen_ix, closest_distance = visited_words[word]
            #have only seen the word once so far
            if closest_distance is None:
                visited_words[word] = [index, (index - last_seen_ix)]
            else:
                if (index - last_seen_ix) < closest_distance:
                    visited_words[word] = [index, (index - last_seen_ix)]
                else:
                    #another occurence seen, but it is not the closest pair
                    #so just update the last_seen_index
                    visited_words[word][0] = index
        else:
            visited_words[word] = [index, None]
            
    return min(visited_words.items(), key=lambda t: t[1][1] if t[1][1] else math.inf)


In [84]:
#slightly refactored
def closest_repeated_entries(words):
    words_to_nearest_index = {}
    closest_word = [math.inf, None]
    for index, word in enumerate(words):
        if word in words_to_nearest_index:
            #decide if the curr word  ix - last_seen_ix is the new closest
            last_seen_ix = words_to_nearest_index[word]
            closest_word = min(closest_word, [(index - last_seen_ix), word])
        words_to_nearest_index[word] = index
    return closest_word[1]

In [86]:
(closest_repeated_entries(["All", "work", "and", "no", "play", "makes", "for",
                          "no", "work", "no", "fun", "and", "no", "results"]),
 closest_repeated_entries(["all", "work", "and"]))

('no', None)

### 13.8 - Find The Smallest Subarray Containing All Values
Write a program that takes an array of stringd and a set of strings, and return the indices of the starting and ending index of a shortest subarray of the given array that covers the set, i.e., constains all strings in the set.

go through the array of words, get indices of every occurence of each word in the set

In [148]:
import statistics
def smallest_subarray(words, keywords):
    distance = math.inf
    keywords_to_index = {}
    #will be in form of (begin_ix, end_ix)
    ret = (None, None)
    #coerce keywords to set to improve efficiency of 'in'
    for index, word in enumerate(words):
        if word in keywords:
            keywords_to_index[word] = index
            if len(keywords_to_index) == len(keywords):
                #need to calculate the distance between all keywords
                begin_ix = min(keywords_to_index.values())
                end_ix = max(keywords_to_index.values())
                #if the new distance is smallest, update ret
                if (end_ix - begin_ix) < distance:
                    ret = (begin_ix, end_ix)
    return ret
    

In [168]:
from collections import OrderedDict
def smallest_subarray_oc(words, keywords):
    distance = math.inf
    keyword_to_index = OrderedDict()
    #want keywords to be set, to improve efficiency of `in`
    if not isinstance(keywords, set):
        keywords = set(keywords)
    ret = (None, None)
    for index, word in enumerate(words):
        if word in keywords:
            keyword_to_index[word] = index
            if len(keyword_to_index) == len(keywords):
                start_ix = keyword_to_index.popitem(last=False)[1]
                end_ix = keyword_to_index[word]
                if (end_ix - start_ix) < distance:
                    ret = (start_ix, end_ix)
    return ret

In [152]:
(smallest_subarray(['a', 'z', 'b', 'z', 'z', 'c', 'z', 'z', 'z', 'c',  'z', 'a', 'b'], ['a', 'b', 'c']),
 smallest_subarray(['apple', 'banana', 'apple', 'apple', 'dog', 'cat', 'apple', 'dog',
                    'banana', 'apple', 'cat', 'dog'], ['banana', 'cat']))

((9, 12), (8, 10))

In [166]:
(smallest_subarray_oc(['a', 'z', 'b', 'z', 'z', 'c', 'z', 'z', 'z', 'c',  'z', 'a', 'b'], ['a', 'b', 'c']),
 smallest_subarray_oc(['apple', 'banana', 'apple', 'apple', 'dog', 'cat', 'apple', 'dog',
                    'banana', 'apple', 'cat', 'dog'], ['banana', 'cat']))

((9, 12), (8, 10))