<a href="https://colab.research.google.com/github/vin136/Machine-Learning-Interview-Questions/blob/main/Data_structures.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# HASH TABELS

Hash Tabels/Mapping:
A mapping from distinct keys to values. Access O(1)

ADT:

1. Get method

2. Set method


How to implement ?
Naive implementation. 



In [4]:
class Entry:
  def __init__(self,key,value):
    self.key = key
    self.value = value

  def __str__(self):
    return f"{self.key}:{self.value}"
e = Entry(2,8)
print(e)

2:8


In [11]:
# implement a typical dictionary but with naive list
"""
eg:
a = Mapping()
1. set : a[3] = 9
2. get: a[3] # prints 9
3. contains : 3 in a
4. iterators: a.keys(),a.values()
5. remove method

"""

class Mapping:
  def __init__(self):
    self._entries = []

  def __setitem__(self,k,v):
    for entry in self._entries:
      if entry.key == k:
        entry.key = v
        return
    self._entries.append(Entry(k,v)) 

  def __getitem__(self,k):
    for entry in self._entries:
      if entry.key == k:
        return entry.value
    raise KeyError

  def keys(self):
    return (e.key for e in self._entries)

  def values(self):
    return (e.value for e in self._entries)

  def items(self):
    return ((e.key,e.value) for e in self._entries)

  def __contains__(self,k):
    for key in self.keys():
      if key == k:
        return True
    return False

  def remove(self,k):
    for ind,entry in enumerate(self._entries):
      if entry.key == k:
        del self._entries[ind]
    raise KeyError



# opportunities for refactoring:


In [12]:
m = Mapping()
m[1] = 10
m[2] = 20

In [13]:
1 in m

True

In [14]:
for k in m.keys():
  print(k)

1
2


TODO: 



HOW TO GET CONSTANT TIME OPERATIONS FOR get, set, remove.

`BIGIDEA`: 
WHAT IF INSTEAD OF ONE BIG LIST WE HAVE MANY SMALL SUBLISTS. we just need to have a quick way of knowing which
short list to search or update.

`solution`

1. use hash("john dick") to map key to integers.
2. Periodically double your list of lists to have amortized O(1)


In [19]:
class HashMap:
  def __init__(self,sz = 2):
    #num of sublists
    self.sz = sz
    #total # elements
    self.length = 0
    self._buckets = [Mapping() for i in range(self.sz)]

  def __getitem__(self,k):
    bucket = self._bucket(k)
    return bucket[k]

  def __setitem__(self,k,v):
    bucket = self._bucket(k)
    if k not in bucket:
      self.length += 1
    bucket[k] = v

    if self.length > self.sz:
      self._double()
    

  def _bucket(self,key):
    #return the corresponding mapping
    return self._buckets[hash(key)%self.sz]


  def _double(self):
    old_buckets = self._buckets
    self.__init__(self.sz*2)
    for mapping in old_buckets:
      for key, value in mapping.items():
        self[key] = value

A bit about hashing

Properties

- if h(x) = h(y), then x might be equal to y but if h(x) != h(y) then x != y

- always deterministic


How to handle collissions:

1. seperate chaining: linked list for collissions

2. typically when there is hash-collission we should look for other slots to fill up rather than chaining.Some ways:

```
x = 1
everytime you hit a collission
inc x
look at (prob_func(x)+pres_loc) % n
```
Problem : most prob functions give cycles.



# TREES

Tree : root with one or more trees as children.

In [27]:
tree_st = ['a',['b',['b1']],['c',['c1']]]


def line_print_tree(tree):
  #works with any iterator
  t = iter(tree)
  print(next(t))
  for child in t:
    line_print_tree(child)

line_print_tree(tree_st)


a
b
b1
c
c1


In [17]:
def print_tree(tree,level=0):
  #works with any iterator and shows the tree structure
  t = iter(tree)
  out = ' '*level + next(t) + '\n'
  print(out)
  for child in t:
    print_tree(child,level= level+1)

print_tree(tree_st)

a

 b

  b1

 c

  c1



In [44]:
# Make a generic Tree ADT

class Tree:
  def __init__(self,L):
    iterator  = iter(L)
    self.data = next(iterator)
    self.children = [Tree(child) for child in iterator]

  def _listwithlevels(self, level, trees):
    trees.append("  " * level + str(self.data))
    for child in self.children:
        child._listwithlevels(level + 1, trees)

  def __str__(self):
      trees = []
      self._listwithlevels(0, trees)
      return "\n".join(trees)

  def __eq__(self,t1):
    return self.data == t1.data and self.children == t1.children

  def __contains__(self,key):
    return self.data == key or any(key in c for c in self.children)

  def height(self):
    if len(self.children) == 0:
      return 0
    return 1 + max(c.height() for c in self.children)




a
  b
    b1
  c
    c1


## Tests

In [51]:
t = Tree(tree_st)
    
    
print(str(t))

a
  b
    b1
  c
    c1


In [45]:
t1 = Tree(['a',['b',['b1']],['c',['c1']]])
t1 == t

True

In [46]:
t1 = Tree(['a',['b',['b1']]])
t1 == t

False

In [47]:
t1.height()

2

In [48]:
3 in [0,9,8]

False

In [49]:
'a' in t

True

In [50]:
'b' in t

True

In [None]:
#implement complete tree.

# BST


In [None]:
#algorithms

