<h1><center>cs1001.py , Tel Aviv University, Spring 2018</center></h1>
<img src="http://www.pngall.com/wp-content/uploads/2016/05/Python-Logo-PNG-Image-180x180.png" width=50/>

# Recitation 9

We discussed two data structures: Binary search trees and Hash tables. 
Then, we solved a challenging recursion exercise: N-queens.

#### Takeaways:
- When choosing a data structure (DS) for a specific application, consider the advantages and disadvantages of this DS and then evaluate whether it fits your application.
- Make sure you read <a href="https://github.com/taucsrec/recitations/blob/master/2018b/Michal/rec9/DataStructures_summary.pdf">the following summary</a> on the various data structures mentioned in class.
- We have proved the correctness of inorder(), which is an important skill to learn fo
- Important properties of Binary search trees: 
    - Insert and find take $O(h)$ time where $h$ is the height of the tree.
    - When a tree containing $n$ nodes is balanced, $h = O(\log{n})$.
    - Many methods in this class are implemented using recursion.
- Hash tables can be useful for many algorithms, including memoization. 
- Make sure you understand the complexity analysis for hash tables (see the links below).
- Solve as many recursion questions as possible. It gets easier after about 100.

#### Code for printing several outputs in one cell (not part of the recitation):

In [26]:
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

##  Binary Search Trees

In [21]:
class Tree_node():
    def __init__(self, key, val):
        self.key = key
        self.val = val
        self.left = None
        self.right = None

    def __repr__(self):
        return "(" + str(self.key) + ":" + str(self.val) + ")"
    
    
    
class Binary_search_tree():

    def __init__(self):
        self.root = None


    def __repr__(self): #no need to understand the implementation of this one
        out = ""
        for row in printree(self.root): #need printree.py file
            out = out + row + "\n"
        return out


    def lookup(self, key):
        ''' return node with key, uses recursion '''

        def lookup_rec(node, key):
            if node == None:
                return None
            elif key == node.key:
                return node
            elif key < node.key:
                return lookup_rec(node.left, key)
            else:
                return lookup_rec(node.right, key)

        return lookup_rec(self.root, key)



    def insert(self, key, val):
        ''' insert node with key,val into tree, uses recursion '''

        def insert_rec(node, key, val):
            if key == node.key:
                node.val = val     # update the val for this key
            elif key < node.key:
                if node.left == None:
                    node.left = Tree_node(key, val)
                else:
                    insert_rec(node.left, key, val)
            else: #key > node.key:
                if node.right == None:
                    node.right = Tree_node(key, val)
                else:
                    insert_rec(node.right, key, val)
            return
        
        if self.root == None: #empty tree
            self.root = Tree_node(key, val)
        else:
            insert_rec(self.root, key, val)


    def minimum(self):
        ''' return node with minimal key '''
        if self.root == None:
            return None
        node = self.root
        left = node.left
        while left != None:
            node = left
            left = node.left
        return node


    def depth(self):
        ''' return depth of tree, uses recursion'''
        def depth_rec(node):
            if node == None:
                return -1
            else:
                return 1 + max(depth_rec(node.left), depth_rec(node.right))

        return depth_rec(self.root)


    def size(self):
        ''' return number of nodes in tree, uses recursion '''
        def size_rec(node):
            if node == None:
                return 0
            else:
                return 1 + size_rec(node.left) + size_rec(node.right)

        return size_rec(self.root)
    
    def inorder(self):
        '''prints the keys of the tree in a sorted order'''
        def inorder_rec(node):
            if node == None:
                return
            inorder_rec(node.left)
            print(node.key)
            inorder_rec(node.right)
            
        inorder_rec(self.root)


In [22]:
t = Binary_search_tree()
t.insert(2,"hi")
t.insert(4,"tea")
t.insert(1,"mother")
t.insert(3,"CS")
t.insert(4,"recursion")

t.inorder()

1
2
3
4


#### Proof of the correctness of the inorder function (can also be found <a href="http://tau-cs1001-py.wdfiles.com/local--files/recitation-logs-2017b/m_09_BST_inorder_proof.pdf"> here</a>)

<img src="inorder_proof.PNG">

##  Hash

We wish to have a data structure that implements the operations: insert, search and delete in **expected** $O(1)$ time. 

Summarizing the insert and search complexity of the data structures that we have seen already:
<img src="tbl_ds.PNG">

Please read <a href="https://github.com/taucsrec/recitations/blob/master/2018b/Michal/rec9/DataStructures_summary.pdf"> the following summary</a> on the various data structures mentioned in class.

A detailed summary on the complexity of insert/search operations using <u>hash tables</u> can be found <a href="http://tau-cs1001-py.wdfiles.com/local--files/recitation-logs-2017a/hashtable_find_and_insert_complexity.pdf"> here </a>. Make sure you read it.

### Exercise: 
given a string $st$ of length $n$ and a small integer $\ell$, write a function that checks whether there is a substring in $st$ of length $\ell$ that appears more than once.


Make sure you read the following <a href="http://tau-cs1001-py.wdfiles.com/local--files/recitation-logs-2016b/m_10_repeating_substring_additional_material.pdf">summary</a> that includes a detailed explanation on the experiments.

#### Naive solution

The complexity is $O(\ell(n-\ell)^2)$. 
There $O((n-\ell)^2)$ iterations (make sure you undersand why) and in each iteration we perform operations in $O(\ell)$ time.

In [18]:
def repeat_naive(st, l): 
    for i in range(len(st)-l+1):
        for j in range(i+1,len(st)-l+1):
            if st[i:i+l]==st[j:j+l]:
                return True
    return False

repeat_naive("hello", 1)
repeat_naive("hello"*10, 45)
repeat_naive("hello"*10, 46)

False

A function that generates a random string of a given size

In [27]:
import random
def gen_str(size, alphabet = "abcdefghijklmnopqrstuvwxyz"):
    ''' Generate a random string of length size over alphabet '''
    s=""
    for i in range(size):
        s += random.choice(alphabet)
    return s
rndstr = gen_str(1000)
print(rndstr)
repeat_naive(rndstr, 3)
repeat_naive(rndstr, 10)



febmalrswezdzkivbtrefsnxhanprdrvzhllngitowjxtmpkjtfsyzrkidixllvphihywzysaujvroiqqujxcqfulmcdvbhuvlnnfywesalnusgpkvujzynkhjozncwceqgazsdvhgcerdhkwsnzssebevzskwnameuqfdzjgbrqvflmtvfdiwgdquwasnazdbuhzoevzqmmgeutxjyhagyzpphjusvkgkuwpxsulkvryuuppwpzpudipudezskgvbruunxslmjwxzomvcrmwwqakduxdghvnshzzewaomfvisuazqdkvqmdluxqebaecfwxnxqevmvvnpcvjyyjjylvskuumzppxxietiqhxzdqohqixveiznswqcwzykpbsaftopurfzhwrxfaxporchrboqclvcowsoulnhtolxeqzzpyvndfmlzwhcqdnlolnyoenbggxbculwbdhuyzlxkyjzyvygcgxtbxoqigmfyvtelrvvstzpzjdbmkfyqdhvctuftebczbdkqnellvupaquadlzsqxpfmbqloziuawsphvsneisbqqllbnpkasbcidmrhxixzfhwprfouhxrhewzdptfbhwdjjfdbviiktaltulkxnkpjuikgsdvwvrtclgpineslixdpxkuxoiyxpogbiihuhfyimpdggdpihunwvctacrykgpirrvwbvlvwoahdzyjrpnzlfkjgsnajateysrvijvmqzisbpgshlbzqfetsyklbjedttshbzwvqyywdgrbbtammvqulqdkqgbiukgwmsshvaezcmzfvxnxicqprasbgmkwgpkvuwgzwiijkyvyjgnybvwqtthajjawfarsaxcsfsefelkfggwnljkwinizqxmxdgaoxpjgnlfyxocjgtcnlgczdwagqiypmjsevptkttdzydqqtimpreybstdlxvnhiiazlyzmhusodxawsjfrgfxbidxtoqimuekbbdledfnesc

True

False

In [28]:
rndstr = gen_str(10000)
repeat_naive(rndstr, 3)
repeat_naive(rndstr, 10)


True

KeyboardInterrupt: 

The class Hashtable from the lectures

In [20]:
class Hashtable:
    def __init__(self, m, hash_func=hash):
        """ initial hash table, m empty entries """
        ##bogus initialization #1:
        #self.table = [[]*m]
        ##bogus initialization #2:
        #empty=[]
        #self.table = [empty for i in range(m)]
        
        self.table = [ [] for i in range(m)]
        self.hash_mod = lambda x: hash_func(x) % m # using python hash function

    def __repr__(self):
        L = [self.table[i] for i in range(len(self.table))]
        return "".join([str(i) + " " + str(L[i]) + "\n" for i in range(len(self.table))])
    
    def find(self, item):
        """ returns True if item in hashtable, False otherwise  """
        i = self.hash_mod(item)
        return item in self.table[i]
        #if item in self.table[i]:
        #    return True
        #else:
        #    return False

    def insert(self, item):
        """ insert an item into table """
        i = self.hash_mod(item)
        if item not in self.table[i]:
            self.table[i].append(item)


#### Solution using the class Hashtable

The expected (average) complexity is: $O(\ell(n-\ell))$

Creating the table takes $O(n-\ell)$ time, and there are $O(n-\ell)$ iterations, each taking expected $O(\ell)$ time.



The worst case complexity is: $O(\ell(n-\ell)^2)$

Creating the table takes $O(n-\ell)$ time, and the time for executing the loop is
$\ell\cdot\sum_{i=0}^{n-\ell}{i}= O(\ell(n-\ell)^2)$ 





In [10]:
def repeat_hash1(st, l):
    m=len(st)-l+1
    htable = Hashtable(m)
    for i in range(len(st)-l+1):
        if htable.find(st[i:i+l])==False:
            htable.insert(st[i:i+l])
        else:
            return True
    return False

Which of Python's naitive DS fits the solution?
<img src="tbl_container.PNG">

#### Solution using Python's set implementation

In [11]:
def repeat_hash2(st, l):
    htable = set() #Python sets use hash functions for fast lookup
    for i in range(len(st)-l+1):
        if st[i:i+l] not in htable:
            htable.add(st[i:i+l])
        else: return True
    return False

#### Competition between the 3 solutions

For a random string of size $n=1000$ and for $l=10$ the running time of repeat_hash2 is the smallest, while the one for repeat_naive is the largest.

When increasing $n$ to 2000, the running time of repeat_naive increases by ~4, while the running time of repeat_hash1, repeat_hash2 increases by ~2.

In [12]:
import time
str_len=1000
st=gen_str(str_len)
l=10
print("str_len=",str_len, "repeating substring len=",l)
for f in [repeat_naive,repeat_hash1,repeat_hash2]:
    t0=time.clock()
    res=f(st, l)
    t1=time.clock()
    print(f.__name__, t1-t0, "found?",res)

str_len= 1000 repeating substring len= 10
repeat_naive 0.2050581945000918 found? False
repeat_hash1 0.0019200138948373968 found? False
repeat_hash2 0.0009138224026621344 found? False


In [13]:
str_len=2000
st=gen_str(str_len)
l=10
print("str_len=",str_len, "repeating substring len=",l)
for f in [repeat_naive,repeat_hash1,repeat_hash2]:
    t0=time.clock()
    res=f(st, l)
    t1=time.clock()
    print(f.__name__, t1-t0, "found?",res)

str_len= 2000 repeating substring len= 10
repeat_naive 0.975955220728571 found? False
repeat_hash1 0.005003325681962156 found? False
repeat_hash2 0.001123823922409839 found? False


When $st$ is "a"$*1000$, repeat_hash1 is the slowest, since it spends time on creating an empty table of size 991.

In [14]:
st="a"*1000
l=10
print("str_len=",str_len, "repeating substring len=",l)
for f in [repeat_naive,repeat_hash1,repeat_hash2]:
    t0=time.clock()
    res=f(st, l)
    t1=time.clock()
    print(f.__name__, t1-t0, "found?",res)

str_len= 2000 repeating substring len= 10
repeat_naive 1.3815889460033759e-05 found? True
repeat_hash1 0.00028460732281843093 found? True
repeat_hash2 8.684273367975948e-06 found? True


### The effect of table size

Our solution, with control over the table size

In [15]:
def repeat_hash1_var_size(st, l, m=0):
    if m==0: #default hash table size is ~number of substrings to be inserted
        m=len(st)-l+1
    htable = Hashtable(m)
    for i in range(len(st)-l+1):
        if htable.find(st[i:i+l])==False:
            htable.insert(st[i:i+l])
        else:
            return True
    return False

Comparing different table sizes

In [16]:
import time
str_len=1000
st=gen_str(str_len)
l=10
print("str_len=",str_len, "repeating substring len=",l)
for m in [1, 10, 100, 1000, 1500, 10000, 100000]:
    t0=time.clock()
    res=repeat_hash1_var_size(st, l, m)
    t1=time.clock()
    print(t1-t0, "found?",res, "table size=",m)

str_len= 1000 repeating substring len= 10
0.041150824117806906 found? False table size= 1
0.007006234913546905 found? False table size= 10
0.0034646303361398623 found? False table size= 100
0.003710553168474462 found? False table size= 1000
0.003364761192358401 found? False table size= 1500
0.005215695639918749 found? False table size= 10000
0.04382913297399682 found? False table size= 100000


## The N-Queens Problem

The presentation can be found <a href="http://tau-cs1001-py.wdfiles.com/local--files/recitation-logs-2017b/8queens.pdf">here</a>.

First try to understand the queens() and queens_rec() functions and then try to understand how the function legal() works.

Some intuition for queens_rec:

assume we can find a solution for placing $k<N$ queens, how do we expand the solution to $k+1$?
queens_rec returns the number of possible legal placements for $N$ queens, where $k$ are already placed at the leftmost columns and there are $N-k$ queens left to place. 
The recursive idea: Legally place queen number $(k+1)$ and recursively solve the problem, when there is one less queen to place.


Note that the complexity is $O(N!)$ (greater than $O(2^N)$)


In [17]:
def queens(n,show=True):
    ''' how many ways to place n queens on an nXn board? '''
    partial = []    # list representing partial placement of queens
    return queens_rec(n, partial,show)

def queens_rec(n, partial,show):
    ''' Given a list representing partial placement of queens,
        can we legally extend it ? '''
    if len(partial)==n: #all n queens are placed legally
        if show:
            print(partial)
        return 1
    else:
        cnt=0
        for i in range(n):
            if legal(partial,i): #try to place a queen in row i of the next column
                cnt += queens_rec(n, partial+[i],show)
        return cnt

def legal(partial, i):
    ''' Can we place a queen in the next column in row i ? '''
    left = [j for j in partial if j==i] #any queens in the same row to the left?
    diag_up = [j for j in partial if j-partial.index(j) == i-len(partial)] #diagonal up-left
    diag_down = [j for j in partial if j+partial.index(j) == i+len(partial)] #diagonal down-left
    res = (left == diag_up == diag_down == [])
    # print ("partial=",partial,"can add queen at row", i , "?",res)
    return res