# Searching and Sorting

## 1. Sequential
## 2. Binary Search
## 3. Hashing
## 4. Sorting

## Sequential

In [48]:
arr = [10,7,2,8,1,5]
arr_s = [1,2,5,8,7,10]

In [1]:
#Non-Sequential Search
def seq_search(arr,ele):
    
    pos = 0
    found = False
    
    while pos < len(arr) and not found:
        
        if arr[pos] == ele:
            found = True
            
        else:
            pos += 1
    
    return found

In [6]:
seq_search(arr,6)

False

In [23]:
#Sequential Search
def ord_seq_search(arr,ele):
    
    #sorts the arr in sequential order
    arr = sorted(arr)
    
    pos = 0
    found = False
    stopped = False
    
    while pos < len(arr) and not found and not stopped:
        
        if arr[pos] == ele:
            found = True

        else:
            #Notice that the following if statement is after the
            #else statement. This should help with speed after the ele is
            #found since it will not have to run this statement!
            if arr[pos] > ele:
                stopped = True
            pos += 1
    
    return found

In [24]:
ord_seq_search(arr,10)

True

## 2. Binary Search

In [75]:
#Iteritive Method
def binary_search_iter(arr, ele):
    
    first = 0
    last = len(arr)-1

    while first <= last:
        
        mid = (first+last)//2
        
        if arr[mid] == ele:
            return True
            
        else:
            if ele < arr[mid]:
                last = mid - 1
                
            else:
                first = mid + 1
    return False

In [77]:
binary_search_iter(arr,10)

True

In [93]:
#Recursive Method
arr = sorted(arr)
def binary_search_rec(arr,ele):
    scope = len(arr)//2
    
    #Base Case
    if len(arr) == 0:
        return False

    if arr[scope]==ele:
        return True
    
    else:
        
        if ele < arr[scope]:
            return binary_search_rec(arr[:scope],ele)

        else:
            return binary_search_rec(arr[scope+1:],ele)

    return False

In [94]:
binary_search_rec(arr, 10)

True

## 3. Hashing

### Hash Function


#### Remainder Method
When presented with an item, the hash fucntion is the item divided by the table size (specifically n%x where n is the item number, x is an arbitrary integer to get the remainder), this is then its slot number

This method falls pertinent to collisions (also known as clash)

In [102]:
n=[54,26,93,17,77,31]
x = 11

hash_value = [val%x for val in n]
hash_value


[10, 4, 5, 6, 0, 9]

n = number of items
x = table size.
load factor = λ = number of items/table size

In the above example, λ = 6/11

Our new hash table now looks like

[77,None,None,None,26,93,17,None,None,31,54]


#### Collision Resolution
When two numbers end up with the same hash value (ie. 55%11 and 44%11 both return 0), we can rehash. 

##### Rehashing
Rehash is the process of looking for another slot after a collision is rehashing.

Linear Probing is a method of rehashing where the next available slot is taken up. In the previous example, if both 77 and 44 are in the items, 44 will take slot 1. The problem with this is clustering all the values at the beginning.

Clustering is a problem because it reduces performance the faster the cluster grows.

Quadratic Probing is a method of rehashing where a rehash function is used so that the hash values are selected on incremental values (1,3,5,7,9 and so on...x, x+1,x+4,x+9,x+17...)

##### Chaining
Chaining is another method of handling collisions. This allows many items to exist at the same location in the hash table


#### Folding Method
Divide the item into equal sized pieces (the last piece may not be equal in size). The pieces are then added togther to give the resulting hash value

item = 436-555-4601

divide by 2 = 43,65,55,46,01 #you can divide by anything

sum = 210

slots = 11

hash_value = 210%11 = 1


#### Mid Square Method
Square the number, then take the middle digits and divide by the slot number

item = 44

squared = 1936

middle digits = 93

slots = 11

hash_value = 93%11 = 5

#### Non-Integer Elements
ord('string') will give you a value

ie. ord('c') = 99

### Implementing a Hash Table
In python, there is a dictionary method to do this for us. For learning purposes, we will implement our own hash table. This hash table will update key values instead of creating multiple values for 1 key.

#### Map
HashTable(): Create a new, empty map. It returns an empty map collection

put(key,val): Add a new key-value pair to the map. If the key is already in teh map, then replace the old value with the new value.

get(key): Given a key, return the value stored i nteh map or None otherwise

del: delete the key-value pair from the map using a statement of the form del map[key]

len(): Return the number of key-value pairs stored


In [48]:
class HashTable(object):
    
    def __init__(self,size):
        self.size = size
        self.slots = [None]*self.size
        self.data = [None]*self.size
       
    def put(self,key,data):
        
        hashvalue = self.hashfunction(key,len(self.slots)) # can you use size for this
        
        if self.slots[hashvalue] == None:
            self.slots[hashvalue] = key
            self.data[hashvalue] = data
        
        #If there is already a value there, we are goign to get a collision
        #so we need to rehash...find the next possible slot
        
        else:
            if self.slots[hashvalue] == key:
                self.data[hashvalue] = data
                
            else:
                nextslot = self.rehash(hashvalue,len(self.slots))
                
                #get the next slot
                while self.slots[nextslot] != None and self.slots[nextslot] != key:
                    nextslot = self.rehash(nextslot,len(self.slots))
                if self.slots[nextslot] == None:
                    self.slots[nextslot] = key
                    self.data[nextslot] = data
                    
                else: 
                    self.data[nextslot] = data
    
    
    #The actual hash function    
    def hashfunction(self,key,size):

        return key%size
    
    def rehash(self,oldhash,size):
        return(oldhash+1)%size
    
    def get(self,key):
        startslot = self.hashfunction(key,len(self.slots))
        data = None
        stop = False
        found = False
        position = startslot
        
        while self.slots[position] != None and not found and not stop:
            
            if self.slots[position] == key:
                found = True
                data = self.data[position]
                
            else:
                position = self.rehash(position, len(self.slots))
                
                if position == startslot: 
                    stop = True
                    
        return data
    
    '''
    Helps with indexing. Python usually internally calls these while doing a[i]. Since it is not built into the class,
    we include it. It allows for the simplification of inputting the data. We still need def put because it is called
    in the indexing functions
    
    we could say 
    
    h = HashTable(5)
    h.put(5,'FIVE')
    
    or
    
    h[5] = 'five'
    
    '''
    
        
    def __setitem__(self,key,data):
        self.put(key,data)
    
    '''
    Helps with indexing and returning a value.
    
    h.get(2)
    
    vs
    
    h[2]

    '''

    def __getitem__(self,key):
        return self.get(key)

In [49]:
h = HashTable(5)

In [50]:
#Required without __setitem__
h.put(5,'FIVE')

In [51]:
#Required without __getitem__
h.get(5)

'FIVE'

In [52]:
#Possible with __setitem__
h[2] = 'two'

In [53]:
h[2]

'two'

In [54]:
h[2]="new number"

In [55]:
h.get(2)

'new number'

In [56]:
h.put(2,'two')

In [42]:
h.get(2)

'two'

In [43]:
h[2]

'two'

## 4. Sorting Algorithm
### a. Bubble Sort
### b. Selection Sort
### c. Insertion Sort
### d. Shell Sort
### e. Merge Sort
### f. QuickSort

### Bubble Sort Implementation
The largest value ends up at the end on the first pass, so you dont need to go all the way to the end every pass, hence the range of sort decreasing by 1 every time. Done in O(n)time

In [72]:
def bubble_sort(arr):
    
    listed = arr
    #This generates numbers starting from n-1 going backwards to 0 in 1 increment
    for n in range(len(arr)-1,0,-1):
        #print('Value of n: ',n)
        
        for k in range(n):
            #print('Value of k: ',k)
            
            #If the values flipped, swap them
            if listed[k] > listed[k+1]:
                listed[k],listed[k+1] = listed[k+1],listed[k]
                
    return listed

In [117]:
arr = [5,3,7,2,1,4,10,6]

In [88]:
bubble_sort(arr)

[1, 2, 3, 4, 5, 6, 7, 10]

In [89]:
arr

[1, 2, 3, 4, 5, 6, 7, 10]

### Selection Sort
Improves on the bubble sort by finding the largest item and places it in the correct spot. After the second pass, the next largest is put in place. This requires n-1 passes. but has a O(n^2) run time

The following selection sort does minimum sorted first, and works its way up

In [161]:
def selection_sort(arr):
    
    listed = arr

    length = len(listed)
    
    for n in range(length):
        minim = listed[n]
        
        #Can do length-1 because the last iteration will just be itself = itself
        for k in range(n,length-1):
                
            if  minim > listed[k]:
                spot = k
                minim = listed[k]
        
        #Swap the values
        listed[n],listed[spot] = listed[spot],listed[n]
        #print(listed[spot],listed[n])
    
    return listed

#arr = [5,3,7,2,1,4,10,6]

In [162]:
arr = [5,3,7,2,1,4,10,6]
selection_sort(arr)

[1, 2, 3, 4, 7, 6, 5, 10]

### Insertion Sort
Create a new list with one item at position 0 in it. Then on e ach pass, one for each item 1 through n-1 of the given list is checked against those already in the sorted sublist. If the value is greater, it goes on the right, if smaller, it goes on the left

#### Benefits
Simple <br/>
Efficient for small data sets, O(n^2) <br/>
Stable <br/>
Small memory space at O(1) <br/>
Online: can sort as it recieves

In [331]:
#This creates a new list so the origional is unaffected
def insertion_sort(arr):
    
    listed = [arr[0]]
    
    for i in range(1,len(arr)):
        
        if arr[i] >= listed[-1]:
            listed.append(arr[i])
                    
        if arr[i] < listed[0]:
            listed.insert(0,arr[i])
        
        else:
            for k in range(len(listed)):                           
                if arr[i] > listed[k-1]:
                    if arr[i] < listed[k]:
                        listed.insert(k-1,arr[i])
                        break
    
    return listed
                    

In [332]:
arr = [5,3,7,2,1,4,10,6]
insertion_sort(arr)

[1, 2, 4, 3, 6, 5, 7, 10]

In [330]:
#This modifies the given list. This is how the real method works.
def insertion_sort2(arr):
    for i in range(1,len(arr)):
        
        currentvalue = arr[i]
        position = i
        
        #We do while position > 0 for the first item. The next part figures out if 
        #the currentvalue belongs earlier in the new list or not
        while position > 0 and arr[position-1] > currentvalue:
            
            arr[position] = arr[position-1]
            position = position-1
            
        arr[position] = currentvalue
    
    return arr

In [329]:
arr1 = [5,3,7,2,1,4,10,6]
insertion_sort2(arr1)

[1, 2, 3, 4, 5, 6, 7, 10]

### Shell Sort
Instead of breaking the list into sublists of contiguous items, the shell sort uses an increment "i" to create a sublist by choosing all items taht are "i" items apart

In [348]:
def shell_sort(arr):
    sublistcount = len(arr)//2
    
    while sublistcount>0:
        print('Sublist: ',sublistcount)
        for start in range(sublistcount):
            gap_insertion_sort(arr,start,sublistcount)
            
        sublistcount = sublistcount//2
        
        
def gap_insertion_sort(arr,start,gap):
    for i in range(start+gap, len(arr),gap):
        currentvalue = arr[i] 
        position = i
        
        while position >= gap and arr[position-gap] > currentvalue:
            
            arr[position] = arr[position-gap]
            position = position - gap
            
        arr[position] = currentvalue

In [349]:
arr1 = [5,3,7,2,1,4,10,6]
shell_sort(arr1)
arr1

Sublist:  4
0 4 1 4
1 5 4 5
2 6 10 6
3 7 6 7
Sublist:  2
0 2 7 2
0 4 5 4
0 6 10 6
1 3 2 3
1 5 4 5
1 7 6 7
Sublist:  1
0 1 2 1
0 2 5 2
0 3 3 3
0 4 7 4
0 5 4 5
0 6 10 6
0 7 6 7


[1, 2, 3, 4, 5, 6, 7, 10]

### Merge Sort
Recursive algorithm that continually splits a list in half. If the list is empty or has one item, its is sorted by definition (the base case). If the list has more than oneitem, we split the list and recursively invoke a merge sort on both halves

We need 3 while loops because the first one will select the smallest value and append it, the next 2 while loop catches the other option and appends that next.

In [386]:
def merge_sort(arr):
    
    if len(arr) > 1:
        
        mid = len(arr)//2
        lefthalf = arr[:mid]
        righthalf = arr[mid:]
        
        merge_sort(lefthalf)
        merge_sort(righthalf)
        
        i = 0 #Left half
        j = 0 #Right half
        k = 0 #Final array
        
        while i < len(lefthalf) and j < len(righthalf):
            
            if lefthalf[i] < righthalf[j]:
                arr[k] = lefthalf[i]
                i += 1
                print('first: ', arr[k], i)
                
            else:
                arr[k] = righthalf[j]
                j += 1
                print('second: ', arr[k], j) 
            k += 1
            
        while i < len(lefthalf):
            arr[k] = lefthalf[i]
            print('third: ', arr[k], i)
            i += 1
            k += 1
            
            
        while j<len(righthalf):
            arr[k] = righthalf[j]
            print('fourth: ', arr[k],j)
            j += 1
            k += 1
        
        print('Merging: ',arr)
            
    return arr

In [387]:
arr1 = [5,3,7,2,1,4,10,6]
merge_sort(arr1)

second:  3 1
third:  5 0
Merging:  [3, 5]
second:  2 1
third:  7 0
Merging:  [2, 7]
second:  2 1
first:  3 1
first:  5 2
fourth:  7 1
Merging:  [2, 3, 5, 7]
first:  1 1
fourth:  4 0
Merging:  [1, 4]
second:  6 1
third:  10 0
Merging:  [6, 10]
first:  1 1
first:  4 2
fourth:  6 0
fourth:  10 1
Merging:  [1, 4, 6, 10]
second:  1 1
first:  2 1
first:  3 2
second:  4 2
first:  5 3
second:  6 3
first:  7 4
fourth:  10 3
Merging:  [1, 2, 3, 4, 5, 6, 7, 10]


[1, 2, 3, 4, 5, 6, 7, 10]

### Quick Sort
Uses the divide and conquer strategy like mergesort, but does not divide the list in half, which may lead to complications and decreased performance

First is to select a value called pivot value. The role of this pivot value is to assist with splitting the list. The actual position where the pivot value belongs to in the final sorted list is called the split point.

In [391]:
def quick_sort(arr):
    
    quick_sort_help(arr,0,len(arr)-1)
    
    return arr

def quick_sort_help(arr,first,last):
    
    if first < last:
        
        splitpoint = partition(arr,first,last)
        
        quick_sort_help(arr, first, splitpoint-1)
        quick_sort_help(arr, splitpoint+1, last)

def partition(arr,first,last):
    
    pivotvalue = arr[first]
    
    leftmark = first+1
    rightmark = last
    
    done = False
    
    while not done:
        
        while leftmark <= rightmark and arr[leftmark] <= pivotvalue:
            
            leftmark += 1
            
        while arr[rightmark] >= pivotvalue and rightmark >= leftmark:
            rightmark -= 1
            
        if rightmark < leftmark:
            done = True
            
        else:
            temp = arr[leftmark]
            arr[leftmark] = arr[rightmark]
            arr[rightmark] = temp
            
    temp = arr[first]
    arr[first] = arr[rightmark]
    arr[rightmark] = temp
    
    return rightmark

In [392]:
arr1 = [5,3,7,2,1,4,10,6]
quick_sort(arr1)

[1, 2, 3, 4, 5, 6, 7, 10]