https://bradfieldcs.com/algos/searching/searching/

### The Sequential Search 

In [134]:
def sequential_search(input_list, item):
    for i in range(len(input_list)):
        if item == input_list[i]:
            return True
    return False

In [135]:
testlist = [1, 2, 32, 8, 17, 19, 42, 13, 0]

print(sequential_search(testlist, 3))  # => False
print(sequential_search(testlist, 13))  # => True

False
True


### Analysis of Sequential Search

In [136]:
def ordered_sequential_search(input_list, item):
    for i in range(len(input_list)):
        current_item = input_list[i]
        if item == current_item:
            return True
        elif item < current_item:
            return False
    return False       

In [137]:
testlist = [0, 1, 2, 8, 13, 17, 19, 32, 42]

print(ordered_sequential_search(testlist, 3))  # => False
print(ordered_sequential_search(testlist, 13))  # => True

False
True


 ### The Binary Search 

#### The Binary Search (slice mode)

In [138]:
def binary_search(input_list, item):
    length = len(input_list)
    if length == 0:
        return False
    # elif length == 1:
    #     if input_list[0] == item:
    #         return True
    #     else:
    #         return False
    else:
        index_to_look = length//2
        current_item = input_list[index_to_look]
        if current_item == item:
            return True
        elif current_item < item:
            return binary_search(input_list[index_to_look+1:], item)
        elif current_item > item:
            return binary_search(input_list[:index_to_look], item)

In [139]:
testlist = [0, 1, 2, 8, 13, 17, 19, 32, 42]
print(binary_search(testlist, 3))  # => False
print(binary_search(testlist, 13))  # => True

False
True


Even though a binary search is generally better than a sequential search, it is important to note that for small values of n, the additional cost of sorting is probably not worth it. In fact, we should always consider whether it is cost effective to take on the extra work of sorting to gain searching benefits. If we can sort once and then search many times, the cost of the sort is not so significant. However, for large lists, sorting even once can be so expensive that simply performing a sequential search from the start may be the best choice.

#### The Binary Search (non-slice mode)

In [140]:
def binary_search_non_slice(input_list, item, start_index=0, end_index=None):
    if end_index is None:
        length = len(input_list)
        end_index = length
    length = end_index-start_index
    
    if length == 0:
        return False
    else:
        index_to_look = ((end_index-start_index)//2)+start_index
        current_item = input_list[index_to_look]
        if current_item == item:
            return True
        elif current_item < item:
            return binary_search_non_slice(input_list, item, start_index=index_to_look+1, end_index=end_index)
        elif current_item > item:
            return binary_search_non_slice(input_list, item, start_index=start_index, end_index=index_to_look)

In [141]:
testlist = [0, 1, 2, 8, 13, 17, 19, 32, 42]
print(binary_search_non_slice(testlist, 3))  # => False
print(binary_search_non_slice(testlist, 13))  # => True

False
True


### Time Test

In [142]:
testlist = [2*i for i in range(100_000_000)]

##### ordered search is better the item is not present in the list (only from performance point of view- not functionality and in real example) - the reason is having two conditions in the while loop

In [143]:
value= 51_000_001
%time print(sequential_search(testlist, value))
%time print(ordered_sequential_search(testlist, value))
%time print(binary_search(testlist, value))
%time print(binary_search_non_slice(testlist, value))

False
CPU times: user 3.88 s, sys: 0 ns, total: 3.88 s
Wall time: 3.87 s
False
CPU times: user 1.46 s, sys: 0 ns, total: 1.46 s
Wall time: 1.46 s
False
CPU times: user 821 ms, sys: 173 ms, total: 993 ms
Wall time: 987 ms
False
CPU times: user 47 µs, sys: 7 µs, total: 54 µs
Wall time: 56.5 µs


### Hashing

collision ?

#### Hash Functions

folding method 

mid-square method

In [151]:
def hash_(astring, tablesize):
    the_sum = sum(ord(char) for char in astring)
    return the_sum % tablesize    

In [153]:
hash_("12weere",50)

35

You may be able to think of a number of additional ways to compute hash values for items in a collection. The important thing to remember is that the hash function has to be efficient so that it does not become the dominant part of the storage and search process. If the hash function is too complex, then it becomes more work to compute the slot name than it would be to simply do a basic sequential or binary search as described earlier. This would quickly defeat the purpose of hashing.

#### Collision Resolution

Collision resolution with linear probing

the tendency for clustering

Collision resolution using “plus 3”

rehashing

The “plus x” rehash can be defined as:

#### rehash(pos)=(pos+x)%sizeoftable

x => skip

It is important to note that the size of the “skip” must be such that all the slots in the table will eventually be visited. Otherwise, part of the table will be unused. To ensure this, it is often suggested that the table size be a prime number. This is the reason we have been using 11 in our examples.


A variation of the linear probing idea is called quadratic probing. Instead of using a constant “skip” value, we use a rehash function that increments the hash value by 1, 3, 5, 7, 9, and so on. This means that if the first hash value is h, the successive values are h+1, h+4, h+9, h+16, and so on. In other words, quadratic probing uses a skip consisting of successive perfect squares.

Collision resolution with chaining

### Analysis of Hashing

The most important piece of information we need to analyze the use of a hash table is the load factor, λ. Conceptually, if λ is small, then there is a lower chance of collisions, meaning that items are more likely to be in the slots where they belong. If λ is large, meaning that the table is filling up, then there are more and more collisions. This means that collision resolution is more difficult, requiring more comparisons to find an empty slot. With chaining, increased collisions means an increased number of items on each chain.

Check the URL of the notebook for more precise analysis