## Search Algorithms

Array vs Python List

### Array
- static size, cannot be changed during runtime
- all elements in the array must be of the same type
- generally the smallest element has index 1 e.g. arr[1] accesses the first element unless otherwise stated.
- the upper bound used in the declaration of the array is inclusive
- DECLARE \<identifier\>:ARRAY[\<lower\>:\<upper\>] OF \<data type\>
- DECLARE a : ARRAY[1:30] OF INTEGER
- because they are fixed size, they are typically faster and more memory-efficient compared to python lists.

### Python List
- dynamic in size (can grow and shrink as needed)
- can contain elements of different types
- accessed using zero-based indexing
- DECLARE A : LIST // list is a resizeable array of items
- lists are more flexible but come with some overhead due to their dynamic nature.


## 1) Linear Search

A **linear search**, also called **serial** or **sequential** searches an item in a given array sequentially till the end of the collection. It does not require the data to be in any particular order.

To find the position of a particular value involves looking at each value in turn – starting with the first – and comparing it with the search criteria.
- When the search criteria is met, you need to record or return its location in the collection.
- You must also be able to report the special case that a value has not been found. This last part only becomes apparent when the search has reached the final data item without finding the required value.

### Example

In this example, you have the array `[10,14,19,26,27,31,33,35,42,44]` and you are looking for the value `33` in the array.


Implement the linear search function below. It returns the index of the searched value in the array if it exists. In the case that the value is not in the array, the function returns `-1`.

In [None]:
## Code
# linear search to just return the index
def linearsearchfirst(A, t):
    for i in range(0, len(A)): # range does not include the stop index
        if A[i] == t:
            return i
    return -1


## Exercise 0

Write the pseudocode for linearsearchfirst below.

In [None]:
# return the list of indices
def linearsearch(A, t):
    ret_list = []
    for i in range(0, len(A)): # range does not include the stop index
        if A[i] == t:
            ret_list.append(i)
    if ret_list:
        return ret_list
    else:
        return None


Read the pseudocode below and describe what it does.

In [None]:
// a one-dimensional array is declared as follows:
// DECLARE <idenfifier>:ARRAY[<lower>:<upper>] OR <data type>
FUNCTION LINEARSEARCH(A: ARRAY[0:N-1] of INTEGER, t: INTEGER) RETURNS LIST

    DECLARE RET_LIST AS LIST// List is a dynamic storage ADT

    FOR i = 0 TO N-1
        IF A[i] == t THEN
            APPEND(RET_LIST, i) //  APPEND will add an item to the collection
        ENDIF
    NEXT i

    RETURN RET_LIST IF RET_LIST IS NOT EMPTY ELSE RETURN NONE``

ENDFUNCTION

<div style="background-color:rgba(0, 0, 0, 0.0470588); padding:10px 0;font-family:monospace;">
    This is <font color = "red">pseudocode</font><br>
    &nbsp;&nbsp;&nbsp;&nbsp; More pseudocode <font color = "orange" over</font> there.
</div>

### Exercise 1
Code the above algorithm as a  Python function, linear_search1

In [None]:
## Code
def linearsearch1(A,t):


  linearsearch1([1,2,6,21,5,23,5,7,2,2,1,5],5)

### Exercise 2
Generate a list of 50 Random numbers between 1 and 10

Write a Python function, linear_search2 to perform a linear search on a list for all occurrences of a particular number by returning

a) their indices in the list

b) the list itself with the found items converted to a string with an * prefixed.   
Example [2, '*1', 12, '*1'] should be output when searching for the number 1

In [None]:
## Code


### Exercise 3
- Write a function to find the floor and ceiling of a target, where

    - floor is defined as the largest number in the array smaller than target and
    - ceiling is the smallest number in the array larger than target.

- Example in the array [4,1,5,2,3,10], the floor and ceiling of 3 is 2 and 4 respectively, a None value is used if not found.  

- Write the test cases to test the function


In [None]:
## Code
# target must be inside the array
def fcsearch(A: list, t: int)->tuple:
    target_found = False
    floor, ceiling = None, None
    for i in range(len(A)):
        if A[i] == t:
            target_found = True
    if target_found:
        # need to fix this
        for i in A:
            if floor == None and i < t:
                floor = i
            if floor < i < t:
                floor = i
        return (floor, ceiling)
    else:
        return (None, None)

In [None]:
def find_floor_ceiling(arr, target):

    floor = None
    ceiling = None

    for num in arr:
        if num < target:
            if floor is None or num > floor:
                floor = num
        elif num > target:
            if ceiling is None or num < ceiling:
                ceiling = num

    return floor , ceiling

find_floor_ceiling((4,1,5,2,3,10),3)

(2, 4)

In [None]:
def xxx(A, t):
    f, c = float('-inf'), float('inf')
    flag = False
    for i in A:
        if i == t:
            flag = True
        if f<i<t:
            f = i
        if t<i<c:
            c = i
    if flag:
        return [f,c]

print(xxx([4,1,5,2,3,10], 4))


[3, 5]


In [None]:
fcsearch([4,1,5,2,3,10], 3)

### Exercise 4

The text file `students.csv` contains scores for quizzes that the students have taken. Each line of the file scores the results of a single student, and has the following format:

>```python
><student name>,<Quiz 1>,<Quiz 2>,<Quiz 3>,<Quiz 4>
>```

Here, \<student name\> is a string of letters, and each \<Quiz i\> corresponds to an integer.

The maximum values for the Quizzes are as follows.
- Quiz 1 and 2: 50
- Quiz 3 and 4: 100

The overall score for each student may be calculated by using the following weights:
- Quiz 1: 10%
- Quiz 2: 15%
- Quiz 3: 25%
- Quiz 4: 50%   

 Round up the overall score to the nearest integer value


Use OOP to model the students data and operations  and write a script that will perform the following.

1. Repeatedly ask the user if they wish to perform a search. If they reply no, the script terminates, else the script continues.

2.  Asks the user to input one of the following search categories:
	- Overall score
	- Quiz 1
	- Quiz 2
	- Quiz 3
	- Quiz 4

3. Asks the user to input a search value (i.e., the score that they wish to search for).

4. Prints ALL the names of students that satisfy the search criteria.

5. The script should then continue to the main loop in 1.



In [None]:
## Code your Student class here

In [None]:
## Rest of your code

____
### 2) Binary Search

**Array MUST be sorted in ascending or desceding order**
<ol>
<li> When the array is empty, return NOT_FOUND
<li>Find the position of the middle element.
<li>If it meets the search criteria, return the middle element's position.
<li>If our target is smaller than the middle element, repeat search on the left of the list.
Else, repeat search on the right of the list.
</ol>

<div style="background-color:rgba(0, 0, 0, 0.0470588); padding:10px 0;font-family:monospace;">
    <font color = "blue">FUNCTION</font> bin_search(A: <font color = "blue">ARRAY[0: N-1]</font>, target: <font color = "blue">INTEGER</font>) <font color = "blue">RETURNS BOOLEAN</font><br>
    &nbsp;&nbsp;&nbsp;&nbsp;LB &larr; 0<br>
    &nbsp;&nbsp;&nbsp;&nbsp;UB &larr; LEN(A)-1 <font color = "green">//LEN function gives the size of the array</font><br>
    &nbsp;&nbsp;&nbsp;&nbsp;<font color = "blue">WHILE</font> LB <= UB:<br>
    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;MID &larr; (LB+UB) <font color = "blue">DIV</font> 2 <br>
    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<font color = "blue">IF</font> A[MID] = target <font color = "blue">THEN</font><br>
    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<font color = "blue">RETURN</font> TRUE<br>
    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<font color = "blue">ELSE IF</font> target < A[MID] <font color = "blue">THEN</font><br>
    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;UB &larr; MID-1<br>
    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<font color = "blue">ELSE</font> target > A[MID]<br>
    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;LB &larr; MID+1<br>
    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<font color = "blue">ENDIF</font><br>
    &nbsp;&nbsp;&nbsp;&nbsp;<font color = "blue">ENDWHILE</font><br>
    &nbsp;&nbsp;&nbsp;&nbsp;<font color = "blue">RETURN</font> FALSE<br>
<font color = "blue">ENDFUNCTION</font>
</div>

In [1]:
## Naive iterative Binary Search Code
## Assume that the array is sorted in ascending order
## Find the first occurance of the target
#def bin_search(A, target): ## return True if found else False ) O(log n)
def bin_search(a,target):
    lb = 0
    ub = len(a) - 1
    while lb<=ub:
        mid = (lb+ub)//2
        if a[mid] == target:
            return True
        elif a[mid] > target:
            ub = mid - 1
        else:
            lb = mid  + 1

    return False

bin_search([1,2,3,4,5,6,10,12,15],3)

True

In [2]:
A = [1,2,3,4, 6, 8, 10, 12, 13]
bin_search(A, 3)

True

### Exercise 5
Complete the code below for the iterative binary search for the index of the first occurrence of the target. If the target is not found, return -1.

In [None]:
## Find the first occurrence of the target
def bin_search2(a,target):
    lb = 0
    ub = len(a) - 1
    while lb<=ub:
        mid = (lb+ub)//2
        if a[mid] == target:
            return mid
        elif a[mid] > target:
            ub = mid - 1
        else:
            lb = mid  + 1

    return -1

bin_search2([1,2,3,4,5,6,10,12,15],3)

2

In [3]:
## Test Driver
A = [2,4,6,8,10]
bin_search(A, 6, 0, len(A)-1)

TypeError: bin_search() takes 2 positional arguments but 4 were given

## Write the pseudocode and flowchart for the recursive Binary Search

Pseudocode:



Draw your flowchart on paper

In [None]:
# Code the recursive version of the Binary Search


### Exercise 6
- What is the limitation of the recursive version ?
- Design the test cases to verify the correctness of your code

***Recusive binary search is limited by the depth (number) of  the recursive calls. Since each function call uses space in the call stack. The size of the call stack will limit the depth of the recusive search.***

In [None]:
### Test Cases Here
bin_search([2,4,6,8,10], 6, 0, 4) #valid
bin_search([2,4,6,8,10], 5, 0, 4) #valid
bin_search([2], 6, 0, 0)#boundary
bin_search([4,2,5,7,4], 6, 0, 0)#invalid when array is not in order

____
### 3) Big O notation

A problem can be solved in different ways, with different algorithms. Clearly, we want to use time and memory efficiently. A way of comparing the efficiency of algorithms has been devised using order of growth as a function of the size of the input.
- **Big O** notation is used to **classify algorithms** according to how their running time (or space requirements) grows as the input size grows.
- The letter *O* is used because the growth rate of a function is also referred to as 'order of the function'.
- The worst-case scenario is used when calculating the order of growth for very large data sets.

Consider the linear search algorithm covered at the beginning.
- The worst case scenario is that the item searched for is the last item in the list.
- The longer the list, the more comparisons (`if A[i] == target`) have to be made. If the list is twice as long, twice as many comparisons have to be made.
- Generally, we can say the order of growth is linear. We write this as *O($n$)*, where $n$ is the size of the data set.

Consider the binary search for the worst case scenario where the target is present in the first position.

The basic operation for this algorithm is the comparison `IF A[MID] = TARGET`

| n     | Number of Comparisons | List, target                          |
| :---: |         :---:         | :---                                  |
| 1     |           1           |                                       |
| 4     |           2           | [2,4,6,8], 2                          |
| 8     |           3           | [2,4,6,8,10,12,14,16], 2              |
| 11    |           3           | [2,4,6,8,10,12,14,16,18,20,22], 2     |

With each iteration, this algorithm halves the number of values in the data set. This iterative halving of data sets produces a growth curve that peaks at the beginning and slowly flattens out as the size of the data sets increase. This type of algorithm is described as O($\log_2 n$).

Two further very important principles in working with the Big-O notation:
- Constant factors doesn't matter. In other words, if we have $O\left(df\left(n\right)\right)$ for some constant $d>0$, then the growth time is $O\left(f\left(n\right)\right)$, i.e. we ignore the multiplicative constants.
- The low-order terms don't matter. For example, if we have $O\left(n^3+n\right)$, then the growth time is is $O\left(n^3\right)$. In particular, we can ignore the additive constants as well.

The table below shows a summary of standard algorithms and their order of growth (time complexity)

| Order of growth   | Example              | Explanation                          |
| :---              |         :---         | :---                                 |
|O(1) | FUNCTION GetFirstItem(List: ARRAY)<br>RETURN List[1] | The complexity of the algorithm does not change regardless of data set size |
|O($n$)             | Linear search | Linear Growth |
|O($\log n$) | Binary search | The total time taken increases as the data set size increases, but each comparison halves the data set. So the time taken increases by smaller amounts and approaches constant time. |
|O($n^2$) | Bubble sort <br> Insertion sort | Polynomial growth <br> Common with algorithms that involve nested iterations over the data set |
|O($n^3$) | | Polynomial growth <br> Deeper nested iterations will result in O($n^3$), O($n^4$), $\ldots$ |
|O($2^n$) | Recursive calculation of Fibonacci numbers | Exponential growth |

![Comparing Complexity](https://www.happycoders.eu/wp-content/uploads/2020/05/big-o-notation-comparing-complexity-classes-v2.png)

#### Another perspective

The following table lists the common running times for algorithms and their names.

<center>

| Big-Oh | Name |
|-|-|
| $O\left(1\right)$ | constant |
| $O\left(\log n \right)$ | logarithmic |
| $O\left(n\right)$ | linear |
| $O\left(n\log n\right)$ | log-linear |
| $O\left(n^2\right)$ | quadratic |
| $O\left(n^k\right)$ | polynomial, $k\in \mathbb{Z^+}, k\geq 2$ |
| $O\left(k^n\right)$ | exponential, $k\in \mathbb{Z^+}, k\geq 2$ |

</center>

The entries in the table above are arranged in the order of ascending <b>efficiency</b> ~~running time~~, i.e. the lower its position is in the table, the slower the running time is.

For most cases, the following holds for algorithms:
- $O(1)$ - algorithm doesn't depend on input size
- $O(\log n)$ - problem gets reduced in half each time through the process
- $O(n)$ - simple iterative or recursive programs
- $O(n^k)$ - nested loops or recursive calls
- $O(k^n)$ - multiple recursive calls at each level

## Example

Determine the orders of growth of the algorithms with the following running time:
- $n^2+2n+2$,
- $n^2+10000n+3^{10000}$,
- $\log(n)+n+4$,
- $0.0001n\log(n)+300n$,
- $2n^{30}+3^n$.

#YOUR_ANSWER_HERE
- $n^2+2n+2$, $O(n^2)$
- $n^2+10000n+3^{10000}$, $O(n^2)$
- $\log(n)+n+4$, $O(n)$
- $0.0001n\log(n)+300n$, $O(n \log n)$
- $2n^{30}+3^n$. $O(3^n)$

### Exercise

Determine the orders of growth of the search and sort algorithms from the previous chapter. Assume that the input is an array of size $n$.
- linear search $O(n)$
- binary search $O(\log n)$