# Algorithms 2: Binary Search

## The Binary Search Algorithm

**Binary search** is a divide-and-conquer search algorithm. Given a list of
values that is in ascending *sorted* order from smallest to biggest, it quickly
finds the index of a given target value $x$ in the list, or returns -1 if $x$ is
not in the list. 

As we will see, binary search is *much* faster than linear search, especially
when searching large lists.


## A Binary Search Algorithm

Here's a binary search algorithm. Importantly, it is assumed that the list is
already in sorted order from biggest to smallest:

1. Check if the middle element of the list is equal to $x$. If so, we're done.
2. If $x$ is less than the middle element, then $x$ must be in the left half of
   the list. So we apply binary search to the left half.
3. If $x$ is greater than the middle element, then $x$ must be in the right half
   of the list. So we apply binary search to the right half.

Each step of the algorithm reduces the size of the list to be searched by half
which quickly narrows down the search to a small number of elements.

## Example Trace

For example, suppose we want to find 5 in the list `[0, 2, 3, 4, 8, 9, 10]`. We
will use variables `lo` and `hi` to keep track of the current sub-list that is
being searched. Initially, we search the entire list:

```
looking for x = 5

[0, 2, 3, 4, 8, 9, 10]
 ^                  ^
 |                  |
lo                  hi
```

The middle element is 4. Since 5 is greater than 4, if 5 is in the list it must
be in the right half. So we adjust `lo`:

```
looking for x = 5

[0, 2, 3, 4, 8, 9, 10]
             ^     ^
             |     |
            lo    hi
```

The middle element of the current sub-list is 9. Since 5 is less than 9, if 5 is
in the list it must be in the left half. So we adjust `hi`:

```
looking for x = 5

[0, 2, 3, 4, 8, 9, 10]
             ^
             |
           lo, hi
```

Both `lo` and `hi` are pointing to the same element, and that element is not 5.
So we're done: 5 is not in the list.


## Binary Search Implementation

Here is an implementation of binary search:

In [1]:
def binary_search(x, lst):
    """Returns an index i such that list[i] == x.
    If x is not in lst, returns -1.
    lst must be in sorted order, from smallest to biggest.
    """
    lo = 0
    hi = len(lst) - 1
    while lo <= hi:
        mid = (lo + hi) // 2  # // is integer division
        if lst[mid] == x:
            return mid        # x found at location mid
        elif x < lst[mid]:
            hi = mid - 1
        else:                 # lst[mid] < x
            lo = mid + 1
    return -1                 # x not in lst

def binary_search_test():
    """Test that binary search works correctly.
    assert expr does nothing if expr evaluates to True. If expr evaluates to
    False, then the program immediately crashes with an error message.
    """
    assert binary_search(5, []             ) == -1
    assert binary_search(1, [3]            ) == -1
    assert binary_search(3, [3]            ) == 0
    assert binary_search(5, [3]            ) == -1
    assert binary_search(0, [1, 3]         ) == -1
    assert binary_search(1, [1, 3]         ) == 0
    assert binary_search(2, [1, 3]         ) == -1
    assert binary_search(3, [1, 3]         ) == 1
    assert binary_search(5, [5, 5, 5]      ) in [0, 1, 2]
    assert binary_search(5, [1, 2, 5, 4, 7]) ==  2
    assert binary_search(7, [1, 2, 5, 4, 7]) ==  4
    print('all binary_search tests passed!')

binary_search_test()

all binary_search tests passed!


To help ensure our implementation is correct, we also wrote the test function
`binary_search_test`. If any of the test cases fail, then `assert` crashes the
program with a line number and error message.

## Comparing Linear Search and Binary Search

Binary search is typically *much* faster than linear search, especially when
searching long lists. This graph shows the result of one comparison:

 ![sorting time line graph for linear and binary search](searchRealTimeGraph_small.png)

This shows that typical while-style linear search is slower than both the
built-in `index` and binary search. 

The linear search graph is not perfectly smooth since the computer running the
experiment occasionally switches to do other things, as decided upon by the
operating system or the user (e.g. the user might be checking their email while
the experiment runs).

The bottom green line for binary search doesn't appear to go up at all. But it
does: the scale of the graph is such that the increase is not visible. Binary
search is so much faster than linear search that they are hard to compare on the
same graph.

> **Warning!** This graph is misleading because it does not take into the
account the fact that binary search only works on lists that are in sorted
order. If you must also sort the list, then linear can be faster! We will see in
the section on sorting that the best sorting algorithms are usually *slower*
than linear search.

## Performance of Linear Search and Binary Search

We've seen that, in the worst case, linear search does $n$ comparisons to search
a list of $n$ items. To search a list of $n$ (sorted) items, binary search does
$\log_2 n$ comparisons, which is much faster.

This table gives you an idea of how much faster binary search is than linear
search:

|    # items searched <br> n      | Linear search comparisons <br> $n$      | Binary search comparisons <br>$\log_2 n$ |
|-----------|-------------------------|---------------------------|
|      10   |                      10 |                3.32       |
|     100   |                     100 |                6.64       |
|    1,000  |                   1,000 |                9.97       |
|   10,000  |                  10,000 |               13.29       |
|  100,000  |                 100,000 |               16.61       |
| 1,000,000 |               1,000,000 |               19.93       |

The last row shows that, in the worst case, linear search does a million
comparisons to search a list of size 1 million, while binary search does about
20 comparisons. Binary search is about 50,000 times faster!

**Again a warning!** Binary search requires that the list it is searching be in
sorted order, and so the table assumes that the list for binary search is
already sorted. If you must also sort the data, then linear search can be
faster! We will see in the section on sorting that the best general-purpose
sorting algorithms are *slower* than linear search.

## Questions

1. [Explain like I'm five](https://www.reddit.com/r/explainlikeimfive/): what is
   binary search and how does it work?
2. In `binary_search_test`, what is the purpose of the `assert` statements? How
   do they work?
3. In the graph for linear search, why is the linear search line not perfectly
   straight?
4. Suppose you're search for `x` using `binary_search` in a sorted list that
   contains 2 copies of `x`. Which one `x` is found: the one with the smallest
   index, or the one with the biggest index?
5. In the worst case, how many times will an $n$-element list be cut in half by
   binary search?
6. Twice in the notes the reader is warned that the results for binary search
   can be misleading. Why are they misleading?
7. Binary search is called "binary" because it splits the list into two parts at
   each step. Write an algorithm for
   "[ternary](https://www.merriam-webster.com/dictionary/ternary) search".