# Lecture 26 Notes

## The Binary Search Algorithm

**Binary search** is a divide-and-conquer search algorithm. Given a list of
values that is in ascending sorted order, it quickly finds the index of a target
value $x$ in the list, or returns -1 if $x$ is not in the list. It is much
faster than linear search, especially for large lists.

It works like this:
- Check if the middle-most element of the list is equal to $x$. If so, we're
  done.
- If $x$ is less than the middle element, then $x$ must be in the left half of
  the list. So we apply binary search to the left half.
- If $x$ is greater than the middle element, then $x$ must be in the right half
  of the list. So we apply binary search to the right half.

For example, suppose the list is `[0, 2, 3, 4, 8, 9, 10]` and you want to find
the number 5. We will use variables `lo` and `hi` to keep track of the current
sub-list that is being searched. Initially, we search the entire list:

```
looking for x = 5

[0, 2, 3, 4, 8, 9, 10]
 ^                  ^
 |                  |
lo                  hi
```

The middle element is 4. Since 5 is greater than 4, if 5 is in the list it must
be in the right half. So adjust `lo`:

```
looking for x = 5

[0, 2, 3, 4, 8, 9, 10]
             ^     ^
             |     |
            lo    hi
```

The middle element of the current sub-list is 9. Since 5 is less than 9, if 5 is
in the list it must be in the left half. So we adjust `hi`:

```
looking for x = 5

[0, 2, 3, 4, 8, 9, 10]
             ^
             |     
            lo, hi
```

Both `lo` and `hi` are pointing to the same element, and that element is not 5.
So we're done: 5 is not in the list.

Here is an implementation:

```python
def binary_search(x, lst):
    """Returns an index i such that list[i] == x.
    If x is not in lst, returns -1.
    lst must be in sorted order, from smallest to biggest.
    """
    lo = 0
    hi = len(lst) - 1
    while lo <= hi:
        mid = (lo + hi) // 2
        if lst[mid] == x:
            return mid    # x found at location mid
        elif x < lst[mid]:
            hi = mid - 1
        else: # lst[mid] < x
            lo = mid + 1
    return -1             # x not in lst
```

Binary search is notoriously tricky to implement, and so we also provide a
testing function to help catch any bugs:

In [1]:
def binary_search(x, lst):
    """Returns an index i such that list[i] == x.
    If x is not in lst, returns -1.
    lst must be in sorted order, from smallest to biggest.
    """
    lo = 0
    hi = len(lst) - 1
    while lo <= hi:
        mid = (lo + hi) // 2  # // is integer division
        if lst[mid] == x:
            return mid        # x found at location mid
        elif x < lst[mid]:
            hi = mid - 1
        else:                 # lst[mid] < x
            lo = mid + 1
    return -1                 # x not in lst

def binary_search_test():
    """Test that binary search works correctly.
    assert expr does nothing if expr evaluates to True. If expr evaluates to
    False, then the program immediately crashes with an error message.
    """
    assert binary_search(5, []             ) == -1
    assert binary_search(5, [3]            ) == -1
    assert binary_search(5, [3, 1]         ) == -1
    assert binary_search(5, [5]            ) ==  0
    assert binary_search(5, [5, 2]         ) ==  0
    assert binary_search(5, [2, 5]         ) ==  1
    assert binary_search(5, [5, 5, 5]      ) in [0, 1, 2]
    assert binary_search(5, [1, 2, 5, 4, 7]) ==  2
    print('all binary_search tests passed!')

binary_search_test()

all binary_search tests passed!


## Comparing Linear Search and Binary Search

Binary search is typically much faster than linear search, especially when
searching long lists. Here's the results of an experiment showing how their real
time performance compares:

# ![sorting time line graph for linear and binary search](searchRealTimeGraph_small.png)

This shows that are linear search is slower than both the built-in `index` and
binary search. The linear search graph is not perfectly smooth since the
computer running the experiment is doing other things at the same time, and so
occasionally slows down or speeds up.

## Estimated Performance of Linear Search and Binary Search

We can get estimates of the performance of linear search and binary search by
looking at the number of **comparisons** they do. Comparisons turns out to be
proportional to the running-time of algorithms like linear search and binary
search.

Suppose you run linear search on a list of 100 items. It might:

- **Do 1 comparison**. This happens when the first element is the target.
- **Do 100 comparisons**. This happens when the target is the last element, or
  not in the list.
- **Do somewhere between 1 and 100 comparisons**, depending on where the target
  is.

If we assume that the target is equally likely to be at any position, then, on
average, we would expect to do about 50 comparisons. In general, if a list has
$n$ items, then linear search does about $\frac{n}{2}$ comparisons on average.

In contrast, binary search cuts the list in half at each step. So if there are
initially 100 items (in ascending sorted order), then, the size of the list
decreases like this: 100, 50, 25, 12, 6, 3, 1. In general, if a list has $n$
items, then binary search does, at most, $\log_2 n$ comparisons.

Since the number of comparisons is proportional to the running time, we can say
that linear search runs in time proportional to $n$, and binary search runs in
time proportional to $\log_2 n$.

To get an idea of how much better binary search is, look at this table:

|   **n**  | **log2 n** |
|----------|------------|
|    16    |      4     |
|    32    |      5     |
|    64    |      6     |
|   128    |      7     |
| 1048576  |     20     |

1048576 is $2^{20}$, which is just over a million. If you have a sorted list of
a million values, then, in the worst case, linear search would do 1048576
comparisons, but binary search would only do about $\log_2 1000000 \approx 20$
comparisons.