# Chapter 10 - Some Simple Algorithms and Data Structures

Though we expend a fair number of pages in this book talking about efficiency,
the goal is not to make you expert in designing efficient programs. There are
many long books (and even some good long books) devoted exclusively to that
topic.56 In Chapter 9, we introduced some of the basic concepts underlying complexity analysis. In this chapter we use those concepts to look at the complexity of
a few classic algorithms. The goal of this chapter is to help you develop some
general intuitions about how to approach questions of efficiency. By the time you
get through this chapter you should understand why some programs complete in
the blink of an eye, why some need to run overnight, and why some wouldn’t
complete in your lifetime.
The first algorithms we looked at in this book were based on brute-force exhaustive enumeration. We argued that modern computers are so fast that it is often the case that employing clever algorithms is a waste of time. Writing code
that is simple and obviously correct, is often the right way to go.

We then looked at some problems (e.g., finding an approximation to the
roots of a polynomial) where the search space was too large to make brute force
practical. This led us to consider more efficient algorithms such as bisection
search and Newton-Raphson. The major point was that the key to efficiency is a
good algorithm, not clever coding tricks.
In the sciences (physical, life, and social), programmers often start by quickly
coding up a simple algorithm to test the plausibility of a hypothesis about a data
set, and then run it on a small amount of data. If this yields encouraging results,
the hard work of producing an implementation that can be run (perhaps over
and over again) on large data sets begins. Such implementations need to be based
on efficient algorithms.
Efficient algorithms are hard to invent. Successful professional computer scientists might invent one algorithm during their whole career—if they are lucky.
Most of us never invent a novel algorithm. What we do instead is learn to reduce
the most complex aspects of the problems we are faced with to previously solved
problems.

More specifically, we
* Develop an understanding of the inherent complexity of the problem,
* Think about how to break that problem up into subproblems, and
* Relate those subproblems to other problems for which efficient algorithms already exist.
This chapter contains a few examples intended to give you some intuition
about algorithm design. Many other algorithms appear elsewhere in the book.
Keep in mind that the most efficient algorithm is not always the algorithm of
choice. A program that does everything in the most efficient possible way is often
needlessly difficult to understand. It is often a good strategy to start by solving
the problem at hand in the most straightforward manner possible, instrument it
to find any computational bottlenecks, and then look for ways to improve the
computational complexity of those parts of the program contributing to the bottlenecks.

## 10.1 Search Algorithms

A search algorithm is a method for finding an item or group of items with specific properties within a collection of items. We refer to the collection of items as a
search space. The search space might be something concrete, such as a set of electronic medical records, or something abstract, such as the set of all integers. A
large number of problems that occur in practice can be formulated as search
problems.
Many of the algorithms presented earlier in this book can be viewed as
search algorithms. In Chapter 3, we formulated finding an approximation to the
roots of a polynomial as a search problem, and looked at three algorithms—
exhaustive enumeration, bisection search, and Newton-Raphson—for searching
the space of possible answers.
In this section, we will examine two algorithms for searching a list. Each
meets the specification

In [1]:
def search(L,e):
    """Assumes L is a list. Returns True if e is in L and False otherwise."""

The astute reader might wonder if this is not semantically equivalent to the
Python expression e in L. The answer is yes, it is. And if one is unconcerned
about the efficiency of discovering whether e is in L, one should simply write that
expression.

### 10.1.1 Linear Search and Using Indirection to Access Elements

Python uses the following algorithm to determine if an element is in a list: 

In [4]:
def search(L,e):
    """Assumes L is a list. Returns True if e is in L and False otherwise."""
    for i in range(len(L)):
        if L[i] == e:
            return True
    return False

In [5]:
L = [1,2,5,'a','k']
e = 'a'
search(L,e)

True

If the element e is not in the list the algorithm will perform O(len(L)) tests,
i.e., the complexity is at best linear in the length of L. Why “at best” linear? It will
be linear only if each operation inside the loop can be done in constant time.
That raises the question of whether Python retrieves the ith element of a list in
constant time. Since our model of computation assumes that fetching the contents of an address is a constant-time operation, the question becomes whether
we can compute the address of the ith element of a list in constant time.

Let’s start by considering the simple case where each element of the list is an
integer. This implies that each element of the list is the same size, e.g., four units
of memory (four eight-bit bytes57). Assuming that the elements of the list are
stored contiguously, the address in memory of the ith element of the list is simply
start	+	4* i, where start is the address of the start of the list. Therefore we can assume that Python could compute the address of the ith element of a list of integers
in constant time.

Of course, we know that Python lists can contain objects of types other than
int, and that the same list can contain objects of many different types and sizes.
You might think that this would present a problem, but it does not.

In Python, a list is represented as a length (the number of objects in the list)
and a sequence of fixed-size pointers58 to objects. Figure 10.1 illustrates the use
of these pointers. The shaded region represents a list containing four elements.
The leftmost shaded box contains a pointer to an integer indicating the length of
the list. Each of the other shaded boxes contains a pointer to an object in the list.

![](implement_list.jpg)

If the length field occupies four units of memory, and each pointer (address)
occupies four units of memory, the address of the ith element of the list is stored
at the address start	+	4	+	4* i. Again, this address can be found in constant time,
and then the value stored at that address can be used to access the ith element.
This access too is a constant-time operation.

This example illustrates one of the most important implementation techniques used in computing: indirection.59 Generally speaking, indirection involves
accessing something by first accessing something else that contains a reference to
the thing initially sought. This is what happens each time we use a variable to refer to the object to which that variable is bound. When we use a variable to access
a list and then a reference stored in that list to access another object, we are going
through two levels of indirection.

### 10.1.2 Binary Search and Exploiting Assumptions.

Getting back to the problem of implementing search(L, e), is O(len(L)) the best
we can do? Yes, if we know nothing about the relationship of the values of the elements in the list and the order in which they are stored. In the worst case, we
have to look at each element in L to determine whether L contains e.

But suppose we know something about the order in which elements are
stored, e.g., suppose we know that we have a list of integers stored in ascending
order. We could change the implementation so that the search stops when it reaches a number larger than the number for which it is searching.

In [6]:
def search(L,e):
    """Assumes L is a list, the elements of which are in ascending order.
       Returns True if e is in L and False otherwise."""
    for i in range(len(L)):
        if L[i] == e:
            return True
        if L[i]>e:
            return False
    return False

This would improve the average running time. However, it would not change
the worst-case complexity of the algorithm, since in the worst case each element
of L is examined.

We can, however, get a considerable improvement in the worst-case complexity by using an algorithm, binary search, that is similar to the bisection
search algorithm used in Chapter 3 to find an approximation to the square root
of a floating point number. There we relied upon the fact that there is an intrinsic
total ordering on floating point numbers. Here we rely on the assumption that
the list is ordered.

The idea is simple:

1. Pick an index, i, that divides the list L roughly in half.
2. Ask if L[i] == e.
3. If not, ask whether L[i] is larger or smaller than e.
4. Depending upon the answer, search either the left or right half of L for e.

Given the structure of this algorithm, it is not surprising that the most
straightforward implementation of binary search uses recursion, as shown in the following function:

In [9]:
def search(L,e):
    """Assumes L is a list, the elements of which are in ascending order.
       Returns True if e is in L and False otherwise."""
    
    def bSearch(L,e,low,high):
        #Decrements high-low
        if high == low:
            return L[low] == e #since len(L)=1
        mid = (low+high)//2 
        if L[mid] == e:
            return True
        elif L[mid]>e:
            if low == mid: #nothing left to search 
                return False
            else:
                return bSearch(L,e,low,mid-1)
        else:
            return bSearch(L,e,mid+1,high)
        
    if len(L) == 0:
        return False
    else:
        #low=0 and high=len(L)-1
        return bSearch(L,e,0,len(L)-1)

Functions such as search are often called wrapper functions. The function
provides a nice interface for client code, but is essentially a pass-through that
does no serious computation. Instead, it calls the helper function bSearch with
appropriate arguments. This raises the question of why not eliminate search and
have clients call bSearch directly? The reason is that the parameters low and high
have nothing to do with the abstraction of searching a list for an element. They
are implementation details that should be hidden from those writing programs
that call search.

Let us now analyze the complexity of bSearch. We showed in the last section
that list access takes constant time. Therefore, we can see that excluding the recursive call, each instance of bSearch is O(1). Therefore, the complexity of bSearch
depends only upon the number of recursive calls.
If this were a book about algorithms, we would now dive into a careful analysis using something called a recurrence relation. But since it isn’t, we will take a much less formal approach that starts with the question “How do we know that
the program terminates?” Recall that in Chapter 3 we asked the same question
about a while loop. We answered the question by providing a decrementing function for the loop. We do the same thing here. In this context, the decrementing
function has the properties:
* It maps the values to which the formal parameters are bound to a nonnegative
integer.
* When its value is 0, the recursion terminates.
* For each recursive call, the value of the decrementing function is less than the
value of the decrementing function on entry to the instance of the function
making the call.

The decrementing function for bSearch is high–low. The if statement in
search ensures that the value of this decrementing function is at least 0 the first
time bSearch is called (decrementing function property 1).
When bSearch is entered, if high–low is exactly 0, the function makes no recursive call—simply returning the value L[low] == e (satisfying decrementing
function property 2).

The function bSearch contains two recursive calls. One call uses arguments
that cover all the elements to the left of mid, and the other call uses arguments
that cover all the elements to the right of mid. In either case, the value of high–low
is cut in half (satisfying decrementing function property 3).

We now understand why the recursion terminates. The next question is how
many times can the value of high–low be cut in half before high–low == 0? Recall
that logy(x) is the number of times that y has to be multiplied by itself to reach x.
Conversely, if x is divided by y^logy(x) times, the result is 1. This implies that
high–low can be cut in half using integer division at most log2(high–low) times before it reaches 0.

Finally, we can answer the question, what is the algorithmic complexity of
binary search? Since when search calls bSearch the value of high–low is equal to
len(L)-1, the complexity of search is O(log(len(L))).

## 10.2 Sorting Algorithm

We have just seen that if we happen to know that a list is sorted, we can exploit
that information to greatly reduce the time needed to search a list. Does this
mean that when asked to search a list one should first sort it and then perform
the search?

Let O(sortComplexity(L)) be the complexity of sorting a list. Since we know
that we can always search a list in O(len(L)) time, the question of whether we
should first sort and then search boils down to the question, is sortComplexity(L)	+	 log(len(L))	 less	 than len(L)? The answer, sadly, is no. One cannot sort a list
without looking at each element in the list at least once, so it is not possible to
sort a list in sub-linear time.

Fortunately, sorting can be done rather efficiently. For example, the standard
implementation of sorting in most Python implementations runs in roughly
O(n* log(n)) time, where n is the length of the list. In practice, you will rarely need
to implement your own sort function. In most cases, the right thing to do is to
use either Python’s built-in sort method (L.sort() sorts the list L) or its built-in
function sorted (sorted(L) returns a list with the same elements as L, but does not
mutate L). We present sorting algorithms here primarily to provide some practice
in thinking about algorithm design and complexity analysis.

We begin with a simple but inefficient algorithm, selection sort. Selection
sort, Figure 10.4, works by maintaining the loop invariant that, given a partitioning of the list into a prefix (L[0:i]) and a suffix (L[i+1:len(L)]), the prefix is sorted and no element in the prefix is larger than the smallest element in the suffix.

We use induction to reason about loop invariants.
* Base case: At the start of the first iteration, the prefix is empty, i.e., the suffix is
the entire list. Therefore, the invariant is (trivially) true.
* Induction step: At each step of the algorithm, we move one element from the
suffix to the prefix. We do this by appending a minimum element of the suffix
to the end of the prefix. Because the invariant held before we moved the element, we know that after we append the element the prefix is still sorted. We
also know that since we removed the smallest element in the suffix, no element
in the prefix is larger than the smallest element in the suffix.
* Termination: When the loop is exited, the prefix includes the entire list, and
the suffix is empty. Therefore, the entire list is now sorted in ascending order.

In [15]:
def selSort(L):
    """Assumes that L is a list of elemets that can be compared using >. Sorts L in ascending order."""
    suffixStart = 0
    while suffixStart != len(L):
        #look at each element in suffix
        for i in range(suffixStart, len(L)):
            if L[i] < L[suffixStart]:
                #swap position of elements
                L[suffixStart], L[i] = L[i], L[suffixStart]
        suffixStart += 1

In [18]:
L = [1,78,3,6,4,8,34,972,10]
selSort(L)
print('L =', L)

L = [1, 3, 4, 6, 8, 10, 34, 78, 972]


It’s hard to imagine a simpler or more obviously correct sorting algorithm.
Unfortunately, it is rather inefficient.62 The complexity of the inner loop is
O(len(L)). The complexity of the outer loop is also O(len(L)). So, the complexity
of the entire function is O(len(L)2). I.e., it is quadratic in the length of L.

### 10.2.1 Merge Sort

Fortunately, we can do a lot better than quadratic time using a divide-andconquer algorithm. The basic idea is to combine solutions of simpler instances of
the original problem. In general, a divide-and-conquer algorithm is characterized by:

* A threshold input size, below which the problem is not subdivided,
* The size and number of sub-instances into which an instance is split, and
* The algorithm used to combine sub-solutions.

The threshold is sometimes called the recursive base. For item 2 it is usual to
consider the ratio of initial problem size to the sub-instance size. In most of the
examples we’ve seen so far, the ratio was 2.
Merge sort is a prototypical divide-and-conquer algorithm. It was invented
in 1945, by John von Neumann, and is still widely used. Like many divide-andconquer algorithms it is most easily described recursively:
1. If the list is of length 0 or 1, it is already sorted.
2. If the list has more than one element, split the list into two lists, and use
merge sort to sort each of them.
3. Merge the results.
The key observation made by von Neumann is that two sorted lists can be efficiently merged into a single sorted list. The idea is to look at the first element of
each list, and move the smaller of the two to the end of the result list. When one
of the lists is empty, all that remains is to copy the remaining items from the other list. Consider, for example, merging the two lists [1,5,12,18,19,20] and
[2,3,4,17]:

![](merge_sort.jpg)

What is the complexity of the merge process? It involves two constant-time
operations, comparing the values of elements and copying elements from one list
to another. The number of comparisons is O(len(L)), where L is the longer of the
two lists. The number of copy operations is O(len(L1)	+	len(L2)), because each
element gets copied exactly once. (The time to copy an element will depend on
the size of the element. However, this does not affect the order of the growth of
sort as a function of the number of elements in the list.) Therefore, merging two
sorted lists is linear in the length of the lists.

In [22]:
def merge(left,right,compare):
    """Assumes left and right are sorted list and compare is a function defines an ordering on the elements.
       Returns a new sorted (by compare) list containing the same elements as (left+right) would contain."""
    result = []
    i,j = 0,0
    while i<len(left) and j<len(right):
        if compare(left[i],right[j]):
            result.append(left[i])
            i += 1
        else:
            result.append(right[j])
            j += 1
    while (i<len(left)):
        result.append(left[i])
        i += 1
    while (j<len(right)):
        result.append(right[j])
        j += 1
    return result

def mergeSort(L,compare = lambda x,y: x<y):
    """Assumes L is a list, compare defines an ordering on elements of L. 
       Returns a new sorted list with the same element as L"""
    if len(L)<2:
        return L[:]
    else:
        middle = len(L)//2
        left = mergeSort(L[:middle], compare)
        right = mergeSort(L[middle:], compare)
        return merge(left,right,compare)

In [23]:
L = [2,1,4,5,3]
print(mergeSort(L), mergeSort(L, lambda x, y: x > y))

[1, 2, 3, 4, 5] [5, 4, 3, 2, 1]


Let’s analyze the complexity of mergeSort. We already know that the time
complexity of merge is O(len(L)). *__At each level of recursion the total number of
elements to be merged is len(L). Therefore, the time complexity of mergeSort is
O(len(L)) multiplied by the number of levels of recursion. Since mergeSort divides
the list in half each time, we know that the number of levels of recursion is
O(log(len(L)). Therefore, the time complexity of mergeSort is O(n$*$log(n)), where n
is len(L).__*

This is a lot better than selection sort’s O(len(L)2). For example, if L has
10,000 elements, len(L)2 is 100 million but len(L)* log2(len(L)) is about 130,000.
This improvement in time complexity comes with a price. Selection sort is an
example of an in-place sorting algorithm. Because it works by swapping the place
of elements within the list, it uses only a constant amount of extra storage (one
element in our implementation). *__In contrast, the merge sort algorithm involves
making copies of the list. This means that its space complexity is O(len(L)). This
can be an issue for large lists.__*

### 10.2.2 Exploting Function as Parameters

Suppose we want to sort a list of names written as firstName lastName, e.g., the
list ['Chris Terman', 'Tom Brady', 'Eric Grimson', 'Gisele Bundchen']. The following codedefines two ordering functions, and then  uses these to sort a list in two different ways. Each function uses the split method of type str.

In [28]:
def lastNameFirstName(name1,name2):
    arg1 = name1.split('')
    arg2 = name2.split('')
    if arg1[1] != arg2[1]:
        return arg1[1]<arg2[1]
    else:
        #last name the same, sort by first name
        return arg1[0]<arg2[0]
    
def firstNameLastName(name1,name2):
    arg1 = name1.split('')
    arg2 = name2.split('')
    if arg1[0] != arg2[0]:
        return arg1[0]<arg2[0]
    else:
        return arg1[1]<arg2[1]

In [29]:
L = ['Tom Brady', 'Eric Grimson', 'Gisele Bundchen']
newL = mergeSort(L,lastNameFirstName)
print('Sorted by last name = ',newL)

newL = mwrgeSort(L,firstNameLastName)
print('Sorted by first name = ',newL)

ValueError: empty separator

### 10.2.3 Sorting in Python

The sorting algorithm used in most Python implementations is called timsort.64
The key idea is to take advantage of the fact that in a lot of data sets the data is
already partially sorted. Timsort’s worst-case performance is the same as merge
sort’s, but on average it performs considerably better.

## 10.3 Hash Tables