# 10  SOME SIMPLE ALGORITHMS AND DATA STRUCTURES

The goal of this chapter is to help you develop some general intuitions about

* <b>how to approach questions of efficiency</b>.

The major point was that the key to efficiency is

* a <b>good algorithm</b>,

* not <b>clever coding tricks</b>.

What we do instead is learn to <b>reduce</b> the most complex aspects of the problems with which we are faced <b>to previously solved problems</b>. 

More specifically, we:

* Develop an <b>understanding of the inherent complexity</b> of the problem with which we are faced,

* Think about how to **break** that problem up <b>into subproblems</b> 

* Relate those subproblems to other problems for which <b>efficient algorithms already exist</b>

**Keep in mind ：**

* The **most efficient** algorithm is **not always** the algorithm of **choice.**

* A program that does everything in the most efficient possible way is often

   * <b>needlessly difficult to understand</b>.

It is often **a good strategy** to :
   
* **start** by solving the problem at hand in the most **straightforward manner possible**

* instrument it to **find** any computational **bottlenecks**

* look for ways to <b>improve</b> the computational complexity of those parts of the program contributing to the bottlenecks.
 


## 10.1 Search Algorithms

A search algorithm is a method for finding an item or group of items with specific properties within a collection of items. 

We refer to the collection of items as a <b>search space</b>. 

The search space might be something concrete, such as a set of electronic medical records, or something abstract, such as the set of all integers.

In this section, we will examine two algorithms for searching a list. 

Each meets **the specification:**

```python
def search(L, e):
"""Assumes L is a list.
   Returns True if e is in L and False otherwise"""
```

The astute reader might wonder if this is not semantically equivalent to the Python expression
```python
e in L
``` 
The answer is yes, it is. 

If one is unconcerned about the efficiency of discovering whether **e is in L**, one should simply write that expression.

### 10.1.1 Linear Search and Using <font color='blue'>Indirection</font> to Access Elements

Python uses the following algorithm to determine if an element is in a list
```python
def search(L, e):
    for i in range(len(L)): # O(len(L))
        if L[i] == e:
            return True
    return False
```

If the element **e** is not in the list the algorithm will perform <b>O(len(L))</b> tests

* the complexity is <b>at best linear</b> in the length of L.

** Why “at best” linear? **

It will be linear **only if** : each operation **inside the loop can be **done** in **constant time**.

Let’s start by considering the simple case:

*  each element of the list is an **integer**

In this case the **address** in memory of the ith element of the list is simply

* $start + 4i$

where **start** is the address of the start of the list, integer variable occupy 4 bytes. 

Therefore we can assume that Python could compute the address of the ith element of a list of integers in constant time

In Python, a **list** is represented as a **length** (the number of objects in the list) and a sequence of **fixed-size pointers** to **objects**.

The Figure illustrates the use of these pointers.

* The shaded region represents a list containing four elements.

* <font color="red">The leftmost shaded box</font> contains a pointer to an integer indicating the length of the list.

* <font color="blue"><b>Each of the other shaded boxes</font> contains a pointer to an **object** in the list.

![10.1.1img](./img/ds/10.1.1.PNG)

**IF** 

* the length field is four units of memory

* each pointer (address) occupies four units of memory

the address of the ith element of the list is stored at the address

$start + 4 + 4i$

this address can be found in constant time, and then the value stored at that address can be used to access the ith element. This access too is a constant-time operation.

This example illustrates one of the most important implementation techniques  used in computing: 

* <b>indirection</b>
  
   * Generally speaking, indirection involves accessing something by 
   
     **first accessing something else** that contains <b>a reference</b> to the thing initially sought.
     
This is what happens each time we use a variable to refer to the object to which that variable is bound. 

When we use a variable to **access a list** and then a reference stored in that list to **access another object**, we are going through two levels of indirection.


In [1]:
%%file ./code/ds/LinearSearch.c

/* Search an array for the given key using Linear Search (LinearSearch.c) */
#include <stdio.h>
#include <stdlib.h> 

int linearSearch(const int a[], int size, int key);

// Search the array for the given key
// If found, return array index [0, size-1]; otherwise, return size
int linearSearch(const int a[], int size, int key) {
   for (int i = 0; i < size; ++i) {
      if (a[i] == key) return i;
   }
   return size;
}
 

int main() {
   const int SIZE = 8;
   int a1[8] = {8, 4, 5, 3, 2, 9, 4, 1};
 
   int keys[3]={8,4,99};
   for(int i=0; i<3; i++) 
       printf("%d's index is: %d \n",keys[i],linearSearch(a1,SIZE, keys[i]));
   
   return 0; 
 }


Writing ./code/ds/LinearSearch.c


In [17]:
!gcc -o ./code/LinearSearch.exe ./code/ds/LinearSearch.c

In [18]:
!.\code\LinearSearch.exe 

8's index is: 0 
4's index is: 1 
99's index is: 8 


### 10.1.2 Binary Search and Exploiting Assumptions

Getting back to the problem of implementing 

**search(L, e)**, is **O(len(L))**。

The best we can do? Yes!

* If we know <b>nothing about the relationship of the values</b> of the elements in the list and the order in which they are stored.

But suppose we **know** something about the **order** in which elements are stored, 

  * we have a list of integers stored in <b>ascending order</b>.

We could change the implementation so that the search stops when it reaches a number larger than the number for which it is searching:

```python
def search(L, e):
    """Assumes L is a list, the elements of which are in
       ascending order.
       Returns True if e is in L and False otherwise"""
    for i in range(len(L)):
        if L[i] == e:
            return True
        if L[i] > e:  # ascending order
            return False
    return False
```

This would improve the average running time. 

However, it would **not change** the **worst-case complexity** of the algorithm:

  * in the worst case each element of **L** is examined.

### <font color="blue">Binary search</font>

We can get a considerable **improvement** in the **worst-case complexity** by using an algorithm, 

* <b>binary search</b>,

Here we rely on the **assumption** that the list is **ordered**.

The idea is simple:

* Pick an index, i, that divides the list L <b>roughly in half</b>.

* Ask if L[i] == e.

* If not, ask whether **L[i]** is larger or smaller than **e**.

* Depending upon the answer, search either <b>the left or right half</b> of **L** for **e**.

Example:
```c
int a1[10] = {1, 4, 5, 8, 12, 19, 24, 31, 43, 55}; // sorted
/* search */ 
int keys[4]={8,12,24,21};
```
![](./img/bSearch.jpg)


In [4]:
def search(L, e):
    """Assumes L is a list, the elements of which are in
          ascending order.
       Returns True if e is in L and False otherwise"""
    
    def bSearch(L, e, low, high):
        #Decrements high - low
        if high == low:
            return L[low] == e
        mid = (low + high)//2  #  i roughly in half of list. 
        if L[mid] == e:
            return True
        elif L[mid] > e:
            if low == mid: #nothing left to search
                return False
            else:
                return bSearch(L, e, low, mid - 1)# left
        else:
            return bSearch(L, e, mid + 1, high)  # right
        
    if len(L) == 0:
        return False
    else:
        return bSearch(L, e, 0, len(L) - 1)

The specification says that the implementation may assume that

* **L** is <b>sorted in ascending order</b>

### The complexity of **bSearch.** 

The complexity of **bSearch** depends only upon 

* <b>the number of <font color='blue'>recursive</font> calls</b>.

The question is 

* **how many times** can the value of **high–low** be cut in half before **high–low == 0?**

$$2^?=high-low$$

then:

$$?=\log_2 ^{(high-low)}$$

so, <b>high–low</b> can be cut in half at most <b>$\log_2^{(high–low)}$</b> times before it reaches 0.

The complexity of search is <b>O(log(len(L)))</b>.

In [5]:
L1 = [1, 4, 5, 8, 12, 19, 24, 31, 43, 55]
print(search(L1, 8))
print(search(L1, 12))
print(search(L1, 24))
# 
print(search(L1, 21))

True
True
True
False


### Binary Search in C

In [10]:
%%file ./code/ds/bSearch.h

/* Search an array for a key using Binary Search */

#ifndef BSEARCH_H
#define BSEARCH_H


int bSearch(const int a[], int size, int key);

#endif

Overwriting ./code/ds/bSearch.h


In [11]:
%%file ./code/ds/bSearch.c

/* Search an array for a key using Binary Search (BinarySearch.c) */

#include "bSearch.h"
#include <stdio.h>
 
int binarySearch(const int a[], int iLeft, int iRight, int key);
void print(const int a[], int iLeft, int iRight);
  
// Search the array for the given key
// If found, return array index; otherwise, return -1
int bSearch(const int a[], int size, int key) {
   // Call recursive helper function
   return binarySearch(a, 0, size-1, key);
}
 
// Recursive helper function for binarySearch
int binarySearch(const int a[], int iLeft, int iRight, int key) {
  
   // For tracing the algorithm
   print(a, iLeft, iRight);
 
   // Test for empty list
   if (iLeft > iRight) return -1;
 
   // Compare with middle element
   int mid = (iRight + iLeft) / 2;  // truncate
   if (key == a[mid]) {
      return mid;
   } else if (key < a[mid]) {
      // Recursively search the lower half
      binarySearch(a, iLeft, mid - 1, key);
   } else {
      // Recursively search the upper half
      binarySearch(a, mid + 1, iRight, key);
   }
}

// Print the contents of the given array from iLeft to iRight (inclusive)
void print(const int a[], int iLeft, int iRight) {
   printf("{");
   for (int i = iLeft; i <= iRight; ++i) {
      printf("%d",a[i]);
      if (i < iRight) printf(",");
   }
   printf("} \n");
}

Writing ./code/ds/bSearch.c


In [7]:
%%file ./code/ds/DemobSearch.c

#include <stdio.h>
#include <stdlib.h> 
#include "bSearch.h" 

int main() {
   const int SIZE = 10;
   int a1[10] = {1, 4, 5, 8, 12, 19, 24, 31, 43, 55}; // sorted
 
   int keys[4]={8,12,24,21};
   for(int i=0; i<4; i++) 
       printf("%d's index is: %d \n",keys[i],bSearch(a1,  SIZE, keys[i]));
   
   return 0; 
}

Writing ./code/ds/DemobSearch.c


In [14]:
!gcc -c -o ./code/obj/bSearch.o ./code/ds/bSearch.c -I./code/ds
!gcc -c -o ./code/obj/DemobSearch.o ./code/ds/DemobSearch.c  -I./code/ds
!gcc -o ./code/DemobSearch.exe ./code/obj/DemobSearch.o ./code/obj/bSearch.o

In [16]:
!.\code\DemobSearch.exe

{1,4,5,8,12,19,24,31,43,55} 
{1,4,5,8} 
{5,8} 
{8} 
8's index is: 3 
{1,4,5,8,12,19,24,31,43,55} 
12's index is: 4 
{1,4,5,8,12,19,24,31,43,55} 
{19,24,31,43,55} 
{19,24} 
{24} 
24's index is: 6 
{1,4,5,8,12,19,24,31,43,55} 
{19,24,31,43,55} 
{19,24} 
{24} 
{} 
21's index is: -1 


## Further Reading

* 严蔚敏，李冬梅，吴伟民. 数据结构（C语言版），人民邮电出版社（第2版）,2015年2月  


* Mark Allen Weiss. Data Structures and Algorithm Analysis in C


* GNU C Library: Searching and Sorting : http://www.gnu.org/software/libc/manual/html_node/Searching-and-Sorting.html

