# Analysis of Algorithms
___

## Kinds of Analysis
- **Worst-case (usually)**  
  $T(n)$ *max* time of any input of size $n$
- **Average-case (sometimes)**  
  $T(n)$ *expected* time of any input of size $n$  
  (Works best on uniformly distributed random inputs)
- **Best-case (not helpful)**  
  (Best time on specific inputs)

## Asymptotic Analysis
- Ignore machine dependent constants
- Look at **growth** of $T(n)$ as $n \to\infty$

## Asymptotic Notation
### Idea:
- Drop low-order terms
- Ignore leading constants

### Most common notations:
### $\mathcal{O}$-notation (upper bound)
$f(n)=\mathcal{O}(g(n))\;\text{means}$  
$\text{there are constants $c \gt 0$ and $n_0\gt 0$ such that}\;0 \le f(n)\le cg(n)\;\text{for all}\;n \ge n_0$
![title](img/notation-omicron.png)  
### $\Omega$-notation (lower bound)
$f(n)=\Omega(g(n))\;\text{means}$  
$\text{there are constants $c \gt 0$ and $n_0\gt 0$ such that}\;0 \le cg(n)\le f(n)\;\text{for all}\;n \ge n_0$
![title](img/notation-omega.png)
### $\Theta$-notation (tight bound)
$f(n)=\Theta(g(n))\;\text{means}$  
$\text{there are constants $c_1 \gt 0$, $c_2 \gt 0$ and $n_0\gt 0$ such that}\;0 \le c_1g(n)\le f(n)\le c_2g(n)\;\text{for all}\;n \ge n_0$  
  
**Note:** $\Theta(g(n))=\mathcal{O}(g(n))\cap\Omega(g(n))$
![title](img/notation-theta.png)
### What's the difference between $\mathcal{O}(g(n))$ and $\Theta(g(n))$?
$\Theta(g(n))$ means a **tight bound** while $\mathcal{O}(g(n))$ is just an **upper** bound.  
For some reason, people in the industry have merged them together and use $\mathcal{O}(g(n))$ where they should use $\Theta(g(n))$

### Example
$2n^2+10n+5=\Theta(n^2)$ - the "tightest" bound  
$2n^2+10n+5=\mathcal{O}(n^3)$ - any "upper" bound

## Runtime Comparison
![title](img/notation-functions.png)

## Which logarithmic base should I use?
Does not matter but assumed $2$  
$log_{b}n=\frac{log_{2}n}{log_{2}b}=c\cdot log_{2}n$

## Simple cases

### $\mathcal{O}(n)$: linear scan
```
def indexof(nums, target):
    for i in range(len(nums)):
        if nums[i] == target:
            return i
    return -1
```

### $\mathcal{O}(n^b)$: nested loops ($b$ - number of nested loops)
```
def contains_duplicates(nums):
    for i in range(len(nums)):
        for j in range(i+1, len(nums)):
            if nums[i] == nums[j]:
                return True
    return False
```
### $\mathcal{O}(\lg n)$: divide problem in half
```
def binary_search(nums, target):
    lo, hi = 0, len(nums)-1
    while lo <= hi:
        mid = lo + (hi - lo)//2
        if target < nums[mid]:
            hi = mid - 1
        elif target > nums[mid]:
            lo = mid + 1
        else:
            return mid
    return -1
```

## Case Study: Fibonacci Numbers
### Recurrence definition:  
$F_n=\begin{cases}
    0, & \text{if $n=1$}.\\
    1, & \text{if $n=2$}.\\
    F_{n-2} + F_{n-1}, & \text{otherwise}.
  \end{cases}$
### Time Complexity (how fast it runs)
$T(n)=T(n-2)+T(n-1)$  

To simplify:  
$T(n)\leq 2T(n-1)$  
$T(n)\leq 2(2T(n-2))$  
$T(n)\leq 2(2(2T(n-3)))$  
$T(n)\leq 2(2(2(2(...))))\leq 2^n$  
  
Runtime complexity is $\mathcal{O}(2^n)$ (**very bad**)
  
More precise with math proof:  
$T(n)=\mathcal{O}(\phi^n), \text{where $\phi$ - golden ratio($1.6180339887498948482...$)}$ (**still very bad**)
### Space Complexity (how much memory it requires)
Let's analyze the call stack:
```
F(n)
↳F(n-1)
  ↳F(n-2)
    ↳F(n-3)
      ↳...
        ↳F(1)
↳F(n-2)
  ↳F(n-3)
    ↳...
      ↳F(1)
```
It doesn't go deeper than $n$ so the space complexity is $\mathcal{O}(n)$

## Relative speed of $\mathcal{O}(2^n)$, $\mathcal{O}(n)$ and $\mathcal{O}(\lg n)$

In [None]:
import time

def measure_call(target, name):
    start = time.clock()
    target()
    elasped = time.clock() - start
    print('%s: %g ms' % (name, elasped * 1000))

In [None]:
def fibonacci_exp(n):
    if n==1:
        return 0
    if n==2:
        return 1
    return fibonacci_exp(n-1)+fibonacci_exp(n-2)

In [None]:
def fibonacci_linear(n):
    if n==1:
        return 0
    if n==2:
        return 1
    f1 = 0
    f2 = 1
    for i in range(2, n):
        f2, f1 = f2 + f1, f2
    
    return f2

In [None]:
def fibonacci_logarithmic(n):
    if n==1:
        return 0
    if n==2:
        return 1

    # Helper function: returns True if n is an even number.
    even = lambda n: (n % 2 == 0)

    (current, next, p, q) = (0, 1, 0, 1)    
    
    n -= 1

    while (n > 0):
        if (even(n)):
            (p, q) = (p**2 + q**2, q**2 + 2*p*q)
            n /= 2
        else:
            (current, next) = (p*current + q*next, q*current + (p+q)*next)
            n -= 1
    
    return current

In [None]:
print('fibonacci_exp(10) =', fibonacci_exp(10))
print('fibonacci_linear(10) =', fibonacci_linear(10))
print('fibonacci_logarithmic(10) =', fibonacci_logarithmic(10))

In [None]:
def run_trial(n):
    if n > 40:
        print('exponential: ∞')
    else:
        measure_call(lambda: fibonacci_exp(n), 'exponential')
    measure_call(lambda: fibonacci_linear(n), 'linear')
    measure_call(lambda: fibonacci_logarithmic(n), 'logarithmic')

In [None]:
run_trial(10)

In [None]:
run_trial(20)

In [None]:
run_trial(30)

In [None]:
run_trial(40)

In [None]:
run_trial(100000)

In [None]:
run_trial(1000000)

## Amortized Time
From stackoverflow:
> If you do an operation say a million times, you don't really care about the worst-case or the best-case of that operation - what you care about is how much time is taken in total when you repeat the operation a million times.
>
> So it doesn't matter if the operation is very slow once in a while, as long as "once in a while" is rare enough for the slowness to be diluted away. Essentially amortised time means "average time taken per operation, if you do many operations". Amortised time doesn't have to be constant; you can have linear and logarithmic amortised time or whatever else.
> 
> Let's take mats' example of a dynamic array, to which you repeatedly add new items. Normally adding an item takes constant time (that is, O(1)). But each time the array is full, you allocate twice as much space, copy your data into the new region, and free the old space. Assuming allocates and frees run in constant time, this enlargement process takes O(n) time where n is the current size of the array.
> 
> So each time you enlarge, you take about twice as much time as the last enlarge. But you've also waited twice as long before doing it! The cost of each enlargement can thus be "spread out" among the insertions. This means that in the long term, the total time taken for adding m items to the array is O(m), and so the amortised time (i.e. time per insertion) is O(1).