### Dynamic array resizing

- So far, we've covered static arrays, where we preallocate a fixed amount of memory to create the array of fixed length

- Depending on the language, of course, arrays are often not restricted to fixed lengths
    - `list` in python, for example, allows for indefinite `append`s

- Under the hood, however, this is simply a resizing of the array on the fly to create a **dynamic array**
    - We store a pointer to an array with capacity 2
    - When we try to do a push when the first array is at capacity, a new array is created with double the capacity
    - Copy the elements of array 1 to array 2
    - Update pointer to array 2

- Dynamic array methods
    - `get(i)`: O(1) 
        - return arr[i]
    - `set(i, x)`: O(1) 
        - arr[i] = x
    - `pushback(x)`: O(N) if at capacity, else O(1)
        - if array.size() == capacity: allocate new array --> copy old array into new --> update pointer to new array --> update capacity to capacity * 2
        - push x into position i = array.size
        - update size = size + 1
    - `remove(i)`: O(N)
        - For values j from position i to size-2 --> del array[i] --> array[i] = array[i+1]
    - `size():` O(1)
        - return size

- Note that in this manner of growing the array, at worst half the space is wasted on average

### Amortized analysis

- The main objective of discussing them was to discuss the strange time complexity of the `pushback` function. Notice that the time complexity of this function is not definite! 
    - It is correct to say that it is $O(N)$, because you have to expand the array in the worst case
    - But also notice that you don't incur $O(N)$ each time! In fact, you only incur $O(N)$ when the array is at specific points $2^1, 2^2, 2^3,...2^n$

- Therefore, it is **typically** inaccurate to say that pushback is $O(N)$; because of this behaviour, the linear time behaviour only happens at intervals! 
    - In such cases, we look at what typically happens, rather than purely worst case projections, because it may be misleading
    - This complexity analysis known as **amortized analysis**

- Formally, we define `amortized cost` to be $$\frac{\text{Cost(n operations)}}{n}$$

#### Amortized Cost: Aggregate Method

- Let's consider the `PushBack` method in the dynamic array
    - Let's suppose we make $n$ calls to `PushBack`
    - Let's suppose the $i$-th insertion costs $c_i$. $c_i$ has a consistent insertion cost of 1, plus an extra cost of shuffling the array if it is a power of 2:
    $$c_i = 1 + \begin{Bmatrix} i-1 & \text{if i-1 is a power of 2} \\ 0 & \text{otherwise} \end{Bmatrix}$$
    
- Recall that the amortized cost is $\frac{\text{Cost(n operations)}}{n}$ So in this case

$$\begin{aligned}
    \frac{\text{Cost(n operations)}}{n} &= \frac{\sum_{i=1}^{n} c_i}{n} \\
    &= \frac{n + \sum_{j=1}^{\left \lfloor \log_2(n-1) \right \rfloor} 2^j}{n} \\
    &= \frac{O(N)}{N} \\
    &= O(1)
\end{aligned}$$

- What is the meaning of $\log_2(n-1)$? This just computes the number of times we hit a multiple of 2 when we have $n$ requests!

#### Amortized Cost: Banker's method

- I don't really like this method, because it's not clear what we should use for the "extra" charge. In any case, the intuition is exactly the same as the aggregate method

- Every time you push a token into the array, you place a token on itself, and a token on 1 element prior to it. 
    - in this case, you place it on size/2

- When we reach a point of doubling such that $n -> 2n$, you can add an additional $n$ elements to the new array
    - Each of these $n$ elements will provide 2 tokens, one on itself, and one on a token before it
    - So by the time you need to double the array again, every element has a token, so every element can be moved

- Finally, the amortized cost we need to pay is just the number of tokens we use (i.e. the number of operations we need to conduct to ensure the dynamic resizing works)
    - This is simply 3, the number of tokens!
    - In other words, the `PushBack` can be done in $O(3)$ time, or constant time

#### Amortized Cost: Physicist's Method

- Again, I don't see how this method aids understanding vs the aggregate method, so just stick to that one. Will just explain for completeness

- Imagine a potential function $\phi$

- Let the amortized cost of an operation be the actual cost $c_i$ plus the delta in potential between states $h_t$ and $h_{t-1}$, that is 
$$a_i = c_i + \phi(h_t) - \phi(h_{t-1})$$

- Note that summing amortized costs across $n$ operations gives us a telescoping sum:
$$\begin{aligned}
c_1 + \phi(h_1) - \phi(h_{0}) + \\
c_2 + \phi(h_2) - \phi(h_1) + \\
... c_n + \phi(h_n) - \phi(h_{n-1}) \\
= \sum_{1}^{n} c_i + \phi(h_1) - \phi(h_{n-1})
\end{aligned}$$

- So long as we define $\phi$ such that $\phi(h_i) - \phi(h_{i-1}) > 0$, we always get a positive amortized cost for every state change

- And all operations above are $O(1)$, so $N$ operations will be $O(N)$

### PushBack analysis, why must we double array size?

- In all the methods we discussed, we assumed that the allocation of the array doubles whenever we hit a capacity. 

- Why? Strictly speaking, can't we just add some length $k$ to the existing array instead of doubling?

- Let's see what happens if we do that:
    - Let's suppose we expand by $k=10$ each time we hit a multiple of 10
    - Let $c_i$ be the cost of the i-th operation
    - $$c_i = 1 + \begin{Bmatrix} i-1 & \text{if i-1 is a multiple of 10} \\ 0 & \text{otherwise} \end{Bmatrix}$$

$$\begin{aligned}
    \frac{\sum_{i=1}^{n} c_i}{n} &= \frac{n + \sum_{j=1}^{(n-1)/10}10j}{n} \\
    &= \frac{n + 10 \sum_{j=1}^{(n-1)/10} j}{n} \\
    &= \frac{n + O(N^2)}{N} & \text{because } \sum_0^n i = \frac{n(n+1)}{2} \\
    &= O(N)
\end{aligned}$$

