---
title: "LC121 and LC53 - Kadane's Algorithm"
author: "Vahram Poghosyan"
date: "2022-01-23"
categories: ["Leetcode", "Algorithms", "Dyanmic Progamming"]
image: "leetcode.png"
format:
  html:
    code-fold: true
jupyter: python3
include-after-body:
  text: |
    <script type="application/javascript" src="../../javascript/light-dark.js"></script>
---

# Problem Statement

We are given an array of `prices` where `prices[i]` is the price of a given stock on the `i`-th day.

We want to maximize our profit by choosing a single day to buy one stock and choosing a different day in the future to sell that stock. Return the maximum profit you can achieve from this transaction. If you cannot achieve any profit, return `0`.

**Example 1**

```
Input: prices = [7,1,5,3,6,4]
Output: 5
```

**Explanation** 

Buy on day 2 (price = 1) and sell on day 5 (price = 6), profit = 6-1 = 5.
Note that buying on day 2 and selling on day 1 is not allowed because you must buy before you sell

**Example 2:**

```
Input: prices = [7,6,4,3,1]
Output: 0
```

**Explanation** 

In this case, no transactions are done and the max profit = 0.

## Brute Force Solution

Consider each viable pair of days. It's easy to see that this leads to time complexity $O(n^2)$ because, for each possible day `i` that we choose to buy the stock on, there are $n-i$ possible days that we can sell the stock on. Since there are $n$ choices for which day to buy the stock on, the number of total pairs has a leading term of $n(n-1)$, so is quadratic in $n$. 

We can also think of choosing a subset of size $2$ and discarding those which have a reverse order of days. This essentially means choosing a subset of size $2$ without order (since each pair is either in the correct order or not, and we only count the one that is), so ${O \left ({n \choose 2} \right )}$ which is, of course,  $O(n^2)$.

## Non-Brute Solution

We can solve this problem in a single pass, achieving $O(n)$ complexity by a DP algorithm similar to [Kadane's algorithm](https://en.wikipedia.org/wiki/Maximum_subarray_problem#Kadane's_algorithm) which solves the [Maximum Subarray](https://en.wikipedia.org/wiki/Maximum_subarray_problem) problem. Some of the similarity is due to fact that both problems are concerned with some score over a *contiguous* array. Whereas Kadane's is concerned with the contiguous subarray with maximum sum, this algorithm is interested in the maximum profit were the elements in the array envisioned to be stock prices. Like Kadane's algorithm, we can also prove its correctness using *loop invariants*. We will give the solution and prove the correctness. As for building intuition for why this solution works, we will focus on the [Maximum Subarray problem](https://en.wikipedia.org/wiki/Maximum_subarray_problem) which is a more general application of Kadane's.  

### Code

In [1]:
#| code-fold: false
def maxProfit(prices):
    min_price = float("inf") # +infinity
    max_profit = 0 

    for i in range(len(prices)):
        if prices[i] < min_price:
            min_price = prices[i]
        elif prices[i] - min_price > max_profit:
            max_profit = prices[i] - min_price

    return max_profit

prices = [7,1,5,3,6,4]
print(maxProfit(prices))

5


## Proof of Correctness

It's easy to see that, at the end of each iteration `i`, the variable `min_price` holds the lowest price dip in the stock up to, and including, the index `i`. This is called a loop invariant. Showing something is a loop invariant is a lot like proving the *inductive step* in a proof by induction (for example, when proving with recursive algorithms).

It can also be shown that `max_profit` is another loop invariant, and that it holds the maximum profit up to, and including, the index `i`. 

### Proof of Invariance

The loop does one of two things exclusively: either it updates `min_price` or it doesn't. 

Suppose it *doesn't* update `min_price` (**case 1**). The first loop invariant, `min_price` holds the lowest price dip up to, and including, the index `i`. In this case, the difference of `prices[i]` and `min_price` is then calculated and `max_profit` is updated *only* if the difference is greater than the `max_profit` at the end of the previous iteration (iteration `i-1`). This guarantees that `max_profit` holds the maximum profit up to, and including, the index `i`. 

In the other case (**case 2**), when the loop *does* update `min_price]` at iteration `i`, it enters the subsequent iteration `i+1` with `max_profit` still holding the maximum profit up to, and including, index `i` (at the beginning of the iteration). If `prices[i+1]` is, again, less than `min_price` then the loop just goes on updating `min_price` *only* until it encounters the lowest price dip (unless the price keeps dipping until the very end, in which case the proof is complete). At this iteration of the loop, let's call it `k+1`, `max_profit` still holds the maximum profit up to, and including, index `k` (because there have just been consecutive dips in price since index `i`). Since `prices[k+1]` is the lowest price dip, by assumption, in the next iteration we are necessarily in the familiar again (**case 1**). Hence, `max_profit` is a loop invariant. 

Therefore, once the loop is finished, `max_profit` will hold the maximum profit up to, and including, the last index `n`. In other words, it will hold the solution to the problem. 

## Kadane's Algorithm - Maximum Subarray

At the heart of Kadane's algorithm is an optimal substructure which lends itself to optimization using the principles of dynamic programming.

Any subarray ends (or begins, but let's go with ends for now) at some index. This means we can have the notion of the solution up to index `i`. Le't call that `global_max(i)`. This is, effectively, the final solution to the problem if the problem had size `i` (i.e. if `nums` had size `i`).

A relationship we may initially notice when looking at the problem is that:

```
local_max[i] = max(local_max[i-1] + nums[i], nums[i])
```

Where `local_max[i]` is the maximum of all subarrays ending at index `i` but, crucially, *not* the solution of the problem up to index `i`. It's easy to get lost in this problem by conflating these two loop invariants, but we should be mindful of this mistake.

Feel free to tinker with the problem to notice this relationship. In my experience, it helps to illustrate the arrays.

The final answer is then the maximum over all the local maxima (this is trivially true). 

Thinking this way opens the doors to dynamic programming, the idea with which is to cut down on the number of subarrays considered in the brute force approach by coming up with a simple, cheap rule that gives us the solution to the current sub-problem based on the retrieved value of a previously solved sub-problem (whatever the relationship between these sub-problems may be). In the case of the Maximum Subarray problem the sub-problems are the local maxima discussed above. It turns out that if we know `local_max[i-1]`, `local_max[i]` is all but known through the above relationship. 

If we further work through an example, the act of solving will give us insight into the implementation of the single pass, iterative algorithm (the recursive solution is already betrayed by the optimal substructure uncovered above, to further make it dynamic we can implement memoization on top). Let's take the first step, then generalize:


In the beginning, there's just `nums[0]`. By virtue of being the only subarray that ends at index `0` it is its own `local_max[0]`. In situations like this we initialize `local_max` to $-\inf$ in order not to resort to handling special cases, like singleton arrays, in the loop.

```python
local_max = float('-inf')

for i, num in enumerate(nums):
    if num > local_max:
        local_max = num
```

It's immediately clear that, since `local_max` is initialized to $-\inf$ (let's call this `local_max[-1]` to stay consistent with array notation of the optimal substructure), `num > local_max` is equivalent to:

```
nums[0] > local_max[-1] + nums[0]
```

This is true in general for any `num[0]`. So the desired condition is actually: 

```python
local_max = float('-inf')
global_max = 

for i, num in enumerate(nums):
    if num > local_max + num:
        local_max = num
    else
        local_max = local_max + num
```

As it is, the loop invariant `local_max` will contain the local maximum at the last element of the `nums` array. We need to figure out what to do with the other loop invariant `global_max` which is supposed to be the solution to the problem of size `i`. Well, `global_max` is clearly just: 

```
global_max[i] = max(global_max[i-1], local_max[i])
```

We need to implement this relationship just as we implemented the previous one. Let's take the first step. In the beginning there's just `nums[0]`. Clearly `global_max[0]` equals `nums[0]` as that's what `local_max[0]` is and global_max[-1] (as per earlier abuse of notation) is $-\inf$ by choice.

```python
local_max = float('-inf')
global_max = float('-inf')

for i, num in enumerate(nums):
    if num > local_max + num:
        local_max = num
    else
        local_max = local_max + num
    if global_max < local_max:
        global_max = local_max
```

We can reference `local_max` and assume its value to be as the promised invariant on line 9. 

Notice that `global_max` has two meanings, as does `local_max` in the implementation. It is used as the previous iterate `global_max[i-1]` (in the Boolean comparison) as well as the current (or next, depending on frame of reference) iterate `global_max[i]` (in the assignment operation).

        