<a href="https://colab.research.google.com/github/zanzivyr/Optimizers/blob/main/AutoDiff.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Automatic Differentiation

Many technologies require differentiation of functions. There are several ways to approach this including Manual, Numerical, and Symbolic differentiation. Automatic Differentiation, or Autodiff, is a popular method because it provides results similar to manual differentiation without function swell.

For this study, we will only experiment with a simple example just to get the gist of the concept.

# Resources

- **BYU Flow Lab** - https://www.youtube.com/watch?v=B-1hsAfdGHw *(This notebook is primarily based on this source)*
- Wikipedia - https://en.wikipedia.org/wiki/Automatic_differentiation
- Gradient of Gradient - https://colab.research.google.com/drive/1VpeE6UvEPRz9HmsHh1KS0XxXjYu533EC#scrollTo=hb--WbxnFvec
- Playlist - https://www.youtube.com/watch?v=NL8feCR2Qts&list=PLRDl2inPrWQVu2OvnTvtkRpJ-wz-URMJx&index=26

In [18]:
import numpy as np
import tensorflow as np
from math import sin, cos

# Dual Numbers

Building off the idea of complex numbers, dual numbers carry two components, a primal (equivalent to the real part of a complex number) and a dual (similar to the imaginary component). The dual is a scalar, b, multiplied by ε, where ε^2 is 0. 

- Complex Number ≡ a ± bi
- Dual Number ≡ a ± bε

## Taylor Series

If we let b = 1 and take our dual number as input to a function such that

f(x) → f(x + ε)

Then we write a Taylor Series approximation of the input, we have

f(x + ε) = f(x) + f'(x)ε + f''(x)*(ε^2)/2 + ...

But we know that ε^2 = 0, and all of the proceeding terms will contain ε^2.

## Chain Rule (Nested Functions)

In the case of nested functions, we can take the following approach. Let our inner function be y, and the outer function be f, such that

- y(x) = x + ε
- f(x) → f(y(x))

If we write a Taylor Series approximation of y, we have

y(x + ε) = y(x) + (dy/dx)ε 

And f becomes

f(y(x + ε)) = f(y(x)) +  (df/dy)*bε

Substituting b = dy/dx, we have

f(y(x + ε)) = f(y(x)) +  (df/dy)(dy/dx)ε

Where (df/dy)(dy/dx) is the Chain Rule!

# Forward Mode Automatic Differentiation

Let v_i be a function composed of all functions which came before it (are nested within it), such that

v_i = v_i(v_1, v_2, ... , v_(i-2), v_(i-1))

Rewriting our Taylor Series approximation as a series, using this function, v_i, we have

dv_i/dv_j = Σ(k=j → i-1)(dv_i/dv_k)(dv_k/dv_j)

### Example

Let v1, v2, v3, v4

- v1 = x1
- v4 = f(x)

If we differentiate each part, we have

- v̇1 = 1
- v̇2 = (∂v2/∂v1)*v̇1
- v̇3 = (∂v3/∂v1)*v̇1 + (∂v3/∂v1)*v̇2
- v̇4 = (∂v4/∂v1)*v̇1 + (∂v3/∂v1)*v̇2 + (∂v3/∂v1)*v̇3

We can see that this structure lends to some amount of recursion.

# Binary Tree

To capture the function we are going to overload the default math functions of python.

When python sees "a + b", for example, it will create a binary tree with (a + b) at the root, and (a) and (b) as the leaves.

Though this can be expanded to a and/or b being functions, we will limit this in order to simplify the study.

## Double Linked List

We will use a double linked list to capture the left and right sides of the tree, with variables to track logistical aspects of each node, functions to overload default math operations, and a partial derivative function.

The partial derivative function here is **extremely** limited. This is because even seemingly simple rules can have multiple complex rules. (More on this at the bottom of the notebook).

In [42]:
class dual:
  def __init__(self, value, op=None, left=None, right=None, name=None):
    self.value = value
    self.op = op
    self.left = left
    self.right = right
    self.name = name

  def __mul__(self, b):
    return dual(self.value * b.value, op='*', left=self, right=b)

  def __rmul__(self, b):
    return dual(self.value * b.value, op='*', left=b, right=self)

  def __add__(self, b):
    return dual(self.value + b.value, op='+', left=self, right=b)

  def sin(self):
    return dual(sin(self.value), op='sin', left=self)

  def partial(self, name):
    # termination condition
    if self.left == None and self.right == None:
      if self.name == name: return 1
      return 0

    # normal operation
    else:
      op = self.op
      if op == '*':
        # Below are partial derivative conditions
        
        # left is wrt variable
        if self.left.name == name:
          return self.right.value

        # right is wrt variable
        return self.left.value

      if op == '+': return self.left.partial(name) + self.right.partial(name)
      if op == 'sin':
        if self.left.name == name: return cos(self.left.value)
        return 0


# Simple Autodiff

Let's run through a real example!

In [43]:
x1 = dual(2.0, name="x1")
x2 = dual(5.0, name="x2")

fx = x1*x2 + x2.sin()
fx

<__main__.dual at 0x7fa499ab6790>

In [38]:
fx.value

9.041075725336862

Now, by traversing the binary tree, we can determine partial derivatives of each term

In [39]:
fx.partial("x1")

5.0

In [40]:
fx.partial('x2')

2.283662185463226

We can now verify these by hand.

## Tracking expression variables

At this point we would need to add a tracking functionality to hold variables used in the expression. This way we can determine the span of the column space. But our dual numbers class is starting to become winding and unweildy. For our implementation here, tracking would involve creating an overseeing list.

To keep things simple, we'll just conclude here.

# Defining Partial Derivatives

As stated earlier, defining partial derivatives can be deceivingly complicated. Below is some of the code I started writing for finding the derivative of (a*b) with respect to some variable x_i.

For this to work in every situation, I needed to account for every combination of a or b being a function, a or b being a variable, a or b eventually containing the variable we were looking for, etc.

Perhaps there is a simpler way to handle this, but it clearly requires significant effort which escapes the scope of this study. Here, we just want to gain an insight to the purpose of Automatic Differentiation, the scope of it, and have some appreciation for the utility it provides for free.

In [None]:
'''
# FUNCTIONS
# both children are functions
if self.left.name == None and self.right.name == None:
  return self.left.partial(name) * self.right.partial(name)

# left is a function and right is a variable
if self.left.name == None:
  result = self.left.partial(name)
  b = self.right

# right is a function and left is a variable
if self.right.name == None:
  result = self.right.partial(name)
  b = self.left

# variable was found in function, multiply as planned
# else check b is wrt variable
# if so, return function value, else return zero
if type(result) != dual: return result * b.value
else:
  if b.name == name: return result.value
  else: return 0

# VARIABLES (no functions)
# both are wrt variable (squared)
if self.left.name == name and self.right.name == name:
  self.value = 2*self.left.value
  self.name = name
  return self

# both are not wrt variable
# return dual (self) as a flag, variable not found
if self.left.name != name and self.right.name != name:
  self.value = self.left.value * self.right.value
  return self
'''