# Creating our own Auto Differentiation (AutoGrad) framework

In this practical exercise we will build our own, very simple, Auto-differentiation (or AutoGrad) framework.

## Coding the framework

### Step 1: Define a class for our variables

The key idea is that we will define our own class of `Variable` which is basically the same as a scalar (a number). So our class is created by passing it a `value`, and it stores this value internally.

But apart from being a placeholder for a number, we also want to keep track of the operation that created every `Variable`.

For example, if a variable $c$ is the result of the addition of two variables $a$ and $b$: $c = a + b$, then we would say that $a$ and $b$ are "parent" variables of $c$, and $c$ is their "child". The way $c$ was created was by adding these two parent variables together. A variable created directly (defined by the user, not resulting by any operation over existing variables) would have no parents.

So apart from the value of the `Variable`, we will also have to keep track of its parents, and on how each of them "contributes" to the value of the `Variable` - this is described by the local derivative associated with each of the parents, that tells us how a change in the value of each of the parent variables translates into a change in the value of the child variable.

This is important in order to implement our backwards pass. During the backwards pass each parent defines a "route" through which the gradients coming into our variable will have to flow through.

So we will define a list of `gradRoutes` that will contain the list of parent variables and their corresponding local derivatives. A `Variable` created directly (not resulting by any operation over existing variables) will have an empty `gradRoutes`.

Finally, we want each `Variable` of ours to keep track of the value of gradient of the quantity we are interested in (usually the loss) with respect to the `Variable` itself. We will create a placeholder for that as well, called `grad`. As seen in theory, this placeholder will accummulate the gradients that are backpropagated from the children of this variable when we implement the backpropagation algorithm. So we will initialise it to zero.

In [70]:
import numpy as np

In [71]:
class Variable: #Simple variable. These are the leafs of our tree, they can request to have a gradient calculated, or not
    def __init__(self, value):
        self.value = value
        self.gradRoutes = [] # A variable by default has no grad Routes (was not created by anything, just defined)
        self.grad = 0.0

    def __str__(self):
        return 'Value: {self.value}'.format(self=self)

Apart from the `__init__()` function which stores the value passed to our class and initialises the `gradRoutes` and `grad` member variables, we have also overloaded the function that python uses to convert a class into a string representation: `__str__()`. This will allow us to print our class.

We cannot do much yet with this class, apart from storing values into our variables and printing them out. Let's try this out.

In [72]:
a = Variable(4.3)
b = Variable(5.2)
print(a)
print(b)

Value: 4.3
Value: 5.2


### Step 2: Define operations over our variables

The next step would be to implement operations on our variables. Let's first define the operations for addition and multiplication.

These will be functions that take two `variables` as input and produce a new (child) `variable` with a value equal to the sum or the product of the two inputs. Apart from the forward pass though, we should keep track of how this new `variable` was created: the two parent `variables`, and their corresponding local derivatives. In our code we put these two things in a tuple and save them in the list of `gradRoutes`.

In [73]:
def vAdd(A: Variable, B: Variable): # Addition
    if not isinstance(A,Variable):
      A = Variable(A)
    if not isinstance(B,Variable):
      B = Variable(B)
    result = Variable(A.value + B.value) # Create a new Variable to store the result, and pass it the value = a + b

    #keep track of the parent variables, and of the local derivative associated with each one
    result.gradRoutes.append((A, 1)) # dresult / dA = 1
    result.gradRoutes.append((B, 1)) # dresult / dB = 1

    return result

def vMul(A: Variable, B: Variable): # Addition
    if not isinstance(A,Variable):
      A = Variable(A)
    if not isinstance(B,Variable):
      B = Variable(B)
    result = Variable(A.value * B.value) # Create a new Variable to store the result, and pass it the value = a * b

    #keep track of the parent variables, and of the local derivative associated with each one
    result.gradRoutes.append((A, B.value)) # dresult / dA = B
    result.gradRoutes.append((B, A.value)) # dresult / dB = A

    return result

So for example to calculate $d = (a + b) * c$ we first need to calculate $(a + b)$ and then mutiply the result with $c$

In [74]:
a = Variable(2) # a = 2
b = Variable(3) # b = 3
c = Variable(4) # c = 4

#d = (a + b) * c = 20
d = vMul(vAdd(a, b), c)

print(d)

Value: 20


### Step 3: Implement the backpropagation function

The final step is to implement the backpropagation function. This starts with a child `variable`, and backpropagates gradients through the routes recursively. It uses the two rules that we saw in theory:

- Accumulate the incoming gradients that arrive to a `variable`. Each of the incoming gradients describe a different way in which the `variable` affects the quantity of interest, so this sum will be the final gradient for the `variable`
- Multiply every incoming gradient with each of the local derivatives corresponding to parent `variables` (this would be the application of the chain rule), and continue the backpropagation through the corresponding route (for each of the parent `variables`)

We update the `Variable` class accordingly. We also update the `__str___()` function to include also gradient information.

In [75]:
class Variable: #Simple variable. These are the leafs of our tree, they can request to have a gradient calculated, or not
    def __init__(self, value):
        self.value = value
        self.gradRoutes = [] # A variable by default has no grad Routes (was not created by anything, just defined)
        self.grad = 0.0

    def backProp(self, route_val = 1.0):
        # Add together the incoming gradients from the different routes that lead to a node - this will be the final gradient for the node
        self.grad += route_val

        # For every parent variable and corresponding local derivative value that we have in the gradRoutes, continue with the gradient calculation
        for variable, local_derivative_value in self.gradRoutes:
            # Multiply the incoming gradient with the local derivatives corresponding the parent variable, and continue the backpropagation
            variable.backProp(local_derivative_value * route_val)

    def __str__(self):
        return 'Value: {self.value}, Gradient: {self.grad}'.format(self=self)

<font color=blue>**Question:** Why did we set the default value of route_val equal to 1.0?</font>

---

*Answer*
We know from theory that the child node (the node where backpropagation starts) should be derivated with the respect of itself so this concludes to setting the value to 1.0


---


This should be all. If we want to calculate the derivative of the result with respect to any of the variables that participated in the calculation, we just need to call backprop on the result, and then read the derivatives out.

In [76]:
a = Variable(2)           # a = 2
b = Variable(3)           # b = 3
c = Variable(4)           # c = 4
res = vMul(vAdd(a, b), c) # res = (a + b) * c = 20

print("Result =", res.value)

# Call backprop on the result
res.backProp()

# Now all variables should contain in their "grad" the derivative d(res) / d(variable)
print("The derivative of the result with respect to a is:", a.grad)
print("The derivative of the result with respect to b is:", b.grad)
print("The derivative of the result with respect to c is:", c.grad)

Result = 20
The derivative of the result with respect to a is: 4.0
The derivative of the result with respect to b is: 4.0
The derivative of the result with respect to c is: 5.0


In the following example, variable $a$ affects the result through two different routes

In [77]:
a = Variable(4)  # a = 4
b = Variable(3)  # b = 3
c = vAdd(a, b)   # c = 4 + 3
res = vMul(a, c) # res = a * c = 28

print("Result =", res.value)

# Call backprop on the result
res.backProp()

# Now all variables should contain in their "grad" the derivative d(res) / d(variable)
print("The derivative of the result with respect to a is:", a.grad)
print("The derivative of the result with respect to b is:", b.grad)
# Also for intermediate results
print("The derivative of the result with respect to c is:", c.grad)

Result = 28
The derivative of the result with respect to a is: 11.0
The derivative of the result with respect to b is: 4.0
The derivative of the result with respect to c is: 4.0


<font color=blue>Use this setup to calculate the derivative of $c$ with respect to $a$ and $b$?</font>

In [78]:
a.grad = b.grad = c.grad = 0
c.backProp()
print("The derivative of c with respect to a is:", a.grad)
print("The derivative of c with respect to b is:", b.grad)



The derivative of c with respect to a is: 1.0
The derivative of c with respect to b is: 1.0


## Final touches

If you understood how this works up to here, then you should be already good to go. But since we want to use our auto grad to do some practical work, we will continue working on it a bit, to make it a bit more usable and complete it with more operations. Many of the subsequent steps are quite "engineering" in nature.

### Improving usability: overloading operators

Of course this is still highly incomplete, very inefficient and not very usable. Lets first improve a usability issue. Instead of having to call different functions for the operations like `res = vMul(a, c)`, we would like to be able to directly write them down like `res = a * b`. To achieve this, we should overload [Python's special functions for operator overloading](https://docs.python.org/3/reference/datamodel.html#emulating-numeric-types).

Here's how to do this for the addition and multiplication.

In [79]:
class Variable: #Simple variable. These are the leafs of our tree, they can request to have a gradient calculated, or not
    def __init__(self, value):
        self.value = value
        self.gradRoutes = [] # A variable by default has no grad Routes (was not created by anything, just defined)
        self.grad = 0.0

    def backProp(self, route_val = 1.0):
        # Add together the incoming gradients from the different routes that lead to a node - this will be the final gradient for the node
        self.grad += route_val

        # For every parent variable and corresponding local derivative value that we have in the gradRoutes, continue with the gradient calculation
        for variable, local_derivative_value in self.gradRoutes:
            # Multiply the incoming gradient with the local derivatives corresponding the parent variable, and continue the backpropagation
            variable.backProp(local_derivative_value * route_val)

    def __add__(self, b):
        return vAdd(self, b)

    def __mul__(self, b):
        return vMul(self, b)

    def __str__(self):
        return 'Value: {self.value}, Gradient: {self.grad}'.format(self=self)

In [80]:
a = Variable(4)  # a = 4
b = Variable(3)  # b = 3
c = a + b        # c = 4 + 3
res = a * c      # res = a * c = 28

print("Result =", res.value)

# Call backprop on the result
res.backProp()

# Now all variables should contain in their "grad" the derivative d(res) / d(variable)
print("The derivative of the result with respect to a is:", a.grad)
print("The derivative of the result with respect to b is:", b.grad)
# Also for intermediate results
print("The derivative of the result with respect to c is:", b.grad)

Result = 28
The derivative of the result with respect to a is: 11.0
The derivative of the result with respect to b is: 4.0
The derivative of the result with respect to c is: 4.0


### Zeroing gradients

A last thing to note is that once we call `backProp`, our gradients are calculated and our variables are now "dirty" in the sense that if we call backprop again, the new result will be added to the previous one:

In [81]:
a = Variable(4)   # a = 4
b = Variable(3)   # b = 3
res = (a + b) * a # res = a * c = 28

# Call backprop on the result
res.backProp()
print("The derivative of the result with respect to a is:", a.grad)
print("The derivative of the result with respect to b is:", b.grad)

# Call backprop on the result once more
print("Second time")
res.backProp()
print("The derivative of the result with respect to a is:", a.grad)
print("The derivative of the result with respect to b is:", b.grad)

The derivative of the result with respect to a is: 11.0
The derivative of the result with respect to b is: 4.0
Second time
The derivative of the result with respect to a is: 22.0
The derivative of the result with respect to b is: 8.0


This will actually turn out to be quite useful, e.g. when we want to accumulate weight gradients over different samples in our learning loops (see next week's notebook), but we need a way to control it.

To avoid this, we should reset the gradients to zero before we call `backProp` again. We can do it one by one for every variable, but we will also implement a function that does this recursively from the child node we backProped, all the way to the parents.

In [82]:
class Variable: #Simple variable. These are the leafs of our tree, they can request to have a gradient calculated, or not
    def __init__(self, value):
        self.value = value
        self.gradRoutes = [] # A variable by default has no grad Routes (was not created by anything, just defined)
        self.grad = 0.0

    def backProp(self, route_val = 1.0):
        # Add together the incoming gradients from the different routes that lead to a node - this will be the final gradient for the node
        self.grad += route_val

        # For every parent variable and corresponding local derivative value that we have in the gradRoutes, continue with the gradient calculation
        for variable, local_derivative_value in self.gradRoutes:
            # Multiply the incoming gradient with the local derivatives corresponding the parent variable, and continue the backpropagation
            variable.backProp(local_derivative_value * route_val)

    def zeroGrad(self):
        self.grad = 0.0

    def zeroGradsRecursively(self):
        self.zeroGrad()
        for variable, _ in self.gradRoutes:
            variable.zeroGradsRecursively()

    def __add__(self, b):
        return vAdd(self, b)

    def __mul__(self, b):
        return vMul(self, b)

    def __str__(self):
        return 'Value: {self.value}, Gradient: {self.grad}'.format(self=self)

In [83]:
a = Variable(4)   # a = 4
b = Variable(3)   # b = 3
res = (a + b) * a # res = a * c = 28

# Call backprop on the result
res.backProp()
print("The derivative of the result with respect to a is:", a.grad)
print("The derivative of the result with respect to b is:", b.grad)

# Zero gradients
res.zeroGradsRecursively()

# Call backprop on the result once more
print("Second time")
res.backProp()
print("The derivative of the result with respect to a is:", a.grad)
print("The derivative of the result with respect to b is:", b.grad)

The derivative of the result with respect to a is: 11.0
The derivative of the result with respect to b is: 4.0
Second time
The derivative of the result with respect to a is: 11.0
The derivative of the result with respect to b is: 4.0


## More Improvements

There are number of ways we can improve our simple network. The most important is probably being able to work with vectors and matrices - we will not implement this ourselves though, next week we will see a framework that does this. For the time being, we will focus on other, simpler improvements.

---

    
<font color=blue>We usually do not require gradients for all our variables. If we could indicate which variables require gradients, then we could keep track of the routes that lead to these variables only and drop all the rest. This would be a huge improvement in resources and speed (number of calculations). Add a boolean parameter in the Variable class initialization called `requiresGrad`, and use it to add this functionality.</font>

In [84]:
class Variable: #Simple variable. These are the leafs of our tree, they can request to have a gradient calculated, or not
    def __init__(self, value,requiresGrad = True):
        self.value = value
        self.gradRoutes = [] # A variable by default has no grad Routes (was not created by anything, just defined)
        self.grad = 0.0
        self.requiresGrad = requiresGrad

    def backProp(self, route_val = 1.0):
      if self.requiresGrad:
        # Add together the incoming gradients from the different routes that lead to a node - this will be the final gradient for the node
        self.grad += route_val

        # For every parent variable and corresponding local derivative value that we have in the gradRoutes, continue with the gradient calculation
        for variable, local_derivative_value in self.gradRoutes:
          # Multiply the incoming gradient with the local derivatives corresponding the parent variable, and continue the backpropagation
          variable.backProp(local_derivative_value * route_val)

    def zeroGrad(self):
        self.grad = 0.0

    def zeroGradsRecursively(self):
        self.zeroGrad()
        for variable, _ in self.gradRoutes:
            variable.zeroGradsRecursively()

    def __add__(self, b):
        return vAdd(self, b)

    def __mul__(self, b):
        return vMul(self, b)

    def __str__(self):
        return 'Value: {self.value}, Gradient: {self.grad}'.format(self=self)

In [85]:
a = Variable(4,True)   # a = 4
b = Variable(3,False)   # b = 3
res = (a + b) * a # res = a * c = 28

# Call backprop on the result
res.backProp()
print("The derivative of the result with respect to a is:", a.grad)
print("The derivative of the result with respect to b is:", b.grad)


The derivative of the result with respect to a is: 11.0
The derivative of the result with respect to b is: 0.0


---
    
<font color=blue>We obviously need to implement more functions - implement the following functions:
- Subtraction
- Raising to a power
- Division
- Unary negation
- The (natural) exponential function exp(x)
- ... any other function you might want</font>

In [86]:
import math

In [87]:
def vSub(A: Variable, B: Variable): # Substraction
    if not isinstance(A,Variable):
      A = Variable(A)
    if not isinstance(B,Variable):
      B = Variable(B)
    result = Variable(A.value - B.value) # Create a new Variable to store the result, and pass it the value = a - b

    #keep track of the parent variables, and of the local derivative associated with each one
    result.gradRoutes.append((A, 1)) # dresult / dA = 1
    result.gradRoutes.append((B, -1)) # dresult / dB = -1

    return result

def vRaise(A: Variable, B: Variable): # Raise
    if not isinstance(A,Variable):
      A = Variable(A)
    if not isinstance(B,Variable):
      B = Variable(B)
    result = Variable(A.value ** B.value) # Create a new Variable to store the result, and pass it the value = a ** b

    #keep track of the parent variables, and of the local derivative associated with each one
    result.gradRoutes.append((A, B.value*(A.value**(B.value - 1)))) # dresult / dA = B*A**(B-1)
    result.gradRoutes.append((B, math.log(A.value)*(A.value ** B.value))) # dresult / dB = ln(A)*A**B

    return result

def vDiv(A: Variable, B: Variable): # Division
    if not isinstance(A,Variable):
      A = Variable(A)
    if not isinstance(B,Variable):
      B = Variable(B)

    result = Variable(A.value / B.value) # Create a new Variable to store the result, and pass it the value = a / b

    #keep track of the parent variables, and of the local derivative associated with each one
    result.gradRoutes.append((A, 1/B.value)) # dresult / dA = 1/B
    result.gradRoutes.append((B, -A.value/(B.value ** 2))) # dresult / dB = -A/(B**2)

    return result

def vUnary(A: Variable): # Unary negation
    if not isinstance(A,Variable):
      A = Variable(A)

    result = Variable(-A.value) # Create a new Variable to store the result, and pass it the value = -a

    #keep track of the parent variables, and of the local derivative associated with each one
    result.gradRoutes.append((A, -1)) # dresult / dA = -1

    return result

def vExp(A: Variable): # Natural exponent
    if not isinstance(A,Variable):
      A = Variable(A)

    result = Variable(math.e**A.value) # Create a new Variable to store the result, and pass it the value = e **a

    #keep track of the parent variables, and of the local derivative associated with each one
    result.gradRoutes.append((A, math.e**A.value)) # dresult / dA = e**A

    return result


In [88]:
class Variable: #Simple variable. These are the leafs of our tree, they can request to have a gradient calculated, or not
    def __init__(self, value,requiresGrad = True):
        self.value = value
        self.gradRoutes = [] # A variable by default has no grad Routes (was not created by anything, just defined)
        self.grad = 0.0
        self.requiresGrad = requiresGrad

    def backProp(self, route_val = 1.0):
      if self.requiresGrad:
        # Add together the incoming gradients from the different routes that lead to a node - this will be the final gradient for the node
        self.grad += route_val

        # For every parent variable and corresponding local derivative value that we have in the gradRoutes, continue with the gradient calculation
        for variable, local_derivative_value in self.gradRoutes:
          # Multiply the incoming gradient with the local derivatives corresponding the parent variable, and continue the backpropagation
          variable.backProp(local_derivative_value * route_val)

    def zeroGrad(self):
        self.grad = 0.0

    def zeroGradsRecursively(self):
        self.zeroGrad()
        for variable, _ in self.gradRoutes:
            variable.zeroGradsRecursively()

    def __add__(self, b):
        return vAdd(self, b)

    def __radd__(self,b):
        return vAdd(self,b)

    def __mul__(self, b):
        return vMul(self, b)

    def __rmul__(self,b):
        return vMul(self,b)

    def __sub__(self,b):
        return vSub(self,b)

    def __rsub__(self,b):
        return vSub(b,self)

    def __truediv__(self,b):
        return vDiv(self,b)

    def __rtruediv__(self,b):
        return vDiv(b,self)

    def __pow__(self,b):
        return vRaise(self,b)

    def __rpow__(self,b):
        return vRaise(b,self)

    def __neg__(self):
        return vUnary(self)

    def __exp__(self):
        return vExp(self)

    def __str__(self):
        return 'Value: {self.value}, Gradient: {self.grad}'.format(self=self)

In [89]:
a = Variable(4)   # a = 4
b = Variable(3)   # b = 3
res = (a ** b)

# Call backprop on the result
res.backProp()
print("The derivative of the result with respect to a is:", a.grad)
print("The derivative of the result with respect to b is:", b.grad)
# Zero gradients

The derivative of the result with respect to a is: 48.0
The derivative of the result with respect to b is: 88.722839111673


In [90]:
a = Variable(4)   # a = 4
b = Variable(3)   # b = 3
res = (a / b)

# Call backprop on the result
res.backProp()
print("The derivative of the result with respect to a is:", a.grad)
print("The derivative of the result with respect to b is:", b.grad)

The derivative of the result with respect to a is: 0.3333333333333333
The derivative of the result with respect to b is: -0.4444444444444444


In [91]:
a = Variable(2)   # a = 4
b = Variable(3)   # b = 3
res = a.__exp__() * 3

# Call backprop on the result
res.backProp()
print("The derivative of the result with respect to a is:", a.grad)
print("The derivative of the result with respect to b is:", b.grad)

The derivative of the result with respect to a is: 22.16716829679195
The derivative of the result with respect to b is: 0.0


---

    
<font color=blue>Our operations currently accept only instances of our Variable class as inputs. So, if you wanted to calculate `a = b * 2` where `b` is an instance of our variable class and `2` is just a numerical constant you would get an error as our framework does not know how to multiply a `Variable` with a number. You should instead write `a = b * Variable(2)` to achieve this.</font>

<font color=blue>Improve further the usability of our framework by allowing our functions to mix numbers and Variables in the same operation. To do this, you should check the type of each operand, and if it is not a `Variable`, convert it to a `Variable` before you continue.</font>

In [92]:
# The code for this is included above

In [93]:
a = Variable(4)   # a = 4
res = (a ** 2) * 2 * a # res = 2 * a**3

# Call backprop on the result
res.backProp()
print("The derivative of the result with respect to a is:", a.grad)


The derivative of the result with respect to a is: 96.0


In [94]:
b = Variable(3)   # b = 3
res = 2 ** (3 * b)

# Call backprop on the result
res.backProp()
print("The derivative of the result with respect to b is:", b.grad)

The derivative of the result with respect to b is: 1064.674069340076


In [95]:
a = Variable(4)   # a = 4
res = (16 / a).__exp__()

# Call backprop on the result
res.backProp()
print("The derivative of the result with respect to a is:", a.grad)

The derivative of the result with respect to a is: -54.59815003314423


---


When we create a new variable through an operation, we create gradRoutes to all the parent variables and calculate the value of the local derivative. The sole purpose of calculating this value is to use it as a multiplier during backProp.

During the backprop operation, everytime our Variable receives a gradient it multiplies it with the local derivative value of each gradRoute (that was calcualted during the forward pass), and passes it back to the corresponding parent Variable.

It seems a bit strange that it is our Variable that has to keep track of these local derivative values and do this multiplication... In addition, this is a multiplication only because we deal with scalars, if we were implementing tensor (vector, matrix, tensor) operations we would have to substitute this scalar multiplication with a tensor operation...

It is better if our Variable does not have to bother about all these. What exactly we should do when we push the gradient back depends on the operation. It would be better, if each operation gave us the recipe of what to do during backprop. This would mean that every time we perform an operation, the operation defines and gives us a small function that is what we should call during backProp. This way, each operation keeps responsibility about how backprop should be implemented through it, and our Variable just needs to call this small function.

Keeping track of this function instead of the local derivative value will allow us to easily extend this framework to tensor operations. In addition, keeping a note of the function instead of the value, allows us to abstract away stuff. This basically means that we can build the computation graph first, with placeholder variables independently of specific input values, and then reuse it for different inputs. This is how many deep learning frameworks work.

<font color=blue>How to implement this? Every time we do an operation and create a new gradRoute, instead of calculating and storing a value for the local derivative, define a small function that calculates this local derivative value, takes the gradient that is being backpropagated and multiplies (or otherwise, depending on the operation) the two things.</font>

In [96]:
def vAdd(A: Variable, B: Variable):
    if not isinstance(A, Variable):
        A = Variable(A)
    if not isinstance(B, Variable):
        B = Variable(B)

    def add_backprop(route_val):
        A.backProp(route_val)
        B.backProp(route_val)

    result = Variable(A.value + B.value)
    result.gradRoutes.append(([A,B],add_backprop))
    return result


def vMul(A: Variable, B: Variable):
    if not isinstance(A, Variable):
        A = Variable(A)
    if not isinstance(B, Variable):
        B = Variable(B)

    def mul_backprop(route_val):
        A.backProp(route_val * B.value)
        B.backProp(route_val * A.value)

    result = Variable(A.value * B.value)
    result.gradRoutes.append(([A,B],mul_backprop))
    return result

def vSub(A: Variable, B: Variable):
    if not isinstance(A, Variable):
        A = Variable(A)
    if not isinstance(B, Variable):
        B = Variable(B)

    def sub_backprop(route_val):
        A.backProp(route_val)
        B.backProp(-route_val)

    result = Variable(A.value - B.value)
    result.gradRoutes.append(([A,B],sub_backprop))
    return result

def vDiv(A: Variable, B: Variable):
    if not isinstance(A, Variable):
        A = Variable(A)
    if not isinstance(B, Variable):
        B = Variable(B)

    def div_backprop(route_val):
        A.backProp(route_val*(1/B.value))
        B.backProp(route_val*(-A.value/(B.value ** 2)))

    result = Variable(A.value / B.value)
    result.gradRoutes.append(([A,B],div_backprop))
    return result

def vRaise(A: Variable, B: Variable):
    if not isinstance(A, Variable):
        A = Variable(A)
    if not isinstance(B, Variable):
        B = Variable(B)

    def raise_backprop(route_val):
        A.backProp(route_val*(B.value*(A.value**(B.value-1))))
        B.backProp(route_val*(math.log(A.value)*(A.value ** B.value)))

    result = Variable(A.value ** B.value)
    result.gradRoutes.append(([A,B],raise_backprop))
    return result

def vUnary(A: Variable):
    if not isinstance(A, Variable):
        A = Variable(A)

    def unary_backprop(route_val):
        A.backProp(-route_val)

    result = Variable(-A.value)
    result.gradRoutes.append(([A],unary_backprop))
    return result

def vExp(A: Variable):
    if not isinstance(A, Variable):
        A = Variable(A)

    def exp_backprop(route_val):
        A.backProp(route_val*math.e**A.value)

    result = Variable(math.e**(A.value))
    result.gradRoutes.append(([A],exp_backprop))
    return result

In [97]:
class Variable:
    def __init__(self, value, requiresGrad=True):
        self.value = value
        self.gradRoutes = []
        self.grad = 0.0
        self.requiresGrad = requiresGrad

    def backProp(self, route_val=1.0):
        if self.requiresGrad:
            self.grad += route_val
            for variables,backprop_function in self.gradRoutes:
                backprop_function(route_val)

    def zeroGrad(self):
        self.grad = 0.0

    def zeroGradsRecursively(self):
        self.zeroGrad()
        for variables,backprop_function in self.gradRoutes:
          for variable in variables:
            variable.zeroGradsRecursively()

    def __add__(self, b):
        return vAdd(self, b)

    def __radd__(self, b):
        return vAdd(self, b)

    def __mul__(self, b):
        return vMul(self, b)

    def __rmul__(self, b):
        return vMul(self, b)

    def __sub__(self,b):
        return vSub(self,b)

    def __rsub__(self,b):
        return vSub(b,self)

    def __truediv__(self,b):
        return vDiv(self,b)

    def __rtruediv__(self,b):
        return vDiv(b,self)

    def __pow__(self,b):
        return vRaise(self,b)

    def __rpow__(self,b):
        return vRaise(b,self)

    def __neg__(self):
        return vUnary(self)

    def __exp__(self):
        return vExp(self)

    def __str__(self):
        return 'Value: {self.value}, Gradient: {self.grad}'.format(self=self)

In [98]:
a = Variable(4)   # a = 4
res = (3 + a) * a # res = 28

# Call backprop on the result
res.backProp()
print("The derivative of the result with respect to a is:", a.grad)

# Zero gradients

res.zeroGradsRecursively()
# Call backprop on the result once more
print("Second time")
res.backProp()
print("The derivative of the result with respect to a is:", a.grad)

The derivative of the result with respect to a is: 11.0
Second time
The derivative of the result with respect to a is: 11.0


In [99]:
a = Variable(4)   # a = 4
res = (16 / a).__exp__()

# Call backprop on the result
res.backProp()
print("The derivative of the result with respect to a is:", a.grad)

The derivative of the result with respect to a is: -54.59815003314423


In [100]:
a = Variable(4)   # a = 4
b = Variable(3)   # b = 3
res = (a / b) * 2

# Call backprop on the result
res.backProp()
print("The derivative of the result with respect to a is:", a.grad)
print("The derivative of the result with respect to b is:", b.grad)

The derivative of the result with respect to a is: 0.6666666666666666
The derivative of the result with respect to b is: -0.8888888888888888


In [101]:
a = Variable(5)  # a = 5
b = Variable(3)  # b = 3
c = a - b        # c = 5 - 3
res = a * c      # res = a * c = 10

print("Result =", res.value)

# Call backprop on the result
res.backProp()

# Now all variables should contain in their "grad" the derivative d(res) / d(variable)
print("The derivative of the result with respect to a is:", a.grad)
print("The derivative of the result with respect to b is:", b.grad)

Result = 10
The derivative of the result with respect to a is: 7.0
The derivative of the result with respect to b is: -5.0


---
<font color=blue>We will Write some code to manually check that our gradient calculation is correct, using the property of:</font>

<font color=blue>$$f'(x) = \frac {f(x+\epsilon) - f(x-\epsilon)}{2 \epsilon}$$ where $\epsilon$ is a very small number to approximately calculate the gradient.</font>

<font color=blue>Then use it to calculate the derivative of the function $f(x) = 21 * x^3$ at $x=3.2$.</font>

In [102]:
# Your Code Here
eps = 0.001
x = Variable(3.2)
f = 21 * (x ** 3)
f1 = (21*(x+eps)**3 - 21*(x-eps)**3) / (2*eps)
f.backProp()
print("The derivative of the f with respect to x is:", x.grad)
print(f1.value)

The derivative of the f with respect to x is: 645.1200000000001
645.1200210000252
