# The chain rule

### Introduction

So far we have seen that the derivative of a function is the instantaneous rate of change of that function.  In other words, how does a function's output change as we change one of the variables.  In this lesson, we will learn about the chain rule, which allows us to see how a function's output change as we change a variable that function does not directly depend on.  The chain rule may seem complicated, but it is just a matter of following the prescribed procedure.  Learning about the chain rule, will allow us to take the derivative of more complicated functions that we will encounter in machine learning.

### The chain rule

Ok, now let's talk about the chain rule.  Imagine that we would like to take the derivative of the following function:

$$f(x) = (3 + x^2 + 2x )^2 $$ 

Doing something like that can be pretting tricky right off the bat.  Lucky for us, we can use the chain rule.  The chain rule is essentially a trick that can be applied when our functions get complicated.  The first step is using functional composition to break our function down. Ok, let's do it.

$$g(x) = (3 + x^2 + 2x)$$
$$f(g(x)) = g(x)^2$$

So now note that $f(x) = f(g(x))$.  So our new question is to find the derivative of that latter function, $f'(g(x))$.  Sounds impossible, you say?  Enter the chain rule.

> **The chain rule** allows us to answer just this question.  Remember, taking a derivative means changing a variable $x$ a little, and seeing the change in the output.  The chain rule allows us to solve the problem of seeing the change in output when our function does not **directly** depend on that changing variable, but depends on **a function ** that depends on a variable.  

So here $f(g(x) $ does not directly depend on $x$.  Instead it depends on the fucntion $g(x)$ which depends on $x$.  Ok, enough talk let's see the rule.

 $$f'(g(x)) = \frac{\Delta f}{\Delta g}f(g(x))*\frac{\Delta g}{\Delta x}g(x)$$ 
 
 Yes it's a mouthful, but it's not so bad in practice.

### Applying the chain rule

#### Taking each derivative

Let's apply our chain rule step by step to the function by taking the derivative $f'(x) $ where:

$$g(x) = (3 + x^2 + 2x)$$

$$f(g(x)) = g(x)^2$$

Remember our chain rule is: $f'(g(x)) = \frac{\Delta f}{\Delta g}f(g(x))*\frac{\Delta g}{\Delta x}g(x)$

* First we take the derivative $\frac{\Delta g}{\Delta x}g(x) = g'(x) = 2x + 2$.


* Then, we take the derivative $\frac{\Delta f}{\Delta g}f(g(x))$ where $f(g(x)) = (g(x))^2 $.

This is how we evaluate that second derivative.  $\frac{\Delta f}{\Delta g}f(g(x)) = 2 * (g(x))^1 =  2 * g(x)$ 

> The reason why is because to take that second derivative $\frac{\Delta f}{\Delta g}f(g(x))$, means how does the output of $f(g(x))$ change as we nudge **the function** $g(x)$.  This means that we can just treat the entire function $g(x)$ as a variable, and use our power rule.  

#### Plugging our derivatives back into our formula

Ok, now we have solved our two derivatives.  Let's plug these derivatives back into our chain rule of $f'(g(x)) = \frac{\Delta f}{\Delta g}f(g)*\frac{\Delta g}{\Delta x}g(x)$.

* Doing so we have $f'(g(x)) = 2*g(x)*(2x + 2)$


* And because we are told $g(x) = (3 + x^2 + 2x)$ 


* substituting this in for $g(x)$ we have


* $f'(g(x)) = 2*g(x)*(2x + 2) = 2*(3 + x^2 + 2x)*(2x + 2) $

Leaving our equation there is fine.  We've done enough math for one lesson.  Hopefully, you see how using the chain rule allows us to break a complicated function up into two, and simply apply the rule to calculate the derivative.   

### Summary

Finally we saw how the chain rule allows us to break a complicated function up into two, and simply apply the rule of $f'(g(x)) = \frac{\Delta f}{\Delta g}f(g)*\frac{\Delta f}{\Delta x}g(x)$.  In machine learning sections to come, we will make use of the chain rule in taking derivatives. 