<a href="https://colab.research.google.com/github/vshlemon/colabs/blob/main/notebooks/karpathy_nns/chapter_1%20-%20neural%20networks%20%26%20backpropagation%20(micrograd).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!git clone https://github.com/vshlemon/colabs.git

# *The spelled out intro to neural networks and backpropagation: building micrograd*

- [Youtube video](https://www.youtube.com/watch?v=VMj-3S1tku0&t=7386s)
  - [Jupyter notebooks from video](https://github.com/karpathy/nn-zero-to-hero/tree/master/lectures/micrograd)
  - [Exercises for after video](https://www.youtube.com/redirect?event=video_description&redir_token=QUFFLUhqbW51Y3ZCcWdoamQ3S00wY1pHLWNVTWRtSmRWQXxBQ3Jtc0tsQ3BlSlZIY1ptdWRuWEFJaDc0Wnk5YlI3ekhPRTR6OUpIdmxOLVREaDFucUtyUHdrU2w3UHFCdTNkT0pQTkV6RWNVcU1KZnNOVzBmUnFER3Y4SElzX0tYb1lXRl80aXlaN3N1SmFITERDRjNoS1hjYw&q=https%3A%2F%2Fcolab.research.google.com%2Fdrive%2F1FPTx1RXtBfc4MaTkf7viZZD4U2F9gtKN%3Fusp%3Dsharing&v=VMj-3S1tku0)
- [Maths solver](https://www.google.com/search?q=step+by+step+math+solver)
  - try [Symbolab](https://www.symbolab.com) or [Wolfram Alpha](https://www.wolframalpha.com/input?i=derivative) if above doesn't work

In [None]:
import math
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

# 1. Understanding derivatives of functions

Derivatives tell us the rate of change of a function at a particular point within the domain of it's input values. They form the basis of backpropagation, the technique by which neural networks learn to reduce error and make more accurate predictions/generations.

In [None]:
def f(x):
  return 3*x**2 - 4*x + 5

In [None]:
f(3.0)

The function above has only a single variable on which it operates, we can plot how the function behaves for a subset of input values within its domain

In [None]:
xs = np.arange(-5, 5, 0.25) # range from -5 to 5, with 0.25 step intervals
ys = f(xs) # computing the function on every input value
plt.plot(xs, ys)
plt.xlabel('input values')
plt.ylabel('function response/output')

**Calculating derivatives numerically**

We can calculate the derivative, i.e. the rate of change, by estimating the slope of the function at a particular point.

We can get this estimation by adding a miniscule amount to the input value and calculating the function at this slightly changed location. We can then calculate the difference between the function at the location we intend to estimate for and this slightly changed location to get the change in the function value, and we can then get the rate of change by normalising (dividing out) by the miniscule amount we changed it by e.g:

$$\frac{f(x+h) - f(x)}{h}$$

Here, $h$ is the amount we add, which must be miniscule so we can get an approximation of the rate of change around $x$ which is the location we care to evaluate the derivative for, by subtracting the function at $x$ from it offset by $h$ we get a good local approximation - we then divide by $h$ so that we don't simply get the change in the function value (y-axis only), but we get the rate of change i.e. slope of it (factoring in y-axis change over the x-axis change $h$).

We can write a function for this below. *Note that as $h$ gets smaller it becomes harder for programming languages like python to handle as they can not manage with that level of precision in representing a number, there are tricks to deal with this such as using the logarithm of a decimal number etc.*

In [None]:
def change_in_function(f, x, h):
  return (f(x + h) - f(x))

def numerical_derivative(f, x, h):
  return (
      change_in_function(f, x, h) / h
  )

In [None]:
x = 2/3 # 0.66666666
h = 0.000001
print(f"""
  The function changes by {change_in_function(f, x)}, over a distance of {h}
  giving a derivative/slope of: {numerical_derivative(f, x)}
""")

So we can see that at the point where $x=0.66666$ it's rate of change is very little but is moving in the positive direction (because $f(x + h)$ is greater than $f(x)$) - and as you can see in the graph it begins to bottom out/slow down around that area.

We can try to plot this

In [None]:
def plot_with_tangent_and_change_lines(
    f,
    xs,
    ys,
    h,
    x_value_to_calculate_derivative_at
):
  plt.plot(xs, ys)
  plt.xlabel('input value')
  plt.ylabel('function response')

  # Plotting tangent line
  # y = m*(x - x1) + y1
  def tangent_line(xrange, x1, y1):
    '''
    Produces the y-values of the tangent line by:

    Derivative:
      It gets the derivative of the number x1,

    Xrange offsets:
      It then gets the offsets of all numbers
      in the range of x we are plotting for
      from x1, so if x1 is 5 and we are plotting
      for range x1-1 to x1+1, it will go from 4:6
      and subtract 5 from every element in there,
      where num elements are determined by the 3rd
      arg to np.linspace. This will look like, for
      eg. [-1, -0.8 ... 0 ... 0.2, 0.4 ... 1] etc.

      It then multiplies the derivative by the x-range
      offsets, which intuitively gives us the line hanging
      in mid-air (i.e. without rooting at the right y-axis
      location) since the derivative gives us the slope &
      we are just wanting to draw out the same slope line
      across a range of x-values and so we just multiply their
      value with the slope to get one straight line (treating
      it as if that were there slope line almost) ... & then
      finally we add the actual function output for the x value
      we got the derivative of in order to peg the hanging line
      to the right place i.e. touching at the y-tangent point.

      You can also just think of it as a re-arrangement of the
      derivative/slope formula:
        (y - y1)
        -------- = m
        (x - x1)
      where m is the slope, that can be re-arranged to get the
      y point & if we compute it across a range of x-values whilst
      holding everything the same we get the slope line stretched
      across those x-range values
    '''
    return (
        numerical_derivative(f, x1, h)
        * (xrange - x1)
        + y1
    )

  xvda = x_value_to_calculate_derivative_at
  # Define x data range for tangent line
  tangent_xrange = np.linspace(
      xvda - (1+h),
      xvda + (1+h),
      10
  )
  plt.plot(
      tangent_xrange, # x-axis of tangent line
      tangent_line(
          tangent_xrange,
          xvda,
          f(xvda)
      ), # y-axis of tangent line
      'C1--',
      linewidth=2,
      label='tangent line'
  )

  # Plotting the change in x and y for the value to compute derivative of
  # Change in x (likely invisible due to miniscule size of h)
  plt.plot(
      [
          xvda,
          xvda + h
      ],
      [f(xvda), f(xvda)],
      'C8--',
      linewidth=2,
      label='change in x line'
  )

  # Change in y (likely invisible due to miniscule size of h)
  plt.arrow(
      x=xvda,
      y=f(xvda),
      dx=0, # dx
      dy=(f(xvda+h)-f(xvda)), # dy
      color='C4',
      linestyle='-',
      linewidth=2,
      head_width=(0.1+h/2),
      label='change in y line'
  )

  plt.legend()
  plt.show()

In [None]:
h = 1
xs = np.arange(-5, 5, 0.25)
ys = f(xs)

x_value_to_calculate_derivative_at = -2

plot_with_tangent_and_change_lines(
    f, xs, ys, h, x_value_to_calculate_derivative_at
)