<a href="https://colab.research.google.com/github/taskswithcode/probability_for_ml_notebooks/blob/main/MLKnobs.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**How do Neural Networks learn? *(with just arithmetic - No Calculus)*.**

#### 1. Lets start with a function with three **knobs (variables)** that control the output value.

#### Our objective is to find out how do we **tweak** these knobs so the function output increases *(or decreases if that is what we prefer)*

In [None]:
a = 2
b = -3
c = 10
value = a * b + c
print(f"Value of the function before changing any variable: {value}")

Value of the function before changing any variable: 4


##### We are first going to change these knobs **individually** by adding a fixed value  **h** and see what happens to the output

In [None]:
h = .0001 #the fixed constant value we are going to use to increase each variable

###### Lets add **h** to **variable a** first and see how the function responds

In [None]:
d1 = a*b+c
print(a,a+h)
a = a + h #tweaking a
d2 = a*b + c
print(d1,d2)

2 2.0001
4 3.999699999999999


*When we **changed variable a** by adding a tiny amount **h**, the function output went **down** a little from 4 to 3.99969. So **increasing variable a decreases** output*

In [None]:
grada = (d2-d1)/h
grada

-3.000000000010772

*This value captures the **rate of change** of function output when we increased the value of **variable a**. We will use it below*

###### Lets add **h** to **variable b** and see how the function responds

In [None]:
d1 = a*b+c
print(b,b+h)
b = b + h
d2 = a*b + c
print(d1,d2)

-3 -2.9999
3.999699999999999 3.99990001


*When we **changed variable b** by adding a tiny amount **h**, the function output went **up** a little. So **increasing variable b increases** output*

In [None]:
gradb = (d2-d1)/h
gradb

2.0001000000124947

*This value captures the **rate of change** of function output when we increased the value of **variable b**. We will use it below*

###### Lets add **h** to **variable c** and see how the function responds

In [None]:
d1 = a*b+c
print(c,c+h)
c = c + h
d2 = a*b + c
print(d1,d2)

10 10.0001
3.99990001 4.00000001


*When we **changed variable c** by a adding tiny amount **h**, the function output went **up** by same amount. So **increasing variable c increases** output*

In [None]:
gradc = (d2-d1)/h
gradc

0.9999999999976694

*This value captures the **rate of change** of function output when we increased the value of **variable c**. We will use it below*

##### **Our findings so far:** When we **increased two knobs (b and c)** by a fixed value **h** the output went **up**, whereas when we **increased  knob a** the output went **down**.

How can we know ***which direction to turn the knobs*** so the function output ***increases*** (or decreases if thats what we prefer) ?

The rate of change of the function with respect to each variable, that we calculated above ***(grada,gradb,gradc)***  is useful for this.

The ***rate of change of a function*** with respect to a variable is often called ***gradient or slope of the function*** with respect to that variable.


##### We are now going to **change** these knobs individually by adding a **small fraction of the gradients** computed above

In [None]:
lr = .01 #this is the fraction of the gradients we are going to take

###### Lets add a fraction of the gradient **grada** to  **variable a**  and see how the function responds

In [None]:
a = 2
b = -3
c = 10
lr = .01
d1 = a*b + c
print(grada) # we calculated this earlier
print(f"Original value of a: {a};  Value of a after adding a fraction of the gradient to it: {a+lr*grada}")
a = a + lr*grada #We increase "a" by a small fraction of the gradient
d2 = a*b + c
print(d1,d2)

-3.000000000010772
Original value of a: 2;  Value of a after adding a fraction of the gradient to it: 1.9699999999998923
4 4.090000000000323


###### The output goes **up** when we changed **knob a** by a small fraction of the gradient **grada**. The output went up from 4 to 4.09. Note, the output went **down** before when we **increased a** by a fixed value h. The reason it goes up now when we changed it by a small fraction of the gradient is, when we are adding a fraction of the gradient to **a**, we are **effectively decreasing the value of a, because the gradient is a negative value**
Note the value of **a** before and after adding the *fraction of the gradient* - it goes down from 2 to 1.9699999999998923. Essentially we **turning down knob a when adding the gradient** to it.

###### Lets add a fraction of the gradient **gradb** to  **variable b**  and see how the function responds

In [None]:
a = 2
b = -3
c = 10
lr = .01
d1 = a*b + c
print(gradb) # we calculated this earlier
print(f"Original value of b: {b};  Value of a after adding a fraction of the gradient to it: {b+lr*gradb}")
b = b + lr*gradb
d2 = a*b + c
print(d1,d2)

2.0001000000124947
Original value of b: -3;  Value of a after adding a fraction of the gradient to it: -2.979998999999875
4 4.04000200000025


###### *The output goes **up** when we change **knob b** by a small fraction of the gradient **gradb**.* **Adding a small value** of the gradient **turns up** knob b

###### Lets add a fraction of the gradient **gradc** to  **variable c**  and see how the function responds

In [None]:
a = 2
b = -3
c = 10
lr = .01
d1 = a*b + c
print(gradc) # we calculated this earlier
print(f"Original value of c: {c};  Value of a after adding a fraction of the gradient to it: {c+lr*gradc}")
c = c + lr*gradc
d2 = a*b + c
print(d1,d2)

0.9999999999976694
Original value of c: 10;  Value of a after adding a fraction of the gradient to it: 10.009999999999977
4 4.009999999999977


###### *The output goes **up** when we change **knob c** by a small fraction of the gradient **gradc**.* **Adding a small value** of the gradient **turns up** knob c

##### **Our finding from this**: The function output **increased** when we **individually increased** each variable by a **fraction of the gradient** with respect to that variable. The gradient essentially took care of which way (and by how much) to turn a knob in order to increase the function output

##### We are now going to **decrease** these knobs individually by a small fraction of the gradients computed above

###### Lets subtract a fraction of the gradient **grada** to  **variable a**  and see how the function responds

In [None]:
a = 2
b = -3
c = 10
lr = .01
d1 = a*b + c
print(grada) # we calculated this earlier
print(f"Original value of a: {a};  Value of a after adding a fraction of the gradient to it: {a-lr*grada}")
a = a - lr*grada #We increase "a" by a small fraction of the gradient
d2 = a*b + c
print(d1,d2)

-3.000000000010772
Original value of a: 2;  Value of a after adding a fraction of the gradient to it: 2.0300000000001077
4 3.909999999999677


###### *The output goes **down** when we **decrease knob a** by a small fraction of the gradient **grada**.*

###### Lets subtract a fraction of the gradient **gradb** to  **variable b**  and see how the function responds

In [None]:
a = 2
b = -3
c = 10
lr = .01
d1 = a*b + c
print(gradb) # we calculated this earlier
print(f"Original value of b: {b};  Value of a after adding a fraction of the gradient to it: {b-lr*gradb}")
b = b - lr*gradb
d2 = a*b + c
print(d1,d2)

2.0001000000124947
Original value of b: -3;  Value of a after adding a fraction of the gradient to it: -3.020001000000125
4 3.95999799999975


###### *The output goes **down** when we **decrease knob b** by a small fraction of the gradient **gradb**.*

###### Lets subtract a fraction of the gradient **gradc** to  **variable c**  and see how the function responds

In [None]:
a = 2
b = -3
c = 10
lr = .01
d1 = a*b + c
print(gradc) # we calculated this earlier
print(f"Original value of c: {c};  Value of a after adding a fraction of the gradient to it: {c-lr*gradc}")
c = c - lr*gradc
d2 = a*b + c
print(d1,d2)

0.9999999999976694
Original value of c: 10;  Value of a after adding a fraction of the gradient to it: 9.990000000000023
4 3.9900000000000233


###### *The output goes **down** when we **decrease knob c** by a small fraction of the gradient **gradc**.*

##### **Our finding from this:** The function output **decreased** when we **individually decreased** each variable by a **fraction of the gradient** with respect to that variable



#### In Summary, if we want to know which direction to tweak all the variables (knobs) of a function to **increase** the output of a function, we just need to **add to each variable**, **the gradient** (or a fraction of it) of the function **with respect to that variable**.

#### Conversely, to **decrease** the output of a function, we just need to **subtract** from  each variable, the **gradient** (or a fraction of it) of the function **with respect to that variable.**

#### This fact is very useful when we train a neural network, which is a function with many knobs (variables) to tweak. During training, the output of a neural network is passed as input to another function, often called the loss function, which compares how far off the output of the neural network is from the true (ground truth) value (the second input to the loss function).

#### So during training a neural network is one big function where the output is from the loss function which in turn depends on the neural network function for its input. The loss function is discarded after training - we just use the neural network function.  

#### The lower the output of the loss function, the better. So we tweak the variables of the neural network so that the loss function output is lowered. That is, we subtract the gradient (a fraction of it) of the loss function (with respect to each variable) from each variable.

#### In practice, the amount to subtract is found much more efficiently than the method described above, by an efficient algorithm called **backpropagation**.  Backpropagation algorithm does what we did in this notebook, except in an efficient manner.

#### The actual variable tweaking, or subtraction step that we did above, is done by an optimization algorithm like **gradient descent**. The optimization algorithm determines how much to subtract during each step of the training etc.


This notebook is based on and inspired by  [Andrej Karpathys video](https://youtu.be/VMj-3S1tku0)