<a href="https://colab.research.google.com/github/taskswithcode/probability_for_ml_notebooks/blob/main/MLKnobs.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**How do Neural Networks Learn? (with just arithmetic - no Calculus)**

#### 1. A function with three **knobs (variables)** that control the output value

In [171]:
a = 2
b = -3
c = 10
value = a * b + c
value

4

##### We are first going to change these knobs individually by a fixed value **h**

In [None]:
h = .0001 #constant value used to change each variable separately below

###### Lets change **variable a** first and see how the function responds

In [172]:
d1 = a*b+c
print(a,a+h)
a = a + h
d2 = a*b + c
print(d1,d2)

2 2.1
4 3.6999999999999993


*When we **changed variable a** by adding a tiny amount **h**, the function output went **down** a little. So **increasing variable a decreases** the output*

In [173]:
GradA = (d2-d1)/h
GradA

-3.000000000000007

*This value captures the **rate of change** of function output when we changed the value of **variable a**. We will use it below*

###### Lets change **variable b** and see how the function responds

In [174]:
d1 = a*b+c
print(b,b+h)
b = b + h
d2 = a*b + c
print(d1,d2)

-3 -2.9
3.6999999999999993 3.91


*When we **changed variable b** by adding a tiny amount **h**, the function output went **up** a little. So **increasing variable b increases** the output*

In [175]:
GradB = (d2-d1)/h
GradB

2.1000000000000085

*This value captures the **rate of change** of function output when we changed the value of **variable b**. We will use it below*

###### Lets change **variable c** and see how the function responds

In [176]:
d1 = a*b+c
print(c,c+h)
c = c + h
d2 = a*b + c
print(d1,d2)

10 10.1
3.91 4.01


*When we **changed variable c** by adding a tiny amount **h**, the function output went **up** by same amount. So **increasing variable c increases** the output*

In [177]:
GradC = (d2-d1)/h
GradC

0.9999999999999964

*This value captures the **rate of change** of function output when we changed the value of **variable c**. We will use it below*

##### **Finding so far:** When we **increased two knobs (b and c)** the output went **up**, whereas when we **increased  knob a** the output went **down**.

How can we know ***which direction to turn the knobs*** so the function output ***increases*** (or decreases) ?

The rate of change of the function we calculated above ***(grada,gradb,gradc)*** with respect to each variable is useful for this. It tells which direction to tweak a knob.

The ***rate of change of a function*** with respect to a variable is often called ***gradient or slope of the function*** with respect to that variable.


##### We are now going to **change** these knobs individually **adding** by a small fraction of the gradients computed above

In [178]:
lr = .01 #this is the fraction of the gradients we are going to take

###### Lets add a fraction of the gradient **GradA** to  **variable a**  and see how the function responds

In [179]:
a = 2
b = -3
c = 10
lr = .01
d1 = a*b + c
print(GradA) # we calculated this earlier
print(f"Original value of a: {a};  Value of a after adding a fraction of the gradient to it: {a+lr*grada}")
a = a + lr*GradA #We increase "a" by a small fraction of the gradient
d2 = a*b + c
print(d1,d2)

-3.000000000000007
Original value of a: 2;  Value of a after adding a fraction of the gradient to it: 1.9699999999998923
4 4.09


###### The output goes **up** when we changed **knob a** by adding a small fraction of the gradient **GradA**. Note, the output when **down** before when we **changed a** by adding a small amount h. The reason it goes up now when we change it by a small fraction of the gradient is, when we are adding a fraction of the gradient to **a**, we are **effectively decreasing the value of a**. This is because the gradient is **negative**
See the value of **a** before and after adding the *fraction of the gradient* - it goes down from 2 to 1.9699999999998923

###### Lets add a fraction of the gradient **GradB** to  **variable b**  and see how the function responds

In [165]:
a = 2
b = -3
c = 10
lr = .01
d1 = a*b + c
print(GradB) # we calculated this earlier
print(f"Original value of b: {b};  Value of a after adding a fraction of the gradient to it: {b+lr*gradb}")
b = b + lr*GradB
d2 = a*b + c
print(d1,d2)

2.1999999999999975
Original value of b: -3;  Value of a after adding a fraction of the gradient to it: -2.979998999999875
4 4.044


###### *The output goes **up** when we change **knob b** by adding a small fraction of the gradient **GradB**.*

###### Lets add a fraction of the gradient **GradC** to  **variable c**  and see how the function responds

In [166]:
a = 2
b = -3
c = 10
lr = .01
d1 = a*b + c
print(GradC) # we calculated this earlier
print(f"Original value of c: {c};  Value of a after adding a fraction of the gradient to it: {c+lr*gradc}")
c = c + lr*GradC
d2 = a*b + c
print(d1,d2)

0.9999999999999964
Original value of c: 10;  Value of a after adding a fraction of the gradient to it: 10.009999999999977
4 4.01


###### *The output goes **up** when we change **knob c** by adding a small fraction of the gradient **GradC**.*

##### **Finding so far:** The function output **increased** when we **individually changed** each variable by adding a **fraction of the gradient** with respect to that variable

##### We are now going to **change** these knobs individually by **subtracting** a small fraction of the gradients computed above

###### Lets subtract a fraction of the gradient **GradA** from  **variable a**  and see how the function responds

In [167]:
a = 2
b = -3
c = 10
lr = .01
d1 = a*b + c
print(GradA) # we calculated this earlier
print(f"Original value of a: {a};  Value of a after adding a fraction of the gradient to it: {a-lr*grada}")
a = a - lr*GradA #We increase "a" by a small fraction of the gradient
d2 = a*b + c
print(d1,d2)

-2.9000000000000004
Original value of a: 2;  Value of a after adding a fraction of the gradient to it: 2.0300000000001077
4 3.9130000000000003


###### *The output goes **down** when we **change knob a** by subtracting a small fraction of the gradient **GradA**.*

###### Lets subtract a fraction of the gradient **GradB** from  **variable b**  and see how the function responds

In [168]:
a = 2
b = -3
c = 10
lr = .01
d1 = a*b + c
print(GradB) # we calculated this earlier
print(f"Original value of b: {b};  Value of a after adding a fraction of the gradient to it: {b-lr*gradb}")
b = b - lr*GradB
d2 = a*b + c
print(d1,d2)

2.1999999999999975
Original value of b: -3;  Value of a after adding a fraction of the gradient to it: -3.020001000000125
4 3.9560000000000004


###### *The output goes **down** when we **change knob b** by subtracting a small fraction of the gradient **GradB**.*

###### Lets subtract a fraction of the gradient **GradC** to  **variable c**  and see how the function responds

In [None]:
a = 2
b = -3
c = 10
lr = .01
d1 = a*b + c
print(GradC) # we calculated this earlier
print(f"Original value of c: {c};  Value of a after adding a fraction of the gradient to it: {c-lr*gradc}")
c = c - lr*GradC
d2 = a*b + c
print(d1,d2)

0.9999999999976694
Original value of c: 10;  Value of a after adding a fraction of the gradient to it: 9.990000000000023
4 3.9900000000000233


###### *The output goes **down** when we **change knob c** by subtracting a small fraction of the gradient **GradC**.*

##### **Finding so far:** The function output **decreased** when we **individually changed** each variable by subtracting a **fraction of the gradient** with respect to that variable




---



#### In Summary if we want to know which direction to tweak all the variables (knobs) of a function to **increase** the output of a function, we just need to **add to each variable**, **the gradient** (or a fraction of it) of the function **with respect to that variable**.

#### Conversely, to **decrease** the output of a function, we just need to **subtract** from  each variable, the **gradient** (or a fraction of it) of the function **with respect to that variable.**

#### The gradient of a variable has two pieces of information to increase (or decrease) a function's output. It contains the direction to tweak a variable, and how much to tweak a variable.

#### This fact is very useful when we train a neural network, which is essentially a function with many knobs (variables). During training, the output of a neural network is passed as input to another function, often called the loss function, which compares how far off the output of the neural network is from the true (ground truth) value.

#### The lower the output of the loss function, the better. So we want to tweak the variables of the neural network so that the loss function output is lowered. We achieve this by subtracting the gradient (a fraction of it) of the loss function (with respect to each variable) from each variable. Once trained, the loss function is discared - only the neural network is used.

#### The gradient for each variable is found by an efficient algorithm called **backpropagation**, which essentially does what we did in this notebook

#### The update of each variable with the gradient is done by an optimization algorithm called **gradient descent**.




---

**Additional notes.** The above argument applies no matter how many knobs a function has, or how complex the function is. For instance, in the example below, the output function depends both directly on some knobs but also another function which in turn depends on few other knobs that are not directly present in the output function. The same observations as above applies.

In [180]:
h = .1
a = 1
b = -1
c = -2
d = 3
e = -2
f = d*e*c
d1 = a*b*f # d is only used indirectly through f in the output function
print(d,d+h)
d = d + h
f = d*e*c
d2 = a*b*f
print(d1,d2)

3 3.1
-12 -12.4


In [181]:
GradD = (d2-d1)/h
GradD

-4.0000000000000036

In [182]:
h = .1
a = 1
b = -1
c = -2
d = 3
e = -2
lr = .01
f = d*e*c
d1 = a*b*f
print(d,d + lr*GradD)
d = d + lr*GradD
f = d*e*c
d2 = a*b*f
print(d1,d2)

3 2.96
-12 -11.84


In [183]:
h = .1
a = 1
b = -1
c = -2
d = 3
e = -2
lr = .01
f = d*e*c
d1 = a*b*f
print(d,d - lr*GradD)
d = d - lr*GradD
f = d*e*c
d2 = a*b*f
print(d1,d2)

3 3.04
-12 -12.16
