This simple predictor introduces concepts like weights, bias, hidden nodes, activation function, learning rate, loss and gradient descent.

In [None]:
simple_dataset = [1,2,3]
initial_weight = 2
initial_bias = 0

def activation_function(h):
    return h

for x in simple_dataset:

  original_output = x + 1

  print("Original Output : ", original_output, end=", ")

  #  First compute the hidden node.
  h = initial_weight*x + initial_bias

  # Pass the output of the hidden node to the activation function
  a = activation_function(h)

  # The output of the activation function is our predicted output.
  predicted_output = a

  # Find the loss
  loss = abs(original_output - predicted_output)

  # Perform multiple steps for single data to minimize the loss of the prediction for that.
  w = initial_weight
  b = initial_bias

  while(loss > 0.1):

      dw = 0.1
      db = 0.1

      if (original_output - predicted_output > 0 ):
        w = w + dw
        b = b + db
      else:
        w = w - dw
        b = b - db

      h = w*x + b
      a = activation_function(h)
      predicted_output = a

      loss = abs(original_output - predicted_output)

  print("Predicted Output : ", predicted_output)

Original Output :  2, Predicted Output :  2
Original Output :  3, Predicted Output :  3.0999999999999996
Original Output :  4, Predicted Output :  3.9999999999999982


We were close enough. But what will happen if we try to minimize the loss to a point less than 0.1. Let us see.

In [None]:
simple_dataset = [1,2,3]
initial_weight = 2
initial_bias = 0

def activation_function(h):
    return h

for x in simple_dataset:

  original_output = x + 1

  print("Original Output : ", original_output, end=", ")

  #  First compute the hidden node.
  h = initial_weight*x + initial_bias

  # Pass the output of the hidden node to the activation function
  a = activation_function(h)

  # The output of the activation function is our predicted output.
  predicted_output = a

  # Find the loss
  loss = abs(original_output - predicted_output)

  # Perform multiple steps for single data to minimize the loss of the prediction for that.
  w = initial_weight
  b = initial_bias

  while(loss > 0.01):

      dw = 0.1
      db = 0.1

      if (original_output - predicted_output > 0 ):
        w = w + dw
        b = b + db
      else:
        w = w - dw
        b = b - db

      h = w*x + b
      a = activation_function(h)
      predicted_output = a

      loss = abs(original_output - predicted_output)

  print("Predicted Output : ", predicted_output)

Original Output :  2, Predicted Output :  2
Original Output :  3, Predicted Output :  3.0099999999999993
Original Output :  4, Predicted Output :  3.9999999999999982


It is not working. Because we have not introduce any learning rate and so, we are optimizing weights and biases by 0.1 at each iteration. So, we were jumping over the minimum and our loss were always greater than 0.1.

To solve this, let us minimize the weights and biases by 0.01 instead of 0.1.

In [None]:
simple_dataset = [1,2,3]
initial_weight = 2
initial_bias = 0

def activation_function(h):
    return h

for x in simple_dataset:

  original_output = x + 1

  print("Original Output : ", original_output, end=", ")

  #  First compute the hidden node.
  h = initial_weight*x + initial_bias

  # Pass the output of the hidden node to the activation function
  a = activation_function(h)

  # The output of the activation function is our predicted output.
  predicted_output = a

  # Find the loss
  loss = abs(original_output - predicted_output)

  # Perform multiple steps for single data to minimize the loss of the prediction for that.
  w = initial_weight
  b = initial_bias

  while(loss > 0.01):

      dw = 0.01
      db = 0.01

      if (original_output - predicted_output > 0 ):
        w = w + dw
        b = b + db
      else:
        w = w - dw
        b = b - db

      h = w*x + b
      a = activation_function(h)
      predicted_output = a

      loss = abs(original_output - predicted_output)

  print("Predicted Output : ", predicted_output)

Original Output :  2, Predicted Output :  2
Original Output :  3, Predicted Output :  3.0099999999999993
Original Output :  4, Predicted Output :  3.9999999999999982


We can see that it is working now. Instead of hard coding 0.01, we can introduce a variable called learning rate.

In [None]:
simple_dataset = [1,2,3]
initial_weight = 2
initial_bias = 0

def activation_function(h):
    return h

learning_rate = 0.1

for x in simple_dataset:

  original_output = x + 1

  print("Original Output : ", original_output, end=", ")

  #  First compute the hidden node.
  h = initial_weight*x + initial_bias

  # Pass the output of the hidden node to the activation function
  a = activation_function(h)

  # The output of the activation function is our predicted output.
  predicted_output = a

  # Find the loss
  loss = abs(original_output - predicted_output)

  # Perform multiple steps for single data to minimize the loss of the prediction for that.
  w = initial_weight
  b = initial_bias

  while(loss > 0.01):

      dw = 0.1
      db = 0.1

      if (original_output - predicted_output > 0 ):
        w = w + learning_rate*dw
        b = b + learning_rate*db
      else:
        w = w - learning_rate*dw
        b = b - learning_rate*db

      h = w*x + b
      a = activation_function(h)
      predicted_output = a

      loss = abs(original_output - predicted_output)

  print("Predicted Output : ", predicted_output)

Original Output :  2, Predicted Output :  2
Original Output :  3, Predicted Output :  3.0099999999999993
Original Output :  4, Predicted Output :  3.999999999999998


As we can see, it is working.

But in the avobe examples, we are not actually implementing the concept of epochs currectly. We are so far taking a single input from the input vector and predicting the output, then taking another input from the input vector and for that new input vector, we are taking fresh initial weight and bias and again calculating the output for that input vector. We are calculating loss for each input vector.

But in one epoch, we should only take the initial weight and bias once, keep them updating, calculating loss for each input, take the average loss and based on that we should update our initial weight and bias and start the next epoch with that updated weight and bias.

Let us do it like that.

In [30]:
import statistics

simple_dataset = [1,2,3,4,5]
initial_weight = 2
initial_bias = 0

def activation_function(h):
    return h

learning_rate = 0.1

# Perform multiple steps for single data to minimize the loss of the prediction for that.
w = initial_weight
b = initial_bias
dw = 0.1
db = 0.1
losses = []

for epoch in range(200):
  print("Epoch : ", epoch, ", ", end="")
  for x in simple_dataset:

    original_output = x + 1

    # print("Original Output : ", original_output, end=", ")

    #  First compute the hidden node.
    h = w*x + b

    # Pass the output of the hidden node to the activation function
    a = activation_function(h)

    # The output of the activation function is our predicted output.
    predicted_output = a

    # Find the loss
    loss = (original_output - predicted_output)

    losses.append(loss)

  #   print("Predicted Output: ", predicted_output)
  # print("\n")

  if (statistics.mean(losses) > 0 ):
    w = w + learning_rate*dw
    b = b + learning_rate*db
  else:
    w = w - learning_rate*dw
    b = b - learning_rate*db

  print("avg loss: ", statistics.mean(losses))

# Let us predict now

x = 6
#  First compute the hidden node.
h = w*x + b

# Pass the output of the hidden node to the activation function
a = activation_function(h)

# The output of the activation function is our predicted output.
predicted_output = a
print(predicted_output)


Epoch :  0 , avg loss:  -2
Epoch :  1 , avg loss:  -1.98
Epoch :  2 , avg loss:  -1.96
Epoch :  3 , avg loss:  -1.94
Epoch :  4 , avg loss:  -1.9200000000000002
Epoch :  5 , avg loss:  -1.9000000000000001
Epoch :  6 , avg loss:  -1.8800000000000001
Epoch :  7 , avg loss:  -1.8599999999999999
Epoch :  8 , avg loss:  -1.8399999999999999
Epoch :  9 , avg loss:  -1.8199999999999998
Epoch :  10 , avg loss:  -1.7999999999999998
Epoch :  11 , avg loss:  -1.7799999999999998
Epoch :  12 , avg loss:  -1.7599999999999998
Epoch :  13 , avg loss:  -1.7399999999999998
Epoch :  14 , avg loss:  -1.7199999999999998
Epoch :  15 , avg loss:  -1.6999999999999997
Epoch :  16 , avg loss:  -1.6799999999999997
Epoch :  17 , avg loss:  -1.6599999999999997
Epoch :  18 , avg loss:  -1.6399999999999997
Epoch :  19 , avg loss:  -1.6199999999999997
Epoch :  20 , avg loss:  -1.5999999999999996
Epoch :  21 , avg loss:  -1.5799999999999996
Epoch :  22 , avg loss:  -1.5599999999999996
Epoch :  23 , avg loss:  -1.539999

Let us introduce early stopping now

In [64]:
import statistics

simple_dataset = [1,2,3,4,5]
initial_weight = 0.1
initial_bias = 0.1

def activation_function(h):
    return h

learning_rate = 0.1

# Perform multiple steps for single data to minimize the loss of the prediction for that.
w = initial_weight
b = initial_bias
dw = 0.1
db = 0.1


for epoch in range(200):
  losses = []
  print("Epoch : ", epoch, ", ", end="")
  for x in simple_dataset:

    original_output = x + 1

    #  First compute the hidden node.
    h = w*x + b

    # Pass the output of the hidden node to the activation function
    a = activation_function(h)

    # The output of the activation function is our predicted output.
    predicted_output = a

    # Find the loss
    loss = (original_output - predicted_output)

    losses.append(loss)

  # early stopping
  if (statistics.mean(losses) < 0.02):
    if (statistics.mean(losses) > 0.01):
      print("avg loss: ", statistics.mean(losses))
      break

  if (statistics.mean(losses) > 0 ):
    w = w + learning_rate*dw
    b = b + learning_rate*db
  else:
    w = w - learning_rate*dw
    b = b - learning_rate*db

  print("avg loss: ", statistics.mean(losses))
  # print("w, b : ", w,b)

# Let us predict now

x = 600
#  First compute the hidden node.
h = w*x + b

# Pass the output of the hidden node to the activation function
a = activation_function(h)

# The output of the activation function is our predicted output.
predicted_output = a
print(predicted_output)

Epoch :  0 , avg loss:  3.6
Epoch :  1 , avg loss:  3.56
Epoch :  2 , avg loss:  3.52
Epoch :  3 , avg loss:  3.48
Epoch :  4 , avg loss:  3.44
Epoch :  5 , avg loss:  3.4
Epoch :  6 , avg loss:  3.36
Epoch :  7 , avg loss:  3.32
Epoch :  8 , avg loss:  3.28
Epoch :  9 , avg loss:  3.2399999999999998
Epoch :  10 , avg loss:  3.1999999999999993
Epoch :  11 , avg loss:  3.1599999999999997
Epoch :  12 , avg loss:  3.1199999999999997
Epoch :  13 , avg loss:  3.0799999999999996
Epoch :  14 , avg loss:  3.039999999999999
Epoch :  15 , avg loss:  2.9999999999999996
Epoch :  16 , avg loss:  2.9599999999999995
Epoch :  17 , avg loss:  2.9199999999999995
Epoch :  18 , avg loss:  2.8799999999999994
Epoch :  19 , avg loss:  2.8399999999999994
Epoch :  20 , avg loss:  2.7999999999999994
Epoch :  21 , avg loss:  2.7599999999999993
Epoch :  22 , avg loss:  2.7199999999999993
Epoch :  23 , avg loss:  2.6799999999999993
Epoch :  24 , avg loss:  2.6399999999999992
Epoch :  25 , avg loss:  2.599999999999