# Loss or Cost Function

Link to the Youtube tutorial video: https://www.youtube.com/watch?v=E1yyaLRUnLo&list=PLeo1K3hjS3uu7CxAacxVndI4bE_o3BDtO&index=11


1) In general, in machine learning, having squared error allows your gradient descent to converge in a better way.  <br />
2) Loss is heavily used during neural network training.  <br />
3) When you feed your neural network with the features belongs to a sample of your train set, it is called a forward pass.   <br />
4) Individual error (the difference between the prediction and ground truth of a sample) is called a loss, the cumulative/sum of error is called a cost function.  <br />
5) If you goes through (forward pass) all the samples in the train set in 1 round (once), it is called 1 epoch.  <br />
6) Practically, for logistic regression, we use binary cross entropy (log loss) as the cost function.   <br />


**Important illustrations:**  </br>
<img src="hidden\photo1.png" alt="This image is a representation of the simple neural network" style="width: 400px;"/>  <br />  <br />
<img src="hidden\photo2.png" alt="This image is a representation of the simple neural network" style="width: 400px;"/>  <br />  <br />
<img src="hidden\photo3.png" alt="This image is a representation of the simple neural network" style="width: 400px;"/>  <br />

In [55]:
import numpy as np

# Create a numpy array (matrix) called Y_predicted, which consists of predicted values
Y_predicted = np.array([1 ,1 ,0 ,0 ,1])
print('The numpy array (matrix) called Y_predicted:\n', Y_predicted)

# Create a numpy array (matrix) called Y_true, which consists of ground truth values
Y_true = np.array([0.30, 0.7, 1, 0, 0.5])
print('\nThe numpy array (matrix) called Y_true:\n', Y_true)

The numpy array (matrix) called Y_predicted:
 [1 1 0 0 1]

The numpy array (matrix) called Y_true:
 [0.3 0.7 1.  0.  0.5]


# Cost function

## Mean Absolute Error (MAE)

### Calculate mean absolute error using self-defined function

In [56]:
'''
1) zip() allows for parallel iteration across multiple iterables. 
It takes in any number of iterables and returns an iterator that aggregates elements based on the iterables passed.
'''

# Self-define mean absolute error (MAE)
def mae (y_true, y_predicted):
    total_error = 0
    for yt, yp in zip(y_true, y_predicted): # For each iteration, yt will get 1 element of y_true and yp will get 1 element of y_predicted
       total_error += abs(yt - yp) # Calculate the absolute difference of each element, then add it to the variable total_error
    print('Total error: ', total_error)
    MAE = total_error / len(y_true) # Usually, the number of elements (len) in a variable equals to the number of samples in the variable
    print('MAE: ', MAE)  
    return MAE # means after executing this line, the program will jump out (exit) from this self-defined block and assign the MAE value to the variable which call this self-define function in the main

mae_results = mae(Y_true, Y_predicted)

print('\nResults of MAE: ', mae_results)

Total error:  2.5
MAE:  0.5

Results of MAE:  0.5


### Calculate mean absolute error using functions from numpy module

In [57]:
# Calculate the absolute difference between each element in Y_predicted and Y_true, using abs() from numpy module
print('The absolute difference between each element in Y_predicted and Y_true:\n', np.abs(Y_predicted - Y_true))

# Calculate the mean absolute error for the absolute difference, using mean() from numpy module
print('\nThe mean absolute error: ', np.mean(np.abs(Y_predicted - Y_true)))

The absolute difference between each element in Y_predicted and Y_true:
 [0.7 0.3 1.  0.  0.5]

The mean absolute error:  0.5


## Binary cross entropy

Since log(0) is undefined(having infinite value), according to the formula of binary cross entropy, we need to replace all predicted values which are 1 with a value close to 1 & 0 with a value close to 0, so that the binary cross entropy will not provide error

In [58]:
print('log(Y_predicted_new)=\n', np.log(Y_predicted))

log(Y_predicted_new)=
 [  0.   0. -inf -inf   0.]


  print('log(Y_predicted_new)=\n', np.log(Y_predicted))


Process the predicted values (particularly replacing the predicted value of 0 and 1 only) to avoid log(0) occurs in binary cross entropy formula

In [59]:
epsilon = 1e-15

# Replace the predicted values which are 0 with a value close to 0, using list comprehension
Y_predicted_new = [max(i, epsilon) for i in Y_predicted]

# Replace the predicted values which are 1 with a value close to 1, using list comprehension
Y_predicted_new = [min(i, 1-epsilon) for i in Y_predicted_new]

print('Processed predicted values: ', Y_predicted_new)


Processed predicted values:  [0.999999999999999, 0.999999999999999, 1e-15, 1e-15, 0.999999999999999]


In [60]:
# Convert the Y_predicted_new from the python list (format) [because it was achieved using list comprehension previously] into numpy array (format)
Y_predicted_new = np.array(Y_predicted_new)

# Calculate the log of each predicted value
print('log(Y_predicted_new)=\n', np.log(Y_predicted_new))

# Calculate the binary cross entropy:
print('\nBinary cross entropy (log loss)= ', -np.mean(Y_true*np.log(Y_predicted_new)+(1-Y_true)*np.log(1-Y_predicted_new)))

log(Y_predicted_new)=
 [-9.99200722e-16 -9.99200722e-16 -3.45387764e+01 -3.45387764e+01
 -9.99200722e-16]

Binary cross entropy (log loss)=  17.2696280766844


### Compile the codes to calculate binary cross entropy using functions from numpy module into a self-defined function

In [61]:
def log_loss(y_true, y_predicted):
    epsilon = 1e-15
    y_predicted_new = [max(i, epsilon) for i in y_predicted]
    y_predicted_new = [min(i, 1-epsilon) for i in y_predicted_new]
    y_predicted_new = np.array(y_predicted_new)
    return -np.mean(Y_true*np.log(Y_predicted_new)+(1-Y_true)*np.log(1-Y_predicted_new))

print('Binary cross entropy (log loss) = ', log_loss(Y_true, Y_predicted))

Binary cross entropy (log loss) =  17.2696280766844
