### Calculating Network Error with Loss

In chapter two, our model is currently random. We need a way to calculate how wrong our neural network is at current prediction and begin adjusting the weights and biases to decrease error over time.

To quantify how wrong our model is , we define it as the ***loss function***

***loss function***

> also referred to as the cost function, it quantifies how wrong the model is. We ideally want this loss to be 0.

note that argmax applied to the output gives us the index of the biggest value in the softmax output. This index indicates the value with the biggest confidence. 


*Categorical Cross Entropy Loss*

>  used to compare a 'ground-truth' probability (y) and some predicted distribution (y-hat or predictions). 

> one of the most commonly used loss functions with a softmax activation on the output layer.

$ L = - \frac{1}{N} \sum_{i=1}^{N}\sum_{j=1}^{M} y_{i,j} \log(p_{i,j}) $,

where:

* $ N $ is the number of samples
* $ M $ is the number of classes
* $ y $ is the true label, a one-hot encoded vector of size $ M $
* $ p $ is the predict label, a probabailiy distribution over the $ M $ classes
* $ j $ is to index the classes.

also denoted as:

$ L = -  \sum_{j} y_{i,j} \log(p_{i,j}) $

where:

* $ L $ denotes the sample loss value
* $ i $ is the i-th sample in the set
* $ j $ is the label/output index
* $ y $ denotes the target values
* $ p $ denotes the predicted values

which then is simplified further to:

$ L = - \log(p_{i,k}) $ , where $ k $ is an index of the 'true' probability

we compare the output probability distribution (predict) with the one-hot vector probability distribution (true, or ground truth)



In [6]:
import math 

#an example output from the output layer of the neural network
softmax_output = [0.7, 0.1, 0.2]

#ground truth, this is a one-hot vector
target_output = [1,0,0]

loss = - (target_output[0]*math.log(softmax_output[0]) + target_output[1]*math.log(softmax_output[1]) + target_output[2]*math.log(softmax_output[2]))

print(loss)

0.35667494393873245


this can be simplified to:

In [7]:
loss = -(math.log(softmax_output[0]))
print(loss)

0.35667494393873245


remember:

> the loss value raises the confidence level, approaching 0.

but what is this 'log' that we are using?

> ***logarithm***, function that determines the power to which a given number (the base) must be raised to produce a vertain value

$ a^{x} = b $, to find x, we use $ log $ , so $ log_{a}b = x $

> ***natural logarithm***, also referred to just the log, is where $ e $ is the base, so that:

$ e^{x} = b, log_{e}b = x $

In [8]:
import numpy as np

b = 5.2
x = np.log(b)
print(x)
print('this value is x, so that when e is exponentiated with this value, result is b.')

1.6486586255873816
this value is x, so that when e is exponentiated with this value, result is b.


In [9]:
print(np.exp(x))
print('this value is b!')

5.2
this value is b!


lets consider a neural network that performs the classification between 3 classes, and the network classifies in batches of 3, so that the output layer yields:

In [10]:
softmax_outputs = np.array([[0.7, 0.1, 0.2], [0.1, 0.5, 0.4], [0.02, 0.9, 0.08]])
print(softmax_outputs, 'shape: ', softmax_outputs.shape)

[[0.7  0.1  0.2 ]
 [0.1  0.5  0.4 ]
 [0.02 0.9  0.08]] shape:  (3, 3)


lets consider our target values (the one hot vector of true labels), where:

* dog is class 0 (index 0)
* cat is class 1 (index 1)
* human is class 2 (index 2)

In [11]:
class_targets = [0, 1, 1] 

this class_targets represents the true label of each input in the sample, so:

>for first image: target value is 0, which denotes a dog

> for second image: target value is 1, which denotes a cat

> for third image: target value is 1, which denotes a cat

if the third index was 2, that would mean that the target value that maps to the third image would be a human.

Note:

* the target values are specific to each input sample, rather than all of the input samples.

with the collection fo softmax outputs and their indended targets, we can map these indices to retrieve the values from the softmax distributions.

In [12]:
softmax_outputs = np.array([[0.7, 0.1, 0.2], [0.1, 0.5, 0.4], [0.02, 0.9, 0.08]])
class_targets= [0,1,1]

for target_index, distributions in zip(class_targets, softmax_outputs):
    print(distributions[target_index])

0.7
0.5
0.9


what the above code does is go through both the softmax array and the class target vector.

outputs the confidence score in index 0 for first sample,

outputs the confidence score in index 1 for the second sample,
 
outputs the confidence score in index 1 for the third sample.

we can simplify this:

In [13]:
softmax_outputs = np.array([[0.7, 0.1, 0.2], [0.1, 0.5, 0.4], [0.02, 0.9, 0.08]])
class_targets= [0,1,1]

print(softmax_outputs[[0,1,2], class_targets])

[0.7 0.5 0.9]


or alternatively:

In [14]:
print(softmax_outputs[range(len(softmax_outputs)), class_targets])

[0.7 0.5 0.9]


the above code is printing the specific elements of the softmax output that correspond to the target classes

> range creates a list of integers from 0 to the number of input samples-1

> in the first index, the range(len(softmax_outputs)) is used to select the rows of the array that correspond to each input sample

> in the second index, class_targets is used to select the columns of the array that correspond to the target classes.

lets now apply the -log to the list

In [15]:
print(-np.log(softmax_outputs[range(len(softmax_outputs)), class_targets]))
print('This result is the loss for each of the samples in the batch')

[0.35667494 0.69314718 0.10536052]
This result is the loss for each of the samples in the batch


notice how for the third sample, the loss is ~0.1, the lower the loss is better, and this is seen based off of the probability distribution for the third sample. the confidence score of 0.9 is the best out of the three. 

finally, lets compute the arithmetic mean to find the average loss per batch so that we can have an idea about how our model is doing during the training

$ sum(iterable) / len(iterable) $

In [16]:
neg_log = -(np.log(softmax_outputs[range(len(softmax_outputs)), class_targets]))
print('this is the list of losses for each sample in the batch: ', neg_log)

average_loss = sum(neg_log) / len(neg_log)
print('this is the average loss of the batch: ', average_loss)

this is the list of losses for each sample in the batch:  [0.35667494 0.69314718 0.10536052]
this is the average loss of the batch:  0.38506088005216804


sometimes, we have to check if the targets are one-hot encoded. 

> if the targets have only one dimension, like a list, or like the one in the code above, this means that the targets are ***sparse*** 

* we can write down the number that represents the correct class

> if it is more than one dimension, then it may be one-hot encoded. 

if the targets are one-hot encoded, we need to multiply the confidences by the targets, and then zero out all of the values except for the correct labels. Then we add up the numbers along the row axis (axis 1), so that we can calculate the loss for the one-hot encoded targets.

add a test in the code to check the number of dimensions of the targets, move calculations of the log values outside of the new if statement, and implement a solution for the one-hot encoded targets following the first equation

In [17]:
import numpy as np

softmax_outputs = np.array([[0.7, 0.1, 0.2],
                            [0.1, 0.5, 0.4],
                            [0.02, 0.9, 0.08]])

class_targets = np.array([[1, 0, 0],
                        [0, 1, 0],
                        [0, 1, 0]])

print(class_targets.shape, len(class_targets.shape)) 
print('the length of the shape of the target values is n is nD array')                       

'''probabilities for target values only if categorical labels
   if target values is simply a one-hot vector (1D), simply output the confidences 
    that maps to the target values, but if the target values is 2D, multiply the 
    confidences with the target values along the columns, and sum them up '''
if len(class_targets.shape) == 1:
    correct_confidences = softmax_outputs[range(len(softmax_outputs)), class_targets]

elif len(class_targets.shape) == 2:
    correct_confidences = np.sum(softmax_outputs * class_targets, axis=1)

print('this is the confidence scores', correct_confidences) 

'''losses'''
neg_log = -np.log(correct_confidences)

average_loss = np.mean(neg_log)
print(average_loss)

(3, 3) 2
the length of the shape of the target values is n is nD array
this is the confidence scores [0.7 0.5 0.9]
0.38506088005216804


Good! however, there is one more problem we have to solve.

the softmax output onsists of numbers in the range from 0 to 1. its possible that the model will have the full confidence for one label making all of the remaining confidences zero. its also possible that the model will assign full confidenc3s to a value that wasn't the target.

if we then try to calculate the loss of this confidence of 0:

In [18]:
import numpy as np
-np.log(0)

  -np.log(0)


inf

$ log(0) $ is undefined, since if $ y = log(x) $, then $ e^{y} = x $, 

a constant e to any power is always a positive number, and there is no y resulting in $ e^{y} = 0 $

            y_pred_clipped = np.clip(y_pred, 1e-7, 1 - 1e-7)

this value prevents the loss from being exactly 0, making it a very small value instead, but it wont make it a negative value and won't biases overall loss towards 1.

this method can perform clipping on an array of values, so we can apply it ot the predictions directly and save this as a separate array

> this changes the values of the predictions rray (y_pred) to be a range between 1e-7 and 1 - 1e-7, not 0 and 1.

### Categorical Cross Entropy Loss Class

calculating the overall loss, no matter which loss function we use, will ALWAYS be the same, which is the arithmetic mean of all of the sample losses.

lets create a class that will calculate this mean value from the returned sample losses:

In [19]:
#common loss class
class Loss:
    #calculates the data and regularization losses. Given model input and ground truth values
    def calculate(self, output, y):
        sample_losses = self.forward(output, y)
        data_loss = np.mean(sample_losses)

        return data_loss

lets show how this loss class will be implemented if we used the categorical cross entropy loss from above

this class inherits the Loss class and performs all the error calculations that we derived throughout the chapter and can be used as an object.

In [20]:
#Cross Entropy loss
class Loss_CategoricalCrossEntropy(Loss):

    #forward pass
    def forward(self, y_pred, y_true):
        #number of samples in a batch
        samples = len(y_pred)

        #clip data to prevent division by 0. clip both sides to not drag the mean towards any value
        y_pred_clipped = np.clip(y_pred, 1e-7, 1 - 1e-7)

        #probabilities for target values, only if categorical labels
        if len(y_true.shape) == 1:
            correct_confidences = y_pred_clipped[range(samples), y_true]
        elif len(y_true.shape) == 2:
            correct_confidences = np.sum(y_pred_clipped * y_true, axis=1)

        #losses
        negative_log_likelihoods = -np.log(correct_confidences)
        return negative_log_likelihoods  

loss_function = Loss_CategoricalCrossEntropy()
loss = loss_function.calculate(softmax_outputs, class_targets)
print(loss)              

0.38506088005216804


### Combining All Code Up To This Point

In [21]:
import numpy as np
import nnfs
from nnfs.datasets import spiral_data

#dataset initialization
nnfs.init()

#creation of the Dense Layer class
class Dense_Layer:
    #weights and biases initialization
    def __init__(self, num_features, num_neurons):
        self.weights = 0.01 * np.random.randn(num_features, num_neurons)
        self.biases = np.zeros((1, num_neurons))

    #perform the dot/matrix product calculations between the samples and the weights, add biases
    def forward(self, samples):
        self.outputs = np.dot(samples, self.weights) + self.biases

#creation of the activation function
class ReLU:
    #rectified linear unit activation function
    def forward(self, inputs):
        self.outputs = np.maximum(0, inputs)

#creation of the softmax activation function
class SoftMax:
    '''compute the probability distributions of the output layer:
            compute the exponentiated values for the output layer
            normalize the exponentiated values for each sample in the output layer
            '''
    def forward(self, inputs):
        '''subtract the input values from the largest value of the array
            this is to stop from the values being too big. normalization will
            be the same'''
        exp_values = np.exp(inputs - np.max(inputs, axis=1, keepdims=True))
        self.outputs = exp_values / np.sum(exp_values, axis=1, keepdims=True)

#creation of the Loss class
class Loss:
    '''calculates the data and regularization losses, given model output and 
        ground truth values'''
    def calculate(self, output, y):
        #calculate sample losses
        sample_losses = self.forward(output, y)

        #calculate mean loss
        data_loss = np.mean(sample_losses)

        return data_loss

#the specific loss we are using: Categorical Cross Entropy Loss
class CategoricalCrossEntropy(Loss):
    #calculate the cross entropy loss of the predicted values with the true values
    def forward(self, y_pred, y_true):
        #number of samples in the batch
        samples = len(y_pred)

        '''clip data to prevent division by 0
            clip both sides to not drage the mean towards any value'''
        y_pred_clipped = np.clip(y_pred, 1e-7, 1-1e-7) 

        if len(y_true.shape) == 1:
            correct_confidences = y_pred_clipped[range(samples), y_true]
        elif len(y_true.shape) == 2:
            correct_confidences = np.sum(y_pred_clipped * y_true, axis=1) 

        neg_log_likelihoods = -np.log(correct_confidences)
        return neg_log_likelihoods

# getting the dataset
X, y = spiral_data(samples=100, classes=3)
print('this is first 5 samples in the dataset: \n', X[:5], 'shape: ', X.shape)
print('this is the first 5 values in the y: \n', y[:5], 'shape: ', y.shape)


#create layer 1, the reLU activation function. layer 2, and the softmax activation function
dense_layer_1 = Dense_Layer(2, 3)
reLU_activation = ReLU()
dense_layer_2 = Dense_Layer(3, 3)
softmax = SoftMax()
loss_function = CategoricalCrossEntropy()

dense_layer_1.forward(X)
print('output values of the first layer: \n', dense_layer_1.outputs[:5], 'shape: ', dense_layer_1.outputs.shape)

reLU_activation.forward(dense_layer_1.outputs)
print('output values after sent to the reLU activation function: \n', reLU_activation.outputs[:5])

dense_layer_2.forward(reLU_activation.outputs)
print('output values of the second layer: \n', dense_layer_2.outputs[:5])

softmax.forward(dense_layer_2.outputs)
print('output values after sent to the softmax activation function: \n', softmax.outputs)

loss = loss_function.calculate(softmax.outputs, y)
print('loss', loss)

this is first 5 samples in the dataset: 
 [[0.         0.        ]
 [0.00299556 0.00964661]
 [0.01288097 0.01556285]
 [0.02997479 0.0044481 ]
 [0.03931246 0.00932828]] shape:  (300, 2)
this is the first 5 values in the y: 
 [0 0 0 0 0] shape:  (300,)
output values of the first layer: 
 [[ 0.0000000e+00  0.0000000e+00  0.0000000e+00]
 [-1.0475188e-04  1.1395361e-04 -4.7983500e-05]
 [-2.7414842e-04  3.1729150e-04 -8.6921798e-05]
 [-4.2188365e-04  5.2666257e-04 -5.5912682e-05]
 [-5.7707680e-04  7.1401405e-04 -8.9430439e-05]] shape:  (300, 3)
output values after sent to the reLU activation function: 
 [[0.         0.         0.        ]
 [0.         0.00011395 0.        ]
 [0.         0.00031729 0.        ]
 [0.         0.00052666 0.        ]
 [0.         0.00071401 0.        ]]
output values of the second layer: 
 [[ 0.0000000e+00  0.0000000e+00  0.0000000e+00]
 [-1.8183968e-07 -1.5235776e-07  1.2281279e-06]
 [-5.0631292e-07 -4.2422371e-07  3.4195891e-06]
 [-8.4041352e-07 -7.0415609e-07  

Again, we get ~0.33 values since the model is random, and its average loss is also not great for these data, as we've not yet trained out model on how to correct its errors

### Accuracy Calculation

***accuracy*** describes how often the largest confidence is the correct class in terms of a fraction.

we will use the $ argmax $ values from the $ softmax outputs $ and then compare these to the targets. 

remember that the $ argmax $ returns the maximum value in the given function

> it returns the index of where that largest confidence is in the sample

In [22]:
import numpy as np

#probabilities of 3 samples
softmax_outputs = np.array([[0.7, 0.2, 0.1],
                            [0.5, 0.1, 0.4],
                            [0.02, 0.9, 0.08]])

#target (ground truth) labels for the 3 samples
class_targets = np.array([0,1,1])

#calculate the values along the second axis (axis of 1)
predictions = np.argmax(softmax_outputs, axis=1)

print('the predictions: ', predictions)
print('shape of target values, predictions: ', class_targets.shape, predictions.shape)

'''if targets are one-hot encoded, convert them
        we are handling one-hot encoded targets by converting them to sparse 
        values using np.argmax('''
if len(class_targets.shape) == 2:
    class_targets = np.argmax(class_targets, axis=1)

#true evaluates to 1, false to 0
accuracy = np.mean(predictions==class_targets)

print('acc: ', accuracy)

the predictions:  [0 0 1]
shape of target values, predictions:  (3,) (3,)
acc:  0.6666666666666666


lets add this code to the end of our neural network above:

In [23]:
'''calculate acuracy from output of softmax activation anf targets.
    calculate values along the frist axis
'''
predictions = np.argmax(softmax.outputs, axis=1)
if len(y.shape) == 2:
    y = np.argmax(y, axis=1)
accuracy = np.mean(predictions==y)

print('acc: ', accuracy)


acc:  0.34
