# Loss Functions
Given an input vector, a loss function is a measure of how bad a particular model performs in predicting a desired output  quantity (regression) or correctly labeling the input vector (classification).


Farhad Kamangar  Sept. 2018

## Mean Squared Error (MSE).

Mean Squared Error is the most commonly used regression loss function and it is defined as the mean of the squared distances between the desired and the predicted output(s).

$$\large MSE=\frac{1}{N}\sum_{i=1}^{N} {(t_i-y_i)}^2$$

where $N$ is the total number of samples , $t_i$ is the desired output for sample $i$ and $y_i$ is the actual output for sample $i$


## Mean Absolute Error (MAE).

Mean Absolute Error is another common function used for regression models and it is defined as as the mean of the absolute differences between the desired and the predicted output(s).

$$\large MAE=\frac{1}{N}\sum_{i=1}^{N} {|t_i-y_i|}$$

where $N$ is the total number of samples , $t_i$ is the desired output for sample $i$ and $y_i$ is the actual output for sample $i$



## Hinge (Multiclass Support Vector Machine) Loss

The hinge loss function is used for classification and it is based on the concept of maximum-margin. The hinge loss is formulated as:

$$\large L=\frac {1}{N} \sum_{i} L_i$$

$$\large L_i=\sum_{j \neq i} max(0,y_j-y_i+\Delta)$$

where $N$ is the number of the samples,  $i$ is the index of the true class, and $\Delta$ is a constant.


## Numerical Example

In [1]:
import numpy as np
def SVM_loss(input_vector, w,true_class_index,delta=1):
    """
    This function calculated the hinge loss function
    Farhad Kamangar Sept. 2018
    """
    y = np.dot(w,input_vector)
    print("Actual output: ",y)
    margins = np.maximum(0, y - y[true_class_index] + delta)
    margins[true_class_index] = 0
    loss_i = np.sum(margins)
    return loss_i

x =np.transpose( np.array([[1.0, 1.0, 1], [1.0, 0,0], [0,1,0]]))
true_class_index=[0,2,1]
w=np.array([[2,4,7], [1,5,6], [6,2,5]])
index=1
loss=SVM_loss(x[index], w,true_class_index[index],delta=1)
print("Loss:\n ", loss)


Actual output:  [  9.   7.  11.]
Loss:
  0.0


## Cross Entropy Loss

The cross entropy loss uses a softmax function to calculate the loss. 

### Softmax Function

The softmax function gives a probabilistic interpretation to the output values and it is formulated as:

$$\large S(y_i)=\frac{e^{y_i}}{\sum_{j}e^{y_j}}$$

This function interprets the outputs as unnormalized log probabilities of each class. Notice that the denominator of the above equation normalizes the probabilities so the total sums to 1. 

In other words the softmax function takes a vector of floating point numbers and proportionally compresses each number between zero and one such that the total adds up to 1.

Using the softmax function, the cross entropy loss can be calculated as:

$$\large L_i=-log(\frac{e^{y_i}}{\sum_{j}e^{y_j}})$$


The above equation is really a simplified version of a discrete cross entropy between two distributions.

Let's imagine that we have a true discrete distribution $p$ and an estimated discrete distribution $q$. The cross entropy between these two distribution is defined as:

$$\large H(p,q)= - \sum_{x}p(x)log(q(x))$$

Notice that in a multi-class classification problem the true probability distribution has all zeros except for the correct class, $i$, which has the value of 1:

$$\large p=[0,0,..., 1, ... 0]$$

If the above discrete distribution $p$ is substituted into the general cross entropy equation it will result in the simplified cross entropy loss 

$$\large L_i=-log(\frac{e^{y_i}}{\sum_{j}e^{y_j}})$$

where $i$ is the index of the correct class.

Notice that to calculate the overall loss we still need to average the loss over all the samples.

$$\large L=\frac {1}{Q}\sum_{k=1}^{Q}{L_k}$$

where Q is the number of samples and $l_k$ is the cross entropy loss for sample $k$

**Note:** There is no "softmax loss". The correct terminology is "cross-entropy loss". The "cross entropy loss" uses the "softmax" function to calculate the loss.  