In [2]:
## better understanding of argmax and softmax, common components in neural networks
## tutorial url:
## https://machinelearningmastery.com/softmax-activation-function-with-python/

#### Explanation

Softmax is a mathematical function that converts a vector of numbers into a vector of probabilities, where the probabilities of each value are proportional to the relative scale of each value in the vector.

The most common use of the softmax function in applied machine learning is in its use as an activation function in a neural network model. Specifically, the network is configured to output N values, one for each class in the classification task, and the softmax function is used to normalize the outputs, converting them from weighted sum values into probabilities that sum to one. Each value in the output of the softmax function is interpreted as the probability of membership for each class.

For a binary classification problem, a Binomial probability distribution is used. This is achieved using a network with a single node in the output layer that predicts the probability of an example belonging to class 1.

For a multi-class classification problem, a Multinomial probability is used. This is achieved using a network with one node for each class in the output layer and the sum of the predicted probabilities equals one.

A neural network model requires an activation function in the output layer of the model to make the prediction.

Activation functions include:
<br>
Linear - inappropriate for either the binomial or multinomial case.
<br>
Sigmoid - aka logistic function; good for binomial, not for multi-class.
<br>
Argmax - returns the index in the list that contains the largest value.
<br>
Softmax - can be thought to be a probabilistic or “softer” version of the argmax function. It can be achieved by calculating the exponent of each value in the list and dividing it by the sum of the exponent values.
<br>
Softmax - "Any time we wish to represent a probability distribution over a discrete variable with n possible values, we may use the softmax function. This can be seen as a generalization of the sigmoid function which was used to represent a probability distribution over a binary variable."
<br>
Softmax equation - probability = exp(value) / sum v in list exp(v)

In [21]:
from numpy import argmax
from math import exp
from scipy.special import softmax

## Argmax

In [22]:
# define data
data = [10, 30, 20]
# calculate the argmax of the list
result = argmax(data)
print(result)

1


## Softmax

In [23]:
# calculate each probability
p1 = exp(10) / (exp(10) + exp(30) + exp(20))
p2 = exp(30) / (exp(10) + exp(30) + exp(20))
p3 = exp(20) / (exp(10) + exp(30) + exp(20))
# report probabilities
print(p1, p2, p3)
# report sum of probabilities
print(p1 + p2 + p3)

2.061060046209062e-09 0.9999546000703311 4.539786860886666e-05
1.0


In [24]:
# define data
data = [10, 30, 20]
# calculate softmax
result = softmax(data)
# report the probabilities
print(result)
# report the sum of the probabilities
print(sum(result))

[2.06106005e-09 9.99954600e-01 4.53978686e-05]
0.9999999999999987
