# Analysis of the loss function with bootstrapping
https://arxiv.org/pdf/1412.6596

Multinomial classification task with noisy labels. Loss function modifications:

a) Soft version:

$$
L_{soft}(\mathbf{q}, \mathbf{t}) = - \sum_{k=1}^{L} (\beta t_{k} + (1 - \beta) q_{k}) \log(q_{k}),
$$
where $\mathbf{q}$ is a single image class probabilities, $\mathbf{t}$ is the ground truth, $L$ is the number of classes. Parameter $\beta$ is chosen between $0$ and $1$. 

a) Hard version:

$$
L_{hard}(\mathbf{q}, \mathbf{t}) = - \sum_{k=1}^{L} (\beta t_{k} + (1 - \beta) z_{k}) \log(q_{k}),
$$
where $z_{k}$ is argmax of $\mathbf{q}$ (similar form as $\mathbf{t}$).

In [1]:
import torch
from torch.nn import functional as F

print(torch.__version__)

L = 3

# Some ground truth samples:
t_1 = torch.tensor([0])
t_2 = torch.tensor([1])
logit_q_1a = torch.tensor([[0.9, 0.05, 0.05], ])
logit_q_1b = torch.tensor([[0.2, 0.5, 0.3], ])
logit_q_2a = torch.tensor([[0.33, 0.33, 0.33], ])
logit_q_2b = torch.tensor([[0.15, 0.7, 0.15], ])

1.7.0


#### Soft version

Let's first compute cross entropy term: $-\sum_{k} t_{k} \log(q_{k})$

In [2]:
F.cross_entropy(logit_q_1a, t_1), F.cross_entropy(logit_q_1b, t_1), F.cross_entropy(logit_q_2a, t_2), F.cross_entropy(logit_q_2b, t_2)

(tensor(0.6178), tensor(1.2398), tensor(1.0986), tensor(0.7673))

Now let's compute the second term (soft bootstrapping) : $-\sum_{k} q_{k} \log(q_{k})$

In [3]:
def soft_boostrapping(logit_q):
    return - torch.sum(F.softmax(logit_q, dim=1) * F.log_softmax(logit_q, dim=1), dim=1)

In [4]:
soft_boostrapping(logit_q_1a), soft_boostrapping(logit_q_1b), soft_boostrapping(logit_q_2a), soft_boostrapping(logit_q_2b)

(tensor([1.0095]), tensor([1.0906]), tensor([1.0986]), tensor([1.0619]))

In [5]:
def soft_bootstrapping_loss(logit_q, t, beta):
    return F.cross_entropy(logit_q, t) * beta + (1.0 - beta) * soft_boostrapping(logit_q)

In [6]:
soft_bootstrapping_loss(logit_q_1a, t_1, beta=0.95), soft_bootstrapping_loss(logit_q_1b, t_1, beta=0.95)

(tensor([0.6374]), tensor([1.2324]))

In [7]:
soft_bootstrapping_loss(logit_q_2a, t_2, beta=0.95), soft_bootstrapping_loss(logit_q_2b, t_2, beta=0.95)

(tensor([1.0986]), tensor([0.7820]))

#### Hard version

Let's first compute cross entropy term: $-\sum_{k} t_{k} \log(q_{k})$

In [8]:
F.cross_entropy(logit_q_1a, t_1), F.cross_entropy(logit_q_1b, t_1), F.cross_entropy(logit_q_2a, t_2), F.cross_entropy(logit_q_2b, t_2)

(tensor(0.6178), tensor(1.2398), tensor(1.0986), tensor(0.7673))

Now let's compute the second term (hard bootstrapping) : $-\sum_{k} z_{k} \log(q_{k})$

In [9]:
def hard_boostrapping(logit_q):
    _, z = torch.max(F.softmax(logit_q, dim=1), dim=1)
    z = z.view(-1, 1)
    return - F.log_softmax(logit_q, dim=1).gather(1, z).view(-1)    

In [10]:
hard_boostrapping(logit_q_1a), hard_boostrapping(logit_q_1b), hard_boostrapping(logit_q_2a), hard_boostrapping(logit_q_2b)

(tensor([0.6178]), tensor([0.9398]), tensor([1.0986]), tensor([0.7673]))

In [11]:
def hard_bootstrapping_loss(logit_q, t, beta):
    return F.cross_entropy(logit_q, t) * beta + (1.0 - beta) * hard_boostrapping(logit_q)

In [12]:
hard_bootstrapping_loss(logit_q_1a, t_1, beta=0.8), hard_bootstrapping_loss(logit_q_1b, t_1, beta=0.8)

(tensor([0.6178]), tensor([1.1798]))

In [13]:
hard_bootstrapping_loss(logit_q_2a, t_2, beta=0.8), hard_bootstrapping_loss(logit_q_2b, t_2, beta=0.8)

(tensor([1.0986]), tensor([0.7673]))

In [14]:
y_pred = torch.rand(4, 10, requires_grad=True)

In [15]:
z = F.softmax(y_pred, dim=1).argmax(dim=1)
z

tensor([3, 2, 5, 2])

In [17]:
_, z2 = F.softmax(y_pred, dim=1).max(dim=1)
z2

tensor([3, 2, 5, 2])