Loss function #21

yrcrcy · 2020-10-18T01:37:34Z

Thank you for your contribution. I want to know what the |Dl| and |Du| in your cross-entropy loss function formula and semi-supervised loss function formula represent, thank you for your answer.

SuzannaLin · 2020-10-18T10:24:16Z

It's the mean of all the losses. So the sum divided by the number of elements. You will see reduction = 'mean' in losses.py

yrcrcy · 2020-10-18T12:04:32Z

Thank you for your answer. I also have a question. What does N stand for in equation (7)

…

------------------ 原始邮件 ------------------ 发件人: "yassouali/CCT" <notifications@github.com>; 发送时间: 2020年10月18日(星期天) 晚上6:24 收件人: "yassouali/CCT"<CCT@noreply.github.com>; 抄送: "Clown"<1572112413@qq.com>;"Author"<author@noreply.github.com>; 主题: Re: [yassouali/CCT] Loss function (#21) It's the mean of all the losses. So the sum divided by the number of elements. You will see reduction = 'mean' in losses.py — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

SuzannaLin · 2020-10-18T19:37:26Z

I am very confused about how loss functions are calculated: per pixel, per image, all images at once?

The general formula for MSE can also be found here
https://pytorch.org/docs/stable/generated/torch.nn.MSELoss.html#torch.nn.MSELoss
but I am not sure if x and y refer to one pixel probability distribution or an entire image or even the whole dataset.

In the paper, I think it's the number of outputs generated for all the input images (=number of images * number of aux. decoders). (Correct me if I am wrong, @yassouali )
In this part of model.py you can find that the unsupervised loss is calculated first by looping over all the various outputs per input image (for u in outputs_ul) and then it is divided by the total number of output images.

My answer might be incomplete. I hope others can shed more light on the inner workings of loss functions.

yassouali · 2020-10-19T15:42:26Z

Hi @yrcrcy
Thank you for your interest, and thanks to @SuzannaLin for answering the questions.

N is the pixels of the image (N = H x W). When computing the loss, we average over both the pixels of a given image, and the images on the batch (so mean over dim = 0 for the batch size, and mean over Dim = 2 and 3 for height and width).

You can only average over the batch (batch-mean), or even summing over both the batch and pixels, but in this case the LR needs to be reduced to avoid diverging during training. You can see the averaging as a way of having a stable loss.

yrcrcy · 2020-10-20T00:41:27Z

In the calculation formula of supervised loss and unsupervised loss, are |Qu| and |Ql| also batch-size averages?

…

------------------ 原始邮件 ------------------ 发件人: "yassouali/CCT" <notifications@github.com>; 发送时间: 2020年10月19日(星期一) 晚上11:42 收件人: "yassouali/CCT"<CCT@noreply.github.com>; 抄送: "Clown"<1572112413@qq.com>;"Mention"<mention@noreply.github.com>; 主题: Re: [yassouali/CCT] Loss function (#21) Hi @yrcrcy Thank you for your interest, and thanks to @SuzannaLin for answering the questions. N is the pixels of the image (N = H x W). When computing the loss, we average over both the pixels of a given image, and the images on the batch (so mean over dim = 0 for the batch size, and mean over Dim = 2 and 3 for height and width). You can only average over the batch (batch-mean), or even summing over both the batch and pixels, but in this case the LR needs to be reduced to avoid diverging during training. You can see the averaging as a way of having a stable loss. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

SuzannaLin · 2020-10-20T08:25:48Z

Can you confirm if I am correct to say the input of the MSE definition in losses.py is:
inputs: a Tensor [batch_size, Num_classes, H, W]?
And the mean is over the batch size?

And in models.py you consider the different outcomes of the aux. decoders by taking the average?

yassouali · 2020-10-20T09:24:00Z

Yes, @yrcrcy that is correct.

@SuzannaLin For the mse loss, the mean is over all elements of the tensor, so a mean over both batches and pixels (as detailed below for the usage of reduction = "mean").

yassouali closed this as completed Oct 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loss function #21

Loss function #21

yrcrcy commented Oct 18, 2020

SuzannaLin commented Oct 18, 2020

yrcrcy commented Oct 18, 2020 via email

SuzannaLin commented Oct 18, 2020

yassouali commented Oct 19, 2020 •

edited

yrcrcy commented Oct 20, 2020 via email

SuzannaLin commented Oct 20, 2020

yassouali commented Oct 20, 2020

Loss function #21

Loss function #21

Comments

yrcrcy commented Oct 18, 2020

SuzannaLin commented Oct 18, 2020

yrcrcy commented Oct 18, 2020 via email

SuzannaLin commented Oct 18, 2020

yassouali commented Oct 19, 2020 • edited

yrcrcy commented Oct 20, 2020 via email

SuzannaLin commented Oct 20, 2020

yassouali commented Oct 20, 2020

yassouali commented Oct 19, 2020 •

edited