Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loss function #21

Closed
yrcrcy opened this issue Oct 18, 2020 · 7 comments
Closed

Loss function #21

yrcrcy opened this issue Oct 18, 2020 · 7 comments

Comments

@yrcrcy
Copy link

yrcrcy commented Oct 18, 2020

Thank you for your contribution. I want to know what the |Dl| and |Du| in your cross-entropy loss function formula and semi-supervised loss function formula represent, thank you for your answer.

@SuzannaLin
Copy link
Contributor

It's the mean of all the losses. So the sum divided by the number of elements. You will see reduction = 'mean' in losses.py

@yrcrcy
Copy link
Author

yrcrcy commented Oct 18, 2020 via email

@SuzannaLin
Copy link
Contributor

I am very confused about how loss functions are calculated: per pixel, per image, all images at once?

The general formula for MSE can also be found here
https://pytorch.org/docs/stable/generated/torch.nn.MSELoss.html#torch.nn.MSELoss
but I am not sure if x and y refer to one pixel probability distribution or an entire image or even the whole dataset.

In the paper, I think it's the number of outputs generated for all the input images (=number of images * number of aux. decoders). (Correct me if I am wrong, @yassouali )
In this part of model.py you can find that the unsupervised loss is calculated first by looping over all the various outputs per input image (for u in outputs_ul) and then it is divided by the total number of output images.
image
My answer might be incomplete. I hope others can shed more light on the inner workings of loss functions.

@yassouali
Copy link
Owner

yassouali commented Oct 19, 2020

Hi @yrcrcy
Thank you for your interest, and thanks to @SuzannaLin for answering the questions.

N is the pixels of the image (N = H x W). When computing the loss, we average over both the pixels of a given image, and the images on the batch (so mean over dim = 0 for the batch size, and mean over Dim = 2 and 3 for height and width).

You can only average over the batch (batch-mean), or even summing over both the batch and pixels, but in this case the LR needs to be reduced to avoid diverging during training. You can see the averaging as a way of having a stable loss.

@yrcrcy
Copy link
Author

yrcrcy commented Oct 20, 2020 via email

@SuzannaLin
Copy link
Contributor

Can you confirm if I am correct to say the input of the MSE definition in losses.py is:
inputs: a Tensor [batch_size, Num_classes, H, W]?
And the mean is over the batch size?

image

And in models.py you consider the different outcomes of the aux. decoders by taking the average?

@yassouali
Copy link
Owner

Yes, @yrcrcy that is correct.

@SuzannaLin For the mse loss, the mean is over all elements of the tensor, so a mean over both batches and pixels (as detailed below for the usage of reduction = "mean").

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants