Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with dimension in last output of Discriminator structure in CycleGan notebook. #72

Closed
hpkhanh1610 opened this issue Nov 25, 2018 · 3 comments

Comments

@hpkhanh1610
Copy link

hpkhanh1610 commented Nov 25, 2018

Hi,

In the CycleGAN solution notebook, in the Discriminator architecture, as I have seen the image of the architecture, the dimension of the last output (the logit) is 1x1x1; in the following code from the Discriminator:

        self.conv5 = conv(conv_dim*8, 1, 4, stride=1, batch_norm=False)

and I see that the output from self.conv4 is 8x8x512, if it goes through self.conv5, the output should be of shape 7x7x1. How can it be 1x1x1 as you defined in the image architecture as well as in the video lecture (you said it should output a single value)?

@cezannec
Copy link
Contributor

cezannec commented May 3, 2019

Ah, you're right! Thank you for this feedback. To be more specific, we are looking at one value (the mean of those output values) and using that single value to calculate the real and fake loss, later

@cezannec cezannec closed this as completed May 3, 2019
@christian-steinmeyer
Copy link

Is there a reason for these two steps? Why not go for the single step of creating the final layer with a kernel size of 8, padding 0 and actually have it output just one value?

@JonTong
Copy link

JonTong commented Jan 6, 2021

@christian-steinmeyer On one particular solution by Ram K in the Udacity Q&A platform I managed to locate an answer to why this is done.

He links to this [explanation] (junyanz/pytorch-CycleGAN-and-pix2pix#39) of which my understanding is that this allows the discriminator to more easily identify which specific patch in the image (i.e. which of the 7x7 cells) looks fake/real, which can then be traced back (via the receptive field of CNNs) through the network. In order to calculate the overall Discriminator Loss, this 7x7 output of the Discriminator is then averaged (since they are all indications of whether a given patch is "real" or not, and therefore the average should indicate whether the whole image is "real" (or not).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants