-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Description
The preprocessing step in the Cycle GAN proect is supposed to resize input images to be 128x128 in size. However, as long as only one value (not a tuple) is passed to transforms.Resize(), the code resizes only the smaller side of the input image to 128. The larger side will be resized to keep the aspect ratio of the original image. See docs.
# resize and normalize the images
transform = transforms.Compose([transforms.Resize(image_size), # resize to 128x128
transforms.ToTensor()])
The issue has probably been unnoticed as long as input images are square.
The output of the last discriminator layer should be 1x1x1, right? In the code we apply a kernel of size 4 to 8x8 feature map. That will be a 7x7 matrix, not a single number. Was it intended?
class Discriminator(nn.Module):
def __init__(self, conv_dim=64):
super(Discriminator, self).__init__()
# Define all convolutional layers
# Should accept an RGB image as input and output a single value
# Convolutional layers, increasing in depth
...
# Classification layer
self.conv5 = conv(conv_dim*8, 1, 4, stride=1, batch_norm=False)
The original code seems to work due to the torch.mean() function, which returns one value for a multidimensional array, but I doubt if that is correct.
It is not quite clear why sigmoid can't be used in the output layer of the discriminator along with MSE loss. Sigmoid would squeeze the descriminator outputs to the (0, 1) range, hence the squared distance between a ground truth label (1 or 0) and the output would be more meaningful.
What is the reason to use one optimizer for both generators, but separately optimize the descriminators?
The training loop: as far as I understand the code, what is called "epoch" here, in fact is a step processing one batch of data and not the whole training set.