Skip to content

Conversation

@danielkurniadi
Copy link
Contributor

Dear Udacity team,

Description

I noted that the gram matrix implementation in the Style_Transfer_Solution.ipynb doesn't take account if the tensors inputs are feature map from batch size b > 1.

While the notebook exercise doesn't use multiple content/style image at the same time, I would suggest that the gram matrix implementation takes into account the case for having b > 1. Thus assuming the b=1 in the context of the gram matrix implementation alone confuses me as a reader as it seems to discard batch size information in a general gram matrix implementation, while it should not.

Suggestion

Hence I edited the output tensors of gram_matrix() similar to the following:

def get_gram_matrix(img):
    """
    Compute the gram matrix by converting to 2D tensor and doing dot product
    img: (batch, channel/depth, height, width)
    """
    b, d, h, w = img.size()
    img = img.view(b*d, h*w) # fix the dimension. It doesn't make sense to put b=1 when it's not always the case
    gram = torch.mm(img, img.t())
    return gram

Given batch size b of style image feature maps in the form of 4D tensors, the get_gram_matrix(imgs) does the operation of gram matrix for each batch size and stack the output matrix, hence resulting in the output of 2D with shape: (b*d, h*w)

Thank you,
iqDF.

@abhiojha8 abhiojha8 self-assigned this Mar 30, 2021
@abhiojha8 abhiojha8 added the enhancement New feature or request label Mar 30, 2021
@abhiojha8 abhiojha8 merged commit e8286df into udacity:master Mar 30, 2021
mxagar pushed a commit to mxagar/deep-learning-v2-pytorch that referenced this pull request Jul 1, 2022
fix: Correct Gram Matrix function for Batch Size > 1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants