Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The vanilla cnn downsampling architecture cannot recover spatial information of a image #4

Open
zhiqwang opened this issue Apr 15, 2019 · 1 comment
Labels
enhancement New feature or request

Comments

@zhiqwang
Copy link
Owner

zhiqwang commented Apr 15, 2019

The convolutional part of the architecture act as a encoder part, it capture image's contexture information, the architecture should ensemble a decoder part (deconvolution layer or RNN layer) to recover image's spatial information.

@zhiqwang
Copy link
Owner Author

The current CNN architecture implemented here is classified two categories by the pooling size of the image's width: the one is densenet121, it compress image's width by 1/8, the other one is densenet_cifar by 1/4. So the current network architecture cannot handle the situation where the text have different width.

zhiqwang added a commit that referenced this issue Apr 24, 2019
@zhiqwang zhiqwang added the enhancement New feature or request label May 22, 2019
@zhiqwang zhiqwang added this to To do in image captioning Jun 11, 2019
@zhiqwang zhiqwang moved this from To do to In progress in image captioning Jun 11, 2019
@zhiqwang zhiqwang moved this from In progress to Done in image captioning Jun 21, 2019
@zhiqwang zhiqwang moved this from Done to In progress in image captioning Jun 21, 2019
@zhiqwang zhiqwang moved this from In progress to To do in image captioning Jun 21, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Development

No branches or pull requests

1 participant