New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Image Captioning] out of memory immediately after training starts? #35
Comments
I remember that the required gpu memory for What's your Python and PyTorch version? I guess you are using Python 2.7. Am i right? There are two options for solving this problem.
|
This might be related to #26 This is a known issue which will be resolved in the next release. Till then as a workaround, just change L56 to images = Variable(images, volatile=True) and L66 to features = encoder(images)
features = Variable(features.data) |
@yunjey I am on python 2.7 and pytorch 0.12. I will try your changes. |
@jtoy I recommend you to install PyTorch using source. This will give you the latest version of PyTorch. |
I tried with pytorch python 2.7 source and using pytorch for python 3.5, both died with the same issue. |
@karandwivedi42 your changes work! @yunjey will the code need to be updated? It seems like source doesnt seem to fix the issue. I can do more testing if needed. |
@jtoy Ok. Thanks. |
@karandwivedi42 That does not work. images = Variable(images, volatile=True) The code above makes |
@yunjey You are right. I don't know how important it is though because this linear layer is followed by another linear layer in the decoder with no non-linearity in between. |
so what is the right code to use? I was able to train a model with @karandwivedi42 's change and the model completed training for me in 155 minutes. does that time seem right? I trained the original show and tell model and I remember it taking at least a day. |
Put the fc and bn as a separate module between encoder and decoder so that
they can be a part of gradient computation. Does it make sense?
…On May 25, 2017 4:04 AM, "jtoy" ***@***.***> wrote:
so what is the right code to use? I was able to train a model with
@karandwivedi42 <https://github.com/karandwivedi42> change and the model
completed training for me in 155 minutes. does that time seem right? I
trained the original show and tell model and I remember it taking at least
a day.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#35 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AJLb6rEdGcH57EGQQA_MduX-g65DnZ_Eks5r9LBkgaJpZM4NjHj8>
.
|
@karandwivedi42 I dont fully understand, Im just starting to play with pytorch, any way to see it as a diff ? |
@jtoy This fork is a very hacky way to do exactly what the original code does. @yunjey Can you please check this one? (Thanks for the amazing tutorials btw :) ) |
@jtoy @karandwivedi42 I will fix the code by this weekend. |
I have a model training on it now. I'll also test out your version of the code.
… On May 25, 2017, at 10:06 PM, yunjey ***@***.***> wrote:
@jtoy @karandwivedi42 I will fix the code by this weekend.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
@jtoy @karandwivedi42 I modified the code. Try it. Thanks :) |
What size cards are these networks tested and trained on? I just tried running "09 - Image Captioning" and I immediately get errors. I am testing this on a Titan X with 12 GB of memory:
The text was updated successfully, but these errors were encountered: