Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inefficient usage of resources #12

Closed
zeryx opened this issue Mar 7, 2018 · 5 comments
Closed

Inefficient usage of resources #12

zeryx opened this issue Mar 7, 2018 · 5 comments

Comments

@zeryx
Copy link
Contributor

zeryx commented Mar 7, 2018

I was exploring this project for a general purpose forecasting model project I was working on, and I realized that the performance of this model with multiple layers (say 5) is actually worse in both forward and backward passes than a standard pytorch GRU module with 5 layers.

https://gist.github.com/zeryx/c43fc53b4d3f71c4942dff44912aa3cb

From my understanding of the paper, the DRNN module should be at least as performant as the equivalent GRU module if not dramatically superior in both forward and backward pass compute time.

@kashif
Copy link
Contributor

kashif commented Mar 7, 2018

thanks @zeryx I'll also investigate...

@blythed
Copy link
Contributor

blythed commented Mar 8, 2018

Might have something to do with looping over layers in a native python loop.

@zeryx
Copy link
Contributor Author

zeryx commented Mar 8, 2018

I did a little bit of digging as well, it looks like the hidden memory tensor is never actually updated. I made a local change that forced the DRNN layer to make use of the allocated hidden memory tensor, however, due to the python native List around the memory tensors (as each memory tensor has different dimensionality, you can't stack them conventionally) training requires the "retain_graph" variable set to true in the backward() function.
I can put my work in a PR but I have a feeling that a full rewrite might be required.

@kashif
Copy link
Contributor

kashif commented Mar 8, 2018

ah cool @zeryx please push your stuff then we can try to figure it out and fix it!

@zeryx
Copy link
Contributor Author

zeryx commented Mar 9, 2018

^ that PR would expect the hidden tensor to be

drnn_h = [Variable(torch.zeros(2 ** i, 1, self.hidden_width)).cuda().float()
                    for i in range(self.depth)]

I haven't been able to figure out a way to convert that into torch.Variable, as each layer's hidden tensor has increasing dimensions

@kashif kashif closed this as completed Aug 6, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants