Skip to content
This repository has been archived by the owner on Jul 1, 2023. It is now read-only.

RNN gradients are wrong #518

Closed
eaplatanios opened this issue Sep 29, 2019 · 6 comments
Closed

RNN gradients are wrong #518

eaplatanios opened this issue Sep 29, 2019 · 6 comments

Comments

@eaplatanios
Copy link
Contributor

This line should be 饾泚cell += new饾泚cell instead. There is also an issue with the initial state initialization to a zeros tensor with batch size 1 that does not broadcast. I don't have time to open a PR right now because I'm traveling but will try to open one once I get a chance.

@rxwei
Copy link
Contributor

rxwei commented Sep 29, 2019

Nice catch!

@Shashi456
Copy link
Contributor

Should I take care of this @eaplatanios

@eaplatanios
Copy link
Contributor Author

@Shashi456 thanks! Feel free to fix the += bug. The initial state issue is a less trivial API design question that I haven't thought about much yet. There are a couple of possible solutions that I used in swift-rl, so I'll try to look into it sometime soon.

@sgugger
Copy link
Contributor

sgugger commented Dec 6, 2019

I think this has been solved by the two PRs merged. Feel free to reopen if there is still a problem.

@sgugger sgugger closed this as completed Dec 6, 2019
@Shashi456
Copy link
Contributor

@sgugger the tests against these changes were remaining so I thought we'd keep this PR open. #555, #554 track the tests. But since we have separate issues for them maybe it's okay.

@sgugger
Copy link
Contributor

sgugger commented Dec 6, 2019

Like you said there are already two issues tracking this. I wanted to do a bit of clean-up, but if anyone feels strongly about this issue remaining open, they can click the button :)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants