Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Many NaN errors with the new Tensor-based version #63

Open
ASDen opened this issue Feb 13, 2016 · 2 comments
Open

Many NaN errors with the new Tensor-based version #63

ASDen opened this issue Feb 13, 2016 · 2 comments

Comments

@ASDen
Copy link
Contributor

ASDen commented Feb 13, 2016

For many models (especially deep ones with many parameters e.g. bidi2), I keep getting the following error

clstm.cc:664: void ocropus::GenericNPLSTM<F, G, H>::backward() [with int F = 1; int G = 2; int H = 2]: Assertion `!anynan(out)' failed.

where the old version (Mat-based) works just fine

@ASDen
Copy link
Contributor Author

ASDen commented Feb 19, 2016

can you please confirm the problem ? or it is just me misusing CLSTM...

@MichalBusta
Copy link

Hi, I believe it's just wrong assert. The assert is after input assignment, so ".d" derivatives parts are still un-initialized. (for larger networks you just increase probability that there will be random nan).

the proper fix can be:

bool anynan(Batch &a) {
if(anynan(a.v())) return true;
if(anynan(a.d())) return true; //this is failing
return false;
}

bool anynan_v(Batch &a) {
if(anynan(a.v())) return true;
return false;
}

and replace anynan with anynan_v during the forward step.

ASDen added a commit to ASDen/clstm that referenced this issue Feb 24, 2016
This PR solves the NaN issue reported here : tmbdev#63
The problem is that for newly allocated batches inside a Sequence, memory had to be reset to zeros, otherwise NaNs may flow in
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants