Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use tf.softmax_cross_entropy_with_logits to calculate loss #181

Merged
merged 2 commits into from
Jun 23, 2016

Conversation

elezar
Copy link

@elezar elezar commented Jun 6, 2016

This PR closes #166

When manually implementing the cross entropy loss, the training loss becomes nan after a number of iterations (usually around 90 or so epochs for the cluttered MNIST example). Switching to the built-in softmax_cross_entropy_with_logits allows the network to train stably.

@denny1108 could you also just confirm that this solves the problem?

@denny1108
Copy link

denny1108 commented Jun 6, 2016

@elezar Thanks for your contribution. I ran the cluttered-mnist example again. Within 500 epochs, I did not see the 'nan' loss. I speculate the previous problem is due to negative log likelihood in cross entropy. In some case, when the probability of true category is close to 0, the numerical loss would be 'nan'. It could be caused by error in labeling or model divergence. I guess the build-in loss module has some scheme to avoid it?

Anyway, I think the problem has been solved for this example.

@elezar
Copy link
Author

elezar commented Jun 14, 2016

@martinwicke could you have a look again?

(@denny1108 it would be great if you could also just confirm that things still look good from your side)

@martinwicke
Copy link
Member

Looks good, thanks!

@martinwicke martinwicke merged commit d816971 into tensorflow:master Jun 23, 2016
@elezar elezar deleted the bugfix/nan_loss branch June 23, 2016 07:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Problem with spatial transform network: got 'nan' loss in the mnist example
3 participants