Skip to content
This repository was archived by the owner on Apr 10, 2024. It is now read-only.

Conversation

ludwigschubert
Copy link
Contributor

Adds relu gradient override, but does not yet make it the default for render_vis.
Also adds test to gradient_override.

@ludwigschubert ludwigschubert requested a review from colah May 4, 2018 00:59
@coveralls
Copy link

coveralls commented May 4, 2018

Pull Request Test Coverage Report for Build 102

  • 29 of 29 (100.0%) changed or added relevant lines in 1 file are covered.
  • 1 unchanged line in 1 file lost coverage.
  • Overall coverage increased (+0.9%) to 61.951%

Files with Coverage Reduction New Missed Lines %
lucid/misc/gl/meshutil.py 1 79.38%
Totals Coverage Status
Change from base Build 96: 0.9%
Covered Lines: 889
Relevant Lines: 1435

💛 - Coveralls

Copy link
Contributor

@colah colah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fantastic! I'm very excited to have a general solution to this, and the documentation is lovely.

I've left a few small comments. The main one is noting where Python 2.7 tests are failing due to a stray unicode character, but I've also nitpicked your comments a bit. :P Otherwise, good to merge!


When visualizing models we often[0] have to optimize through ReLu activation
functions. Where accessing pre-relu tensors is too hard, we use these
overrides to allow gradient to flow back through the ReLu—even if it didn't
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the Python 2.7 tests are failing due to a unicode character in this line (the dash?).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That and my favorite ellipsis character…


"""Redirected ReLu Gradient Overrides

When visualizing models we often[0] have to optimize through ReLu activation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Being very nitpicky, I'd probably structure this a bit differently:

"When we visualize ReLU networks, the initial random input we give the model may not cause the neuron we're visualizing to fire at all. For a ReLU neuron, this means that no gradient flow backwards and the visualization never takes off. One solution would be to find the pre-ReLU tensor, but that can be tedious."

"These functions provide a more convenient solution: temporarily override the gradient of ReLUs to allow gradient to flow back through the ReLU -- even if it didn't activate and had a derivative of zero -- allowing the visualization process to get started."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK

ReLus block the flow of the gradient during backpropagation when their input is
negative. ReLu6s also do so when the input is larger than 6. These overrides
change this behavior to allow gradient pushing the input into a desired regime
between these points.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarify that this only happens when the gradient would be zero if we didn't intervene?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK

@ludwigschubert ludwigschubert merged commit b495785 into master May 4, 2018
@ludwigschubert ludwigschubert deleted the feature/redirected-relu-gradients branch May 4, 2018 18:25
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants