-
Notifications
You must be signed in to change notification settings - Fork 656
Add redirected Relu grad and test #56
Conversation
Pull Request Test Coverage Report for Build 102
💛 - Coveralls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fantastic! I'm very excited to have a general solution to this, and the documentation is lovely.
I've left a few small comments. The main one is noting where Python 2.7 tests are failing due to a stray unicode character, but I've also nitpicked your comments a bit. :P Otherwise, good to merge!
lucid/misc/redirected_relu_grad.py
Outdated
|
||
When visualizing models we often[0] have to optimize through ReLu activation | ||
functions. Where accessing pre-relu tensors is too hard, we use these | ||
overrides to allow gradient to flow back through the ReLu—even if it didn't |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the Python 2.7 tests are failing due to a unicode character in this line (the dash?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That and my favorite ellipsis character…
lucid/misc/redirected_relu_grad.py
Outdated
|
||
"""Redirected ReLu Gradient Overrides | ||
|
||
When visualizing models we often[0] have to optimize through ReLu activation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Being very nitpicky, I'd probably structure this a bit differently:
"When we visualize ReLU networks, the initial random input we give the model may not cause the neuron we're visualizing to fire at all. For a ReLU neuron, this means that no gradient flow backwards and the visualization never takes off. One solution would be to find the pre-ReLU tensor, but that can be tedious."
"These functions provide a more convenient solution: temporarily override the gradient of ReLUs to allow gradient to flow back through the ReLU -- even if it didn't activate and had a derivative of zero -- allowing the visualization process to get started."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK
ReLus block the flow of the gradient during backpropagation when their input is | ||
negative. ReLu6s also do so when the input is larger than 6. These overrides | ||
change this behavior to allow gradient pushing the input into a desired regime | ||
between these points. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarify that this only happens when the gradient would be zero if we didn't intervene?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK
Adds relu gradient override, but does not yet make it the default for
render_vis
.Also adds test to
gradient_override
.