Research: Feature Visualization Objectives #116
Please be respectful of the fact that this is unpublished research and that people involved in this are putting themselves in an unusually vulnerable position. Please treat it as you would unpublished work described in a seminar or by a colleague.
Feature Visualization studies neural network behavior by optimizing an input to trigger a particular input.
For example, to visualize a neuron, we create an input which strongly causes the neuron to fire. We can also visualize a combination of neurons by maximizing the amount the sum fires. These visualizations have a nice geometric interpretation: we are visualizing a direction in a vector space of activations, where each neuron is a basis.
We normally do this by maximizing that direction, that is maximizing the dot product of our activation vector with the the desired direction vector. However...
Maximizing dot product may not be the right objective
There are a number of reasons why just maximizing a direction in this way may not actually be the thing we want, at least in some cases:
(An additional reason we might want to do something different is that, even when normal feature visualization works perfectly, it doesn't differentiate between things that strongly help activate the direction and things that only slightly do..).
Alternate Visualization Objectives
There are many other visualization objectives we could try. (Note, there might not be a single correct one -- they may all show us different things.)
Are we sure there's a problem?
The main things pointing towards there being an issue are:
These could be explained in different ways, but generally suggest we should think hard both about the directions we're visualizing and the objectives we're using to visualize them.
(A final, more fatal error could be that directions aren't the right thing to try to understand at all. None of these observations really implicate that at this point.)
Dot x Cosine Similarity
See, for example, this notebook on caricatures.
Penalizing activations at previous layer
obj = objectives.neuron("mixed4d", 504) obj += -1e-4*objectives.L1("mixed4a") # penalize earlier layer param_f = lambda: param.image(160) _ = render.render_vis(model, obj, param_f)
The text was updated successfully, but these errors were encountered:
Comment from Yasaman Bahri (@yasamanb): maybe the reason we see poly-semantic neurons is that the task isn't hard enough to get neurons in later layers to learn the "right" abstractions. In early layers, when you're closer to the data, perhaps it is easier. (comment paraphrase by Chris, may not be a super accurate interpretation of Yasaman's remark.)
Hey there, we are looking at these objectives with a new perspective of tying them to uncertainty estimation within a deep neural network. If an activation vectors is far from all seen activation vectors then its an outlier. If an activation vector is equally similar to the centroid of two classes then its a point close to the boundary between the two classes. Early results show that this method differs from the prediction probability at the end of a softmax and is better for some of the deeper/more complex networks I have experimented with.
This sounds super neat! What's the status on this project (considering it's been over half a year since this issue was explicitly talked over)? If it's still in the works, is the main work to be done with respect to looking at different objectives for visualization or something else?
@colah Maybe I'm oversimplifying, but each abstraction, i.e. each layer away from the input we would desire a generalized representation of the data, i.e. a many-to-one correspondence between input configurations and abstract neuron activations. If we're classifying objects, we're actually stipulating this intentionally.
The real question is how "quickly" learning algorithms can separate classes. There's an obvious linear algebra angle here which will almost certainly relate to the rank and condition numbers of successive weight matrices (because there are biases too, I guess these would be a fine transformations rather than "linear").