Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Experiment] Squeeze and Excitation #683

Open
sethtroisi opened this Issue Feb 4, 2019 · 10 comments

Comments

Projects
None yet
5 participants
@sethtroisi
Copy link
Contributor

sethtroisi commented Feb 4, 2019

Trying out Squeeze and Excitation.

Looks really good.

Code: #673 (Brian would be nice for you to take a look)

Inspirations:

Results

Trained 6 networks, 2 each of baseline, Squeeze And Excitation (SE) and SE + bias


screenshot from 2019-02-03 16-05-54

screenshot from 2019-03-04 16-55-34


tensorflow code was very slow in inference. It was mentioned that averagepool is slower than reduce_average, I investigated but it appeared and disappeared unclear why.

@l1t1

This comment has been minimized.

Copy link

l1t1 commented Feb 4, 2019

https://github.com/hujie-frank/SENet the site of the paper

@sethtroisi

This comment has been minimized.

Copy link
Contributor Author

sethtroisi commented Feb 4, 2019

Some links I used to profile performance

add_run_metadata from:
https://www.tensorflow.org/guide/graph_viz

chrome tracing from:
https://towardsdatascience.com/howto-profile-tensorflow-1a49fb18073d

chrome trace of SE:
image

(was .json renamed for git)timeline_01.txt

@Chicoryn

This comment has been minimized.

Copy link

Chicoryn commented Feb 5, 2019

@sethtroisi From a quick survey of the paper, the Squeeze-Excite (SE) approach looks very similar to what @lightvector has been doing with global properties. The main difference seems to be that SE only considers average pooling (though they suggest other aggregations), while the later suggests that max pooling might also be useful.

See https://github.com/lightvector/GoNN#update-oct-2018 for some further reading of his research into the topic. He also discuss a bunch of other topics you might find inspiring for similar enhancements.

@amj

This comment has been minimized.

Copy link
Contributor

amj commented Feb 11, 2019

What do you want to do with this issue, now that we're doing it? :)

@sethtroisi sethtroisi changed the title [Experiment Squeeze and Excitation [Experiment] Squeeze and Excitation Feb 11, 2019

@sethtroisi

This comment has been minimized.

Copy link
Contributor Author

sethtroisi commented Feb 11, 2019

I'm planning to include details from v17 in this issue then we'll close it out

@sethtroisi

This comment has been minimized.

Copy link
Contributor Author

sethtroisi commented Mar 15, 2019

Cross Eval is showing v17 as much stronger which I'm going to 80% attribute to this change!!!

image

@l1t1

This comment has been minimized.

Copy link

l1t1 commented Mar 15, 2019

great

@lightvector

This comment has been minimized.

Copy link

lightvector commented Mar 15, 2019

What's the computational cost (if any) of SE versus non-SE, holding number of blocks constant?

@sethtroisi

This comment has been minimized.

Copy link
Contributor Author

sethtroisi commented Mar 15, 2019

+2% on TPU for training, +1% for inference.

On my personal machine I had to pin some operation to the GPU or it was 2x slower.

@l1t1

This comment has been minimized.

Copy link

l1t1 commented Mar 15, 2019

they post another paper, i don't know if it is related with go game
Gather-Excite: Exploiting Feature Context in Convolutional Neural Networks

Jie Hu, Li Shen, Samuel Albanie, Gang Sun, Andrea Vedaldi
https://arxiv.org/abs/1810.12348

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.