Add Shake-Shake model #153

rshin · 2017-07-13T23:09:08Z

The model hasn't yet been tested empirically, but it's intended to match "Shake-Shake-Batch" of https://arxiv.org/pdf/1705.07485.pdf on CIFAR-10.

Some potential differences:

tensor2tensor puts smaller weight decay on larger variables
Input augmentation details likely differ slightly from fb.resnet.torch
fb.resnet.torch probably doesn't use gradient clipping
Different input and output flow. Shake-Shake uses 3x3 conv with 32 channels + batch norm, and 8x8 average pooling + FC, respectively.

lukaszkaiser

Looks good, thanks!

lukaszkaiser · 2017-07-14T01:26:24Z

Thanks for the code, we can improve now.

rshin added 4 commits July 13, 2017 14:09

Add shake-shake for CIFAR-10

f44f51f

Fix bugs, add more explanation

9f02a51

Reword comments

8287094

Add shakeshake_type hparam: batch, image, equal

2adf3ae

lukaszkaiser approved these changes Jul 14, 2017

View reviewed changes

lukaszkaiser merged commit 4617c01 into tensorflow:master Jul 14, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Shake-Shake model #153

Add Shake-Shake model #153

Uh oh!

rshin commented Jul 13, 2017

Uh oh!

lukaszkaiser left a comment

Uh oh!

lukaszkaiser commented Jul 14, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add Shake-Shake model #153

Add Shake-Shake model #153

Uh oh!

Conversation

rshin commented Jul 13, 2017

Uh oh!

lukaszkaiser left a comment

Choose a reason for hiding this comment

Uh oh!

lukaszkaiser commented Jul 14, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants