Skip to content
This repository was archived by the owner on Jul 7, 2023. It is now read-only.

Conversation

@rshin
Copy link
Contributor

@rshin rshin commented Jul 13, 2017

The model hasn't yet been tested empirically, but it's intended to match "Shake-Shake-Batch" of https://arxiv.org/pdf/1705.07485.pdf on CIFAR-10.

Some potential differences:

  • tensor2tensor puts smaller weight decay on larger variables
  • Input augmentation details likely differ slightly from fb.resnet.torch
  • fb.resnet.torch probably doesn't use gradient clipping
  • Different input and output flow. Shake-Shake uses 3x3 conv with 32 channels + batch norm, and 8x8 average pooling + FC, respectively.

Copy link
Contributor

@lukaszkaiser lukaszkaiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks!

@lukaszkaiser lukaszkaiser merged commit 4617c01 into tensorflow:master Jul 14, 2017
@lukaszkaiser
Copy link
Contributor

Thanks for the code, we can improve now.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants