Bytenet - implementation difference from paper

I was looking at the [bytenet code](https://github.com/tensorflow/tensor2tensor/blob/de38b168a7e53f7352048b87c276d5a484111f64/tensor2tensor/models/bytenet.py) and noticed that the implementation is different from the [Neural Machine Translation in Linear Time](https://arxiv.org/pdf/1610.10099.pdf) paper.

From my understanding of the paper - each atrous conv layer is wrapped in a ResnetBlock, e.g.:

**single repeat** = [Resnet(conv layer, dilated=1), Resnet(conv layer, dilated=2), Resnet(conv layer, dilated=4)]

while in the current implementation it seems more like a ResnetBlock is composed of many dilated conv layers, e.g.: 

**single repeat** = Resnet[ (conv layer, dilated=1), (conv layer, dilated=4), (conv layer, dilated=8)]

Is the difference from the paper intended?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bytenet - implementation difference from paper #150

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Bytenet - implementation difference from paper #150

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions