This repository was archived by the owner on Jul 7, 2023. It is now read-only.
Add new set network layer type and example model #133
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Added the following layers (and tests):
*
global_pool_1d: Not much to this, just a simple max/mean reduce across all vectors in a sequence. Useful for getting a single vector representation of a sequence/set. This can also be thought of as a kind of global attention.linear_set_layer: Simple 1d convolution across all elements in a sequence (similar layers are already used in repo), but also includes the ability to parameterise the transformation via with a (learned) sequence 'context' vector. Currently this is done by simply concatenating the inputs with the context.*
ravanbakhsh_set_layer: Layer type used in https://arxiv.org/abs/1611.04500.These layers are permutation invariant and hence are suitable for applications where inputs are given as sets, but they may also be useful for sequence tasks as well.
Also added an example model
transformer_altwhich replaces the self attention layers intransformerwith two different kinds of modules composed of the layers described above. I have no idea how well it performs (although similar previous architectures I tested seemed to do only slightly worse than the full transformer).N.B: I have not extensively tested these layers in t2t, so it's entirely possible that they're not perfectly functional (particularly wrt to masking).
Also changed two lines in tests to stop numpy warnings.