This is an experiment to implement transformer arcitecture in tensorflow 2. The goal is to create reproduceable components that can be then used in other projects. Specifically to explore applications in timeseries, audio processing and image segmentation.
Copying a lot from google tensorflow example on tranformer model for language understanding https://www.tensorflow.org/tutorials/text/transformer#scaled_dot_product_attention