Transformer from "Attention is all you need" for language translation

Project description

My own implementation of the Transformer proposed in the paper Attention Is All You Need. I applied the Transformer to translate sentences from Portuguese to English.

Implementation

Jupyter Notebook

Requirements

A full list of the requirements is given here. The Python and deep learning library versions are:

Python 3.5.5
TensorFlow 1.15.0

Some comments on the importance of the Transformer

Challenges of the LSTM/RNN encoder-decoder model with attention

The LSTM/RNN encoder-decoder architecture with attention has the following challenges:

The sequential nature of RNNs/LSTMs results in lack of parallelization.
Increased computational complexity of computing a separate context vector for every step of decoder.
The difficulty of learning long-range dependencies in the network. This does not refer to the dependency between the encoder and decoder sequences, since this has been handled by the attention mechanism in encoder-decoder sequences. It is related with the fact that the attention model ignores the attention information inside the source sentence and the target sentence.

Transformer

Transformer is a novel network architecture that aims to solve sequence-to-sequence tasks, while handling long-range dependencies with ease. It is an architecture for transforming one sequence into another one with the help of two parts (Encoder and Decoder), but it differs from the previously described/existing sequence-to-sequence models because it is based solely on attention mechanisms without any use of LSTMs or RNNs.

The novel idea of the Transformer is to extend the attention mechanism to the processing of input and output sentences themselves as well. In addition, instead of going from left to right using RNNs and feeding the encoder one word at a time, the Transformer allows the encoder and decoder to see the entire input sequence all at once, directly modeling these dependencies using self-attention.

Quoting from the paper Attention Is All You Need, "The Transformer is the first transduction model relying entirely on self-attention to compute representations of its input and output without using sequence-aligned RNNs or convolution". Here, “transduction” means the conversion of input sequences into output sequences.

Advantages of the Transformer

The Transformer has the following advantages over the sequence to sequence models with attention:

Parallelization: The transformer allows the encoder and decoder to see the entire input sequence all at once, directly modeling these dependencies using self-attention.
Long-range dependency: Self-attention allows to handle dependencies between input or output tokens themselves.

Disadvantages of the Transformer

If the input does have a temporal/spatial relationship, like text, some positional encoding must be added or the model will effectively see a bag of words.

Name		Name	Last commit message	Last commit date
Latest commit History 73 Commits
images		images
README.md		README.md
Transformer_language_translation.ipynb		Transformer_language_translation.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

images

images

README.md

README.md

Transformer_language_translation.ipynb

Transformer_language_translation.ipynb

requirements.txt

requirements.txt

Repository files navigation

Transformer from "Attention is all you need" for language translation

Project description

Implementation

Requirements

Some comments on the importance of the Transformer

Challenges of the LSTM/RNN encoder-decoder model with attention

Transformer

Advantages of the Transformer

Disadvantages of the Transformer

References

About

Releases

Packages

Languages

vgkortsas/Transformer

Folders and files

Latest commit

History

Repository files navigation

Transformer from "Attention is all you need" for language translation

Project description

Implementation

Requirements

Some comments on the importance of the Transformer

Challenges of the LSTM/RNN encoder-decoder model with attention

Transformer

Advantages of the Transformer

Disadvantages of the Transformer

References

About

Resources

Stars

Watchers

Forks

Languages