# Facebook FAIR’s transformer model

The pretrained model used in this notebook was developed by Facebook FAIR for the submission to WMT: Workshop on Statistical Machine Translation 2019. They competed in 4 tasks with this model: English-German, English-Russian (to and from, in both cases) and won in all of them. The model itself is very big (around 10 GBs) and we used it from Torch hub.

# Environment setup

In [None]:
import torch
import nltk

In [None]:
# !pip install hydra-core omegaconf
# !pip install fastBPE regex requests sacremoses subword_nmt

In [None]:
nltk.download('punkt')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


True

# Load the model

As mentioned before the model is very large (we used ensemble of models), so it takes a lot of time to load the model. First, we can see the list of models that pytorch/fairseq provides

In [None]:
torch.hub.list('pytorch/fairseq')

Using cache found in /root/.cache/torch/hub/pytorch_fairseq_master


running build_ext
cythoning fairseq/data/data_utils_fast.pyx to fairseq/data/data_utils_fast.cpp




cythoning fairseq/data/token_block_utils_fast.pyx to fairseq/data/token_block_utils_fast.cpp
building 'fairseq.libbleu' extension
creating build
creating build/temp.linux-x86_64-3.7
creating build/temp.linux-x86_64-3.7/fairseq
creating build/temp.linux-x86_64-3.7/fairseq/clib
creating build/temp.linux-x86_64-3.7/fairseq/clib/libbleu
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fdebug-prefix-map=/build/python3.7-a56wZI/python3.7-3.7.10=. -fstack-protector-strong -Wformat -Werror=format-security -g -fdebug-prefix-map=/build/python3.7-a56wZI/python3.7-3.7.10=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.7m -c fairseq/clib/libbleu/libbleu.cpp -o build/temp.linux-x86_64-3.7/fairseq/clib/libbleu/libbleu.o -std=c++11 -O3 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -DTORCH_EXTENSION_

['bart.base',
 'bart.large',
 'bart.large.cnn',
 'bart.large.mnli',
 'bart.large.xsum',
 'bpe',
 'camembert',
 'camembert-base',
 'camembert-base-ccnet',
 'camembert-base-ccnet-4gb',
 'camembert-base-oscar-4gb',
 'camembert-base-wikipedia-4gb',
 'camembert-large',
 'camembert.v0',
 'conv.stories',
 'conv.stories.pretrained',
 'conv.wmt14.en-de',
 'conv.wmt14.en-fr',
 'conv.wmt17.en-de',
 'data.stories',
 'dynamicconv.glu.wmt14.en-fr',
 'dynamicconv.glu.wmt16.en-de',
 'dynamicconv.glu.wmt17.en-de',
 'dynamicconv.glu.wmt17.zh-en',
 'dynamicconv.no_glu.iwslt14.de-en',
 'dynamicconv.no_glu.wmt16.en-de',
 'gottbert-base',
 'lightconv.glu.wmt14.en-fr',
 'lightconv.glu.wmt16.en-de',
 'lightconv.glu.wmt17.en-de',
 'lightconv.glu.wmt17.zh-en',
 'lightconv.no_glu.iwslt14.de-en',
 'lightconv.no_glu.wmt16.en-de',
 'roberta.base',
 'roberta.large',
 'roberta.large.mnli',
 'roberta.large.wsc',
 'tokenizer',
 'transformer.wmt14.en-fr',
 'transformer.wmt16.en-de',
 'transformer.wmt18.en-de',
 'transfo

## Loading the actual ensemble of models. 

As for the parameters we used:


*   model name - transformer.wmt19.ru-en (transformer model for WMT'19 Russian-English task)
*   checkpoint_file - we load 4 models. We also tried a single model and the result was slightly worse
*   tokenizer - 'moses'. We also tried nltk tokenizer, but decided to stick with 'moses' (nltk tokenizer requires further postprocessing step and still got less bleu score)
*   bpe - 'fastbpe'. Actually this is the only option for Russian-English transformer model (according to official GitHub page)



In [None]:
ru2en = torch.hub.load(
    'pytorch/fairseq', 
    # 'transformer.wmt19.ru-en.single_model',
    'transformer.wmt19.ru-en', 
    checkpoint_file='model1.pt:model2.pt:model3.pt:model4.pt', 
    tokenizer='moses', 
    bpe='fastbpe'
)
ru2en.eval()

Using cache found in /root/.cache/torch/hub/pytorch_fairseq_master
100%|██████████| 12161762203/12161762203 [31:18<00:00, 6474057.85B/s]


GeneratorHubInterface(
  (models): ModuleList(
    (0): TransformerModel(
      (encoder): TransformerEncoder(
        (dropout_module): FairseqDropout()
        (embed_tokens): Embedding(31232, 1024, padding_idx=1)
        (embed_positions): SinusoidalPositionalEmbedding()
        (layers): ModuleList(
          (0): TransformerEncoderLayer(
            (self_attn): MultiheadAttention(
              (dropout_module): FairseqDropout()
              (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
              (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
              (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
              (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
            )
            (self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
            (dropout_module): FairseqDropout()
            (activation_dropout_module): FairseqDropout()
            (fc1): Linear(in_feat

## We move the model to GPU as it would be too slow to process in CPU

In [None]:
ru2en.cuda()

GeneratorHubInterface(
  (models): ModuleList(
    (0): TransformerModel(
      (encoder): TransformerEncoder(
        (dropout_module): FairseqDropout()
        (embed_tokens): Embedding(31232, 1024, padding_idx=1)
        (embed_positions): SinusoidalPositionalEmbedding()
        (layers): ModuleList(
          (0): TransformerEncoderLayer(
            (self_attn): MultiheadAttention(
              (dropout_module): FairseqDropout()
              (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
              (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
              (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
              (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
            )
            (self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
            (dropout_module): FairseqDropout()
            (activation_dropout_module): FairseqDropout()
            (fc1): Linear(in_feat

# Prediction

In [None]:
# Just for checking purposes
ru2en.translate('Всем привет!')

'Hello everyone!'

In [None]:
# We load the evaluation set
file_path = 'eval-ru-100.txt'
out_file_path = 'answer.txt'

with open(file_path, 'r') as fd:
    ru_lines = [''.join(line.rsplit('\n')) for line in fd]

In [None]:
# Examples from evaluation set
print(ru_lines[:5])

['26. Вопрос о лесах необходимо вывести на более высокий уровень в рамках целей устойчивого развития, в том числе посредством включения в такие цели убедительных и четких целевых и рабочих показателей по лесам.', 'В рамках экологической экспертизы определены пять вариантов строительства и эксплуатации замещающей электростанции, которая восстановит мощность энергораспределительной сети Управления по состоянию до стихийного бедствия.', 'В ходе рассмотрения данного пункта повестки дня Рабочая группа будет кратко проинформирована Секретариатом о работе УНП ООН по содействию ратификации и осуществлению Протокола об огнестрельном оружии в рамках Глобальной программы по огнестрельному оружию.', 'В последние месяцы сирийское правительство позволило террористам использовать территорию своей страны в качестве базы, действуя с которой они устанавливают взрывные устройства на обочинах дорог, наносят ракетные удары по Израилю и обстреливают подразделения Армии обороны Израиля, дислоцированные на те

In [None]:
# actual translation happening here
en_lines = ru2en.translate(ru_lines)

In [None]:
# Examples of the predicted translations
print(en_lines[:5])

['26. Forests need to be taken to a higher level within the framework of the sustainable development goals, including through the inclusion of strong and clear targets and performance targets on forests.', "The environmental impact assessment identified five options for the construction and operation of a replacement power plant that would restore the Authority's power distribution network to pre-disaster capacity.", 'During its consideration of this agenda item, the Working Group will be briefed by the Secretariat on the work of UNODC in promoting the ratification and implementation of the Firearms Protocol through the Global Firearms Programme.', 'In recent months, the Syrian government has allowed terrorists to use its territory as a base from which to plant roadside bombs, launch rocket attacks on Israel and fire on Israel Defense Forces units stationed inside the country.', 'According to the author, the Immigration Service admits that these events did take place, even though the S

In [None]:
# Save the predicted translations to the answer.txt file
with open(out_file_path, 'w') as fd:
    fd.write('\n'.join(en_lines))

# Results


We tried two setups:



*   Single transformer model           - 49.66 BLEU score

*   Ensemble of 3 transformer models   - 51.00 BLEU score

