This repository was archived by the owner on Jul 7, 2023. It is now read-only.

Description
I would like to train en->fr using the Transformer model on one GPU, and by using wordpieces (sentencepiece) of 32k vocab, like in GNMT. Now I have my own dataset of English sentences and corresponding French sentences.
What's the right way to do it with T2T? Is there a way to pass the parameters of the test/dev/test source and target files in the command line, or must I register my own Problem, by adding a class to the data-generators folder?
Will the 32k vocab be generated automatically given the source and target corpuses I supply?