Tensor2Tensor, or T2T for short, is a library of deep learning models and datasets. It has binaries to train the models and to download and prepare the data for you. T2T is modular and extensible and can be used in notebooks for prototyping your own models or running existing ones on your data. It is actively used and maintained by researchers and engineers within the Google Brain team and was used to develop state-of-the-art models for translation (see Attention Is All You Need), summarization, image generation and other tasks. You can read more about T2T in the Google Research Blog post introducing it.
We're eager to collaborate with you on extending T2T, so please feel free to open an issue on GitHub or send along a pull request to add your dataset or model. See our contribution doc for details and our open issues. You can chat with us and other users on Gitter and please join our Google Group to keep up with T2T announcements.
This iPython notebook explains T2T and runs in your browser using a free VM from Google, no installation needed.
Alternatively, here is a one-command version that installs T2T, downloads data, trains an English-German translation model, and evaluates it:
pip install tensor2tensor && t2t-trainer \ --generate_data \ --data_dir=~/t2t_data \ --problems=translate_ende_wmt32k \ --model=transformer \ --hparams_set=transformer_base_single_gpu \ --output_dir=~/t2t_train/base
You can decode from the model interactively:
t2t-decoder \ --data_dir=~/t2t_data \ --problems=translate_ende_wmt32k \ --model=transformer \ --hparams_set=transformer_base_single_gpu \ --output_dir=~/t2t_train/base \ --decode_interactive
- Suggested Models
- T2T Overview
- Adding your own components
- Adding a dataset
Here's a walkthrough training a good English-to-German translation model using the Transformer model from Attention Is All You Need on WMT data.
pip install tensor2tensor # See what problems, models, and hyperparameter sets are available. # You can easily swap between them (and add new ones). t2t-trainer --registry_help PROBLEM=translate_ende_wmt32k MODEL=transformer HPARAMS=transformer_base_single_gpu DATA_DIR=$HOME/t2t_data TMP_DIR=/tmp/t2t_datagen TRAIN_DIR=$HOME/t2t_train/$PROBLEM/$MODEL-$HPARAMS mkdir -p $DATA_DIR $TMP_DIR $TRAIN_DIR # Generate data t2t-datagen \ --data_dir=$DATA_DIR \ --tmp_dir=$TMP_DIR \ --problem=$PROBLEM # Train # * If you run out of memory, add --hparams='batch_size=1024'. t2t-trainer \ --data_dir=$DATA_DIR \ --problems=$PROBLEM \ --model=$MODEL \ --hparams_set=$HPARAMS \ --output_dir=$TRAIN_DIR # Decode DECODE_FILE=$DATA_DIR/decode_this.txt echo "Hello world" >> $DECODE_FILE echo "Goodbye world" >> $DECODE_FILE BEAM_SIZE=4 ALPHA=0.6 t2t-decoder \ --data_dir=$DATA_DIR \ --problems=$PROBLEM \ --model=$MODEL \ --hparams_set=$HPARAMS \ --output_dir=$TRAIN_DIR \ --decode_hparams="beam_size=$BEAM_SIZE,alpha=$ALPHA" \ --decode_from_file=$DECODE_FILE cat $DECODE_FILE.$MODEL.$HPARAMS.beam$BEAM_SIZE.alpha$ALPHA.decodes
Here are some combinations of models, hparams and problems that we found work well, so we suggest to use them if you're interested in that problem.
For translation, esp. English-German and English-French, we suggest to use
the Transformer model in base or big configurations, i.e.
--hparams_set=transformer_base. When trained on 8 GPUs for 300K steps
this should reach a BLEU score of about 28.
For summarization suggest to use the Transformer model in prepend mode, i.e.
For image classification suggest to use the ResNet or Xception, i.e.
# Assumes tensorflow or tensorflow-gpu installed pip install tensor2tensor # Installs with tensorflow-gpu requirement pip install tensor2tensor[tensorflow_gpu] # Installs with tensorflow (cpu) requirement pip install tensor2tensor[tensorflow]
# Data generator t2t-datagen # Trainer t2t-trainer --registry_help
python -c "from tensor2tensor.models.transformer import Transformer"
- Many state of the art and baseline models are built-in and new models can be added easily (open an issue or pull request!).
- Many datasets across modalities - text, audio, image - available for generation and use, and new ones can be added easily (open an issue or pull request for public datasets!).
- Models can be used with any dataset and input mode (or even multiple); all
modality-specific processing (e.g. embedding lookups for text tokens) is done
Modalityobjects, which are specified per-feature in the dataset/task specification.
- Support for multi-GPU machines and synchronous (1 master, many workers) and asynchronous (independent workers synchronizing through a parameter server) distributed training.
- Easily swap amongst datasets and models by command-line flag with the data
t2t-datagenand the training script
Datasets are all standardized on
TFRecord files with
protocol buffers. All datasets are registered and generated with the
and many common sequence datasets are already available for generation and use.
Problems and Modalities
Problems define training-time hyperparameters for the dataset and task,
mainly by setting input and output modalities (e.g. symbol, image, audio,
label) and vocabularies, if applicable. All problems are defined either in
or are registered with
t2t-datagen to see
the list of all available problems).
Modalities, defined in
abstract away the input and output data types so that models may deal with
T2TModels define the core tensor-to-tensor transformation, independent of
input/output modality or task. Models take dense tensors in and produce dense
tensors that may then be transformed in a final step by a modality depending
on the task (e.g. fed through a final linear transform to produce logits for a
softmax over classes). All models are imported in the
T2TModel - defined in
and are registered with
Hyperparameter sets are defined and registered in code with
and are encoded in
HParams are available to both the problem specification and the
model. A basic set of hyperparameters are defined in
and hyperparameter set functions can compose other hyperparameter set functions.
The trainer binary is the main entrypoint for training, evaluation, and
inference. Users can easily switch between problems, models, and hyperparameter
sets by using the
--hparams_set flags. Specific
hyperparameters can be overridden with the
related flags control local and distributed training/evaluation
(distributed training documentation).
Adding your own components
T2T's components are registered using a central registration mechanism that
enables easily adding new ones and easily swapping amongst them by command-line
flag. You can add your own components without editing the T2T codebase by
--t2t_usr_dir flag in
You can do so for models, hyperparameter sets, modalities, and problems. Please do submit a pull request if your component might be useful to others.
for an example user directory.
Adding a dataset
Also see the data generators README.
Note: This is not an official Google product.