## Gradient Checkpointing

A simple example (sentiment analysis task) using gradient checkpointing.

Gradient checkpointing allows you to train using less VRAM, but at the cost of recomputing activations that are not stored.

In [1]:
import t2t

In [2]:
trainer_arguments = t2t.TrainerArguments(
    # model
    model_name_or_path="t5-base",
    cache_dir="../cache",
    # data inputs
    train_file="../sample_data/trainlines.json",
    max_source_length=128,
    max_target_length=8,
    # taining outputs
    output_dir="/tmp/saved_model",
    overwrite_output_dir=True,
    # training settings
    num_train_epochs=1,
    per_device_train_batch_size=64,
    learning_rate=1e-5,
    gradient_checkpointing=True,
    prefix="predict sentiment: ",
    # validation settings
)
trainer = t2t.Trainer(arguments=trainer_arguments)

Using custom data configuration default-e0ec4565d3962e1d
Reusing dataset json (../cache/json/default-e0ec4565d3962e1d/0.0.0/d75ead8d5cfcbe67495df0f89bd262f0023257fbbbd94a730313295f3d756d50)


  0%|          | 0/1 [00:00<?, ?it/s]

In [3]:
trainer.model_summary()

Summary
- model name: t5-base
- model params:
  - train: 222.9 M
  - total: 222.9 M
  - vocab: 32100
- prompt tuning only: False


### Train Model

* With checkpointing: 8.4 GB VRAM, 54s runtime
* Without checkpointing: 16.1 GB VRAM, 42s runtime

In [4]:
trainer.train(valid=False)

Running tokenizer on train dataset:   0%|          | 0/8 [00:00<?, ?ba/s]

***** Running training *****
  Num examples = 8000
  Num Epochs = 1
  Instantaneous batch size per device = 64
  Total train batch size (w. parallel, distributed & accumulation) = 64
  Gradient Accumulation steps = 1
  Total optimization steps = 125


Step,Training Loss,Validation Loss


Saving model checkpoint to /tmp/saved_model/checkpoint-125
Configuration saved in /tmp/saved_model/checkpoint-125/config.json
Model weights saved in /tmp/saved_model/checkpoint-125/pytorch_model.bin
tokenizer config file saved in /tmp/saved_model/checkpoint-125/tokenizer_config.json
Special tokens file saved in /tmp/saved_model/checkpoint-125/special_tokens_map.json
Copy vocab file to /tmp/saved_model/checkpoint-125/spiece.model


Training completed. Do not forget to share your model on huggingface.co/models =)


Saving model checkpoint to /tmp/saved_model
Configuration saved in /tmp/saved_model/config.json
Model weights saved in /tmp/saved_model/pytorch_model.bin
tokenizer config file saved in /tmp/saved_model/tokenizer_config.json
Special tokens file saved in /tmp/saved_model/special_tokens_map.json
Copy vocab file to /tmp/saved_model/spiece.model


***** train metrics *****
  epoch                    =        1.0
  total_flos               =  1151995GF
  train_loss               =     0.9041
  train_runtime            = 0:00:53.66
  train_samples            =       8000
  train_samples_per_second =     149.06
  train_steps_per_second   =      2.329


### Test Model

In [5]:
input_text = "predict sentiment: This is the worst movie I have ever seen!"
trainer.generate_single(input_text, max_length=8)



'negative'

In [6]:
input_text = "predict sentiment: This is the best movie I have ever seen!"
trainer.generate_single(input_text, max_length=8)

'positive'