![Thinkube AI Lab](../icons/tk_full_logo.svg)

# Training Transformers on GPU 🤗

Train transformer models efficiently:
- HuggingFace Transformers library
- Load pretrained models
- GPU memory optimization
- Mixed precision training
- Gradient accumulation

## Introduction

Training transformers requires careful memory management:

- **Large Models**: Models can be GBs in size
- **Memory Hungry**: Attention is O(n²)
- **Optimization**: Mixed precision, gradient checkpointing
- **Tools**: HuggingFace Trainer simplifies it

## Load Pretrained Model

In [None]:
# Load pretrained transformer
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch

# TODO: Choose model (BERT, RoBERTa, DistilBERT)
# TODO: Load tokenizer
# TODO: Load model for classification
# TODO: Move model to GPU
# TODO: Display model size and parameters

## Prepare Dataset

Tokenize and format data:

In [None]:
# Prepare dataset for training
from datasets import load_dataset

# TODO: Load dataset (e.g., IMDB, SST-2)
# TODO: Tokenize texts
# TODO: Set format for PyTorch
# TODO: Create train/val splits
# TODO: Display sample

## Training Arguments

Configure training with TrainingArguments:

In [None]:
# Setup training configuration
from transformers import TrainingArguments

# TODO: Define output directory
# TODO: Set training hyperparameters
#       - learning_rate
#       - num_train_epochs
#       - per_device_train_batch_size
# TODO: Enable FP16 mixed precision
# TODO: Set gradient accumulation steps
# TODO: Configure logging and evaluation
# TODO: Display configuration

## Mixed Precision Training

Use FP16 for faster training:

In [None]:
# Enable mixed precision

# TODO: Explain FP16 vs FP32
# TODO: Show memory savings
# TODO: Enable in TrainingArguments (fp16=True)
# TODO: Monitor GPU memory before/after
# TODO: Compare training speed

## Gradient Accumulation

Simulate larger batch sizes:

In [None]:
# Configure gradient accumulation

# TODO: Explain gradient accumulation concept
# TODO: Set gradient_accumulation_steps
# TODO: Calculate effective batch size
# TODO: Show memory usage comparison
# TODO: Display configuration

## Train with Trainer

Use HuggingFace Trainer:

In [None]:
# Create and run Trainer
from transformers import Trainer

# TODO: Create Trainer instance
#       - model
#       - training_args
#       - train_dataset
#       - eval_dataset
# TODO: Start training with trainer.train()
# TODO: Monitor training progress
# TODO: Display training results

## Evaluation

Evaluate fine-tuned model:

In [None]:
# Evaluate model

# TODO: Run trainer.evaluate()
# TODO: Calculate metrics (accuracy, F1)
# TODO: Generate predictions on test set
# TODO: Display confusion matrix
# TODO: Show sample predictions

## Save and Upload Model

Save fine-tuned weights:

In [None]:
# Save model

# TODO: Save model with trainer.save_model()
# TODO: Save tokenizer
# TODO: Test loading saved model
# TODO: Optionally push to HuggingFace Hub
# TODO: Display save location

## Memory Optimization Techniques

Advanced optimization:

In [None]:
# Memory optimization strategies

# TODO: Gradient checkpointing
# TODO: Smaller batch size with gradient accumulation
# TODO: Model parallelism for very large models
# TODO: CPU offloading with accelerate
# TODO: Compare memory usage

## Inference

Use fine-tuned model:

In [None]:
# Run inference

# TODO: Load fine-tuned model
# TODO: Create inference pipeline
# TODO: Test on new examples
# TODO: Display predictions with confidence
# TODO: Measure inference speed

## Best Practices

- ✅ Always use mixed precision (FP16) when possible
- ✅ Enable gradient checkpointing for large models
- ✅ Use gradient accumulation instead of huge batches
- ✅ Monitor GPU memory usage
- ✅ Start with smaller models (DistilBERT) then scale
- ✅ Use HuggingFace Trainer for convenience
- ✅ Save checkpoints regularly
- ✅ Track experiments with MLflow/Langfuse

## Next Steps

Continue with:
- **05-mlops-integration.ipynb** - Full MLOps workflow
- **fine-tuning/** - Advanced fine-tuning with QLoRA and Unsloth