![Thinkube AI Lab](../icons/tk_full_logo.svg)

# GPU Basics 🎮

Verify and understand your GPU environment:
- Check GPU availability
- CUDA basics
- PyTorch GPU operations
- Memory management
- Multi-GPU detection

## Why GPUs?

GPUs accelerate deep learning by:

- **Parallel Processing**: Thousands of cores vs CPU's dozen
- **Matrix Operations**: Optimized for tensor math
- **Memory Bandwidth**: Faster data transfer
- **Speed**: 10-100x faster training

Thinkube provides GPU nodes for ML workloads!

## Check GPU with nvidia-smi

In [None]:
# Run nvidia-smi to see GPU info
!nvidia-smi

## PyTorch CUDA Detection

In [None]:
# Check CUDA availability in PyTorch
import torch

# TODO: Check if CUDA is available
# TODO: Get CUDA version
# TODO: Get number of GPUs
# TODO: Get GPU names
# TODO: Display GPU properties (compute capability, memory)

## Tensor Operations on GPU

In [None]:
# Basic GPU tensor operations
import torch

# TODO: Create tensor on CPU
# TODO: Move tensor to GPU with .cuda() or .to('cuda')
# TODO: Perform operations on GPU
# TODO: Time CPU vs GPU operations
# TODO: Move result back to CPU

## GPU Memory Monitoring

In [None]:
# Monitor GPU memory usage

# TODO: Get allocated memory
# TODO: Get reserved memory
# TODO: Get max memory allocated
# TODO: Clear cache with torch.cuda.empty_cache()
# TODO: Display memory stats

## Performance Comparison: CPU vs GPU

In [None]:
# Benchmark matrix multiplication
import time

# TODO: Create large matrices
# TODO: Time matrix multiplication on CPU
# TODO: Time matrix multiplication on GPU
# TODO: Calculate speedup
# TODO: Display comparison chart

## Multi-GPU Detection

In [None]:
# Check for multiple GPUs

# TODO: Get device count
# TODO: List all GPU devices
# TODO: Get properties for each GPU
# TODO: Test operations on different GPUs
# TODO: Display GPU topology

## Memory Management Best Practices

In [None]:
# Efficient memory usage

# TODO: Demonstrate memory leak (forget to free)
# TODO: Show proper cleanup with del and empty_cache()
# TODO: Use torch.no_grad() for inference
# TODO: Demonstrate gradient accumulation
# TODO: Show memory profiling

## Common Issues and Solutions

### Out of Memory (OOM)
- Reduce batch size
- Use gradient accumulation
- Enable mixed precision
- Clear cache between runs

### Slow Training
- Check data loading (use DataLoader with workers)
- Pin memory for faster transfers
- Use GPU-optimized operations
- Profile your code

### CUDA Errors
- Check tensor devices match
- Verify CUDA compatibility
- Restart kernel if needed

## Next Steps

Continue with:
- **02-pytorch-training.ipynb** - Train models on GPU
- **03-distributed-training.ipynb** - Multi-GPU training
- **04-transformers-training.ipynb** - Train transformer models