<a href="https://colab.research.google.com/github/yoenoo/FedGPT/blob/main/pytorch_practice_notebook.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Intro
PyTorch Practice Notebook - ML Interview Preparation

35 Implementation Questions with Learning Objectives

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_classification, make_regression
from sklearn.model_selection import train_test_split

print(f"PyTorch version: {torch.__version__}")

# SECTION 1: TENSOR FUNDAMENTALS (Questions 1-8)

In [None]:

# Question 1: Basic Tensor Operations
def tensor_basics():
    """
    Create a 3x4 tensor with random values, then:
    1. Add 5 to all elements
    2. Multiply by 2
    3. Take the square root
    4. Calculate the mean across dimension 1

    Learning: Tensor creation, broadcasting, reduction operations
    """
    # TODO: Implement this function
    pass


In [None]:
# Question 2: Tensor Indexing and Slicing
def tensor_indexing():
    """
    Given a 5x5 tensor:
    1. Extract the diagonal elements
    2. Set the last row to zeros
    3. Extract every other column
    4. Use advanced indexing to select specific elements

    Learning: PyTorch indexing, slicing, masking
    """
    x = torch.randn(5, 5)
    # TODO: Implement the operations above
    pass

In [None]:
# Question 3: Gradient Computation
def gradient_computation():
    """
    Create tensors x and y, compute z = x^2 + y^3, and find gradients.
    Include a case where you need to retain_graph=True

    Learning: Autograd, computational graphs, gradient retention
    """
    # TODO: Implement gradient computation
    pass

In [None]:
# Question 4: In-place vs Out-of-place Operations
def inplace_operations():
    """
    Demonstrate the difference between in-place and out-of-place operations.
    Show what happens to gradients with in-place operations.

    Learning: Memory efficiency, gradient computation issues with in-place ops
    """
    # TODO: Show examples of both types and their effects on gradients
    pass


In [None]:
# Question 5: Broadcasting Rules
def broadcasting_examples():
    """
    Create examples that demonstrate PyTorch broadcasting rules:
    1. (3,1) + (1,4) = (3,4)
    2. (2,3,1) * (1,1,4) = (2,3,4)
    3. Show a case that would fail broadcasting

    Learning: Broadcasting mechanics, common pitfalls
    """
    # TODO: Implement broadcasting examples
    pass

In [None]:
# Question 6: Tensor Memory Layout
def memory_layout():
    """
    Create tensors with different memory layouts (contiguous vs non-contiguous).
    Use .view() vs .reshape() appropriately.

    Learning: Memory efficiency, when to use contiguous()
    """
    # TODO: Demonstrate memory layout concepts
    pass

In [None]:
# Question 7: Device Management
def device_operations():
    """
    Write code that works on both CPU and GPU.
    Move tensors between devices and handle device compatibility.

    Learning: CUDA operations, device-agnostic code
    """
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    # TODO: Implement device management examples
    pass

In [None]:
# Question 8: Custom Tensor Operations (INTENTIONAL BUG)
def custom_tensor_ops():
    """
    BUG ALERT: This function has an intentional error in tensor dimension handling.
    Find and fix the bug!
    """
    def matrix_multiply_bug(a, b):
        # This has a bug - find it!
        result = torch.zeros(a.shape[0], b.shape[0])  # BUG HERE
        for i in range(a.shape[0]):
            for j in range(b.shape[1]):
                result[i, j] = torch.sum(a[i, :] * b[:, j])
        return result

    # TODO: Find the bug and fix it, then test your fix
    pass

In [None]:
## Other practice questions

"""
Create a 2D tensor of shape (3, 4) with random values. Print its shape and dtype.

Perform element-wise addition, multiplication on tensors of shape (2, 3).

Perform matrix multiplication between (2, 3) and (3, 4) tensors.

Reshape a tensor from (4, 3) → (2, 6) → back. Add assertion.

Convert NumPy array to tensor and back.

Normalize a tensor to zero mean and unit variance. Fix divide-by-zero.

Create a one-hot encoded tensor from indices [0, 2, 1] for 3 classes.

Stack and concatenate 3 tensors of shape (2, 2).

Use broadcasting to subtract row-wise means from a 2D tensor.

Implement a custom function to clip tensor values between 0 and 1.
"""

# SECTION 2: NEURAL NETWORK FUNDAMENTALS (Questions 9-16)

In [None]:
# Question 9: Linear Layer Implementation
class LinearLayer(nn.Module):
    """
    Implement a linear layer from scratch using nn.Parameter.
    Include proper initialization and forward pass.

    Learning: nn.Module structure, Parameter vs regular tensors
    """
    def __init__(self, in_features, out_features, bias=True):
        super().__init__()
        # TODO: Initialize weights and bias as Parameters
        pass

    def forward(self, x):
        # TODO: Implement forward pass
        pass

In [None]:
# Question 10: Activation Functions
class ActivationFunctions:
    """
    Implement various activation functions from scratch and compare with PyTorch versions.

    Learning: Activation function mathematics, numerical stability
    """
    @staticmethod
    def custom_relu(x):
        # TODO: Implement ReLU from scratch
        pass

    @staticmethod
    def custom_sigmoid(x):
        # TODO: Implement sigmoid with numerical stability considerations
        pass

    @staticmethod
    def custom_tanh(x):
        # TODO: Implement tanh from scratch
        pass

    @staticmethod
    def leaky_relu(x, negative_slope=0.01):
        # TODO: Implement Leaky ReLU
        pass

In [None]:


# Question 11: Loss Functions
class CustomLosses:
    """
    Implement common loss functions from scratch.

    Learning: Loss function mathematics, reduction strategies
    """
    @staticmethod
    def mse_loss(predictions, targets, reduction='mean'):
        # TODO: Implement MSE loss
        pass

    @staticmethod
    def cross_entropy_loss(logits, targets):
        # TODO: Implement cross-entropy loss with numerical stability
        pass

    @staticmethod
    def binary_cross_entropy(predictions, targets):
        # TODO: Implement BCE loss
        pass

In [None]:


# Question 12: Multi-Layer Perceptron
class MLP(nn.Module):
    """
    Create a configurable MLP with:
    - Variable number of hidden layers
    - Dropout
    - Batch normalization option
    - Different activation functions

    Learning: Model architecture design, regularization techniques
    """
    def __init__(self, input_size, hidden_sizes, output_size, dropout=0.0, use_batchnorm=False):
        super().__init__()
        # TODO: Build the network architecture
        pass

    def forward(self, x):
        # TODO: Implement forward pass with optional batchnorm and dropout
        pass

In [None]:


# Question 13: Custom Dataset Class
class CustomDataset(Dataset):
    """
    Create a dataset class that can handle both regression and classification data.
    Include data augmentation options.

    Learning: Dataset creation, data preprocessing pipeline
    """
    def __init__(self, X, y, transform=None, task_type='classification'):
        # TODO: Initialize dataset
        pass

    def __len__(self):
        # TODO: Return dataset length
        pass

    def __getitem__(self, idx):
        # TODO: Return single sample with optional transforms
        pass

In [None]:


# Question 14: Training Loop Implementation
class Trainer:
    """
    Implement a flexible training loop with:
    - Training and validation phases
    - Metrics tracking
    - Early stopping
    - Learning rate scheduling

    Learning: Training best practices, monitoring, optimization
    """
    def __init__(self, model, train_loader, val_loader, criterion, optimizer, device):
        self.model = model
        self.train_loader = train_loader
        self.val_loader = val_loader
        self.criterion = criterion
        self.optimizer = optimizer
        self.device = device
        self.train_losses = []
        self.val_losses = []

    def train_epoch(self):
        # TODO: Implement one training epoch
        pass

    def validate_epoch(self):
        # TODO: Implement validation epoch
        pass

    def train(self, num_epochs, early_stopping_patience=None):
        # TODO: Implement full training loop with early stopping
        pass

In [None]:


# Question 15: Regularization Techniques (INTENTIONAL BUG)
class RegularizedModel(nn.Module):
    """
    BUG ALERT: This model has issues with dropout usage during training/evaluation.
    Find and fix the bugs!
    """
    def __init__(self, input_size, hidden_size, output_size):
        super().__init__()
        self.layers = nn.Sequential(
            nn.Linear(input_size, hidden_size),
            nn.ReLU(),
            nn.Dropout(0.5),
            nn.Linear(hidden_size, hidden_size),
            nn.ReLU(),
            nn.Dropout(0.3),
            nn.Linear(hidden_size, output_size)
        )

    def forward(self, x):
        # BUG: Always applies dropout, even during evaluation
        self.layers.train()  # BUG HERE
        return self.layers(x)

In [None]:


# Question 16: Weight Initialization
def initialize_weights(model):
    """
    Implement different weight initialization strategies:
    - Xavier/Glorot initialization
    - He initialization
    - Custom initialization based on layer type

    Learning: Initialization importance, different strategies
    """
    # TODO: Implement various initialization schemes
    pass

In [None]:
## Other practice questions

"""
# Create a tensor with requires_grad, compute y = x^2 + 3x + 1, call backward().
# Use torch.autograd.grad to compute gradients without .backward().
# Implement manual gradient descent to learn y = 2x.
# Add training loop without zeroing gradients. Fix it.
# Write a custom autograd function for absolute value.
# Print and interpret gradients using hooks.
# Implement a custom Dataset class that loads data from CSV.
# Use DataLoader to batch, shuffle and load the dataset with batch size 8.
# Use torchvision transforms (ToTensor, Normalize) on image dataset.
# Apply composed transforms using transforms.Compose().
# Write a collate_fn that pads variable-length sequences.
"""

# SECTION 3: ADVANCED ARCHITECTURES (Questions 17-24)

In [None]:
# Question 17: Convolutional Neural Network
class CNN(nn.Module):
    """
    Build a CNN for image classification with:
    - Multiple conv layers with pooling
    - Batch normalization
    - Global average pooling

    Learning: CNN architecture, feature extraction
    """
    def __init__(self, num_classes=10, input_channels=3):
        super().__init__()
        # TODO: Define CNN architecture
        pass

    def forward(self, x):
        # TODO: Implement forward pass
        pass

In [None]:
# Question 18: Residual Connections
class ResidualBlock(nn.Module):
    """
    Implement a residual block with optional downsampling.

    Learning: Skip connections, gradient flow, identity mapping
    """
    def __init__(self, in_channels, out_channels, stride=1):
        super().__init__()
        # TODO: Implement residual block
        pass

    def forward(self, x):
        # TODO: Implement forward with residual connection
        pass

In [None]:


# Question 19: Attention Mechanism
class SimpleAttention(nn.Module):
    """
    Implement a basic attention mechanism.

    Learning: Attention computation, weighted aggregation
    """
    def __init__(self, hidden_size):
        super().__init__()
        # TODO: Define attention parameters
        pass

    def forward(self, query, key, value):
        # TODO: Implement attention computation
        pass

In [None]:


# Question 20: LSTM from Scratch
class CustomLSTM(nn.Module):
    """
    Implement LSTM cell from scratch using the mathematical formulation.

    Learning: RNN mechanics, gate mechanisms, hidden state evolution
    """
    def __init__(self, input_size, hidden_size):
        super().__init__()
        # TODO: Define LSTM parameters (gates, weights, biases)
        pass

    def forward(self, x, hidden=None):
        # TODO: Implement LSTM forward pass
        pass

In [None]:


# Question 21: Sequence-to-Sequence Model
class Seq2Seq(nn.Module):
    """
    Build a simple encoder-decoder model.

    Learning: Sequence modeling, encoder-decoder architecture
    """
    def __init__(self, input_vocab_size, output_vocab_size, hidden_size):
        super().__init__()
        # TODO: Define encoder and decoder
        pass

    def forward(self, src, tgt=None):
        # TODO: Implement seq2seq forward pass
        pass

In [None]:


# Question 22: Variational Autoencoder
class VAE(nn.Module):
    """
    Implement a Variational Autoencoder with:
    - Encoder (recognition network)
    - Decoder (generative network)
    - Reparameterization trick

    Learning: Generative models, variational inference, latent representations
    """
    def __init__(self, input_size, hidden_size, latent_size):
        super().__init__()
        # TODO: Define encoder and decoder networks
        pass

    def encode(self, x):
        # TODO: Encode input to latent parameters
        pass

    def reparameterize(self, mu, logvar):
        # TODO: Implement reparameterization trick
        pass

    def decode(self, z):
        # TODO: Decode latent to reconstruction
        pass

    def forward(self, x):
        # TODO: Full VAE forward pass
        pass

In [None]:


# Question 23: Transformer Block (INTENTIONAL BUG)
class TransformerBlock(nn.Module):
    """
    BUG ALERT: This transformer block has issues with attention mask application.
    Find and fix the bug!
    """
    def __init__(self, d_model, num_heads, d_ff, dropout=0.1):
        super().__init__()
        self.self_attn = nn.MultiheadAttention(d_model, num_heads, dropout=dropout)
        self.feed_forward = nn.Sequential(
            nn.Linear(d_model, d_ff),
            nn.ReLU(),
            nn.Linear(d_ff, d_model)
        )
        self.norm1 = nn.LayerNorm(d_model)
        self.norm2 = nn.LayerNorm(d_model)
        self.dropout = nn.Dropout(dropout)

    def forward(self, x, mask=None):
        # BUG: Incorrect mask application
        attn_out, _ = self.self_attn(x, x, x, attn_mask=mask)  # BUG HERE - mask shape/application
        x = self.norm1(x + self.dropout(attn_out))
        ff_out = self.feed_forward(x)
        x = self.norm2(x + self.dropout(ff_out))
        return x

In [None]:


# Question 24: Model Ensemble
class ModelEnsemble:
    """
    Create an ensemble of models with different voting strategies.

    Learning: Model combination, prediction aggregation, ensemble methods
    """
    def __init__(self, models):
        self.models = models

    def predict_average(self, x):
        # TODO: Average predictions from all models
        pass

    def predict_weighted(self, x, weights):
        # TODO: Weighted average based on model performance
        pass

    def predict_majority_vote(self, x):
        # TODO: Majority voting for classification
        pass

# SECTION 4: OPTIMIZATION AND TRAINING (Questions 25-30)

In [None]:
# Question 25: Custom Optimizer
class SGDMomentum(optim.Optimizer):
    """
    Implement SGD with momentum from scratch.

    Learning: Optimizer mechanics, parameter updates, momentum
    """
    def __init__(self, params, lr=0.01, momentum=0.9):
        defaults = dict(lr=lr, momentum=momentum)
        super().__init__(params, defaults)

    def step(self, closure=None):
        # TODO: Implement SGD with momentum step
        pass

In [None]:


# Question 26: Learning Rate Scheduling
def create_lr_scheduler(optimizer, schedule_type='cosine', **kwargs):
    """
    Create different types of learning rate schedulers.

    Learning: Learning rate scheduling strategies, training optimization
    """
    # TODO: Implement different scheduler types
    pass

In [None]:


# Question 27: Gradient Clipping and Monitoring
class GradientMonitor:
    """
    Monitor and clip gradients during training.

    Learning: Gradient explosion, monitoring training health
    """
    def __init__(self, model):
        self.model = model
        self.grad_norms = []

    def clip_gradients(self, max_norm):
        # TODO: Implement gradient clipping
        pass

    def log_gradient_norms(self):
        # TODO: Log gradient norms for monitoring
        pass

In [None]:


# Question 28: Mixed Precision Training
def setup_mixed_precision_training(model, optimizer):
    """
    Set up automatic mixed precision training.

    Learning: Memory optimization, numerical stability, AMP
    """
    # TODO: Implement AMP setup and usage
    pass

In [None]:


# Question 29: Model Checkpointing and Loading
class ModelCheckpoint:
    """
    Implement model checkpointing with state management.

    Learning: Model persistence, training resumption, state management
    """
    def __init__(self, model, optimizer, scheduler=None):
        self.model = model
        self.optimizer = optimizer
        self.scheduler = scheduler

    def save_checkpoint(self, filepath, epoch, loss, metrics=None):
        # TODO: Save complete training state
        pass

    def load_checkpoint(self, filepath):
        # TODO: Load and restore training state
        pass

In [None]:


# Question 30: Distributed Training Setup
def setup_distributed_training():
    """
    Set up basic distributed training configuration.

    Learning: Multi-GPU training, distributed optimization
    """
    # TODO: Implement distributed training setup
    pass

In [None]:
## Other practice questions


"""
Write a complete training loop with forward → loss → backward → optimizer step.

Add validation loop that computes accuracy and loss.

Save and load model weights with torch.save and torch.load.

Add early stopping logic if validation loss does not improve for 3 epochs.

Add torch.no_grad() context during evaluation. Explain why it's needed.

Add learning rate scheduler (StepLR or ReduceLROnPlateau).

Add tqdm progress bar and print loss, accuracy live.
"""

# SECTION 5: DEBUGGING AND OPTIMIZATION (Questions 31-35)

In [None]:
# Question 31: Memory Profiling
def profile_model_memory(model, input_shape, batch_size=32):
    """
    Profile memory usage of a model during forward and backward passes.

    Learning: Memory optimization, profiling tools
    """
    # TODO: Implement memory profiling
    pass

In [None]:


# Question 32: Model Debugging Tools
class ModelDebugger:
    """
    Create debugging tools for model inspection.

    Learning: Debugging techniques, model introspection
    """
    def __init__(self, model):
        self.model = model
        self.hooks = []

    def register_hooks(self):
        # TODO: Register forward/backward hooks for debugging
        pass

    def check_gradients(self):
        # TODO: Check for gradient issues (vanishing/exploding)
        pass

    def visualize_activations(self, x):
        # TODO: Visualize layer activations
        pass

In [None]:


# Question 33: Performance Optimization
def optimize_model_inference(model):
    """
    Apply various optimization techniques for inference.

    Learning: JIT compilation, quantization, optimization techniques
    """
    # TODO: Implement TorchScript, quantization, etc.
    pass

In [None]:


# Question 34: Custom Loss with Regularization (INTENTIONAL BUG)
class CustomLossWithRegularization(nn.Module):
    """
    BUG ALERT: This loss function has numerical stability issues.
    Find and fix the bugs!
    """
    def __init__(self, l1_weight=0.01, l2_weight=0.01):
        super().__init__()
        self.l1_weight = l1_weight
        self.l2_weight = l2_weight

    def forward(self, predictions, targets, model):
        # Primary loss
        mse_loss = F.mse_loss(predictions, targets)

        # L1 regularization - BUG: inefficient computation
        l1_reg = sum(p.abs().sum() for p in model.parameters())  # BUG: should use more efficient method

        # L2 regularization - BUG: incorrect computation
        l2_reg = sum(p.pow(2).sum() for p in model.parameters() if p.grad is not None)  # BUG HERE

        # BUG: Potential division by zero
        total_loss = mse_loss + self.l1_weight * l1_reg + self.l2_weight * l2_reg / len(list(model.parameters()))

        return total_loss

In [None]:


# Question 35: Advanced Training Techniques
class AdvancedTrainer(Trainer):
    """
    Extend the basic trainer with advanced techniques:
    - Curriculum learning
    - Progressive resizing
    - Test-time augmentation
    - Model averaging

    Learning: Advanced training strategies, performance optimization
    """
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        # TODO: Add advanced training components

    def curriculum_learning_step(self, epoch):
        # TODO: Implement curriculum learning
        pass

    def test_time_augmentation(self, x, num_augmentations=5):
        # TODO: Implement TTA for better predictions
        pass

    def exponential_moving_average_update(self, decay=0.999):
        # TODO: Update EMA of model weights
        pass

# LEARNING EXERCISES AND FOLLOW-UP QUESTIONS

In [None]:
"""
FOLLOW-UP QUESTIONS TO DEEPEN YOUR UNDERSTANDING:

Section 1 - Tensor Fundamentals:
1. What's the difference between .view() and .reshape()? When might each fail?
2. How does PyTorch's autograd system handle in-place operations, and why?
3. What are the memory implications of keeping computational graphs?
4. How does broadcasting work internally, and what are its limitations?

Section 2 - Neural Network Fundamentals:
5. Why is proper weight initialization crucial? Compare Xavier vs He initialization.
6. What happens to gradients in very deep networks, and how do residual connections help?
7. How do different optimizers (SGD, Adam, RMSprop) differ in their update rules?
8. What are the trade-offs between different activation functions?

Section 3 - Advanced Architectures:
9. How does the attention mechanism solve the bottleneck problem in seq2seq models?
10. What makes LSTMs better than vanilla RNNs at handling long sequences?
11. How does the reparameterization trick in VAEs enable backpropagation?
12. What are the key differences between different normalization techniques?

Section 4 - Optimization and Training:
13. How do different learning rate schedules affect convergence?
14. What are the benefits and challenges of mixed precision training?
15. How does gradient clipping prevent exploding gradients?
16. What are the considerations for distributed training?

Section 5 - Debugging and Optimization:
17. How can you identify and fix common training issues (overfitting, underfitting)?
18. What are the trade-offs between model accuracy and inference speed?
19. How do different quantization techniques affect model performance?
20. What debugging techniques help identify gradient flow issues?

IMPLEMENTATION CHALLENGES:
- Try to implement each skeleton class completely
- Find and fix all intentional bugs
- Optimize the implementations for better performance
- Add error handling and input validation
- Create comprehensive test cases
- Implement additional features beyond the basic requirements

INTERVIEW PREPARATION TIPS:
- Understand the mathematical foundations behind each implementation
- Be able to explain trade-offs and design decisions
- Practice coding these from scratch without looking at documentation
- Understand when to use which technique and why
- Be prepared to debug and optimize existing code
"""
