# Exercise 4: Debug Architectures with Shape Tracking

Shape mismatches are one of the most common errors you'll encounter in deep learning. The good news? They are also one of the easiest to debug once you know how to trace tensor shapes through your network.

> **Overview**: You'll encounter three broken PyTorch models that crash with shape-related errors. Using progressively sophisticated shape tracking techniques, you'll trace tensor flow, pinpoint where dimensions fail to align, and fix all three models.
> 
> **Scenario**: The HR team's employee attrition model is evolving: adding new features, deepening the architecture, and preparing for production deployment. But each change introduces bugs. Your job is to debug all three issues using shape tracking—the most practical debugging technique in deep learning.
> 
> **Goal**: Master the essential debugging skill of tracing tensor shapes through a network to identify and fix shape mismatch errors.
> 
> **Tools**: Python, PyTorch, NumPy, Pandas
> 
> **Estimated Time**: 20 minutes

## Step 1: Setup

Let's import our libraries and set up the environment.

In [None]:
# Import core libraries
import torch
import torch.nn as nn
import numpy as np
import pandas as pd
from datasets import load_dataset
import warnings
warnings.filterwarnings('ignore')

# Set random seeds for reproducibility
torch.manual_seed(42)
np.random.seed(42)

print("Setup complete!")
print(f"PyTorch version: {torch.__version__}")

## Step 2: Load and prepare data

> Note: This step mirrors the exact same dataset and dataset processing as for [demo 4](/cd1818-intro-to-deep-learning/4-feedforward/demo4-breaking-down-forward.ipynb).

We'll use the [Redsmoothy/HR_Attrition](https://huggingface.co/datasets/Redsmoothy/HR_Attrition) dataset from Hugging Face, which contains employment data for 1,470 employees.

For preprocessing, we'll:
1. Load the dataset
2. Select 10 key numeric features
3. Encode categorical variables
4. Normalize features to [0, 1] range
5. Encode the target (Attrition: Yes→1, No→0)
6. Convert to PyTorch tensors

**IMPORTANT: Feel free to skip this section to focus on the debugging task**. Just know that we end up with a dataset where each employee is represented by 10 numeric features.

In [None]:
# 1. Load the dataset
dataset = load_dataset('Redsmoothy/HR_Attrition', split='train')
print(f"✓ Dataset loaded: {len(dataset)} employees found\n")

# Convert to pandas
df = pd.DataFrame(dataset)

# 2. Select features
feature_columns = [
    'Age', 'DistanceFromHome', 'MonthlyIncome', 'TotalWorkingYears',
    'YearsAtCompany', 'YearsInCurrentRole', 'YearsSinceLastPromotion',
    'YearsWithCurrManager', 'WorkLifeBalance', 'OverTime'
]
features_df = df[feature_columns].copy()

# 3. Encode categorical variables
features_df['OverTime'] = (features_df['OverTime'] == 'Yes').astype(int)

# 4. Normalize features
feature_mins = features_df.min()
feature_maxs = features_df.max()
features_normalized = (features_df - feature_mins) / (feature_maxs - feature_mins + 1e-8)

# 5. Encode target
target = (df['Attrition'] == 'Yes').astype(int).values

# 6. Convert to tensors
X = torch.FloatTensor(features_normalized.values)
y = torch.FloatTensor(target).unsqueeze(1)

print(f"✓ Data ready: {X.shape[0]} employees, {X.shape[1]} features each")
print(f"✓ Target shape: {y.shape}\n")

## Step 3: The three debugging challenges

The HR attrition model has been through several updates, and each one introduced a different type of bug. You'll encounter three broken models that fail in different ways.

**Your debugging toolkit**: You'll use progressively sophisticated shape tracking techniques to solve each challenge:
- **Challenge 1**: Basic `print()` statements to trace shapes
- **Challenge 2**: Build a reusable helper function for cleaner tracking
- **Challenge 3**: Comparative tracking to reveal batch-dependent behavior

Each challenge requires you to: (1) add shape tracking, (2) identify the mismatch, and (3) fix the model's `__init__`/`forward()` method.

Let's dive into the challenges!

### Challenge 1 - Feature expansion gone wrong

The HR team wants to improve predictions by adding two new job satisfaction features to the model. The data science team updated the input layer to accept 12 features instead of 10, but something else broke in the process...

In [None]:
# Simulate the feature expansion by adding 2 random features
extra_features = torch.randn(X.shape[0], 2)
X_expanded = torch.cat([X, extra_features], dim=1)  # Now 12 features

print(f"Original features: {X.shape}")
print(f"Expanded features: {X_expanded.shape}")
print(f"✓ Added 2 new job satisfaction metrics\n")

##### Part A: Run the model, and observe the error

Let's try to run the model and see what error we get.

In [None]:
class BrokenExpandedMLP(nn.Module):
    
    def __init__(self):
        super(BrokenExpandedMLP, self).__init__()
        self.layer1 = nn.Linear(12, 20)
        self.layer2 = nn.Linear(10, 5)
        self.layer3 = nn.Linear(5, 1)
    
    def forward(self, x):
        x = self.layer1(x)
        x = torch.relu(x)
        x = self.layer2(x)
        x = torch.relu(x)
        x = self.layer3(x)
        x = torch.sigmoid(x)
        return x

broken_model = BrokenExpandedMLP()
print("Model with expanded input:")
print(broken_model)
print("\n" + "="*60)

In [None]:
# Try to run the model
sample_batch = X_expanded[:5]

print(f"Input shape: {sample_batch.shape}")
print("\nAttempting forward pass...\n")

try:
    output = broken_model(sample_batch)
    print(f"✓ Success! Output shape: {output.shape}")
except RuntimeError as e:
    print("✖ RuntimeError occurred!")
    print(f"\nError message: {e}")

> **Understanding the error**: The error mentions matrix multiplication failure, but doesn't clearly indicate which layer or why. Time to add shape tracking.

##### Part B: Add tensor shape tracking, and run again to debug

Add `print()` statements to track tensor shapes at each step of the forward pass.

In [None]:
class BrokenAttritionMLPWithTracking(nn.Module):
    """
    Same broken model, but with shape tracking to debug the issue.
    """
    
    def __init__(self):
        super(BrokenAttritionMLPWithTracking, self).__init__()
        self.layer1 = nn.Linear(12, 20)
        self.layer2 = nn.Linear(10, 5)
        self.layer3 = nn.Linear(5, 1)
    
    def forward(self, x):
        # TODO: Add print statements to track shapes
        # Hint: Use print(f"{Description}: {shape}") at strategic points
        # The description should describe what stage the tensor is at when you print its shape
        # Reference: https://docs.pytorch.org/docs/stable/generated/torch.Tensor.shape.html
        
        x = self.layer1(x)
        x = torch.relu(x)
        
        x = self.layer2(x)
        x = torch.relu(x)
        
        x = self.layer3(x)
        x = torch.sigmoid(x)
        
        return x


# Instantiate model with tracking
tracked_model = BrokenAttritionMLPWithTracking()
print("Model with shape tracking created!\n")

In [None]:
# Run with tracking
print("Running forward pass WITH shape tracking:\n")
print("="*60)

try:
    output = tracked_model(sample_batch)
    print("="*60)
    print(f"\n✓ Success! Output: {output.shape}")
except RuntimeError as e:
    print("="*60)
    print("\n✖ Error still occurs, but now we can see exactly WHERE!")
    print(f"\nError: {e}\n")

> **Reading the tracking output**: Notice where the print statements stop. The last shape that successfully printed tells you exactly where the error occurs. Compare that shape to what the next layer expects (you can see this in the `__init__` method or in the error message's matrix dimensions).

##### Part C: Fix the model, and run again

Time to define the corrected model with properly aligned dimensions.

In [None]:
class FixedExpandedMLP(nn.Module):
    # TODO: Define the correct model with fixed dimensions
    # Hint: You can copy-paste from BrokenAttritionMLPWithTracking, changing only what's needed
    
    # Add your code here

fixed_model_1 = FixedExpandedMLP()
print("Fixed model created!\n")
print(fixed_model_1)

In [None]:
# Test the fixed model
print("Testing the fixed model:\n")
print("="*60)

try:
    output = fixed_model_1(sample_batch)
    print("="*60)
    print("\n✓ SUCCESS! The model runs without errors.")
    print(f"\nOutput shape: {output.shape}")
except RuntimeError as e:
    print("="*60)
    print("\n✖ Still broken. Review your fix and try again.")
    print(f"\nError: {e}")

> **Verification success**: The shapes should now flow smoothly through every layer. Each transformation produces exactly what the next layer expects, and the model processes all 5 employees in parallel without issues.

##### TODO: Analysis question

**Question:** Based on the shape tracking you added, explain why activation functions like ReLU don't cause shape mismatches, while Linear layers can. What's fundamentally different about how these two types of operations work?

_Write your answer here:_

### Challenge 2: Deeper network, new problems

To capture more complex patterns in attrition risk, the team decided to add an extra hidden layer, creating a deeper 4-layer architecture. The model compiles but crashes during the forward pass...

In [None]:
# Use original 10-feature dataset for this challenge
sample_batch = X[:5]

##### Part A: Run the model, and observe the error

Let's try to run the model and see what error we get.

In [None]:
class BrokenDeeperMLP(nn.Module):
    def __init__(self):
        super(BrokenDeeperMLP, self).__init__()
        self.layer1 = nn.Linear(10, 30)
        self.layer2 = nn.Linear(30, 15)
        self.layer3 = nn.Linear(20, 10)
        self.layer4 = nn.Linear(10, 1)
    
    def forward(self, x):
        x = self.layer1(x)
        x = torch.relu(x)
        x = self.layer2(x)
        x = torch.relu(x)
        x = self.layer3(x)
        x = torch.relu(x)
        x = self.layer4(x)
        x = torch.sigmoid(x)
        return x

broken_deeper = BrokenDeeperMLP()
print("Deeper model created:")
print(broken_deeper)
print("\n" + "="*60)

In [None]:
# Try to run the model
print(f"Input shape: {sample_batch.shape}")
print("\nAttempting forward pass...\n")

try:
    output = broken_deeper(sample_batch)
    print(f"✓ Success! Output shape: {output.shape}")
except RuntimeError as e:
    print("✖ RuntimeError occurred!")
    print(f"\nError message: {e}")

> **More layers, more places to break**: In deeper networks, simple print statements can become cluttered. Time to build a reusable debugging tool.

##### Part B: Add tensor shape tracking, and run again to debug

Instead of plain `print()` statements, build a reusable helper function that formats shape information clearly.

In [None]:
# TODO: Create a helper function for formatted shape tracking
# Hint: Use f-strings to format output as: "name → shape"
# Format the name field to be 30 characters wide and shape to be 20 characters.
# Reference: https://docs.python.org/3/tutorial/inputoutput.html#formatted-string-literals

def print_shape_with_arrow(name, tensor):
    """
    Helper function to print tensor shapes in a clean, readable format.
    
    Args:
        name: Description of the operation/layer
        tensor: The tensor whose shape we want to display
    """
    # Add your code here

# Test the helper function
test_tensor = torch.randn(5, 10)
print("\nTesting the helper function:")
print("="*60)
print_shape_with_arrow("Test input", test_tensor)
print("="*60)
print("\n✓ Helper function ready!\n")

In [None]:
class BrokenDeeperMLPWithTracking(nn.Module):
    """
    Same broken model, now using the helper function for tracking.
    """
    
    def __init__(self):
        super(BrokenDeeperMLPWithTracking, self).__init__()
        self.layer1 = nn.Linear(10, 30)
        self.layer2 = nn.Linear(30, 15)
        self.layer3 = nn.Linear(20, 10)
        self.layer4 = nn.Linear(10, 1)
    
    def forward(self, x):
        # TODO: Use your helper function to track shapes at each step
        # Hint: Call print_shape_with_arrow(description, tensor) throughout
        
        x = self.layer1(x)
        x = torch.relu(x)
        
        x = self.layer2(x)
        x = torch.relu(x)
        
        x = self.layer3(x)
        x = torch.relu(x)

        x = self.layer4(x)
        x = torch.sigmoid(x)
        
        return x

tracked_deeper = BrokenDeeperMLPWithTracking()
print("Model with enhanced tracking created!\n")

In [None]:
# Run with enhanced tracking
print("Running forward pass with FORMATTED shape tracking:\n")
print("="*60)

try:
    output = tracked_deeper(sample_batch)
    print("="*60)
    print(f"\n✓ Success! Output: {output.shape}")
except RuntimeError as e:
    print("="*60)
    print(f"\nError: {e}\n")

> **Reading the formatted output**: The helper function makes it easy to scan through the shape flow. Notice where the prints stop - that's your clue. Check the unprinted layer's definition in `__init__` to identify the mismatch.


##### Part C: Fix the model, and run again

Time to define the corrected model with properly aligned dimensions.

In [None]:
class FixedDeeperMLP(nn.Module):
    # TODO: Define the correct model with fixed dimensions
    # Hint: You can copy-paste from BrokenDeeperMLPWithTracking, changing only what's needed
    
    # Add your code here

fixed_model_2 = FixedDeeperMLP()
print("Fixed deeper model created!\n")
print(fixed_model_2)

In [None]:
# Test the fixed model
print("Testing the fixed deeper model:\n")
print("="*60)

try:
    output = fixed_model_2(sample_batch)
    print("="*60)
    print("\n✓ SUCCESS! The deeper model now runs without errors.")
    print(f"\nOutput shape: {output.shape}")
except RuntimeError as e:
    print("="*60)
    print("\n✖ Still broken. Review the layer dimensions.")
    print(f"\nError: {e}")

> **Verification success**: The dimension chain should now be consistent: `10 → 30 → 15 → 10 → 1`. Your helper function made debugging and identifying the right fix much cleaner than manual inspection.

##### TODO: Analysis question

**Question:** The error message said "mat1 and mat2 shapes cannot be multiplied (5x15 and 20x10)". You could technically figure out the broken layer by examining the `__init__()` method and mentally tracing which layer outputs 15 features and which expects 20. Why is adding shape tracking still more efficient than this mental tracing approach, especially as networks get deeper?

_Write your answer here:_

### Challenge 3: The batch size 1 mystery

The model has been updated with normalization to improve convergence. It passes all training tests (batch size 32), but crashes in production when processing single employees. This is a subtle bug that only appears with certain batch sizes...

In [None]:
# Use original 10-feature dataset for this challenge
sample_batch = X[:32]

##### Part A: Run the model, and observe the error

Let's try to run the model and see what error we get.

In [None]:
class BrokenBatchDependentMLP(nn.Module):
    def __init__(self):
        super(BrokenBatchDependentMLP, self).__init__()
        self.layer1 = nn.Linear(10, 20)
        self.layer2 = nn.Linear(20, 1)
    
    def forward(self, x):
        batch_size = x.shape[0]
        
        x = self.layer1(x)
        x = torch.relu(x)
        
        x = x.reshape(32, 1, -1)  # (batch, 20) → (batch, 1, 20)
        x = x.mean(dim=1)  # Average across dim 1: (batch, 1, 20) → (batch, 20)
        
        x = self.layer2(x)
        x = torch.sigmoid(x)
        return x

broken_batch = BrokenBatchDependentMLP()
print("Batch-dependent model created:")
print(broken_batch)
print("\n" + "="*60)

In [None]:
# Test with batch size 32 (training scenario)
print("Testing with training batch size:")

try:
    output = broken_batch(sample_batch)
    print(f"✓ Works! Output shape: {output.shape}\n")
except RuntimeError as e:
    print(f"✖ Error: {e}\n")

# Test with batch size 1 (production scenario)
print("Testing with batch size 1 (production):")
batch_1 = X[:1]

try:
    output = broken_batch(batch_1)
    print(f"✓ Works! Output shape: {output.shape}")
except RuntimeError as e:
    print(f"✖ Error: {e}")
    print("\n" + "="*60)

> **Batch-dependent behavior**: The model's behavior changes based on input size. This suggests an operation that treats dimensions differently depending on their values.

##### Part B: Add tensor shape tracking, and run again to debug

Run both batch sizes and compare outputs side-by-side.

In [None]:
class BrokenBatchDependentMLPWithTracking(nn.Module):
    """
    Same broken model with comparative tracking.
    """
    
    def __init__(self):
        super(BrokenBatchDependentMLPWithTracking, self).__init__()
        self.layer1 = nn.Linear(10, 20)
        self.layer2 = nn.Linear(20, 1)
    
    def forward(self, x, batch_label=""):
        # TODO: Add comparative tracking focused on the batch size
        # Hint: Include the `batch_label` parameter in your print
        # This lets you run the same model with different batch sizes and compare outputs side-by-side.
        
        x = self.layer1(x)
        x = torch.relu(x)
        
        x = x.reshape(32, 1, -1)
        x = x.mean(dim=1)
        
        x = self.layer2(x)
        x = torch.sigmoid(x)
        
        return x

tracked_batch = BrokenBatchDependentMLPWithTracking()

In [None]:
# Run both batch sizes with tracking
print("Comparative shape tracking:\n")
print("="*60)

print("\nBatch size 32:")
print("-" * 40)
try:
    output = tracked_batch(sample_batch)
    print(f"  ✓ Success! Output: {output.shape}")
except RuntimeError as e:
    print(f"  ✖ Error: {e}")

print("\nBatch size 1:")
print("-" * 40)
try:
    output = tracked_batch(batch_1)
    print(f"  ✓ Success! Output: {output.shape}")
except RuntimeError as e:
    print(f"  ✖ Error: {e}")

print("\n" + "="*60)

> **Reading the comparative output**: Execution stops at the reshape operation when Batch=1. The error message shows `[32, 1, -1]`. Notice an unexpected `32` appearing in the reshape - does that number make sense for a batch of size 1?

##### Part C: Fix the model, and run again

Time to define the corrected model with properly aligned dimensions.

In [None]:
class FixedBatchDependentMLP(nn.Module):
    # TODO: Define the correct model with fixed dimensions
    # Hint: You can copy-paste from BatchDependentMLPWithTracking, changing only what's needed
    
    # Add your code here

fixed_model_3 = FixedBatchDependentMLP()
print("Fixed batch-independent model created!\n")
print(fixed_model_3)

In [None]:
# Test with multiple batch sizes
print("Testing fixed model with different batch sizes:\n")
print("="*60)

for batch_size in [1, 5, 32]:
    batch = X[:batch_size]
    print(f"\nBatch size: {batch_size}")
    print("-" * 40)
    
    try:
        output = fixed_model_3(batch)
        print(f"  ✓ Success! Consistent output shape: {output.shape}")
    except RuntimeError as e:
        print(f"  ✖ Error: {e}")

print("\n" + "="*60)
print("\n✓ Model works correctly with all batch sizes!")

##### TODO: Analysis question

**Question:** Why is hardcoding values like batch size a dangerous habit in neural networks? Beyond `reshape()`, what other PyTorch operations might behave unexpectedly when dimensions are hardcoded or when certain dimensions equal 1?

_Write your answer here:_

## Step 4: Test with different batch sizes

A robust model should handle any batch size gracefully. Let's verify all three fixed models work across different scenarios, from single predictions to large batches.

In [None]:
# Test all three fixed models comprehensively
test_batch_sizes = [1, 5, 32, 128]

print("Comprehensive batch size testing across all three models:\n")
print("="*70)

# Model 1: Feature expansion fix
print("\nModel 1: Feature Expansion (12 features → 1 output)")
print("-" * 70)
for batch_size in test_batch_sizes:
    batch = X_expanded[:batch_size]
    
    with torch.no_grad():  # No gradients needed for testing
        # Run without printing intermediate shapes
        output = fixed_model_1.layer3(torch.relu(fixed_model_1.layer2(
                 torch.relu(fixed_model_1.layer1(batch)))))
        output = torch.sigmoid(output)
    
    print(f"  Batch={batch_size:3d} → Input: {str(batch.shape):15s} → Output: {str(output.shape):15s} ✓")

# Model 2: Deeper network fix
print("\nModel 2: Deeper Network (10 → 30 → 15 → 10 → 1)")
print("-" * 70)
for batch_size in test_batch_sizes:
    batch = X[:batch_size]
    
    with torch.no_grad():
        output = fixed_model_2.layer4(torch.relu(fixed_model_2.layer3(
                 torch.relu(fixed_model_2.layer2(torch.relu(
                 fixed_model_2.layer1(batch)))))))
        output = torch.sigmoid(output)
    
    print(f"  Batch={batch_size:3d} → Input: {str(batch.shape):15s} → Output: {str(output.shape):15s} ✓")

# Model 3: Batch-dependent bug fix
print("\nModel 3: Batch-Independent (no fixed dimensions)")
print("-" * 70)
for batch_size in test_batch_sizes:
    batch = X[:batch_size]
    
    with torch.no_grad():
        output = fixed_model_3.layer2(torch.relu(fixed_model_3.layer1(batch)))
        output = torch.sigmoid(output)
    
    print(f"  Batch={batch_size:3d} → Input: {str(batch.shape):15s} → Output: {str(output.shape):15s} ✓")

print("\n" + "="*70)
print("\n✓ All three models work correctly across all batch sizes!")

> **Production-ready models are batch-agnostic**: All three fixed models now produce consistent output patterns: `(batch, 1)`. The first dimension scales with input, the second stays constant. Testing across [1, 5, 32, 128] confirms there are no hidden batch-dependent bugs. This robustness—working correctly whether processing one example or hundreds—is essential for real-world deployment where batch sizes vary unpredictably.

##### TODO: Analysis question

**Question:** Based on your experience debugging all three models, why is testing with batch size 1 especially important? What kinds of bugs only appear at this edge case?

_Write your answer here:_

## Conclusion

Congratulations! You've successfully debugged three different types of shape-related errors using progressively sophisticated tracking techniques.

**What you've accomplished:**

- [x] **Debugged a feature expansion mismatch** - Fixed incompatible layer sizes after adding new input features
- [x] **Debugged a deeper network** - Identified misaligned dimensions in a 4-layer architecture
- [x] **Debugged a batch-dependent operation** - Fixed a hardcoded reshape that only worked with one fixed batch size value
- [x] **Built reusable debugging tools** - Created helper functions for cleaner shape tracking
- [x] **Verified robustness** - Tested all fixes across multiple batch sizes (1, 5, 32, 128)

**Critical insights:**

- **Shape tracking is your primary debugging tool**: Whether simple prints or helper functions, inspecting shapes reveals exactly where dimensions fail to align
- **Linear layers transform dimensions**: Matrix multiplication changes feature counts and requires strict alignment between consecutive layers
- **Activation functions preserve dimensions**: Element-wise operations like ReLU never cause shape mismatches
- **Some operations are batch-dependent**: Hardcoded dimensions in operations like `reshape()` create bugs that only manifest with specific batch sizes
- **Batch size 1 is the critical edge case**: Many bugs only manifest when the batch dimension equals 1, making it essential for testing
- **Progressive debugging techniques**: Start simple (basic prints), then build tools (helper functions), then analyze systematically (comparative tracking)

Shape mismatch errors aren't mysteries: they're clear signals pointing to architectural inconsistencies. By adding targeted shape tracking (instead of logging everything), you turn opaque runtime failures into clear debugging steps, saving hours when building custom architectures or adapting models. When something breaks, you now know exactly how to trace and fix it.

> **Next steps to explore**: Print statements work for models you can edit, but for pre-trained or cleaner workflows, PyTorch’s hooks, decorators, and context managers let you log shapes non-invasively—ideal for scaling from development to production.