# Self-Supervised Neural Networks: An Interactive Lab

## üéØ Learning Objectives
By the end of this lab, you will:
- Understand the core principles of self-supervised learning (SSL)
- Implement pretext tasks for vision and time-series data
- Evaluate representation quality through transfer learning
- Compare generative vs. discriminative SSL approaches
- Build intuition about when and why SSL works

## üìö Prerequisites
- Basic understanding of neural networks and backpropagation
- Familiarity with NumPy and Python
- Linear algebra fundamentals (matrix multiplication, derivatives)

## üîó Recommended Reading
Before starting, consider reviewing:
- [A Survey on Self-supervised Learning](https://arxiv.org/abs/2301.05712)
- [Representation Learning: A Review and New Perspectives](https://arxiv.org/abs/1206.5538)
- [Self-supervised Visual Feature Learning with Deep Neural Networks](https://arxiv.org/abs/1712.05577)

## Module 1: Introduction to Self-Supervised Learning

Self-supervised learning (SSL) is a paradigm where models learn representations from unlabeled data by solving **pretext tasks**. The model generates its own supervision signal from the data structure itself.

### Key Concepts

**Two Main Families of SSL:**
1. **Generative/Predictive Methods**: Reconstruct or predict part of the input
   - Examples: Autoencoders, masked language modeling (BERT), image inpainting
   - Learns by minimizing reconstruction error

2. **Discriminative/Contrastive Methods**: Learn to distinguish between different views
   - Examples: SimCLR, MoCo, SwAV
   - Learns by pulling positive pairs together, pushing negatives apart

### ü§î Critical Thinking Question 1
**Why might SSL be particularly valuable in domains like medical imaging or astronomy?**

*Think about data availability, labeling costs, and domain expertise requirements.*

In [None]:
# Setup and imports
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix
import seaborn as sns
from typing import Tuple, List, Optional
import json
import os

# Set random seed for reproducibility
np.random.seed(42)

# Configure matplotlib
plt.style.use('seaborn-v0_8-darkgrid')
%matplotlib inline

### üìù Multiple Choice Question 1
**Which of the following is NOT a typical pretext task in self-supervised learning?**

In [None]:
# Multiple choice question implementation
class MultipleChoiceQuestion:
    def __init__(self, question, options, correct_answer, explanation):
        self.question = question
        self.options = options
        self.correct_answer = correct_answer
        self.explanation = explanation
    
    def display(self):
        print(self.question)
        for i, option in enumerate(self.options, 1):
            print(f"{i}. {option}")
    
    def check_answer(self, answer):
        if answer == self.correct_answer:
            print("‚úÖ Correct!")
        else:
            print(f"‚ùå Incorrect. The correct answer is {self.correct_answer}.")
        print(f"Explanation: {self.explanation}")

mcq1 = MultipleChoiceQuestion(
    "Which of the following is NOT a typical pretext task in self-supervised learning?",
    [
        "Predicting image rotations",
        "Classifying images into predefined categories",
        "Reconstructing masked patches",
        "Predicting next words in a sentence"
    ],
    2,
    "Classification into predefined categories requires labeled data, making it supervised learning, not self-supervised."
)

mcq1.display()
# To check your answer, uncomment and run:
# mcq1.check_answer(YOUR_ANSWER_NUMBER)

## Module 2: Computer Vision - Rotation Prediction

We'll implement a rotation prediction pretext task using the digits dataset. The model learns to predict how much an image has been rotated, developing useful features in the process.

### Step 1: Load and Explore the Data

In [None]:
# Load the digits dataset
digits = load_digits()
X, y = digits.data, digits.target

print(f"Dataset shape: {X.shape}")
print(f"Number of classes: {len(np.unique(y))}")
print(f"Pixel value range: [{X.min()}, {X.max()}]")

# Visualize sample digits
fig, axes = plt.subplots(2, 10, figsize=(12, 3))
for i in range(20):
    ax = axes[i // 10, i % 10]
    ax.imshow(digits.images[i], cmap='gray')
    ax.set_title(f"Label: {digits.target[i]}")
    ax.axis('off')
plt.suptitle("Sample Digits from Dataset")
plt.tight_layout()
plt.show()

# Normalize to [0, 1]
X = X.astype('float32') / 16.0

### üíª Exercise 1: Implement Data Augmentation (Easy)
Complete the function below to create rotated versions of images. This is your first pretext task!

In [None]:
def create_rotation_dataset(X: np.ndarray, 
                          rotations: Tuple[int, ...] = (0, 90, 180, 270)
                          ) -> Tuple[np.ndarray, np.ndarray]:
    """
    Create a dataset of rotated images for the pretext task.
    
    TODO: Complete this function
    Hint: Use np.rot90() with k parameter for 90-degree rotations
    """
    images = X.reshape(-1, 8, 8)
    rot_images = []
    rot_labels = []
    
    for idx, angle in enumerate(rotations):
        # TODO: Calculate how many 90-degree rotations needed
        k = ___  # FILL THIS
        
        for img in images:
            # TODO: Rotate the image and add to lists
            rotated = ___  # FILL THIS
            rot_images.append(___) # FILL THIS (hint: flatten the rotated image)
            rot_labels.append(___) # FILL THIS (hint: use idx)
    
    return np.array(rot_images, dtype=np.float32), np.array(rot_labels, dtype=np.int64)

# Test your implementation
# rot_X, rot_y = create_rotation_dataset(X[:10])  # Test with first 10 images
# print(f"Rotated dataset shape: {rot_X.shape}")
# print(f"Labels shape: {rot_y.shape}")
# print(f"Unique labels: {np.unique(rot_y)}")

### Solution for Exercise 1

In [None]:
# Solution (hidden by default - uncomment to reveal)
"""
def create_rotation_dataset(X: np.ndarray, 
                          rotations: Tuple[int, ...] = (0, 90, 180, 270)
                          ) -> Tuple[np.ndarray, np.ndarray]:
    images = X.reshape(-1, 8, 8)
    rot_images = []
    rot_labels = []
    
    for idx, angle in enumerate(rotations):
        k = (angle // 90) % 4
        
        for img in images:
            rotated = np.rot90(img, k=k)
            rot_images.append(rotated.flatten())
            rot_labels.append(idx)
    
    return np.array(rot_images, dtype=np.float32), np.array(rot_labels, dtype=np.int64)
"""

### Step 2: Build the Neural Network

Now we'll implement a simple 2-layer neural network from scratch. This helps you understand the fundamentals of SSL without framework abstractions.

### üíª Exercise 2: Complete the Forward Pass (Medium)

In [None]:
from dataclasses import dataclass

@dataclass
class TwoLayerNet:
    """A simple two-layer neural network for rotation prediction."""
    input_dim: int
    hidden_dim: int
    output_dim: int
    learning_rate: float = 0.5
    
    def __post_init__(self):
        # Initialize weights
        rng = np.random.default_rng(0)
        self.W1 = rng.standard_normal((self.input_dim, self.hidden_dim)) * 0.01
        self.b1 = np.zeros(self.hidden_dim)
        self.W2 = rng.standard_normal((self.hidden_dim, self.output_dim)) * 0.01
        self.b2 = np.zeros(self.output_dim)
    
    def forward(self, X: np.ndarray) -> Tuple[np.ndarray, Tuple]:
        """
        Forward pass through the network.
        
        TODO: Complete the forward pass
        Steps:
        1. Linear transformation: z1 = X @ W1 + b1
        2. Activation: a1 = tanh(z1)
        3. Linear transformation: z2 = a1 @ W2 + b2
        4. Softmax: convert z2 to probabilities
        """
        # Layer 1
        z1 = ___  # FILL: Linear transformation
        a1 = ___  # FILL: Apply tanh activation
        
        # Layer 2
        z2 = ___  # FILL: Linear transformation
        
        # Softmax (stable version)
        exp_scores = np.exp(z2 - np.max(z2, axis=1, keepdims=True))
        probs = ___  # FILL: Normalize exp_scores to get probabilities
        
        cache = (X, z1, a1, z2, probs)
        return probs, cache
    
    def backward(self, cache, y_true):
        """Backward pass (provided for you)."""
        X, z1, a1, z2, probs = cache
        n_samples = X.shape[0]
        
        # Convert labels to one-hot
        one_hot = np.zeros_like(probs)
        one_hot[np.arange(n_samples), y_true] = 1
        
        # Gradients
        dz2 = (probs - one_hot) / n_samples
        dW2 = a1.T.dot(dz2)
        db2 = dz2.sum(axis=0)
        
        da1 = dz2.dot(self.W2.T)
        dz1 = da1 * (1.0 - np.tanh(z1)**2)
        dW1 = X.T.dot(dz1)
        db1 = dz1.sum(axis=0)
        
        return dW1, db1, dW2, db2
    
    def update_params(self, dW1, db1, dW2, db2):
        """Update parameters using gradient descent."""
        self.W1 -= self.learning_rate * dW1
        self.b1 -= self.learning_rate * db1
        self.W2 -= self.learning_rate * dW2
        self.b2 -= self.learning_rate * db2
    
    def train(self, X, y, epochs=20, batch_size=128, verbose=False):
        """Training loop."""
        n_samples = X.shape[0]
        losses = []
        
        for epoch in range(epochs):
            # Shuffle data
            idx = np.random.permutation(n_samples)
            X_shuf, y_shuf = X[idx], y[idx]
            
            epoch_loss = 0
            n_batches = 0
            
            for start in range(0, n_samples, batch_size):
                end = min(start + batch_size, n_samples)
                X_batch = X_shuf[start:end]
                y_batch = y_shuf[start:end]
                
                # Forward and backward
                probs, cache = self.forward(X_batch)
                grads = self.backward(cache, y_batch)
                self.update_params(*grads)
                
                # Calculate loss
                batch_loss = -np.log(probs[np.arange(len(y_batch)), y_batch] + 1e-8).mean()
                epoch_loss += batch_loss
                n_batches += 1
            
            avg_loss = epoch_loss / n_batches
            losses.append(avg_loss)
            
            if verbose and (epoch + 1) % 5 == 0:
                print(f"Epoch {epoch+1}/{epochs}, Loss: {avg_loss:.4f}")
        
        return losses
    
    def predict(self, X):
        """Predict class labels."""
        probs, _ = self.forward(X)
        return probs.argmax(axis=1)
    
    def hidden_representation(self, X):
        """Extract hidden layer features."""
        z1 = X.dot(self.W1) + self.b1
        return np.tanh(z1)

### ü§î Critical Thinking Question 2
**Why do we use tanh activation instead of ReLU for this simple network?**

*Consider: gradient flow, bounded outputs, and the historical context of when tanh was popular.*

### Step 3: Train the Rotation Classifier

In [None]:
# Create rotation dataset (using the solution)
def create_rotation_dataset_solution(X: np.ndarray, 
                          rotations: Tuple[int, ...] = (0, 90, 180, 270)
                          ) -> Tuple[np.ndarray, np.ndarray]:
    images = X.reshape(-1, 8, 8)
    rot_images = []
    rot_labels = []
    
    for idx, angle in enumerate(rotations):
        k = (angle // 90) % 4
        for img in images:
            rotated = np.rot90(img, k=k)
            rot_images.append(rotated.flatten())
            rot_labels.append(idx)
    
    return np.array(rot_images, dtype=np.float32), np.array(rot_labels, dtype=np.int64)

# Create rotated dataset
rot_X, rot_y = create_rotation_dataset_solution(X)
print(f"Rotation dataset size: {rot_X.shape}")

# Split data
X_train, X_val, y_train, y_val = train_test_split(
    rot_X, rot_y, test_size=0.2, random_state=42
)

# Train the network (with solution for forward pass)
class TwoLayerNetComplete(TwoLayerNet):
    def forward(self, X: np.ndarray) -> Tuple[np.ndarray, Tuple]:
        z1 = X.dot(self.W1) + self.b1
        a1 = np.tanh(z1)
        z2 = a1.dot(self.W2) + self.b2
        exp_scores = np.exp(z2 - np.max(z2, axis=1, keepdims=True))
        probs = exp_scores / exp_scores.sum(axis=1, keepdims=True)
        cache = (X, z1, a1, z2, probs)
        return probs, cache

# Train
net = TwoLayerNetComplete(input_dim=64, hidden_dim=32, output_dim=4, learning_rate=0.3)
losses = net.train(X_train, y_train, epochs=15, batch_size=256, verbose=True)

# Evaluate
val_preds = net.predict(X_val)
val_acc = (val_preds == y_val).mean()
print(f"\nRotation classification accuracy: {val_acc:.3f}")

# Plot training loss
plt.figure(figsize=(8, 4))
plt.plot(losses)
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.title('Training Loss for Rotation Prediction')
plt.grid(True)
plt.show()

### üíª Exercise 3: Analyze Learned Features (Medium-Hard)
Implement a function to visualize what features the network has learned.

In [None]:
def visualize_learned_features(net, X_sample, n_features=8):
    """
    Visualize the activation patterns of hidden units.
    
    TODO: Complete this function
    1. Get hidden representations for sample images
    2. Find images that maximally activate each hidden unit
    3. Visualize these images
    """
    # Get hidden representations
    hidden = ___  # FILL: Use net.hidden_representation()
    
    # TODO: For each of the first n_features hidden units,
    # find the image that activates it most strongly
    
    fig, axes = plt.subplots(2, n_features//2, figsize=(12, 4))
    axes = axes.flatten()
    
    for i in range(min(n_features, hidden.shape[1])):
        # TODO: Find index of maximum activation for hidden unit i
        max_idx = ___  # FILL: np.argmax for column i
        
        # Reshape and display
        img = X_sample[max_idx].reshape(8, 8)
        axes[i].imshow(img, cmap='gray')
        axes[i].set_title(f"Unit {i}\nAct: {hidden[max_idx, i]:.2f}")
        axes[i].axis('off')
    
    plt.suptitle("Images that Maximally Activate Hidden Units")
    plt.tight_layout()
    plt.show()

# Test your implementation
# visualize_learned_features(net, X[:100], n_features=8)

### Step 4: Transfer Learning for Digit Classification

Now we'll use the features learned from rotation prediction for the downstream task of digit classification.

In [None]:
# Extract features using the pretrained network
features = net.hidden_representation(X)

# Split for downstream task
X_train_d, X_test_d, y_train_d, y_test_d = train_test_split(
    features, y, test_size=0.3, random_state=1
)

X_train_raw, X_test_raw, y_train_raw, y_test_raw = train_test_split(
    X, y, test_size=0.3, random_state=1
)

# Train classifiers
clf_feat = LogisticRegression(max_iter=200, multi_class='auto', solver='lbfgs')
clf_feat.fit(X_train_d, y_train_d)
feat_acc = clf_feat.score(X_test_d, y_test_d)

clf_base = LogisticRegression(max_iter=200, multi_class='auto', solver='lbfgs')
clf_base.fit(X_train_raw, y_train_raw)
base_acc = clf_base.score(X_test_raw, y_test_raw)

print(f"Digit classification using SSL features: {feat_acc:.3f}")
print(f"Baseline classification on raw pixels: {base_acc:.3f}")
print(f"Improvement: {(feat_acc - base_acc)*100:.1f}%")

# Visualize confusion matrices
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))

# SSL features confusion matrix
y_pred_feat = clf_feat.predict(X_test_d)
cm_feat = confusion_matrix(y_test_d, y_pred_feat)
sns.heatmap(cm_feat, annot=True, fmt='d', ax=ax1, cmap='Blues')
ax1.set_title(f'SSL Features (Acc: {feat_acc:.3f})')
ax1.set_xlabel('Predicted')
ax1.set_ylabel('True')

# Baseline confusion matrix
y_pred_base = clf_base.predict(X_test_raw)
cm_base = confusion_matrix(y_test_raw, y_pred_base)
sns.heatmap(cm_base, annot=True, fmt='d', ax=ax2, cmap='Blues')
ax2.set_title(f'Raw Pixels (Acc: {base_acc:.3f})')
ax2.set_xlabel('Predicted')
ax2.set_ylabel('True')

plt.tight_layout()
plt.show()

### üìù Multiple Choice Question 2
**Why might SSL features sometimes underperform raw pixels on small, simple datasets?**

In [None]:
mcq2 = MultipleChoiceQuestion(
    "Why might SSL features sometimes underperform raw pixels on small, simple datasets?",
    [
        "The pretext task is too difficult",
        "Information bottleneck: features may lose task-specific details",
        "SSL always performs worse than supervised learning",
        "The network architecture is too complex"
    ],
    2,
    "SSL creates a bottleneck that captures general features but may lose fine-grained details important for simple tasks. SSL shines with limited labels or complex data."
)

mcq2.display()
# mcq2.check_answer(YOUR_ANSWER)

## Module 3: Time Series - Autoencoder

Now we'll explore generative SSL using autoencoders on synthetic time series data.

### Step 1: Generate Synthetic Data

In [None]:
def generate_sine_sequences(n_samples=1000, length=50, freq0=1.0, freq1=3.0, noise_std=0.1):
    """Generate sine sequences with two different frequencies."""
    t = np.linspace(0, 2 * np.pi, length)
    half = n_samples // 2
    
    # Class 0: low frequency
    seq0 = np.sin(freq0 * t)[None, :] * np.ones((half, 1))
    # Class 1: high frequency
    seq1 = np.sin(freq1 * t)[None, :] * np.ones((n_samples - half, 1))
    
    X = np.concatenate([seq0, seq1], axis=0)
    X += np.random.normal(scale=noise_std, size=X.shape)
    y = np.concatenate([np.zeros(half, dtype=int), np.ones(n_samples - half, dtype=int)])
    
    return X.astype(np.float32), y

# Generate data
X_ts, y_ts = generate_sine_sequences()

# Visualize examples
plt.figure(figsize=(10, 4))
plt.subplot(1, 2, 1)
for i in range(3):
    plt.plot(X_ts[i], alpha=0.7, label=f"Sample {i+1}")
plt.title("Class 0: Low Frequency")
plt.xlabel("Time")
plt.ylabel("Value")
plt.legend()
plt.grid(True)

plt.subplot(1, 2, 2)
for i in range(3):
    plt.plot(X_ts[-(i+1)], alpha=0.7, label=f"Sample {i+1}")
plt.title("Class 1: High Frequency")
plt.xlabel("Time")
plt.ylabel("Value")
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()

### üíª Exercise 4: Implement Autoencoder Training (Hard)
Complete the autoencoder implementation, focusing on the reconstruction loss.

In [None]:
@dataclass
class Autoencoder:
    """Simple autoencoder for time series."""
    input_dim: int
    hidden_dim: int
    learning_rate: float = 0.1
    
    def __post_init__(self):
        rng = np.random.default_rng(1)
        self.W_enc = rng.standard_normal((self.input_dim, self.hidden_dim)) * 0.05
        self.b_enc = np.zeros(self.hidden_dim)
        self.W_dec = rng.standard_normal((self.hidden_dim, self.input_dim)) * 0.05
        self.b_dec = np.zeros(self.input_dim)
    
    def forward(self, X):
        """Forward pass: encode then decode."""
        # Encode
        z = X.dot(self.W_enc) + self.b_enc
        h = np.tanh(z)
        
        # Decode
        recon = h.dot(self.W_dec) + self.b_dec
        return h, recon
    
    def compute_loss(self, X, recon):
        """
        TODO: Implement mean squared error loss
        """
        # FILL: Compute MSE between X and recon
        loss = ___
        return loss
    
    def backward(self, X, h, recon):
        """
        TODO: Complete the backward pass
        Hint: Start from dL/d_recon = (recon - X) / n_samples
        """
        n_samples = X.shape[0]
        
        # Gradient of loss w.r.t reconstruction
        d_recon = ___  # FILL: (recon - X) / n_samples
        
        # Decoder gradients
        dW_dec = ___  # FILL: h.T @ d_recon
        db_dec = ___  # FILL: sum over samples
        
        # Propagate to encoder
        dh = ___  # FILL: d_recon @ W_dec.T
        dz = dh * (1.0 - h**2)  # tanh derivative
        
        # Encoder gradients
        dW_enc = ___  # FILL: X.T @ dz
        db_enc = ___  # FILL: sum over samples
        
        return dW_enc, db_enc, dW_dec, db_dec
    
    def update_params(self, dW_enc, db_enc, dW_dec, db_dec):
        self.W_enc -= self.learning_rate * dW_enc
        self.b_enc -= self.learning_rate * db_enc
        self.W_dec -= self.learning_rate * dW_dec
        self.b_dec -= self.learning_rate * db_dec
    
    def train(self, X, epochs=30, batch_size=64, verbose=False):
        n_samples = X.shape[0]
        losses = []
        
        for epoch in range(epochs):
            idx = np.random.permutation(n_samples)
            X_shuf = X[idx]
            
            epoch_loss = 0
            n_batches = 0
            
            for start in range(0, n_samples, batch_size):
                end = min(start + batch_size, n_samples)
                X_batch = X_shuf[start:end]
                
                h, recon = self.forward(X_batch)
                loss = self.compute_loss(X_batch, recon)
                grads = self.backward(X_batch, h, recon)
                self.update_params(*grads)
                
                epoch_loss += loss
                n_batches += 1
            
            avg_loss = epoch_loss / n_batches
            losses.append(avg_loss)
            
            if verbose and (epoch + 1) % 10 == 0:
                print(f"Epoch {epoch+1}/{epochs}, Loss: {avg_loss:.4f}")
        
        return losses
    
    def encode(self, X):
        z = X.dot(self.W_enc) + self.b_enc
        return np.tanh(z)
    
    def reconstruct(self, X):
        h = self.encode(X)
        return h.dot(self.W_dec) + self.b_dec

### Train and Evaluate the Autoencoder

In [None]:
# Complete autoencoder implementation (solution)
class AutoencoderComplete(Autoencoder):
    def compute_loss(self, X, recon):
        return np.mean((X - recon)**2)
    
    def backward(self, X, h, recon):
        n_samples = X.shape[0]
        d_recon = (recon - X) / n_samples
        dW_dec = h.T.dot(d_recon)
        db_dec = d_recon.sum(axis=0)
        dh = d_recon.dot(self.W_dec.T)
        dz = dh * (1.0 - h**2)
        dW_enc = X.T.dot(dz)
        db_enc = dz.sum(axis=0)
        return dW_enc, db_enc, dW_dec, db_dec

# Normalize data
X_ts_norm = (X_ts - X_ts.mean(axis=1, keepdims=True)) / (X_ts.std(axis=1, keepdims=True) + 1e-6)

# Split data
X_train_ts, X_test_ts, y_train_ts, y_test_ts = train_test_split(
    X_ts_norm, y_ts, test_size=0.3, random_state=0
)

# Train autoencoder
ae = AutoencoderComplete(input_dim=X_train_ts.shape[1], hidden_dim=16, learning_rate=0.05)
ae_losses = ae.train(X_train_ts, epochs=30, batch_size=128, verbose=True)

# Plot reconstruction quality
n_examples = 4
fig, axes = plt.subplots(n_examples, 2, figsize=(10, 8))

for i in range(n_examples):
    idx = i * 100  # Sample different sequences
    original = X_test_ts[idx]
    reconstructed = ae.reconstruct(original.reshape(1, -1)).flatten()
    
    axes[i, 0].plot(original, 'b-', label='Original')
    axes[i, 0].plot(reconstructed, 'r--', label='Reconstructed')
    axes[i, 0].set_ylabel(f"Seq {idx}")
    axes[i, 0].legend()
    axes[i, 0].grid(True)
    
    axes[i, 1].plot(original - reconstructed, 'g-')
    axes[i, 1].set_ylabel("Error")
    axes[i, 1].grid(True)
    
axes[0, 0].set_title("Sequence Reconstruction")
axes[0, 1].set_title("Reconstruction Error")
axes[-1, 0].set_xlabel("Time")
axes[-1, 1].set_xlabel("Time")

plt.tight_layout()
plt.show()

### Transfer Learning: Classification with Embeddings

In [None]:
# Get embeddings
train_emb = ae.encode(X_train_ts)
test_emb = ae.encode(X_test_ts)

# Train classifiers
clf_emb = LogisticRegression(max_iter=200)
clf_emb.fit(train_emb, y_train_ts)
emb_acc = clf_emb.score(test_emb, y_test_ts)

clf_raw = LogisticRegression(max_iter=200)
clf_raw.fit(X_train_ts, y_train_ts)
raw_acc = clf_raw.score(X_test_ts, y_test_ts)

print(f"Classification using embeddings: {emb_acc:.3f}")
print(f"Classification using raw sequences: {raw_acc:.3f}")
print(f"Dimensionality reduction: {X_train_ts.shape[1]} ‚Üí {train_emb.shape[1]}")

# Visualize embeddings with t-SNE
from sklearn.manifold import TSNE

tsne = TSNE(n_components=2, random_state=42)
emb_2d = tsne.fit_transform(test_emb[:300])  # Use subset for speed

plt.figure(figsize=(8, 6))
scatter = plt.scatter(emb_2d[:, 0], emb_2d[:, 1], c=y_test_ts[:300], cmap='coolwarm', alpha=0.7)
plt.colorbar(scatter, label='Class')
plt.xlabel('t-SNE 1')
plt.ylabel('t-SNE 2')
plt.title('Autoencoder Embeddings Visualization')
plt.grid(True, alpha=0.3)
plt.show()

## Module 4: Advanced Concepts and Extensions

### üíª Exercise 5: Implement Contrastive Learning (Advanced)
Implement a simple contrastive loss for the time series data.

In [None]:
def create_augmented_pairs(X, augmentation_noise=0.1):
    """
    Create positive pairs by adding noise to sequences.
    
    TODO: Complete this function
    - For each sequence, create a positive pair by adding small noise
    - Return original, augmented, and labels (1 for positive, 0 for negative)
    """
    n_samples = len(X)
    
    # Positive pairs (same sequence with noise)
    X_anchor = X.copy()
    X_positive = ___  # FILL: X + noise
    
    # Negative pairs (different sequences)
    # TODO: Create negative pairs by shuffling indices
    
    return X_anchor, X_positive

def contrastive_loss(embeddings1, embeddings2, labels, margin=1.0):
    """
    Compute contrastive loss.
    
    TODO: Implement the loss
    L = y * d^2 + (1-y) * max(0, margin - d)^2
    where d is the Euclidean distance between embeddings
    """
    # FILL: Compute pairwise distances
    distances = ___
    
    # FILL: Compute loss
    loss = ___
    
    return loss

# Skeleton code for testing
# X_anchor, X_positive = create_augmented_pairs(X_train_ts[:100])
# emb1 = ae.encode(X_anchor)
# emb2 = ae.encode(X_positive)
# labels = np.ones(len(X_anchor))  # All positive pairs
# loss = contrastive_loss(emb1, emb2, labels)
# print(f"Contrastive loss: {loss:.4f}")

## Assessment Module: Open-Ended Questions with Gemini Verification

This section includes open-ended questions that can be automatically evaluated using the Gemini API.

In [None]:
import json
import os
from typing import Dict, List, Optional
import requests

class OpenEndedAssessment:
    """Handle open-ended questions with AI verification."""
    
    def __init__(self, api_key: Optional[str] = None):
        self.api_key = api_key or os.getenv('GEMINI_API_KEY')
        self.questions = self._load_questions()
    
    def _load_questions(self) -> List[Dict]:
        """Load assessment questions."""
        return [
            {
                "id": "q1",
                "question": "Explain why rotation prediction is an effective pretext task for learning visual features. What properties of the task make it useful?",
                "rubric": [
                    "Mentions that rotation is a geometric transformation",
                    "Notes that it requires understanding object structure",
                    "Explains that labels are free (self-supervised)",
                    "Discusses invariance/equivariance properties"
                ],
                "sample_answer": "Rotation prediction works because it forces the network to understand spatial structure and object geometry. The task requires recognizing features regardless of orientation, learning rotation-equivariant representations. Labels are automatically generated without human annotation."
            },
            {
                "id": "q2",
                "question": "Compare and contrast autoencoders with contrastive learning methods. When would you choose one over the other?",
                "rubric": [
                    "Identifies autoencoders as generative/reconstructive",
                    "Identifies contrastive as discriminative",
                    "Mentions computational efficiency differences",
                    "Discusses use cases for each"
                ],
                "sample_answer": "Autoencoders learn by reconstruction, capturing all input details including noise. Contrastive methods learn by comparing samples, focusing on discriminative features. Autoencoders are simpler but may learn trivial solutions. Contrastive methods are more robust but require careful augmentation design."
            },
            {
                "id": "q3",
                "question": "Design a novel pretext task for learning representations from text data. Explain your reasoning.",
                "rubric": [
                    "Proposes a specific, implementable task",
                    "Explains how labels are generated automatically",
                    "Justifies why the task would learn useful features",
                    "Considers computational feasibility"
                ],
                "sample_answer": "One novel task could be 'sentence ordering': given shuffled sentences from a paragraph, predict the correct order. This requires understanding discourse structure, temporal relationships, and causal dependencies. Labels come from the original ordering, making it fully self-supervised."
            }
        ]
    
    def evaluate_answer(self, question_id: str, user_answer: str) -> Dict:
        """Evaluate user answer using Gemini API."""
        question_data = next((q for q in self.questions if q['id'] == question_id), None)
        if not question_data:
            return {"error": "Question not found"}
        
        if not self.api_key:
            return self._manual_evaluation(question_data, user_answer)
        
        # Prepare evaluation prompt
        evaluation_prompt = self._create_evaluation_prompt(question_data, user_answer)
        
        # Call Gemini API
        try:
            response = self._call_gemini_api(evaluation_prompt)
            return self._parse_evaluation(response)
        except Exception as e:
            return {"error": f"API call failed: {str(e)}"}
    
    def _create_evaluation_prompt(self, question_data: Dict, user_answer: str) -> str:
        """Create prompt for Gemini evaluation."""
        prompt = f"""
        Evaluate the following answer to a self-supervised learning question.
        
        Question: {question_data['question']}
        
        Evaluation Rubric:
        {json.dumps(question_data['rubric'], indent=2)}
        
        Reference Answer: {question_data['sample_answer']}
        
        User Answer: {user_answer}
        
        Please evaluate the answer and provide:
        1. Score (0-100)
        2. Which rubric points were addressed
        3. What was missing or could be improved
        4. Any misconceptions to correct
        
        Format your response as JSON:
        {{
            "score": <number>,
            "rubric_met": [<list of rubric points addressed>],
            "strengths": "<what was done well>",
            "improvements": "<what could be better>",
            "feedback": "<constructive feedback for the student>"
        }}
        """
        return prompt
    
    def _call_gemini_api(self, prompt: str) -> str:
        """Call Gemini API for evaluation."""
        # This is a placeholder - actual implementation would use the Gemini API
        # For demonstration, we'll simulate the response
        url = f"https://generativelanguage.googleapis.com/v1beta/models/gemini-pro:generateContent?key={self.api_key}"
        
        payload = {
            "contents": [{
                "parts": [{"text": prompt}]
            }]
        }
        
        # Simulated response for demonstration
        simulated_response = {
            "score": 85,
            "rubric_met": ["Mentions geometric transformation", "Explains free labels"],
            "strengths": "Good understanding of core concepts",
            "improvements": "Could discuss invariance properties more",
            "feedback": "Strong answer! Consider exploring how rotation prediction relates to downstream tasks."
        }
        
        return json.dumps(simulated_response)
    
    def _parse_evaluation(self, response: str) -> Dict:
        """Parse Gemini API response."""
        try:
            return json.loads(response)
        except json.JSONDecodeError:
            return {"error": "Failed to parse API response"}
    
    def _manual_evaluation(self, question_data: Dict, user_answer: str) -> Dict:
        """Provide manual evaluation guidance when API is not available."""
        return {
            "message": "API key not configured. Please self-evaluate using the rubric.",
            "rubric": question_data['rubric'],
            "sample_answer": question_data['sample_answer'],
            "self_evaluation_guide": [
                "Compare your answer to the sample",
                "Check each rubric point",
                "Award 25 points per rubric item addressed",
                "Consider partial credit for partially addressed items"
            ]
        }

# Initialize assessment system
assessment = OpenEndedAssessment()

# Display questions
print("üìù Open-Ended Assessment Questions\n")
for q in assessment.questions:
    print(f"Question {q['id']}: {q['question']}\n")

### Answer Submission Interface

In [None]:
def submit_answer(question_id: str, answer: str):
    """Submit and evaluate an answer."""
    print(f"\nüìä Evaluating answer for question {question_id}...\n")
    result = assessment.evaluate_answer(question_id, answer)
    
    if 'error' in result:
        print(f"‚ùå Error: {result['error']}")
    elif 'message' in result:
        print(f"‚ÑπÔ∏è {result['message']}\n")
        print("Rubric:")
        for item in result['rubric']:
            print(f"  ‚Ä¢ {item}")
        print(f"\nSample Answer: {result['sample_answer']}")
    else:
        print(f"Score: {result['score']}/100")
        print(f"\n‚úÖ Strengths: {result['strengths']}")
        print(f"\nüìà Areas for Improvement: {result['improvements']}")
        print(f"\nüí° Feedback: {result['feedback']}")

# Example usage:
# submit_answer("q1", "Rotation prediction helps because...")

### Assessment Configuration File
Save this configuration for external assessment systems:

In [None]:
# Create assessment configuration
assessment_config = {
    "lab_title": "Self-Supervised Neural Networks Lab",
    "version": "2.0",
    "modules": [
        {
            "name": "Introduction to SSL",
            "weight": 0.2,
            "assessments": ["mcq1", "q1"]
        },
        {
            "name": "Vision - Rotation Prediction",
            "weight": 0.3,
            "assessments": ["exercise1", "exercise2", "mcq2"]
        },
        {
            "name": "Time Series - Autoencoder",
            "weight": 0.3,
            "assessments": ["exercise4", "q2"]
        },
        {
            "name": "Advanced Concepts",
            "weight": 0.2,
            "assessments": ["exercise5", "q3"]
        }
    ],
    "grading_scheme": {
        "exercises": {
            "exercise1": {"points": 10, "difficulty": "easy"},
            "exercise2": {"points": 15, "difficulty": "medium"},
            "exercise3": {"points": 15, "difficulty": "medium-hard"},
            "exercise4": {"points": 20, "difficulty": "hard"},
            "exercise5": {"points": 20, "difficulty": "advanced"}
        },
        "mcqs": {
            "mcq1": {"points": 5},
            "mcq2": {"points": 5}
        },
        "open_ended": {
            "q1": {"points": 10, "rubric_items": 4},
            "q2": {"points": 10, "rubric_items": 4},
            "q3": {"points": 10, "rubric_items": 4}
        },
        "total_points": 120
    },
    "api_configuration": {
        "provider": "gemini",
        "model": "gemini-pro",
        "temperature": 0.3,
        "max_tokens": 1000
    }
}

# Save configuration
with open('assessment_config.json', 'w') as f:
    json.dump(assessment_config, f, indent=2)

print("‚úÖ Assessment configuration saved to assessment_config.json")

## Summary and Next Steps

### üéâ Congratulations!
You've completed the interactive SSL lab. You've learned:
- The principles of self-supervised learning
- How to implement pretext tasks (rotation prediction)
- Generative SSL with autoencoders
- Transfer learning with learned representations
- The trade-offs between different SSL approaches

### üìö Further Reading
1. **Vision SSL:**
   - [SimCLR Paper](https://arxiv.org/abs/2002.05709)
   - [MoCo Paper](https://arxiv.org/abs/1911.05722)
   - [MAE (Masked Autoencoders)](https://arxiv.org/abs/2111.06377)

2. **Language SSL:**
   - [BERT Paper](https://arxiv.org/abs/1810.04805)
   - [GPT Series](https://arxiv.org/abs/2005.14165)

3. **Time Series SSL:**
   - [TS2Vec](https://arxiv.org/abs/2106.10466)
   - [TNC (Temporal Neighborhood Coding)](https://arxiv.org/abs/2106.00750)

### üöÄ Challenge Extensions
1. Implement SimCLR for the digits dataset
2. Try masked patch prediction as a pretext task
3. Combine multiple pretext tasks (multi-task SSL)
4. Apply SSL to your own dataset
5. Implement momentum contrast (MoCo)

### ü§ù Join the Community
- Share your results and implementations
- Contribute new pretext tasks
- Help improve this lab

Remember: SSL is about creativity in finding supervision signals within data itself!