# üöÄ Google Colab Pro Setup Guide
# Tri-Objective Robust XAI for Medical Imaging

This notebook will help you set up the complete project environment on Google Colab Pro.

## üìã Prerequisites
1. **Google Colab Pro** subscription (for better GPU access)
2. **GitHub repository** up to date: `viraj1011JAIN/tri-objective-robust-xai-medimg`
3. **Data zip file** uploaded to Google Drive
4. **GPU Runtime** enabled (Runtime ‚Üí Change runtime type ‚Üí GPU)

## üéØ What This Notebook Does
1. Mounts Google Drive
2. Clones the GitHub repository
3. Extracts your data from Google Drive
4. Installs PyTorch with CUDA support
5. Installs all project dependencies
6. Verifies GPU and environment setup
7. Runs quick tests to ensure everything works
8. Provides training examples

---

## ‚ö†Ô∏è Important Notes
- **Save checkpoints to Google Drive** regularly to prevent data loss
- **Keep session alive** - Colab disconnects after ~90 minutes of inactivity
- **Use T4/V100/A100 GPU** for best performance (check Runtime type)
- **Update paths** in cells marked with `# TODO` to match your Google Drive structure

## Step 1: Mount Google Drive

This will allow access to your data files stored in Google Drive.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

# Verify mount
import os
print("‚úÖ Google Drive mounted successfully!")
print(f"üìÅ Drive contents: {os.listdir('/content/drive/MyDrive/')[:10]}")  # Show first 10 items

## Step 2: Clone GitHub Repository

Clone the project repository from GitHub.

In [None]:
# Clone the repository
!git clone https://github.com/viraj1011JAIN/tri-objective-robust-xai-medimg.git
%cd tri-objective-robust-xai-medimg

# Verify repository contents
!echo "‚úÖ Repository cloned successfully!"
!echo ""
!echo "üìÇ Project structure:"
!ls -la

!echo ""
!echo "üîç Checking key directories:"
!ls -d src/ tests/ configs/ notebooks/ 2>/dev/null || echo "Directories found!"

## Step 3: Extract Data from Google Drive

**üìç TODO: Update the `zip_path` variable to match your Google Drive structure!**

Your zip file should contain datasets like ISIC2018, NIH CXR-14, etc.

In [None]:
import zipfile
import os
from pathlib import Path

# TODO: Update this path to your actual zip file location in Google Drive
zip_path = '/content/drive/MyDrive/dissertation_data/medical_imaging_data.zip'

# Create data directory if it doesn't exist
data_dir = Path('./data/raw')
data_dir.mkdir(parents=True, exist_ok=True)

print(f"üì¶ Extracting data from: {zip_path}")
print(f"üìÅ Extracting to: {data_dir}")
print("‚è≥ This may take several minutes depending on data size...")

try:
    with zipfile.ZipFile(zip_path, 'r') as zip_ref:
        # Get total file count
        file_list = zip_ref.namelist()
        total_files = len(file_list)
        
        print(f"üìä Total files to extract: {total_files}")
        
        # Extract all files
        zip_ref.extractall(data_dir)
        
    print("‚úÖ Data extracted successfully!")
    print("\nüìÇ Extracted contents:")
    !ls -lh ./data/raw/
    
    # Check for expected datasets
    print("\nüîç Checking for expected datasets:")
    expected_datasets = ['isic2018', 'nih_cxr14', 'derm7pt']
    for dataset in expected_datasets:
        dataset_path = data_dir / dataset
        if dataset_path.exists():
            print(f"  ‚úÖ {dataset} found")
        else:
            print(f"  ‚ö†Ô∏è  {dataset} not found (might be named differently)")
            
except FileNotFoundError:
    print(f"‚ùå ERROR: Zip file not found at {zip_path}")
    print("Please update the zip_path variable to match your Google Drive structure.")
    print("\nüí° Tip: Check your Google Drive path by running:")
    print("!ls -la /content/drive/MyDrive/")
except Exception as e:
    print(f"‚ùå ERROR during extraction: {str(e)}")

## Step 4: Install PyTorch with CUDA Support

Installing PyTorch optimized for Colab's CUDA environment.

In [None]:
# Install PyTorch with CUDA 12.1 support (matches Colab's CUDA version)
print("üîß Installing PyTorch with CUDA support...")
!pip install -q torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

# Verify installation
import torch
print("\n‚úÖ PyTorch installed successfully!")
print(f"üì¶ PyTorch version: {torch.__version__}")
print(f"üî• CUDA available: {torch.cuda.is_available()}")

if torch.cuda.is_available():
    print(f"üéÆ CUDA version: {torch.version.cuda}")
    print(f"üíª GPU Device: {torch.cuda.get_device_name(0)}")
    print(f"üíæ GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1024**3:.2f} GB")
    print(f"üöÄ cuDNN version: {torch.backends.cudnn.version()}")
else:
    print("‚ö†Ô∏è  WARNING: CUDA not available! Please check Runtime settings:")
    print("   Go to Runtime ‚Üí Change runtime type ‚Üí Select GPU")

## Step 5: Install Project Dependencies

Installing all required packages from `requirements.txt`.

In [None]:
print("üîß Installing project dependencies...")
print("‚è≥ This may take 2-3 minutes...\n")

# Install requirements
!pip install -q -r requirements.txt

# Install project in editable mode
!pip install -q -e .

print("\n‚úÖ All dependencies installed successfully!")

# Show installed packages (key ones)
print("\nüì¶ Key installed packages:")
!pip list | grep -E "(torch|numpy|pandas|scikit|pillow|albumentations|timm|pydantic|mlflow)"

## Step 6: Verify Project Setup

Test that all project modules can be imported correctly.

In [None]:
print("üîç Verifying project imports...\n")

# Test imports from all major modules
try:
    # Losses
    from src.losses.robust_loss import TRADESLoss, MARTLoss, AdversarialTrainingLoss
    print("‚úÖ Robust losses imported successfully")
    
    # Training
    from src.training.adversarial_trainer import AdversarialTrainer, train_adversarial_epoch, validate_robust
    print("‚úÖ Adversarial trainer imported successfully")
    
    # Attacks
    from src.attacks import PGD, FGSM, CW, AutoAttack
    from src.attacks.pgd import PGDConfig
    print("‚úÖ Attack modules imported successfully")
    
    # Models
    from src.models import build_model
    from src.models.resnet import ResNet
    print("‚úÖ Model modules imported successfully")
    
    # Datasets
    from src.datasets import ISICDataset, ChestXRayDataset
    print("‚úÖ Dataset modules imported successfully")
    
    # Utils
    from src.utils.config import load_experiment_config
    from src.utils.reproducibility import set_seed
    print("‚úÖ Utility modules imported successfully")
    
    print("\nüéâ All imports successful! Environment is ready.")
    
except ImportError as e:
    print(f"‚ùå Import error: {str(e)}")
    print("Please check that all installation steps completed successfully.")

## Step 7: Run Quick Tests

Run a subset of tests to ensure Phase 5.1 (Adversarial Training) works correctly.

In [None]:
print("üß™ Running quick tests...\n")

# Run Phase 5.1 adversarial training tests
!pytest tests/test_adversarial_training.py::TestTRADESLoss -v --tb=short --disable-warnings

print("\n" + "="*70)
print("Test Summary:")
print("="*70)
!pytest tests/test_adversarial_training.py::    TestTRADESLoss -q --disable-warnings

---

## üéØ Training Examples

Now you're ready to start training! Choose one of the examples below.

### Option 1: Train with TRADES (Recommended)

TRADES balances clean accuracy and robust accuracy using KL divergence.

In [None]:
# Train with TRADES loss on ISIC2018 dataset
!python -m src.training.train_baseline \
    --config configs/experiments/adversarial_training_trades_isic.yaml \
    --device cuda \
    --max_epochs 50 \
    --output_dir ./results/trades_isic \
    --checkpoint_dir ./checkpoints/trades_isic

# Results will be saved to:
# - Checkpoints: ./checkpoints/trades_isic/
# - Results: ./results/trades_isic/
# - Logs: ./logs/

### Option 2: Train with MART

MART focuses on misclassified examples for improved robustness.

In [None]:
# Train with MART loss on ISIC2018 dataset
!python -m src.training.train_baseline \
    --config configs/experiments/adversarial_training_mart_isic.yaml \
    --device cuda \
    --max_epochs 50 \
    --output_dir ./results/mart_isic \
    --checkpoint_dir ./checkpoints/mart_isic

### Option 3: Standard Adversarial Training

Pure adversarial training without additional regularization.

In [None]:
# Train with standard adversarial training on ISIC2018
!python -m src.training.train_baseline \
    --config configs/experiments/adversarial_training_standard_isic.yaml \
    --device cuda \
    --max_epochs 50 \
    --output_dir ./results/standard_at_isic \
    --checkpoint_dir ./checkpoints/standard_at_isic

---

## üíæ Save Results to Google Drive

**Important:** Always save your results back to Google Drive to prevent data loss!

In [None]:
import shutil
from datetime import datetime

# Create timestamped backup directory
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
backup_dir = f'/content/drive/MyDrive/dissertation_results/{timestamp}'

print(f"üíæ Saving results to: {backup_dir}")
print("‚è≥ This may take a few minutes...\n")

# Create backup directory
!mkdir -p {backup_dir}

# Copy checkpoints
if os.path.exists('./checkpoints'):
    print("üìÇ Copying checkpoints...")
    !cp -r ./checkpoints {backup_dir}/
    print("‚úÖ Checkpoints saved")

# Copy results
if os.path.exists('./results'):
    print("üìÇ Copying results...")
    !cp -r ./results {backup_dir}/
    print("‚úÖ Results saved")

# Copy logs
if os.path.exists('./logs'):
    print("üìÇ Copying logs...")
    !cp -r ./logs {backup_dir}/
    print("‚úÖ Logs saved")

# Copy mlruns (MLflow tracking)
if os.path.exists('./mlruns'):
    print("üìÇ Copying MLflow runs...")
    !cp -r ./mlruns {backup_dir}/
    print("‚úÖ MLflow runs saved")

print(f"\nüéâ All results backed up to Google Drive!")
print(f"üìÅ Location: {backup_dir}")
print("\nüí° Tip: Download this folder for local analysis")

---

## üîß Utility Functions & Tips

### Keep Session Alive

Prevents Colab from disconnecting due to inactivity (useful for long training runs).

In [None]:
# Keep Colab session alive (run this cell to start)
import time
import threading
from IPython.display import display, Javascript

def keep_alive():
    """Keep the Colab session alive by simulating activity."""
    while True:
        display(Javascript('console.log("Keeping session alive...")'))
        time.sleep(60)  # Ping every 60 seconds

# Start keep-alive thread
thread = threading.Thread(target=keep_alive, daemon=True)
thread.start()

print("‚úÖ Keep-alive thread started!")
print("üí° Your session will stay active as long as this notebook is open")

### Monitor GPU Usage

Check GPU utilization and memory during training.

In [None]:
# Monitor GPU usage
!nvidia-smi

# Detailed GPU info
import torch

if torch.cuda.is_available():
    print(f"\n{'='*70}")
    print(" " * 25 + "GPU INFORMATION")
    print(f"{'='*70}")
    print(f"GPU Name: {torch.cuda.get_device_name(0)}")
    print(f"GPU Count: {torch.cuda.device_count()}")
    print(f"Current Device: {torch.cuda.current_device()}")
    print(f"\nMemory Allocated: {torch.cuda.memory_allocated(0) / 1024**3:.2f} GB")
    print(f"Memory Reserved: {torch.cuda.memory_reserved(0) / 1024**3:.2f} GB")
    print(f"Max Memory Allocated: {torch.cuda.max_memory_allocated(0) / 1024**3:.2f} GB")
    print(f"{'='*70}")
else:
    print("‚ö†Ô∏è  No GPU available!")

### TensorBoard Integration

Monitor training metrics in real-time using TensorBoard:

In [None]:
# Load TensorBoard extension
%load_ext tensorboard

# Launch TensorBoard (runs in background)
%tensorboard --logdir /content/tri-objective-robust-xai-medimg/logs/

# Alternative: Launch TensorBoard for MLflow runs
# %tensorboard --logdir /content/tri-objective-robust-xai-medimg/mlruns/

### Resume Training from Checkpoint

If your session disconnects, you can resume training from the last checkpoint:

In [None]:
# Resume training from last checkpoint
# Add --resume flag to your training command

# Example 1: Resume TRADES training
!python /content/tri-objective-robust-xai-medimg/scripts/train_adversarial.py \
    --config /content/tri-objective-robust-xai-medimg/configs/experiments/adversarial/trades_cifar10.yaml \
    --resume \
    --checkpoint /content/tri-objective-robust-xai-medimg/checkpoints/last.pt

# Example 2: Resume from specific checkpoint
!python /content/tri-objective-robust-xai-medimg/scripts/train_adversarial.py \
    --config /content/tri-objective-robust-xai-medimg/configs/experiments/adversarial/trades_cifar10.yaml \
    --resume \
    --checkpoint /content/tri-objective-robust-xai-medimg/checkpoints/baseline/epoch_10.pt

# Example 3: Resume from Google Drive checkpoint (if you saved it earlier)
!python /content/tri-objective-robust-xai-medimg/scripts/train_adversarial.py \
    --config /content/tri-objective-robust-xai-medimg/configs/experiments/adversarial/trades_cifar10.yaml \
    --resume \
    --checkpoint /content/drive/MyDrive/tri_objective_results/checkpoints/last.pt

---

## üîß Troubleshooting & FAQ

### Common Issues and Solutions

#### **Issue 1: CUDA Out of Memory**

**Error:** `RuntimeError: CUDA out of memory`

**Solutions:**
- Reduce batch size in config file
- Clear GPU cache: `torch.cuda.empty_cache()`
- Restart runtime and re-run setup
- Use gradient checkpointing (if available)

```python
# Clear CUDA cache
import torch
torch.cuda.empty_cache()
print("GPU cache cleared!")
```

#### **Issue 2: Data Not Found**

**Error:** `FileNotFoundError: [Errno 2] No such file or directory: '/content/tri-objective-robust-xai-medimg/data/...'`

**Solutions:**
- Check if data extraction completed: `!ls /content/tri-objective-robust-xai-medimg/data/`
- Re-run data extraction cell (Step 3)
- Verify zip file exists in Google Drive
- Check data path in config files

```python
# Verify data directories
import os
data_dir = "/content/tri-objective-robust-xai-medimg/data"
if os.path.exists(data_dir):
    print(f"‚úÖ Data directory exists")
    print(f"Contents: {os.listdir(data_dir)}")
else:
    print(f"‚ùå Data directory NOT found!")
```

#### **Issue 3: Import Errors**

**Error:** `ModuleNotFoundError: No module named 'src'`

**Solutions:**
- Verify working directory: `!pwd` should show `/content/tri-objective-robust-xai-medimg`
- Change directory: `%cd /content/tri-objective-robust-xai-medimg`
- Re-run dependency installation (Step 5)
- Check PYTHONPATH

```python
# Fix import paths
import sys
import os

project_root = "/content/tri-objective-robust-xai-medimg"
if project_root not in sys.path:
    sys.path.insert(0, project_root)
    print(f"‚úÖ Added {project_root} to PYTHONPATH")

# Verify imports work
try:
    from src.models.resnet import ResNet18
    print("‚úÖ Imports working correctly!")
except ImportError as e:
    print(f"‚ùå Import error: {e}")
```

#### **Issue 4: Session Disconnects During Training**

**Problem:** Long training runs interrupted by Colab disconnection

**Solutions:**
- Enable keep-alive script (see "Keep Session Alive" section above)
- Save checkpoints frequently (modify config: `save_freq: 5`)
- Save results to Google Drive periodically
- Use Colab Pro+ for longer runtimes (24 hours)
- Split training into smaller epochs and resume

```python
# Quick checkpoint backup to Drive
import shutil
from datetime import datetime

checkpoint_dir = "/content/tri-objective-robust-xai-medimg/checkpoints"
backup_dir = f"/content/drive/MyDrive/tri_objective_checkpoints/backup_{datetime.now().strftime('%Y%m%d_%H%M%S')}"

if os.path.exists(checkpoint_dir):
    shutil.copytree(checkpoint_dir, backup_dir)
    print(f"‚úÖ Checkpoints backed up to: {backup_dir}")
```

#### **Issue 5: PyTorch Version Mismatch**

**Error:** `RuntimeError: Detected that PyTorch and torchvision were compiled with different CUDA versions`

**Solution:** Reinstall PyTorch with correct CUDA version

```python
# Uninstall existing PyTorch
!pip uninstall -y torch torchvision torchaudio

# Reinstall with CUDA 12.1 (matches Colab's CUDA version)
!pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cu121

# Verify installation
import torch
print(f"PyTorch: {torch.__version__}")
print(f"CUDA Available: {torch.cuda.is_available()}")
print(f"CUDA Version: {torch.version.cuda}")
```

---

## üìö Additional Resources

- **GitHub Repository:** [viraj1011JAIN/tri-objective-robust-xai-medimg](https://github.com/viraj1011JAIN/tri-objective-robust-xai-medimg)
- **Documentation:** See `docs/` folder in repository
- **Phase Reports:** Check `PHASE_*.md` files for detailed implementation guides

### Quick Commands Reference

```bash
# Check GPU
nvidia-smi

# Run tests
pytest tests/ -v

# Train with TRADES
python scripts/train_adversarial.py --config configs/experiments/adversarial/trades_cifar10.yaml

# Monitor with TensorBoard
tensorboard --logdir logs/ --port 6006

# Clear CUDA cache
python -c "import torch; torch.cuda.empty_cache(); print('Cache cleared')"
```

---

## ‚úÖ Completion Checklist

Before starting training, verify:

- [ ] Google Drive mounted successfully
- [ ] Repository cloned to `/content/tri-objective-robust-xai-medimg`
- [ ] Data extracted to `data/` directory
- [ ] PyTorch with CUDA 12.1 installed
- [ ] All dependencies installed (`pip install -r requirements.txt`)
- [ ] Import test passed
- [ ] Quick test suite passed
- [ ] GPU detected and available
- [ ] Keep-alive script running (optional, for long training)

**Ready to train! üöÄ**

In [None]:
# Train with MART loss on ISIC2018 dataset
!python -m src.training.train_baseline \
    --config configs/experiments/adversarial_training_mart_isic.yaml \
    --device cuda \
    --max_epochs 50 \
    --output_dir ./results/mart_isic \
    --checkpoint_dir ./checkpoints/mart_isic

### Option 3: Standard Adversarial Training

Pure adversarial training without additional regularization.

In [None]:
# Train with standard adversarial training on ISIC2018
!python -m src.training.train_baseline \
    --config configs/experiments/adversarial_training_standard_isic.yaml \
    --device cuda \
    --max_epochs 50 \
    --output_dir ./results/standard_at_isic \
    --checkpoint_dir ./checkpoints/standard_at_isic

---

## üíæ Save Results to Google Drive

**IMPORTANT:** Always save your results back to Google Drive to prevent data loss when Colab session ends!

In [None]:
import shutil
from datetime import datetime

# Create timestamped backup folder
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
backup_dir = f"/content/drive/MyDrive/dissertation_results_backup_{timestamp}"

print(f"üíæ Saving results to: {backup_dir}\n")

# Create backup directory
!mkdir -p "{backup_dir}"

# Copy checkpoints
if os.path.exists('./checkpoints'):
    print("üì¶ Copying checkpoints...")
    shutil.copytree('./checkpoints', f"{backup_dir}/checkpoints", dirs_exist_ok=True)
    print("‚úÖ Checkpoints saved")

# Copy results
if os.path.exists('./results'):
    print("üìä Copying results...")
    shutil.copytree('./results', f"{backup_dir}/results", dirs_exist_ok=True)
    print("‚úÖ Results saved")

# Copy logs
if os.path.exists('./logs'):
    print("üìù Copying logs...")
    shutil.copytree('./logs', f"{backup_dir}/logs", dirs_exist_ok=True)
    print("‚úÖ Logs saved")

# Copy MLflow runs if exists
if os.path.exists('./mlruns'):
    print("üî¨ Copying MLflow runs...")
    shutil.copytree('./mlruns', f"{backup_dir}/mlruns", dirs_exist_ok=True)
    print("‚úÖ MLflow runs saved")

print(f"\nüéâ All results backed up to Google Drive!")
print(f"üìÅ Location: {backup_dir}")

# Show backup size
!du -sh "{backup_dir}"

---

## üîÑ Resume Training from Checkpoint (Optional)

If your session disconnected, you can resume training from the last checkpoint.

In [None]:
# First, restore checkpoints from Google Drive
backup_dir = "/content/drive/MyDrive/dissertation_results_backup_XXXXXXXX_XXXXXX"  # UPDATE THIS

if os.path.exists(f"{backup_dir}/checkpoints"):
    print("üîÑ Restoring checkpoints from Google Drive...")
    shutil.copytree(f"{backup_dir}/checkpoints", './checkpoints', dirs_exist_ok=True)
    print("‚úÖ Checkpoints restored")
else:
    print("‚ö†Ô∏è  No checkpoints found. Update backup_dir path.")

# Resume training with the restored checkpoint
!python -m src.training.train_baseline \
    --config configs/experiments/adversarial_training_trades_isic.yaml \
    --device cuda \
    --resume_from ./checkpoints/trades_isic/last.pt \
    --max_epochs 100

---

## üí° Useful Tips & Troubleshooting

### Keep Session Alive
Colab sessions disconnect after ~90 minutes of inactivity. Run this to keep it alive:

In [None]:
from IPython.display import Javascript
import time
import threading

def keep_colab_alive():
    """Prevents Colab from disconnecting due to inactivity"""
    while True:
        try:
            display(Javascript('window.keepAlive = true'))
            time.sleep(60)  # Ping every 60 seconds
        except:
            break

# Start keep-alive thread
thread = threading.Thread(target=keep_colab_alive, daemon=True)
thread.start()
print("‚úÖ Keep-alive thread started")

### Monitor GPU Usage

In [None]:
# Check GPU memory usage
!nvidia-smi

# Or use Python
import torch
if torch.cuda.is_available():
    print(f"\nüéÆ GPU: {torch.cuda.get_device_name(0)}")
    print(f"üíæ Memory Allocated: {torch.cuda.memory_allocated(0) / 1024**3:.2f} GB")
    print(f"üíæ Memory Reserved: {torch.cuda.memory_reserved(0) / 1024**3:.2f} GB")
    print(f"üíæ Max Memory Allocated: {torch.cuda.max_memory_allocated(0) / 1024**3:.2f} GB")

### Check Available Disk Space

In [None]:
!df -h /content
!echo ""
!echo "Project size:"
!du -sh /content/tri-objective-robust-xai-medimg

---

## üéì Next Steps

1. **Run Training**: Choose one of the training examples above and start training
2. **Monitor Progress**: Check logs directory or use TensorBoard/MLflow
3. **Save Checkpoints**: Run the backup cell regularly (every few epochs)
4. **Evaluate Results**: After training, use evaluation scripts in `scripts/evaluation/`
5. **Experiment**: Try different hyperparameters by modifying config files

## üìö Additional Resources

- **Documentation**: See `/docs` folder in the repository
- **Phase 5.1 Details**: Check `PHASE_5.1_COMPLETE.md` for implementation details
- **Test Suite**: Run `pytest tests/` to verify all functionality
- **Scripts**: Explore `scripts/` for evaluation and analysis tools

## ‚ö†Ô∏è Important Reminders

1. **Always backup to Google Drive** before session ends
2. **Use GPU runtime** (Runtime ‚Üí Change runtime type ‚Üí GPU)
3. **Keep session alive** using the keep-alive cell above
4. **Monitor disk space** - Colab has ~100GB limit
5. **Save intermediate checkpoints** every few epochs

---

## üéâ You're All Set!

Your environment is ready for adversarial training on Google Colab Pro. Happy training! üöÄ