# Block 5: Essential Tools for Reproducible Research

**Python Module for Incoming ISE & OR PhD Students**  
Instructor: Will Kirschenman | August 7, 2025 | 2:00 PM - 2:50 PM

---

## Welcome to Block 5! 🔧

Welcome back! In the previous blocks, we've learned Python fundamentals, data manipulation, machine learning, and optimization. Now it's time to learn the **essential tool** that will help you manage all that code: **Git version control**.

By the end of this block, you'll be able to:
- Understand why version control is critical for coding projects
- Use basic Git commands to track your code changes
- Create and manage repositories on GitHub
- Integrate Git into your daily coding workflow
- Build the foundation of your professional coding portfolio

**Our Mission**: Transform you from someone who creates files like `assignment_final_v3_REAL.py` to someone who uses professional version control!

### Why This Matters 🎯

**Without Git**: "Which version of my code actually works? Did I save my changes? How do I merge my teammate's code?"

**With Git**: "I can see exactly what changed, when, and why. I can experiment freely and always go back to working versions."

**This is a fundamental skill for any coding you'll do in your PhD and beyond!**

In [None]:
# 📦 Package Installation & Setup
# Run this cell ONLY if you encounter import errors
# Most packages are pre-installed in Google Colab

import sys
import subprocess

def install_package(package_name):
    """Install a package using pip if not already installed"""
    try:
        __import__(package_name)
        print(f"✅ {package_name} already installed")
    except ImportError:
        print(f"📦 Installing {package_name}...")
        subprocess.check_call([sys.executable, "-m", "pip", "install", package_name])
        print(f"✅ {package_name} installed successfully")

# Core packages used in this notebook (Block 5: Version Control)
# This block primarily uses built-in Python libraries
required_packages = [
    'matplotlib'  # For basic visualizations
]

print("🔍 Checking required packages for Block 5...")
print("=" * 45)

for package in required_packages:
    install_package(package)

print("\n🎉 All packages ready! You can now run all cells without import errors.")
print("💡 Tip: Block 5 primarily uses built-in Python libraries for Git/version control.")
print("💡 In Google Colab, matplotlib is pre-installed, so you likely won't need to install anything!")

## Part 1: Git in Google Colab - What's Possible? 🤔

### Important Context

Google Colab has some limitations for Git:
- **Temporary environment**: Changes don't persist between sessions
- **No local development**: Can't use full Git workflow
- **Authentication challenges**: Private repos need special handling

However, Colab DOES provide:
- Basic Git commands via shell (`!git`)
- Built-in GitHub integration for notebooks
- Ability to clone and explore repositories

**For real Git work, you'll use your local machine with VS Code. But let's explore what we CAN do in Colab!**

In [None]:
# Check if Git is available in Colab (it should be pre-installed)
!git --version

print("\n✅ Git is available in Google Colab!")
print("💡 We can use Git commands by prefixing them with '!'")

In [None]:
# Configure Git with your information (for this session only)
# Replace with your actual name and email!
!git config --global user.name "Your Name"
!git config --global user.email "your.email@ncsu.edu"

# Verify configuration
print("Current Git configuration:")
!git config --global user.name
!git config --global user.email

print("\n💡 Note: This configuration is temporary and only for this Colab session")

---

# Part 2: Creating Your First Git Repository 📁

Let's create a simple repository to practice Git commands. We'll build a small Python project for statistical calculations - perfect for ISE/OR students!

In [None]:
# Create a project directory
import os

# Clean up if directory exists from previous run
if os.path.exists('stats_project'):
    !rm -rf stats_project

# Create and enter project directory
!mkdir stats_project
os.chdir('stats_project')

# Initialize Git repository
!git init

print("\n✅ Created new Git repository in 'stats_project' directory!")
print("📁 Current directory:", os.getcwd())

In [None]:
# Create our first Python file
stats_code = '''"""Statistical Calculator for ISE/OR Students
A simple module for basic statistical calculations.
"""

import math

def mean(data):
    """Calculate the arithmetic mean of a list of numbers."""
    return sum(data) / len(data)

def variance(data):
    """Calculate the population variance of a list of numbers."""
    m = mean(data)
    return sum((x - m) ** 2 for x in data) / len(data)

def std_dev(data):
    """Calculate the population standard deviation."""
    return math.sqrt(variance(data))

# Test the functions
if __name__ == "__main__":
    test_data = [2, 4, 6, 8, 10]
    print(f"Test data: {test_data}")
    print(f"Mean: {mean(test_data):.2f}")
    print(f"Variance: {variance(test_data):.2f}")
    print(f"Std Dev: {std_dev(test_data):.2f}")
'''

# Write the file
with open('stats_calc.py', 'w') as f:
    f.write(stats_code)

print("✅ Created stats_calc.py")
print("\n📄 File contents:")
!cat stats_calc.py

In [None]:
# Check Git status - this is the most used Git command!
print("🔍 Git Status (what's the current state?):")
print("=" * 50)
!git status

print("\n💡 Git sees our new file but it's not tracked yet!")

In [None]:
# Stage the file (add to staging area)
!git add stats_calc.py

print("✅ Added stats_calc.py to staging area")
print("\n🔍 Git Status after staging:")
!git status

print("\n💡 Now the file is staged and ready to be committed!")

In [None]:
# Make our first commit
!git commit -m "Add basic statistical functions"

print("\n🎉 Made your first commit!")
print("\n📝 View commit history:")
!git log --oneline

### 🎯 Mini Exercise: Add More Functions

Let's practice the Git workflow by adding more statistical functions!

In [None]:
# Add new functions to our file
additional_code = '''

def median(data):
    """Calculate the median of a list of numbers."""
    sorted_data = sorted(data)
    n = len(sorted_data)
    if n % 2 == 0:
        return (sorted_data[n//2 - 1] + sorted_data[n//2]) / 2
    else:
        return sorted_data[n//2]

def mode(data):
    """Calculate the mode (most frequent value)."""
    from collections import Counter
    counts = Counter(data)
    max_count = max(counts.values())
    modes = [val for val, count in counts.items() if count == max_count]
    return modes[0] if len(modes) == 1 else modes
'''

# Append to existing file
with open('stats_calc.py', 'a') as f:
    f.write(additional_code)

print("✅ Added median and mode functions")

# Check what changed
print("\n🔍 Git diff (what changed?):")
!git diff

In [None]:
# Stage and commit the changes
!git add stats_calc.py
!git commit -m "Add median and mode functions"

print("✅ Committed new changes!")
print("\n📝 Updated commit history:")
!git log --oneline

---

# Part 3: Working with Branches 🌳

Branches let you experiment without affecting your main code. Let's create a feature branch!

In [None]:
# Create and switch to a new branch
!git checkout -b add-advanced-stats

print("\n🌿 Current branches:")
!git branch

print("\n💡 The * indicates which branch you're currently on")

In [None]:
# Add advanced statistical functions on the new branch
advanced_code = '''

def correlation(x, y):
    """Calculate Pearson correlation coefficient between two lists."""
    if len(x) != len(y):
        raise ValueError("Lists must have the same length")
    
    n = len(x)
    mean_x = mean(x)
    mean_y = mean(y)
    
    numerator = sum((x[i] - mean_x) * (y[i] - mean_y) for i in range(n))
    denominator = math.sqrt(sum((x[i] - mean_x)**2 for i in range(n)) * 
                           sum((y[i] - mean_y)**2 for i in range(n)))
    
    return numerator / denominator if denominator != 0 else 0

def confidence_interval(data, confidence=0.95):
    """Calculate confidence interval for the mean."""
    from scipy import stats
    n = len(data)
    m = mean(data)
    std_err = std_dev(data) / math.sqrt(n)
    
    # For simplicity, using z-score (assumes large sample)
    z_score = 1.96 if confidence == 0.95 else 2.58  # 95% or 99%
    
    margin = z_score * std_err
    return (m - margin, m + margin)
'''

with open('stats_calc.py', 'a') as f:
    f.write(advanced_code)

print("✅ Added advanced statistical functions on feature branch")

# Commit on the feature branch
!git add stats_calc.py
!git commit -m "Add correlation and confidence interval functions"

print("\n📝 Commits on feature branch:")
!git log --oneline -n 3

In [None]:
# Switch back to main branch
!git checkout main

print("✅ Switched to main branch")
print("\n🔍 Notice: The advanced functions aren't here yet!")
print("\nLast 20 lines of stats_calc.py on main:")
!tail -20 stats_calc.py

In [None]:
# Merge the feature branch into main
!git merge add-advanced-stats

print("\n✅ Merged feature branch into main!")
print("\n📝 Updated history showing the merge:")
!git log --oneline -n 4

print("\n🎉 The advanced functions are now in main branch!")

---

# Part 4: GitHub Integration in Colab 🌐

## Colab's Built-in GitHub Features

Google Colab has special integration with GitHub for notebooks:

1. **Open notebooks from GitHub**: File → Open notebook → GitHub tab
2. **Save notebooks to GitHub**: File → Save a copy in GitHub
3. **Browse GitHub repositories**: Direct from Colab interface

### Let's Clone a Real Repository!

In [None]:
# First, let's go back to the content directory
os.chdir('/content')

# Clone a sample Python repository
!git clone https://github.com/realpython/python-sample-vscode-flask-tutorial.git sample_project

print("\n✅ Cloned a sample repository!")
print("\n📁 Repository contents:")
!ls -la sample_project/

In [None]:
# Explore the repository history
os.chdir('sample_project')

print("📝 Recent commits in the repository:")
!git log --oneline -n 5

print("\n👥 Contributors:")
!git shortlog -sn

print("\n🌿 Branches:")
!git branch -r

### 🔐 Working with Private Repositories

For private repositories, you need authentication. Here's how:

1. **Personal Access Token (Recommended)**:
   - Go to GitHub → Settings → Developer settings → Personal access tokens
   - Generate a token with repo permissions
   - Use: `git clone https://YOUR_TOKEN@github.com/username/repo.git`

2. **GitHub's Colab Integration**:
   - Use File → Save a copy in GitHub
   - Colab will handle authentication via OAuth

**⚠️ Never commit tokens or passwords to repositories!**

---

# Part 5: Creating a .gitignore File 🚫

A `.gitignore` file tells Git which files to ignore. This is crucial for Python projects!

In [None]:
# Go back to our stats project
os.chdir('/content/stats_project')

# Create a Python .gitignore file
gitignore_content = '''# Python
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
env/
venv/
ENV/

# Jupyter Notebook
.ipynb_checkpoints/
*.ipynb_checkpoints

# Data files
*.csv
*.xlsx
*.pkl
data/

# IDE
.vscode/
.idea/
*.swp
*.swo

# OS
.DS_Store
Thumbs.db

# Secrets
.env
config.ini
credentials.json
'''

with open('.gitignore', 'w') as f:
    f.write(gitignore_content)

print("✅ Created .gitignore file")

# Add and commit the .gitignore
!git add .gitignore
!git commit -m "Add Python .gitignore file"

print("\n📄 .gitignore contents:")
!head -20 .gitignore

In [None]:
# Test the .gitignore
# Create files that should be ignored
!mkdir __pycache__
!touch __pycache__/test.pyc
!touch credentials.json
!mkdir data
!touch data/results.csv

print("📁 Created test files that should be ignored")
print("\n🔍 Git status (notice ignored files don't appear):")
!git status

print("\n✅ Git is correctly ignoring the specified file patterns!")

---

# Part 6: Best Practices and Pro Tips 💡

## Writing Good Commit Messages

Your future self (and teammates) will thank you for clear commit messages!

In [None]:
# Examples of good vs bad commit messages
print("❌ BAD Commit Messages:")
bad_messages = [
    "fixed stuff",
    "asdfasdf",
    "changes",
    "done",
    "WIP"
]
for msg in bad_messages:
    print(f"   - {msg}")

print("\n✅ GOOD Commit Messages:")
good_messages = [
    "Add input validation to statistical functions",
    "Fix divide-by-zero error in correlation calculation",
    "Refactor mean function for better performance",
    "Update documentation with usage examples",
    "Add unit tests for median function"
]
for msg in good_messages:
    print(f"   - {msg}")

print("\n📝 Commit Message Template:")
print("""
<type>: <subject>

<body (optional)>

Types: feat, fix, docs, style, refactor, test, chore

Example:
feat: Add confidence interval calculation

Implements 95% and 99% confidence intervals using z-scores.
Assumes large sample size for simplicity.
""")

## Common Git Commands Reference 📚

In [None]:
# Create a handy reference card
git_commands = {
    "Basics": [
        ("git init", "Initialize new repository"),
        ("git clone <url>", "Clone existing repository"),
        ("git status", "Check current status"),
        ("git add <file>", "Stage specific file"),
        ("git add .", "Stage all changes"),
        ("git commit -m 'message'", "Commit with message"),
    ],
    "Branches": [
        ("git branch", "List branches"),
        ("git branch <name>", "Create new branch"),
        ("git checkout <branch>", "Switch branches"),
        ("git checkout -b <branch>", "Create and switch branch"),
        ("git merge <branch>", "Merge branch into current"),
    ],
    "History": [
        ("git log", "View commit history"),
        ("git log --oneline", "Compact history view"),
        ("git diff", "Show unstaged changes"),
        ("git diff --staged", "Show staged changes"),
    ],
    "Remote": [
        ("git remote -v", "List remote repositories"),
        ("git push", "Push commits to remote"),
        ("git pull", "Pull updates from remote"),
        ("git fetch", "Download remote changes"),
    ]
}

print("🎯 GIT COMMAND REFERENCE CARD")
print("=" * 50)

for category, commands in git_commands.items():
    print(f"\n{category}:")
    for cmd, desc in commands:
        print(f"  {cmd:<30} # {desc}")

---

# Part 7: Your Turn - Practice Exercise! 🎮

Now let's practice what you've learned by creating a small project from scratch.

In [None]:
# Exercise: Create an optimization utilities module
print("🎯 EXERCISE: Create an Optimization Utilities Module")
print("=" * 50)
print("""
Your task:
1. Create a new directory called 'optimization_utils'
2. Initialize a Git repository
3. Create a Python file with at least 2 optimization-related functions
4. Make meaningful commits as you add features
5. Create a README.md file
6. Use at least one branch for a new feature

Follow the instructions in the cells below!
""")

In [None]:
# Step 1: Setup your project
# TODO: Create directory, initialize Git, configure your name/email

# Your code here:
os.chdir('/content')
!mkdir optimization_utils
os.chdir('optimization_utils')
!git init

print("✅ Project initialized!")

In [None]:
# Step 2: Create your first Python file
# TODO: Create a file with at least one optimization function
# Ideas: golden_section_search, gradient_descent, newton_method

# Your code here:
opt_code = '''"""Optimization utilities for ISE/OR applications"""

def golden_section_search(f, a, b, tol=1e-5):
    """Find minimum of unimodal function using golden section search."""
    gr = (5**0.5 + 1) / 2  # Golden ratio
    
    while abs(b - a) > tol:
        c = b - (b - a) / gr
        d = a + (b - a) / gr
        
        if f(c) < f(d):
            b = d
        else:
            a = c
    
    return (a + b) / 2
'''

with open('optimize.py', 'w') as f:
    f.write(opt_code)

print("✅ Created optimize.py")

# TODO: Stage and commit this file with a good message
!git add optimize.py
!git commit -m "Add golden section search algorithm"

In [None]:
# Step 3: Create a README file
# TODO: Create README.md with project description

readme_content = '''# Optimization Utilities

A collection of optimization algorithms for ISE/OR applications.

## Features

- Golden Section Search for unimodal functions
- More algorithms coming soon!

## Usage

```python
from optimize import golden_section_search

# Define your function
f = lambda x: (x - 2)**2

# Find minimum
min_x = golden_section_search(f, 0, 5)
```

## Author

Your Name - ISE/OR PhD Student
'''

with open('README.md', 'w') as f:
    f.write(readme_content)

# TODO: Commit the README
!git add README.md
!git commit -m "Add README with project description and usage"

print("✅ Created and committed README.md")

In [None]:
# Step 4: Create a feature branch and add another algorithm
# TODO: Create branch, add new function, commit, merge

# Your solution:
!git checkout -b add-gradient-descent

# Add gradient descent to the file
gd_code = '''

def gradient_descent(f, df, x0, learning_rate=0.01, max_iter=1000, tol=1e-6):
    """Simple gradient descent optimizer.
    
    Args:
        f: Objective function
        df: Gradient function
        x0: Initial point
        learning_rate: Step size
        max_iter: Maximum iterations
        tol: Convergence tolerance
    """
    x = x0
    for i in range(max_iter):
        grad = df(x)
        x_new = x - learning_rate * grad
        
        if abs(x_new - x) < tol:
            break
            
        x = x_new
    
    return x
'''

with open('optimize.py', 'a') as f:
    f.write(gd_code)

!git add optimize.py
!git commit -m "Add gradient descent optimization algorithm"

# Merge back to main
!git checkout main
!git merge add-gradient-descent

print("✅ Added gradient descent and merged to main!")

In [None]:
# Step 5: View your project history
print("🎉 Congratulations! Here's your project history:")
print("\n📝 Commit log:")
!git log --oneline --graph --all

print("\n📁 Final project structure:")
!ls -la

print("\n💡 You've successfully:")
print("- Created a Git repository")
print("- Made multiple meaningful commits")
print("- Used branches for feature development")
print("- Written clear commit messages")
print("- Created project documentation")

---

# Block 5 Wrap-Up: Version Control Mastery 🎯

## What You've Accomplished Today

Congratulations! In this 50-minute block, you've:

✅ **Learned Git Fundamentals**
- Understood the three-stage workflow (working → staging → repository)
- Created and managed Git repositories
- Made commits with meaningful messages

✅ **Practiced Essential Commands**
- `git init`, `git add`, `git commit`
- `git status`, `git diff`, `git log`
- `git branch`, `git checkout`, `git merge`

✅ **Explored GitHub Integration**
- Cloned repositories
- Understood remote repositories
- Learned about authentication

✅ **Applied Best Practices**
- Created meaningful `.gitignore` files
- Written clear commit messages
- Used branches for feature development

## The Version Control Mindset

Remember:

**Version control is not just about backing up code - it's about understanding the evolution of your project.**

When you use Git properly, you:
- **Document your thinking**: Each commit explains a decision
- **Enable experimentation**: Branches let you try ideas safely
- **Facilitate collaboration**: Others can understand and build on your work
- **Build credibility**: Your GitHub profile becomes your coding portfolio

## Next Steps for Your PhD Journey

1. **Start Today**: Version control your current coursework
2. **Make it Habit**: Commit every meaningful change
3. **Build Portfolio**: Keep your GitHub active
4. **Learn More**: Explore advanced Git features
5. **Collaborate**: Join open source projects

## Limitations in Colab

Remember that Colab has limitations:
- No persistent Git state between sessions
- Authentication challenges for private repos
- Can't use full IDE integration

**For real development, use Git on your local machine with VS Code!**

## Essential Resources

- **Pro Git Book**: https://git-scm.com/book (Free!)
- **GitHub Guides**: https://guides.github.com/
- **Oh My Git! (Game)**: https://ohmygit.org/
- **Git Cheat Sheet**: https://education.github.com/git-cheat-sheet-education.pdf

## Looking Ahead

In the final block (Block 6), we'll explore:
- Advanced computational techniques
- AI-powered coding assistants
- Tools that will accelerate your productivity

---

## Thank You! 🙏

Great work in Block 5! You've taken a huge step toward professional software development practices. In the next block, we'll explore cutting-edge tools that will supercharge your coding productivity.

**Remember: Every expert was once a beginner who used version control!**

---

*Questions? Feel free to ask during the break or reach out via email: wkkirsch@ncsu.edu*