# Control Flow for Data Science 🐍📊

A practical guide to Python control flow patterns commonly used in:
- Data Analysis and Preprocessing
- Machine Learning Pipelines
- Feature Engineering
- Model Evaluation
- Data Validation and Cleaning

Each section includes data science-focused examples without external libraries to understand core concepts.

# Table of Contents

1. [Data Validation and Cleaning](#data-validation)
   - Input validation patterns
   - Handling missing values
   - Data type checking

2. [Data Transformation](#data-transformation)
   - Conditional transformations
   - Numerical computations
   - Feature engineering patterns

3. [Iteration Patterns in Data Analysis](#iteration-patterns)
   - Processing data streams
   - Batch processing
   - Window operations

4. [Model Evaluation Loops](#model-evaluation)
   - Cross-validation patterns
   - Hyperparameter tuning
   - Error analysis

5. [Advanced Data Processing](#advanced-processing)
   - Generator patterns for large datasets
   - Memory-efficient processing
   - Error handling in data pipelines

# 1. Data Validation and Cleaning

In data science, validating and cleaning your data is crucial. Here are common control flow patterns for data quality assurance.

In [None]:
# Example dataset: customer records
customers = [
    {"id": 1, "age": 25, "income": 50000},
    {"id": 2, "age": -5, "income": None},  # problematic data
    {"id": 3, "age": 30, "income": 75000},
    {"id": 4, "age": "unknown", "income": 60000},  # problematic data
]

# 1. Basic data validation
def validate_customer(customer):
    """Validate customer data and return list of issues"""
    issues = []
    
    # Check for missing values
    for field in ['age', 'income']:
        if customer[field] is None:
            issues.append(f"Missing {field}")
            
    # Type checking
    if not isinstance(customer['age'], (int, float)):
        issues.append("Age must be numeric")
    
    # Value range validation
    if isinstance(customer['age'], (int, float)) and (customer['age'] < 0 or customer['age'] > 120):
        issues.append("Age out of realistic range")
    
    return issues

# Process each customer record
for customer in customers:
    print(f"\nValidating customer {customer['id']}:")
    issues = validate_customer(customer)
    if issues:
        print("Issues found:", issues)
    else:
        print("Data valid")

### Data Cleaning Patterns

Common data cleaning operations in data science:
- Handling missing values (imputation)
- Removing outliers
- Type conversion
- Range normalization

Below are typical control flow patterns for these operations.

In [None]:
# Data cleaning example
def clean_customer_data(customers):
    cleaned_data = []
    stats = {"total": 0, "cleaned": 0, "dropped": 0}
    
    # Calculate mean income for imputation
    valid_incomes = [c['income'] for c in customers if isinstance(c['income'], (int, float))]
    mean_income = sum(valid_incomes) / len(valid_incomes) if valid_incomes else 0
    
    for customer in customers:
        stats['total'] += 1
        try:
            cleaned_customer = customer.copy()
            
            # Clean age
            if not isinstance(customer['age'], (int, float)) or customer['age'] < 0:
                continue  # Skip invalid age records
            
            # Clean income (impute missing values)
            if customer['income'] is None:
                cleaned_customer['income'] = mean_income
            
            # Add to cleaned dataset
            cleaned_data.append(cleaned_customer)
            stats['cleaned'] += 1
            
        except Exception as e:
            print(f"Error processing customer {customer.get('id')}: {str(e)}")
            stats['dropped'] += 1
            
    return cleaned_data, stats

# Clean the data and print statistics
cleaned_customers, stats = clean_customer_data(customers)
print("\nCleaning Statistics:")
print(f"Total records: {stats['total']}")
print(f"Cleaned records: {stats['cleaned']}")
print(f"Dropped records: {stats['dropped']}")

# Print cleaned data
print("\nCleaned Customer Data:")
for customer in cleaned_customers:
    print(customer)

# 2. Data Transformation

Feature engineering and data transformation are essential skills in data science. Here are common control flow patterns for:
- Feature scaling
- Encoding categorical variables
- Creating derived features
- Handling time series data

In [None]:
# Example dataset: housing data
houses = [
    {"size": 1500, "price": 200000, "rooms": 3, "location": "suburban"},
    {"size": 2200, "price": 350000, "rooms": 4, "location": "urban"},
    {"size": 1200, "price": 180000, "rooms": 2, "location": "rural"},
    {"size": 3000, "price": 450000, "rooms": 5, "location": "urban"}
]

# Feature scaling functions
def min_max_scale(values):
    """Scale values to range [0,1]"""
    min_val = min(values)
    max_val = max(values)
    return [(x - min_val) / (max_val - min_val) for x in values]

def z_score_scale(values):
    """Standardize values (mean=0, std=1)"""
    mean = sum(values) / len(values)
    std = (sum((x - mean) ** 2 for x in values) / len(values)) ** 0.5
    return [(x - mean) / std for x in values]

# Apply scaling to numerical features
sizes = [h['size'] for h in houses]
prices = [h['price'] for h in houses]

scaled_sizes = min_max_scale(sizes)
scaled_prices = z_score_scale(prices)

print("Original vs Scaled Features:")
for i, house in enumerate(houses):
    print(f"\nHouse {i+1}:")
    print(f"Size: {house['size']} → {scaled_sizes[i]:.3f}")
    print(f"Price: {house['price']} → {scaled_prices[i]:.3f}")

In [None]:
# Categorical encoding example
def one_hot_encode(values):
    """Convert categorical values to one-hot encoding"""
    unique_values = sorted(set(values))
    encoding = {val: [1 if val == v else 0 for v in unique_values] 
               for val in unique_values}
    return encoding

# Get locations from houses
locations = [h['location'] for h in houses]

# Create one-hot encoding
location_encoding = one_hot_encode(locations)

print("\nOne-hot Encoding for Locations:")
for location, encoded in location_encoding.items():
    print(f"{location}: {encoded}")

# Apply encoding to dataset
print("\nEncoded Houses:")
for house in houses:
    encoded_location = location_encoding[house['location']]
    print(f"House: {house['location']} → {encoded_location}")

In [None]:
# Feature engineering example
def create_derived_features(houses):
    """Create new features from existing data"""
    derived_features = []
    
    for house in houses:
        # Calculate price per square foot
        price_per_sqft = house['price'] / house['size']
        
        # Calculate rooms per square foot
        rooms_per_sqft = house['rooms'] / house['size']
        
        # Create density category based on rooms per sqft
        if rooms_per_sqft < 0.001:  # Less than 1 room per 1000 sqft
            density = 'sparse'
        elif rooms_per_sqft < 0.002:  # Less than 1 room per 500 sqft
            density = 'normal'
        else:
            density = 'dense'
        
        derived = {
            'original': house,
            'price_per_sqft': price_per_sqft,
            'rooms_per_sqft': rooms_per_sqft,
            'density': density
        }
        derived_features.append(derived)
    
    return derived_features

# Generate derived features
engineered_data = create_derived_features(houses)

# Display results
print("Engineered Features:")
for house in engineered_data:
    print(f"\nLocation: {house['original']['location']}")
    print(f"Price per sqft: ${house['price_per_sqft']:.2f}")
    print(f"Rooms per sqft: {house['rooms_per_sqft']:.4f}")
    print(f"Density category: {house['density']}")

# 3. Data Aggregation and Time Series Operations

Common data science tasks often involve working with grouped data or time series. Here are examples of aggregation and windowing operations using plain Python.

In [None]:
# Example time series data: daily temperature readings
temperatures = [
    {"date": "2023-01-01", "temp": 12, "city": "New York"},
    {"date": "2023-01-01", "temp": 20, "city": "Los Angeles"},
    {"date": "2023-01-02", "temp": 10, "city": "New York"},
    {"date": "2023-01-02", "temp": 22, "city": "Los Angeles"},
    {"date": "2023-01-03", "temp": 15, "city": "New York"},
    {"date": "2023-01-03", "temp": 21, "city": "Los Angeles"}
]

# Group by operation
def group_by(data, key_func):
    """Group data by a key function"""
    groups = {}
    for item in data:
        key = key_func(item)
        if key not in groups:
            groups[key] = []
        groups[key].append(item)
    return groups

# Aggregate by city
city_groups = group_by(temperatures, lambda x: x['city'])

# Calculate statistics for each city
for city, readings in city_groups.items():
    temps = [r['temp'] for r in readings]
    avg_temp = sum(temps) / len(temps)
    min_temp = min(temps)
    max_temp = max(temps)
    
    print(f"\n{city} Statistics:")
    print(f"Average Temperature: {avg_temp:.1f}°C")
    print(f"Min Temperature: {min_temp}°C")
    print(f"Max Temperature: {max_temp}°C")

In [None]:
# Windowing operations
def sliding_window(data, window_size):
    """Create sliding windows over sequential data"""
    return [data[i:i + window_size] 
            for i in range(len(data) - window_size + 1)]

# Sort temperatures by date for each city
ny_temps = sorted(
    [r for r in temperatures if r['city'] == 'New York'],
    key=lambda x: x['date']
)
la_temps = sorted(
    [r for r in temperatures if r['city'] == 'Los Angeles'],
    key=lambda x: x['date']
)

# Calculate moving averages (window size = 2)
def calc_moving_avg(city_data, window_size=2):
    windows = sliding_window([r['temp'] for r in city_data], window_size)
    moving_avgs = [sum(window) / len(window) for window in windows]
    return list(zip(
        [r['date'] for r in city_data[window_size-1:]],
        moving_avgs
    ))

print("\nMoving Averages (2-day windows):")
print("\nNew York:")
for date, avg in calc_moving_avg(ny_temps):
    print(f"Date: {date}, Average: {avg:.1f}°C")

print("\nLos Angeles:")
for date, avg in calc_moving_avg(la_temps):
    print(f"Date: {date}, Average: {avg:.1f}°C")

# 4. Model Evaluation Patterns

When working with machine learning models, we often need to implement validation, cross-validation, and error calculation patterns. Here are examples using plain Python control flow.

In [None]:
# Example dataset for model evaluation
import random

# Generate synthetic data
random.seed(42)
data = [
    {"features": [x, x**2], "target": 2*x + 1 + random.uniform(-0.5, 0.5)}
    for x in [x/10 for x in range(-50, 51)]
]

def train_test_split(data, test_size=0.2):
    """Split data into training and test sets"""
    # Create a copy and shuffle
    shuffled = data.copy()
    random.shuffle(shuffled)
    
    # Calculate split point
    split_idx = int(len(data) * (1 - test_size))
    
    return shuffled[:split_idx], shuffled[split_idx:]

def k_fold_split(data, k=5):
    """Split data into k folds for cross-validation"""
    # Create a copy and shuffle
    shuffled = data.copy()
    random.shuffle(shuffled)
    
    # Calculate fold size
    fold_size = len(data) // k
    
    # Create folds
    folds = []
    for i in range(k):
        start_idx = i * fold_size
        end_idx = start_idx + fold_size if i < k-1 else len(data)
        test_fold = shuffled[start_idx:end_idx]
        train_fold = shuffled[:start_idx] + shuffled[end_idx:]
        folds.append((train_fold, test_fold))
    
    return folds

# Example usage
train_data, test_data = train_test_split(data)
print(f"Train set size: {len(train_data)}")
print(f"Test set size: {len(test_data)}")

# Cross-validation example
folds = k_fold_split(data, k=5)
print("\nCross-validation folds:")
for i, (train, test) in enumerate(folds, 1):
    print(f"Fold {i}:")
    print(f"  Train size: {len(train)}")
    print(f"  Test size: {len(test)}")

In [None]:
# Model evaluation metrics
def calculate_metrics(true_values, predicted_values):
    """Calculate common regression metrics"""
    n = len(true_values)
    
    # Mean Absolute Error (MAE)
    mae = sum(abs(t - p) for t, p in zip(true_values, predicted_values)) / n
    
    # Mean Squared Error (MSE)
    mse = sum((t - p) ** 2 for t, p in zip(true_values, predicted_values)) / n
    
    # Root Mean Squared Error (RMSE)
    rmse = mse ** 0.5
    
    # R-squared (R²)
    mean_true = sum(true_values) / n
    ss_total = sum((t - mean_true) ** 2 for t in true_values)
    ss_residual = sum((t - p) ** 2 for t, p in zip(true_values, predicted_values))
    r2 = 1 - (ss_residual / ss_total)
    
    return {
        'mae': mae,
        'mse': mse,
        'rmse': rmse,
        'r2': r2
    }

# Example usage with dummy predictions
true_values = [d['target'] for d in test_data]
# Dummy model: predict mean of training data
mean_prediction = sum(d['target'] for d in train_data) / len(train_data)
predicted_values = [mean_prediction] * len(test_data)

# Calculate and display metrics
metrics = calculate_metrics(true_values, predicted_values)
print("\nModel Evaluation Metrics:")
for metric, value in metrics.items():
    print(f"{metric.upper()}: {value:.4f}")

# Conclusion

This notebook has covered essential Python control flow patterns specifically focused on data science tasks:

1. Data Validation - Input checking and data quality validation
2. Data Transformation - Feature scaling, encoding, and engineering
3. Data Aggregation - Grouping, aggregation, and time series operations
4. Model Evaluation - Train-test splits, cross-validation, and metrics

These patterns form the foundation of data science workflows and can be implemented using plain Python, without external libraries. While libraries like NumPy, Pandas, and Scikit-learn provide optimized implementations of these patterns, understanding the underlying logic helps in:

- Debugging and troubleshooting
- Customizing algorithms for specific needs
- Working with new or unusual data structures
- Optimizing performance in specific cases
- Understanding library implementations better

Remember that these are basic implementations for learning purposes. In production environments, you would typically use established libraries that have been thoroughly tested and optimized.

# Table of Contents
1. [Conditional Statements](#conditional-statements)
   - if/elif/else
   - Ternary operators
2. [Loops and Control Keywords](#loops)
   - for loops
   - while loops
   - break/continue
   - loop else
3. [Comprehensions](#comprehensions)
   - List comprehensions
   - Dictionary comprehensions
   - Generator expressions
4. [Flow Control in Functions](#functions)
   - return
   - yield
   - raise
   - assert
5. [Advanced Control Flow](#advanced)
   - Exception handling
   - Pattern matching
   - Async/await
   - Context managers

# 1. Conditional Statements

## Basic if/elif/else
The fundamental building blocks of decision making in Python.

### Simple if Statement
The most basic form of control flow. Execute code only if a condition is True.

In [4]:
# Simple if statement
age = int(input())
if age >= 18:
    print("You are an adult")

# If with boolean expression
is_student = True
has_id = True
if is_student and has_id:
    print("Welcome to the library")

You are an adult
Welcome to the library


### if-elif-else Chain
When you need to check multiple conditions in sequence. Python will check conditions in order and execute the first matching block.

In [None]:
# Grade calculator example
score = 85

if score >= 90:
    grade = 'A'
elif score >= 80:
    grade = 'B'
elif score >= 70:
    grade = 'C'
elif score >= 60:
    grade = 'D'
else:
    grade = 'F'

print(f"Score {score} is grade {grade}")

# Multiple conditions example
day = "Saturday"
time = 14  # 24-hour format

if day == "Saturday" and time < 12:
    print("Good morning, it's the weekend!")
elif day == "Saturday" and 12 <= time < 18:
    print("Good afternoon, enjoy your Saturday!")
elif day == "Saturday":
    print("Good evening, Saturday night!")
else:
    print("Not Saturday!")

### Ternary Operators
Conditional expressions in a single line. Useful for simple if-else assignments, but use sparingly to maintain readability.

In [None]:
# Basic ternary operator
age = 20
status = "adult" if age >= 18 else "minor"
print(status)

# Ternary with function calls
def is_even(n): return n % 2 == 0
numbers = [1, 2, 3, 4, 5]
parity = ["even" if is_even(n) else "odd" for n in numbers]
print(parity)

# Nested ternary (use sparingly!)
num = 0
result = "positive" if num > 0 else "zero" if num == 0 else "negative"
print(result)

# 2. Loops and Iteration

Python provides two main types of loops: `for` and `while`. Each has its specific use cases and control mechanisms.

### For Loops
The `for` loop is used to iterate over sequences (lists, tuples, dictionaries, sets, or strings) and other iterables.

Key points:
- Most common type of loop in Python
- Can iterate over any iterable object
- Built-in `range()` function for number sequences
- `enumerate()` for index access

In [None]:
# Basic for loop with list
fruits = ['apple', 'banana', 'orange']
for fruit in fruits:
    print(f"I like {fruit}")

# Using range()
for i in range(3):  # 0 to 2
    print(f"Count: {i}")

# enumerate() for index and value
for index, fruit in enumerate(fruits):
    print(f"{index+1}. {fruit}")

# Looping through strings
for char in "Python":
    print(char.upper(), end=' ')  # P Y T H O N

### Advanced For Loop Patterns
Common patterns and techniques with for loops:
- Nested loops
- Dictionary iteration
- Multiple sequence iteration with `zip()`
- List comprehension alternative

In [None]:
# Nested loops - Multiplication table
for i in range(1, 4):
    for j in range(1, 4):
        print(f"{i} x {j} = {i*j}")
    print("---")

# Dictionary iteration
student_scores = {
    'Alice': 95,
    'Bob': 82,
    'Charlie': 88
}
for name, score in student_scores.items():
    print(f"{name}: {'A' if score >= 90 else 'B'}")

# Using zip() for parallel iteration
names = ['Alice', 'Bob', 'Charlie']
ages = [24, 32, 28]
cities = ['New York', 'San Francisco', 'Chicago']
for name, age, city in zip(names, ages, cities):
    print(f"{name} is {age} years old and lives in {city}")

# Equivalent list comprehension
squares = [x**2 for x in range(5)]
print(squares)  # [0, 1, 4, 9, 16]

### While Loops
While loops continue executing as long as a condition remains True. Best used when:
- You don't know the number of iterations in advance
- You need to loop until a specific condition is met
- You're implementing event loops or infinite loops

⚠️ Remember to include a way to eventually break the loop to avoid infinite execution!

In [None]:
# Basic while loop - countdown
count = 5
while count > 0:
    print(count)
    count -= 1
print("Blast off!")

# While loop with user input
total = 0
while total < 10:
    num = 2  # Simulating user input
    total += num
    print(f"Total is now {total}")

# While with break condition
attempts = 0
while True:
    attempts += 1
    if attempts == 3:
        print("Maximum attempts reached")
        break
    print(f"Attempt {attempts}")

### Advanced While Loop Patterns
Common patterns and techniques with while loops:
- Using while with else
- Nested while loops
- Sentinel values
- Event loops
- Input validation

In [None]:
# While with else (runs when loop completes normally)
numbers = [1, 3, 5, 7, 9]
i = 0
while i < len(numbers):
    if numbers[i] % 2 == 0:
        print("Found even number:", numbers[i])
        break
    i += 1
else:
    print("No even numbers found")

# Nested while for matrix traversal
row = 0
while row < 3:
    col = 0
    while col < 3:
        print(f"({row},{col})", end=" ")
        col += 1
    print()  # new line
    row += 1

# Sentinel value pattern
def process_numbers():
    numbers = []
    while True:
        num = input("Enter a number (or 'q' to quit): ")
        if num == 'q':  # sentinel value
            break
        numbers.append(int(num))
    return numbers

# Input validation
def get_positive_number():
    while True:
        try:
            num = float(input("Enter a positive number: "))
            if num > 0:
                return num
            print("Please enter a positive number.")
        except ValueError:
            print("That's not a valid number.")

### Loop Control Statements
Special statements that modify loop behavior:
- `break`: Exit the loop immediately
- `continue`: Skip to the next iteration
- `pass`: Do nothing (placeholder)

These statements work in both `for` and `while` loops.

In [None]:
# break example - exit when condition met
for num in range(1, 11):
    if num == 5:
        break
    print(num, end=' ')
print("\nLoop ended at 5")

# continue example - skip even numbers
for num in range(1, 6):
    if num % 2 == 0:
        continue
    print(num, end=' ')
print("\nOnly odd numbers printed")

# pass example - placeholder for future code
for num in range(5):
    if num == 2:
        pass  # TODO: add special handling for 2
    print(num, end=' ')

5