# Notebook 5: Functions and Modules

Welcome to your fifth Python notebook! Now you'll learn how to organize your code using functions and modules - essential skills for writing clean, reusable data science code.

**Learning Objectives:**
- Define and call functions
- Use parameters and return values
- Understand variable scope
- Import and use modules
- Apply functions to data science problems

## Basic Function Definition

A function is a block of reusable code that performs a specific task:

In [1]:
# Simple function with no parameters
def greet():
    print("Hello, Data Scientist!")

# Call the function
greet()
greet()  # Can call multiple times

Hello, Data Scientist!
Hello, Data Scientist!


## Functions with Parameters and Return Values

Functions become powerful when they can accept input and return output:

In [2]:
# Function that returns a value - essential for data science!
def calculate_average(numbers):
    """Calculate the average of a list of numbers."""
    if len(numbers) == 0:
        return 0
    return sum(numbers) / len(numbers)

# Test with real data science scenario
student_scores = [85, 92, 78, 96, 88]
avg_score = calculate_average(student_scores)
print(f"Average score: {avg_score:.1f}")

# Function with multiple parameters
def analyze_data(data, show_details=False):
    """Analyze a dataset and optionally show details."""
    result = {
        'count': len(data),
        'min': min(data) if data else 0,
        'max': max(data) if data else 0,
        'avg': calculate_average(data)
    }
    
    if show_details:
        print(f"Dataset: {data}")
        for key, value in result.items():
            print(f"{key}: {value}")
    
    return result

# This pattern is common in ML notebooks!
analysis = analyze_data(student_scores, show_details=True)

Average score: 87.8
Dataset: [85, 92, 78, 96, 88]
count: 5
min: 78
max: 96
avg: 87.8


## Modules and Imports - Essential for Data Science

Understanding imports is crucial because every data science notebook starts with them!

In [3]:
# These are the imports you'll see in EVERY data science notebook!
import math
import random

# The "Big 3" data science imports (you'll learn these next!)
# import numpy as np              # Coming in next notebook!
# import matplotlib.pyplot as plt # Coming soon!
# import pandas as pd             # Most popular data manipulation library

# Using built-in modules
print("Using math module:")
data = [1, 4, 9, 16, 25]
sqrt_data = [math.sqrt(x) for x in data]
print(f"Original: {data}")
print(f"Square roots: {sqrt_data}")

# Random numbers - used a lot in ML for data splitting
print(f"\nRandom examples:")
random.seed(42)  # For reproducible results
random_sample = random.sample(data, 3)
print(f"Random sample of 3: {random_sample}")

# Import specific functions
from math import sqrt, pi
print(f"\nUsing imported functions directly:")
print(f"sqrt(16) = {sqrt(16)}")
print(f"pi = {pi:.4f}")

Using math module:
Original: [1, 4, 9, 16, 25]
Square roots: [1.0, 2.0, 3.0, 4.0, 5.0]

Random examples:
Random sample of 3: [1, 25, 9]

Using imported functions directly:
sqrt(16) = 4.0
pi = 3.1416


---

## 🔧 Common Function Errors and Troubleshooting

Understanding errors helps you become a better programmer!

In [3]:
# Common Function Errors - Learn to Fix Them!

# 1. NameError - Function not defined or called before definition
def demonstrate_name_error():
    print("This function is properly defined")
    
# This would cause error: undefined_function()
# Fix: Make sure function is defined before calling

# 2. TypeError - Wrong number of arguments
def calculate_bmi(weight, height):
    """Calculate BMI given weight (kg) and height (m)"""
    return weight / (height ** 2)

# This works:
bmi = calculate_bmi(70, 1.75)
print(f"BMI: {bmi:.1f}")

# This would cause error: calculate_bmi(70)  # Missing height argument
# Fix: Provide all required arguments or use default parameters

# 3. Better function with error handling
def safe_divide(a, b):
    """Safely divide two numbers"""
    if b == 0:
        print(f"Warning: Cannot divide {a} by zero")
        return None
    return a / b

# Test with different scenarios
print(f"10 / 2 = {safe_divide(10, 2)}")
print(f"10 / 0 = {safe_divide(10, 0)}")

# 4. Function with input validation - common in data science
def analyze_scores(scores):
    """Analyze test scores with validation"""
    # Input validation
    if not scores:
        return {"error": "No scores provided"}
    
    if not all(isinstance(score, (int, float)) for score in scores):
        return {"error": "All scores must be numbers"}
    
    # Analysis
    return {
        "count": len(scores),
        "average": sum(scores) / len(scores),
        "min": min(scores),
        "max": max(scores),
        "range": max(scores) - min(scores)
    }

# Test with valid and invalid data
print("Valid data:", analyze_scores([85, 92, 78, 96]))
print("Invalid data:", analyze_scores([85, "invalid", 78]))
print("Empty data:", analyze_scores([]))

BMI: 22.9
10 / 2 = 5.0
10 / 0 = None
Valid data: {'count': 4, 'average': 87.75, 'min': 78, 'max': 96, 'range': 18}
Invalid data: {'error': 'All scores must be numbers'}
Empty data: {'error': 'No scores provided'}


---

## 📝 Mini-Challenge: Build Your Data Science Toolkit

### Challenge 1: Temperature Converter
Create functions that data scientists use when working with international temperature data:

In [5]:
# Challenge 1: Temperature Converter Functions

def celsius_to_fahrenheit(celsius):
    """Convert Celsius to Fahrenheit"""
    return (celsius * 9/5) + 32

def fahrenheit_to_celsius(fahrenheit):
    """Convert Fahrenheit to Celsius"""
    return (fahrenheit - 32) * 5/9

def analyze_temperature_data(temperatures, unit="celsius"):
    """Analyze a list of temperatures"""
    if unit.lower() == "fahrenheit":
        # Convert to Celsius for analysis
        celsius_temps = [fahrenheit_to_celsius(temp) for temp in temperatures]
    else:
        celsius_temps = temperatures
    
    analysis = {
        "count": len(celsius_temps),
        "average_c": sum(celsius_temps) / len(celsius_temps),
        "min_c": min(celsius_temps),
        "max_c": max(celsius_temps)
    }
    
    # Add Fahrenheit equivalents
    analysis["average_f"] = celsius_to_fahrenheit(analysis["average_c"])
    analysis["min_f"] = celsius_to_fahrenheit(analysis["min_c"])
    analysis["max_f"] = celsius_to_fahrenheit(analysis["max_c"])
    
    return analysis

# Test your functions
cities_temp_c = [20, 25, 18, 30, 22]  # Celsius
cities_temp_f = [68, 77, 64, 86, 72]  # Fahrenheit

print("=== TEMPERATURE ANALYSIS ===")
print("Celsius data:", analyze_temperature_data(cities_temp_c, "celsius"))
print("\nFahrenheit data:", analyze_temperature_data(cities_temp_f, "fahrenheit"))

=== TEMPERATURE ANALYSIS ===
Celsius data: {'count': 5, 'average_c': 23.0, 'min_c': 18, 'max_c': 30, 'average_f': 73.4, 'min_f': 64.4, 'max_f': 86.0}

Fahrenheit data: {'count': 5, 'average_c': 23.0, 'min_c': 17.77777777777778, 'max_c': 30.0, 'average_f': 73.4, 'min_f': 64.0, 'max_f': 86.0}


### Challenge 2: Data Cleaning Function
Create a function that cleans messy data - a common task in data science:

In [6]:
# Challenge 2: Data Cleaning Function
def clean_numeric_data(raw_data):
    """
    Clean a list of mixed data types and extract valid numbers
    Returns: dictionary with cleaned data and cleaning report
    """
    cleaned_numbers = []
    invalid_items = []
    
    for item in raw_data:
        try:
            # Try to convert to float
            if isinstance(item, str):
                item = item.strip()  # Remove whitespace
                if item == '':
                    invalid_items.append("empty string")
                    continue
            
            number = float(item)
            cleaned_numbers.append(number)
        except (ValueError, TypeError):
            invalid_items.append(item)
    
    # Generate cleaning report
    report = {
        "original_count": len(raw_data),
        "cleaned_count": len(cleaned_numbers),
        "invalid_count": len(invalid_items),
        "cleaned_data": cleaned_numbers,
        "invalid_items": invalid_items,
        "success_rate": len(cleaned_numbers) / len(raw_data) * 100 if raw_data else 0
    }
    
    return report

# Test with messy data (common in real datasets!)
messy_data = [85, "92", 78.5, "96", "", None, "invalid", 88, "85.5", "   90   "]

cleaning_result = clean_numeric_data(messy_data)

print("=== DATA CLEANING REPORT ===")
print(f"Original data: {messy_data}")
print(f"Cleaned numbers: {cleaning_result['cleaned_data']}")
print(f"Invalid items: {cleaning_result['invalid_items']}")
print(f"Success rate: {cleaning_result['success_rate']:.1f}%")

# Use the cleaned data for analysis
if cleaning_result['cleaned_data']:
    avg = sum(cleaning_result['cleaned_data']) / len(cleaning_result['cleaned_data'])
    print(f"Average of cleaned data: {avg:.1f}")

=== DATA CLEANING REPORT ===
Original data: [85, '92', 78.5, '96', '', None, 'invalid', 88, '85.5', '   90   ']
Cleaned numbers: [85.0, 92.0, 78.5, 96.0, 88.0, 85.5, 90.0]
Invalid items: ['empty string', None, 'invalid']
Success rate: 70.0%
Average of cleaned data: 87.9


---

## ✅ Self-Assessment Checklist

Before moving to the next notebook, make sure you can:

**Function Basics:**
- [ ] Define functions with `def` keyword
- [ ] Call functions and understand return values
- [ ] Use parameters and default values
- [ ] Write docstrings to document functions
- [ ] Handle errors in functions gracefully

**Module Usage:**
- [ ] Import built-in modules (math, random, etc.)
- [ ] Use different import styles (import, from...import, as)
- [ ] Understand when to use each import style
- [ ] Know the standard data science imports

**Data Science Applications:**
- [ ] Write functions for data cleaning and validation
- [ ] Create reusable analysis functions
- [ ] Handle different data types in functions
- [ ] Debug common function errors

**Best Practices:**
- [ ] Use meaningful function and parameter names
- [ ] Keep functions focused on single tasks
- [ ] Add error handling where appropriate
- [ ] Test functions with different inputs

**Data Science Connection:** Functions are essential for:
- Creating reusable analysis workflows
- Cleaning and validating data consistently
- Building custom calculations and transformations
- Organizing complex data science projects
- Making your code readable and maintainable

---

## 🚀 What's Next?

In the next notebook, you'll learn about:
- **NumPy**: The foundation of scientific computing in Python
- **Arrays**: Efficient numerical operations on large datasets
- **Mathematical functions**: Statistical and mathematical operations
- **Performance**: Why NumPy is faster than pure Python

NumPy is the building block for pandas, matplotlib, and most ML libraries!