# Notebook 4: Dictionaries and Advanced Operations

Welcome to your fourth Python notebook! Now you'll learn about dictionaries - one of Python's most powerful data structures for organizing and accessing data efficiently.

**Learning Objectives:**
- Create and manipulate dictionaries
- Use dictionaries for data organization
- Work with nested data structures
- Understand when to use lists vs dictionaries
- Apply advanced data manipulation techniques

## Introduction to Dictionaries

A dictionary is a collection of key-value pairs. Think of it like a real dictionary where you look up a word (key) to find its definition (value).

In [None]:
# Creating dictionaries
student = {
    "name": "Alice",
    "age": 20,
    "major": "Computer Science",
    "gpa": 3.8
}

print("Student dictionary:", student)
print("Type:", type(student))

# Empty dictionary
empty_dict = {}
print("Empty dictionary:", empty_dict)

## Accessing Dictionary Values

Use keys to access values in dictionaries:

In [None]:
student = {
    "name": "Alice",
    "age": 20,
    "major": "Computer Science",
    "gpa": 3.8
}

# Accessing values using keys
print(f"Student name: {student['name']}")
print(f"Student age: {student['age']}")
print(f"Student GPA: {student['gpa']}")
print()

# Safe access using .get() method
print(f"Major: {student.get('major')}")
print(f"Phone (not exists): {student.get('phone')}")
print(f"Phone with default: {student.get('phone', 'Not provided')}")

## Modifying Dictionaries

Dictionaries are mutable - you can change, add, and remove items:

In [None]:
# Starting with basic info
person = {
    "name": "Bob",
    "age": 25
}
print("Original:", person)

# Adding new key-value pairs
person["city"] = "New York"
person["job"] = "Data Analyst"
print("After adding:", person)

# Modifying existing values
person["age"] = 26  # Birthday!
print("After birthday:", person)

# Removing items
del person["city"]  # Remove city
print("After removing city:", person)

# Remove and get the value
job = person.pop("job")
print(f"Removed job: {job}")
print("Final person:", person)

## Dictionary Methods

Dictionaries have many useful methods for data manipulation:

In [None]:
# Sample data - like a small database
inventory = {
    "apples": 50,
    "bananas": 30,
    "oranges": 25,
    "grapes": 40
}

print("Inventory:", inventory)
print()

# Get all keys, values, and items
print("All fruits (keys):", list(inventory.keys()))
print("All quantities (values):", list(inventory.values()))
print("All items (key-value pairs):", list(inventory.items()))
print()

# Dictionary length
print(f"Number of fruit types: {len(inventory)}")

# Check if key exists
print(f"Do we have apples? {'apples' in inventory}")
print(f"Do we have mangoes? {'mangoes' in inventory}")

## Iterating Through Dictionaries

Different ways to loop through dictionary data:

In [None]:
grades = {
    "Math": 85,
    "Science": 92,
    "English": 78,
    "History": 88
}

# Method 1: Iterate through keys
print("Method 1 - Keys only:")
for subject in grades:
    print(f"{subject}: {grades[subject]}")
print()

# Method 2: Iterate through key-value pairs
print("Method 2 - Key-value pairs:")
for subject, grade in grades.items():
    print(f"{subject}: {grade}")
print()

# Method 3: Iterate through values only
print("Method 3 - Values only:")
for grade in grades.values():
    print(f"Grade: {grade}")

## Dictionary Comprehensions

Like list comprehensions, but for dictionaries:

In [None]:
# Create a dictionary of squares
squares = {x: x**2 for x in range(1, 6)}
print("Squares:", squares)

# Convert temperatures from Celsius to Fahrenheit
celsius_temps = {"morning": 20, "noon": 30, "evening": 25}
fahrenheit_temps = {time: (temp * 9/5) + 32 for time, temp in celsius_temps.items()}
print("Celsius:", celsius_temps)
print("Fahrenheit:", fahrenheit_temps)

# Filter dictionary based on condition
scores = {"Alice": 85, "Bob": 92, "Charlie": 78, "Diana": 96}
high_scores = {name: score for name, score in scores.items() if score >= 90}
print("All scores:", scores)
print("High scores (>=90):", high_scores)

## Nested Dictionaries - Complex Data Structures

Dictionaries can contain other dictionaries, creating complex data structures:

In [None]:
# Student database with nested information
students_db = {
    "student_001": {
        "name": "Alice Johnson",
        "age": 20,
        "major": "Computer Science",
        "grades": {
            "Math": 85,
            "Physics": 92,
            "Programming": 96
        },
        "contact": {
            "email": "alice@email.com",
            "phone": "555-0123"
        }
    },
    "student_002": {
        "name": "Bob Smith",
        "age": 21,
        "major": "Data Science",
        "grades": {
            "Statistics": 88,
            "Python": 94,
            "Machine Learning": 90
        },
        "contact": {
            "email": "bob@email.com",
            "phone": "555-0456"
        }
    }
}

# Accessing nested data
print("Alice's name:", students_db["student_001"]["name"])
print("Alice's Math grade:", students_db["student_001"]["grades"]["Math"])
print("Bob's email:", students_db["student_002"]["contact"]["email"])
print()

# Processing nested data
print("Student Summary:")
for student_id, info in students_db.items():
    name = info["name"]
    major = info["major"]
    grades = info["grades"]
    avg_grade = sum(grades.values()) / len(grades)
    
    print(f"{name} ({major}): Average grade = {avg_grade:.1f}")

## Combining Lists and Dictionaries

Real-world data often combines both structures:

In [None]:
# List of dictionaries - like a database table
employees = [
    {"name": "Alice", "department": "Engineering", "salary": 75000, "years": 3},
    {"name": "Bob", "department": "Marketing", "salary": 65000, "years": 5},
    {"name": "Charlie", "department": "Engineering", "salary": 80000, "years": 7},
    {"name": "Diana", "department": "Sales", "salary": 70000, "years": 2}
]

print("Employee Database:")
for emp in employees:
    print(f"{emp['name']}: {emp['department']}, ${emp['salary']:,}, {emp['years']} years")
print()

# Dictionary with lists as values
department_data = {
    "Engineering": ["Alice", "Charlie"],
    "Marketing": ["Bob"],
    "Sales": ["Diana"]
}

print("Employees by Department:")
for dept, emp_list in department_data.items():
    print(f"{dept}: {', '.join(emp_list)}")

## Data Analysis with Dictionaries

Dictionaries are excellent for counting and grouping data:

In [None]:
# Analyzing survey responses
responses = ["Yes", "No", "Yes", "Maybe", "Yes", "No", "Yes", "Maybe", "Yes", "No"]

# Count responses
response_counts = {}
for response in responses:
    if response in response_counts:
        response_counts[response] += 1
    else:
        response_counts[response] = 1

print("Survey Results:")
for response, count in response_counts.items():
    percentage = (count / len(responses)) * 100
    print(f"{response}: {count} responses ({percentage:.1f}%)")
print()

# Alternative using .get() method
word_text = "the quick brown fox jumps over the lazy dog"
word_counts = {}

for word in word_text.split():
    word_counts[word] = word_counts.get(word, 0) + 1

print("Word frequency:")
for word, count in word_counts.items():
    print(f"'{word}': {count}")

## When to Use Lists vs Dictionaries

Understanding when to use each data structure is important:

In [None]:
# Use LISTS when:
# - Order matters
# - You need to access items by position
# - You have duplicate values

test_scores = [85, 92, 78, 96, 88]  # Order matters (chronological)
shopping_list = ["milk", "bread", "eggs", "milk"]  # Duplicates allowed

print("Test scores (ordered):", test_scores)
print("First score:", test_scores[0])
print("Shopping list:", shopping_list)
print()

# Use DICTIONARIES when:
# - You need fast lookups by key
# - Data has natural key-value relationships
# - Order doesn't matter (or you want to organize by meaningful keys)

student_grades = {"Alice": 85, "Bob": 92, "Charlie": 78}  # Fast lookup by name
config_settings = {"debug": True, "max_users": 100, "timeout": 30}  # Key-value pairs

print("Student grades:", student_grades)
print("Alice's grade:", student_grades["Alice"])  # Fast lookup
print("Configuration:", config_settings)

## Practice Exercises

These exercises will help you master dictionaries for data science:

### Exercise 1: Data Aggregation
Given sales data, calculate total sales by product category.

In [None]:
# Sales transactions: [product_category, amount]
sales_data = [
    ["Electronics", 1200],
    ["Clothing", 800],
    ["Electronics", 1500],
    ["Books", 300],
    ["Clothing", 600],
    ["Electronics", 900],
    ["Books", 250],
    ["Clothing", 750]
]

# Your code here: calculate total sales by category
category_totals = {}

for category, amount in sales_data:
    # Add your logic here
    pass

print("Sales by Category:")
for category, total in category_totals.items():
    print(f"{category}: ${total:,}")

# Bonus: Find the category with highest sales
# top_category = max(category_totals, key=category_totals.get)
# print(f"\nTop category: {top_category} (${category_totals[top_category]:,})")

### Exercise 2: Data Cleaning and Transformation
Clean and organize messy customer data.

In [None]:
# Messy customer data
raw_customers = [
    "Alice,25,New York,alice@email.com",
    "Bob,30,Los Angeles,bob@email.com",
    "Charlie,28,Chicago,charlie@email.com",
    "Diana,35,Houston,diana@email.com"
]

# Your code here: convert to list of dictionaries
customers = []

for customer_string in raw_customers:
    # Split the string and create a dictionary
    parts = customer_string.split(",")
    customer_dict = {
        # Add your code here
    }
    customers.append(customer_dict)

print("Cleaned customer data:")
for customer in customers:
    print(customer)

# Bonus: Find average age
# ages = [customer["age"] for customer in customers]
# avg_age = sum(ages) / len(ages)
# print(f"\nAverage age: {avg_age:.1f}")

### Exercise 3: Nested Data Analysis
Analyze a complex dataset with nested structures.

In [None]:
# Company data with departments and employees
company_data = {
    "Engineering": {
        "employees": ["Alice", "Bob", "Charlie"],
        "budget": 500000,
        "projects": ["Web App", "Mobile App", "API"]
    },
    "Marketing": {
        "employees": ["Diana", "Eve"],
        "budget": 200000,
        "projects": ["Campaign A", "Campaign B"]
    },
    "Sales": {
        "employees": ["Frank", "Grace", "Henry", "Ivy"],
        "budget": 300000,
        "projects": ["Q1 Sales", "Q2 Sales"]
    }
}

print("Company Analysis:")
print("=" * 40)

total_employees = 0
total_budget = 0
total_projects = 0

for dept_name, dept_info in company_data.items():
    num_employees = len(dept_info["employees"])
    budget = dept_info["budget"]
    num_projects = len(dept_info["projects"])
    
    print(f"{dept_name}:")
    print(f"  Employees: {num_employees}")
    print(f"  Budget: ${budget:,}")
    print(f"  Projects: {num_projects}")
    print(f"  Budget per employee: ${budget/num_employees:,.0f}")
    print()
    
    total_employees += num_employees
    total_budget += budget
    total_projects += num_projects

print("Company Totals:")
print(f"Total employees: {total_employees}")
print(f"Total budget: ${total_budget:,}")
print(f"Total projects: {total_projects}")
print(f"Average budget per employee: ${total_budget/total_employees:,.0f}")

### Exercise 4: Data Transformation Pipeline
Create a data processing pipeline using dictionaries.

In [None]:
# Raw sensor data: timestamp, temperature, humidity
sensor_readings = [
    ("2024-01-01 08:00", 22.5, 65),
    ("2024-01-01 09:00", 23.1, 62),
    ("2024-01-01 10:00", 24.8, 58),
    ("2024-01-01 11:00", 26.2, 55),
    ("2024-01-01 12:00", 27.5, 52),
    ("2024-01-01 13:00", 28.1, 50)
]

# Step 1: Convert to structured format
structured_data = []
for timestamp, temp, humidity in sensor_readings:
    reading = {
        "timestamp": timestamp,
        "temperature_c": temp,
        "humidity_percent": humidity,
        "temperature_f": (temp * 9/5) + 32,  # Convert to Fahrenheit
        "comfort_level": "comfortable" if 20 <= temp <= 25 and 40 <= humidity <= 60 else "uncomfortable"
    }
    structured_data.append(reading)

print("Processed Sensor Data:")
for reading in structured_data:
    print(f"{reading['timestamp']}: {reading['temperature_c']}°C ({reading['temperature_f']:.1f}°F), "
          f"{reading['humidity_percent']}% humidity - {reading['comfort_level']}")

# Step 2: Calculate statistics
temperatures = [reading["temperature_c"] for reading in structured_data]
humidities = [reading["humidity_percent"] for reading in structured_data]

stats = {
    "temperature": {
        "min": min(temperatures),
        "max": max(temperatures),
        "avg": sum(temperatures) / len(temperatures)
    },
    "humidity": {
        "min": min(humidities),
        "max": max(humidities),
        "avg": sum(humidities) / len(humidities)
    }
}

print("\nDaily Statistics:")
print(f"Temperature: {stats['temperature']['min']}°C - {stats['temperature']['max']}°C (avg: {stats['temperature']['avg']:.1f}°C)")
print(f"Humidity: {stats['humidity']['min']}% - {stats['humidity']['max']}% (avg: {stats['humidity']['avg']:.1f}%)")

# Step 3: Count comfort levels
comfort_counts = {}
for reading in structured_data:
    level = reading["comfort_level"]
    comfort_counts[level] = comfort_counts.get(level, 0) + 1

print("\nComfort Level Distribution:")
for level, count in comfort_counts.items():
    percentage = (count / len(structured_data)) * 100
    print(f"{level.title()}: {count} readings ({percentage:.1f}%)")

## Key Takeaways

1. **Dictionaries** store key-value pairs for fast lookups and data organization
2. **Keys** must be unique and immutable (strings, numbers, tuples)
3. **Dictionary methods** like .get(), .keys(), .values(), .items() are essential
4. **Dictionary comprehensions** provide concise data transformation
5. **Nested dictionaries** handle complex, hierarchical data
6. **Combining lists and dictionaries** creates powerful data structures
7. **Use dictionaries** for counting, grouping, and organizing data
8. **Choose the right structure**: lists for ordered data, dictionaries for key-value relationships

Dictionaries are fundamental to data science work. You'll use them for data cleaning, transformation, analysis, and organizing results. Combined with lists, they form the backbone of most data processing tasks. In the next notebook, we'll learn about functions and code organization to make your data science code more efficient and reusable.