# Week 2, Day 1: Dictionaries and Key-Value Data

## Programming Concept: Dictionaries

Today we'll learn about **dictionaries** - one of Python's most powerful data structures. While lists store data in order using numeric indices (0, 1, 2...), dictionaries store data using **keys** that can be strings, numbers, or other types.

### Key Programming Concepts:
- **Key-value pairs**: Each piece of data has a unique identifier (key) and a value
- **Hash tables**: Dictionaries use hashing for fast lookups (O(1) average time)
- **Mutable mapping**: You can add, modify, and remove key-value pairs
- **No ordering guarantee**: Dictionary items don't have a fixed order (though Python 3.7+ preserves insertion order)

### Why Dictionaries Matter:
- **Database-like structure**: Store related information together
- **Fast lookups**: Finding data by key is much faster than searching through lists
- **Real-world modeling**: Many data relationships are naturally key-value pairs
- **JSON compatibility**: Dictionaries translate directly to JSON format for web APIs

## Exercise 1: Basic Dictionary Operations

Let's start with creating and manipulating dictionaries. We'll work with a dataset of research participants and their test scores.

In [2]:
# Create a dictionary of participant IDs and their scores
participant_scores = {
    "P001": 85,
    "P002": 92,
    "P003": 78,
    "P004": 96,
    "P005": 88
}

print("Participant scores:")
print(participant_scores)
print(f"Number of participants: {len(participant_scores)}")

Participant scores:
{'P001': 85, 'P002': 92, 'P003': 78, 'P004': 96, 'P005': 88}
Number of participants: 5


In [6]:
# Task 1a: Access individual scores
# Get P003's score and store it in a variable called p003_score
p003_score = participant_scores.get("P003")
# Print both the participant ID and their score
print(f"P003's score: {p003_score}")


P003's score: 78


In [7]:
# Task 1b: Add new participants
# Add two new participants: P006 with score 89, P007 with score 94
participant_scores["P006"] = 89
participant_scores["P007"] = 94
# Print the updated dictionary
print(participant_scores)


{'P001': 85, 'P002': 92, 'P003': 78, 'P004': 96, 'P005': 88, 'P006': 89, 'P007': 94}


In [8]:
# Task 1c: Update an existing score
# P002 retook the test and got a score of 98
participant_scores["P002"] = 98
# Update their score and print the change
print(participant_scores)



{'P001': 85, 'P002': 98, 'P003': 78, 'P004': 96, 'P005': 88, 'P006': 89, 'P007': 94}


## Exercise 2: Dictionary Methods and Iteration

Dictionaries have powerful built-in methods for working with keys, values, and key-value pairs.

In [12]:
# Task 2a: Explore dictionary methods
# Print all participant IDs (keys)
print(participant_scores.keys())
# Print all scores (values)
print(participant_scores.values())
# Print all key-value pairs
print(participant_scores)



dict_keys(['P001', 'P002', 'P003', 'P004', 'P005', 'P006', 'P007'])
dict_values([85, 98, 78, 96, 88, 89, 94])
{'P001': 85, 'P002': 98, 'P003': 78, 'P004': 96, 'P005': 88, 'P006': 89, 'P007': 94}


In [19]:
# Task 2b: Calculate statistics
# Find the highest score and which participant got it
highest_score = max(participant_scores.values())
best_participant = max(participant_scores)
print(f"Highest score: {highest_score} by participant {best_participant}")

# Find the lowest score and which participant got it
lowest_score = min(participant_scores.values())
worst_participant = min(participant_scores)
print(f"Lowest score: {lowest_score} by participant {worst_participant}")

# Calculate the average score
average_score = sum(participant_scores.values()) / len(participant_scores)
print(f"Average score: {average_score}")



Highest score: 98 by participant P007
Lowest score: 78 by participant P001
Average score: 89.71428571428571


In [None]:
# Task 2c: Safe access with .get()
# Try to get the score for participant "P999" (who doesn't exist)
# Use the .get() method with a default value of 0
# Compare this to what happens if you try participant_scores["P999"]



## Exercise 3: Nested Dictionaries

Real-world data often requires storing multiple pieces of information about each entity. We'll use nested dictionaries to store more complex participant data.

In [20]:
# Create a more complex dataset with nested dictionaries
participants = {
    "P001": {
        "age": 25,
        "score": 85,
        "group": "A",
        "completion_time": 45
    },
    "P002": {
        "age": 32,
        "score": 98,
        "group": "B",
        "completion_time": 38
    },
    "P003": {
        "age": 28,
        "score": 78,
        "group": "A",
        "completion_time": 52
    }
}

print("Complex participant data:")
for participant_id, data in participants.items():
    print(f"{participant_id}: {data}")

Complex participant data:
P001: {'age': 25, 'score': 85, 'group': 'A', 'completion_time': 45}
P002: {'age': 32, 'score': 98, 'group': 'B', 'completion_time': 38}
P003: {'age': 28, 'score': 78, 'group': 'A', 'completion_time': 52}


In [25]:
# Task 3a: Access nested data
# Get P002's age and completion time
P002_age = participants["P002"]["age"]
P002_completion_time = participants["P002"]["age"]
# Print them in a formatted string
print(f"P002 is {P002_age} years old and has a completion time of {P002_completion_time}")


P002 is 32 years old and has a completion time of 32


In [27]:
# Task 3b: Add a new participant with complete data
# Add P004: age 29, score 91, group "B", completion_time 41
participants["P004"] = {"age": 29,
                        "score": 91,
                        "group": "B",
                        "completion_time": 41}

# Print the updated participants dictionary
print(participants)

{'P001': {'age': 25, 'score': 85, 'group': 'A', 'completion_time': 45}, 'P002': {'age': 32, 'score': 98, 'group': 'B', 'completion_time': 38}, 'P003': {'age': 28, 'score': 78, 'group': 'A', 'completion_time': 52}, 'P004': {'age': 29, 'score': 91, 'group': 'B', 'completion_time': 41}}


In [28]:
# Task 3c: Group analysis
# Calculate the average score for group A and group B separately
# Hint: Loop through participants and check their group
group_A = []
group_B = []

for participant, data in participants.items():
    if data["group"] == "A":
        group_A.append(data["score"])
    elif data["group"] == "B":
        group_B.append(data["score"])

group_A_avg = sum(group_A) / len(group_A)
group_B_avg = sum(group_B) / len(group_B)

print(f"Group A average score: {group_A_avg}")
print(f"Group B average score: {group_B_avg}")

Group A average score: 81.5
Group B average score: 94.5


## Exercise 4: Dictionary Comprehensions

Dictionary comprehensions are a powerful way to create dictionaries from other data structures, similar to list comprehensions.

In [35]:
# Task 4a: Create a dictionary from lists
# You have two lists: participant IDs and their corresponding reaction times
participant_ids = ["P001", "P002", "P003", "P004", "P005"]
reaction_times = [245, 198, 267, 223, 251]

# Create a dictionary using zip() and dict()
dictionary = dict(zip(participant_ids, reaction_times))
print(dictionary)
# Then create the same dictionary using a dictionary comprehension
my_dic = {k:v for (k,v) in zip(participant_ids, reaction_times)}
print(my_dic)



{'P001': 245, 'P002': 198, 'P003': 267, 'P004': 223, 'P005': 251}
{'P001': 245, 'P002': 198, 'P003': 267, 'P004': 223, 'P005': 251}


In [7]:
# Task 4b: Filter and transform data
# Create a new dictionary containing only participants with scores >= 85
# The values should be the scores converted to letter grades:
# 90-100: "A", 80-89: "B", 70-79: "C", below 70: "F"
filtered_scores = {k:v for k, v in participant_scores.items() if v >= 85}
print(filtered_scores)

grade_dict = {}

for participant, score in participant_scores.items():
    if score >= 90:
        grade_dict[participant] = "A"
    elif score >= 80:
        grade_dict[participant] = "B"
    elif score >= 70:
        grade_dict[participant] = "C"
    else:
        grade_dict[participant] = "F"

print(grade_dict)

{'P001': 85, 'P002': 92, 'P004': 96, 'P005': 88}
{'P001': 'B', 'P002': 'A', 'P003': 'C', 'P004': 'A', 'P005': 'B'}


In [14]:
# Task 4c: Count occurrences
# Given a list of test results, count how many times each score appears
test_results = [85, 92, 78, 85, 96, 88, 92, 85, 91, 78, 96, 88]



# Create a dictionary where keys are scores and values are counts
# Try to do this both with a regular loop and with a dictionary comprehension
score_counts = {}
for score in test_results:
    if score in score_counts:
        score_counts[score] += 1
    else:
        score_counts[score] = 1

score_counts = {score: test_results.count(score) for score in test_results}
print(score_counts)



{85: 3, 92: 2, 78: 2, 96: 2, 88: 2, 91: 1}


## Exercise 5: Practical Application

Let's combine everything we've learned to solve a real-world data analysis problem.

## Exercise 5: Practical Application

Let's combine everything we've learned to solve a real-world data analysis problem.

In [None]:
# Dataset: Survey responses from multiple participants across different questions
survey_data = {
    "participant_001": {"Q1": 4, "Q2": 3, "Q3": 5, "Q4": 4, "Q5": 3},
    "participant_002": {"Q1": 5, "Q2": 5, "Q3": 4, "Q4": 5, "Q5": 4},
    "participant_003": {"Q1": 3, "Q2": 4, "Q3": 3, "Q4": 4, "Q5": 5},
    "participant_004": {"Q1": 2, "Q2": 3, "Q3": 4, "Q4": 3, "Q5": 4},
    "participant_005": {"Q1": 5, "Q2": 4, "Q3": 5, "Q4": 5, "Q5": 3}
}

# Task 5a: Calculate average response for each question
# Create a dictionary where keys are question IDs (Q1, Q2, etc.)
# and values are the average responses for that question

# Create a dictionary to store question averages
question_averages = {}

# Get all question IDs (Q1, Q2, etc.) from the first participant
questions = list(survey_data["participant_001"].keys())


## Exercise 5: Practical Application

Let's combine everything we've learned to solve a real-world data analysis problem.

In [None]:
# Task 5b: Find participants with highest and lowest overall satisfaction
# Calculate the total score for each participant (sum of all their responses)
# Find who had the highest and lowest total scores



In [None]:
# Task 5c: Create a summary report
# Write a function that takes the survey_data and returns a summary dictionary containing:
# - Total number of participants
# - Average response across all questions and participants
# - Most positive question (highest average)
# - Most critical question (lowest average)

def analyze_survey(data):
    # Your code here
    pass

# Call your function and print the results


## Challenge: Advanced Dictionary Operations

For those who want to push further:

In [None]:
# Challenge 1: Merge dictionaries with conflict resolution
# You have two datasets that need to be combined
dataset1 = {"P001": 85, "P002": 92, "P003": 78}
dataset2 = {"P002": 94, "P003": 81, "P004": 89}

# Create a function that merges them, taking the higher score when there's a conflict



In [None]:
# Challenge 2: Invert a dictionary
# Given a dictionary mapping participant IDs to scores,
# create a new dictionary mapping scores to lists of participant IDs
# (since multiple participants might have the same score)

scores = {"P001": 85, "P002": 92, "P003": 85, "P004": 78, "P005": 92}

