# Week 2, Class 2: Data Structures: Tuples, Dictionaries & Collections

## 1. Tuples: Immutable Sequences
A tuple is an ordered, immutable (unchangeable) collection of items. Like lists, tuples can hold items of different data types and allow duplicate values.

Tuples are defined by enclosing items in parentheses `()`, with items separated by commas. A single-element tuple requires a trailing comma `(item,)`.

In [None]:
# Example 1: An empty tuple
empty_tuple = ()
print(f"Empty tuple: {empty_tuple}, Type: {type(empty_tuple)}")

In [None]:
# Example 1.1: An empty tuple
empty_tuple = tuple()
print(f"Empty tuple: {empty_tuple}, Type: {type(empty_tuple)}")

In [None]:
# Example 2: A tuple of coordinates (fixed position)
coordinates: tuple[float, float] = (10.5, 20.3)
print(f"Coordinates: {coordinates}")

In [None]:
# Example 3: A tuple of mixed data types (e.g., experiment result)
experiment_result: tuple[str, int, float] = ("Run_A", 5, 98.7)
print(f"Experiment result: {experiment_result}")

In [None]:
# Example 4: A tuple with a single element (note the comma!)
single_element_tuple = (123,)
print(f"Single element tuple: {single_element_tuple}, Type: {type(single_element_tuple)}")

### Accessing Tuple Elements (Indexing and Slicing)

Accessing elements in a tuple works exactly like lists, using zero-based indexing and slicing.

In [None]:
data_point = (2024, "Spectrometer", 15.7, "nm")

print(f"Year: {data_point[0]}")
print(f"Measurement: {data_point[2]}")
print(f"Last element: {data_point[-1]}")

In [None]:
# Slicing a tuple
subset = data_point[1:3]
print(f"Subset (instrument, value): {subset}")

### Immutability of Tuples

Once created, you cannot change the contents of a tuple. This is their defining characteristic and a key difference from lists.

In [None]:
my_tuple = (1, 2, 3)
my_tuple[0] = 10-

In [None]:
my_tuple.append(4)

**When to use Tuples?**
* When you have a collection of items that should not change (e.g., coordinates, fixed configurations).
* As return values from functions, where the order and number of returned items are fixed.
* As keys in dictionaries (unlike lists, tuples can be dictionary keys because they are immutable).

## 2. Dictionaries: Key-Value Pairs
A dictionary is an unordered (in Python versions before 3.7), mutable collection of items. Each item in a dictionary is a key-value pair. Keys must be unique and immutable (like strings, numbers, or tuples), while values can be of any data type and can be duplicated.

* **Unordered/Insertion Ordered (Python 3.7+)**: Items do not have a defined index. You access values using their keys. From Python 3.7 onwards, dictionaries maintain insertion order.
* **Mutable**: You can add, modify, or remove key-value pairs.
* **Keys must be unique**: No two items can have the same key.
* **Keys must be immutable**: Keys can be strings, numbers, or tuples, but not lists or other mutable types.
* **Values can be anything**: Values can be of any data type and can be duplicated.

Dictionaries are defined by enclosing key-value pairs in curly braces `{}`, with keys and values separated by a colon `:`, and pairs separated by commas.

In [None]:
# Example 1: An empty dictionary
empty_dict = {}
print(f"Empty dictionary: {empty_dict}, Type: {type(empty_dict)}")

In [None]:
# Example 1.1: An empty dictionary
empty_dict = dict()
print(f"Empty dictionary: {empty_dict}, Type: {type(empty_dict)}")

In [None]:
# Example 2: Storing experimental parameters
experiment_params: dict[str, float] = {
    "temperature": 298.15, # Key: "temperature", Value: 298.15
    "pressure": 101.3,
    "duration_hours": 2.5
}
print(f"Experiment parameters: {experiment_params}")

In [None]:
# Example 3: Storing metadata for a sample
sample_metadata: dict[str, str | int] = {
    "sample_id": "XYZ-001",
    "collection_date": "2025-07-22",
    "analyst": "Dr. Smith",
    "batch_number": 5
}
print(f"Sample metadata: {sample_metadata}")

### Accessing Values in Dictionaries

You access values by referring to their corresponding key using square brackets `[]`.

In [None]:
sensor_data = {
    "humidity": 65.2,
    "light_intensity": 500,
    "battery_status": "Good"
}

print(f"Current humidity: {sensor_data['humidity']}")
print(f"Battery status: {sensor_data['battery_status']}")

# Using the .get() method (safer for potentially missing keys)
# If the key doesn't exist, .get() returns None by default, or a specified default value
print(f"Light intensity (using .get()): {sensor_data.get('light_intensity')}")
print(f"Last calibration date (using .get() with default): {sensor_data.get('last_calibration', 'Not available')}")

# Accessing a non-existent key using [] will raise a KeyError
print(sensor_data['non_existent_key'])

### Adding and Modifying Dictionary Items

You can add new key-value pairs or change existing values by assigning a new value to a key.

In [None]:
analysis_report = {
    "sample_id": "S-005",
    "result_A": 12.3,
    "status": "Pending"
}
print(f"Initial report: {analysis_report}")

# Add a new key-value pair
analysis_report["analyst"] = "Dr. Lee"
print(f"After adding analyst: {analysis_report}")

# Modify an existing value
analysis_report["status"] = "Completed"
print(f"After updating status: {analysis_report}")

# Add another result
analysis_report["result_B"] = 5.67
print(f"After adding result_B: {analysis_report}")

### Removing Dictionary Items

* `del dictionary[key]`: Deletes the item with the specified key.
* `dictionary.pop(key)`: Removes the item with the specified key and returns its value. Raises `KeyError` if the key is not found.
* `dictionary.popitem()`: Removes and returns the last inserted key-value pair (in Python 3.7+).
* `dictionary.clear()`: Removes all items from the dictionary.

In [None]:
sensor_config = {
    "name": "Temperature Sensor",
    "location": "Lab 1",
    "calibration_date": "2025-01-15",
    "status": "Active"
}
print(f"Initial config: {sensor_config}")

# Remove an item using del
del sensor_config["status"]
print(f"After del 'status': {sensor_config}")

# Remove and get value using pop()
removed_location = sensor_config.pop("location")
print(f"After pop('location'): {sensor_config}, Removed: {removed_location}")

# Clear all items
sensor_config.clear()
print(f"After clear(): {sensor_config}")

### Iterating Through Dictionaries

You can iterate through dictionaries in several ways:

* Iterate through **keys** (default): `for key in dictionary:`
* Iterate through **values**: `for value in dictionary.values():`
* Iterate through **key-value pairs**: `for key, value in dictionary.items():`

In [None]:
experiment_results = {
    "run_001": 25.7,
    "run_002": 26.1,
    "run_003": 25.9
}

print("Iterating through keys:")
for run_id in experiment_results: # Default iteration is over keys
    print(run_id)

In [None]:
print("Iterating through values:")
for temp_value in experiment_results.values():
    print(temp_value)

In [None]:
print("Iterating through key-value pairs:")
for run_id, temp_value in experiment_results.items():
    print(f"Run ID: {run_id}, Temperature: {temp_value}°C")

## 3. The `collections` Module: Specialized Data Structures

Python's standard library includes a module called `collections` that provides specialized container datatypes. These are often more efficient or provide more convenient functionality than general-purpose `dict`, `list`, or `tuple`.

We'll look at two very useful ones for scientific data: `Counter` and `defaultdict`.

### 3.1. `Counter`: Counting Hashable Objects

A `Counter` is a `dict` subclass for counting hashable objects. It's incredibly useful for frequency analysis, like counting occurrences of words, characters, or experimental categories.

In [None]:
from collections import Counter

# Example 1: Counting elements in a list (e.g., types of samples)
sample_types_list = ["control", "treatment_A", "control", "treatment_B", "treatment_A", "control"]
type_counts = Counter(sample_types_list)
print(f"Sample type counts: {type_counts}")

# Accessing counts
print(f"Count of 'control': {type_counts['control']}")
print(f"Count of 'treatment_C' (not present): {type_counts['treatment_C']}") # Returns 0 for missing items

In [None]:
# Example 2: Counting characters in a DNA sequence
dna_sequence_str = "ATGCATGCATGGCA"
base_counts = Counter(dna_sequence_str)
print(f"DNA base counts: {base_counts}")

# Most common elements
print(f"Two most common bases: {base_counts.most_common(2)}")

### 3.2. `defaultdict`: Dictionaries with Default Values

A `defaultdict` is another `dict` subclass that calls a factory function (e.g., `list`, `int`, `float`) to supply missing values. This is incredibly useful when you want to append items to a list associated with a key, or count occurrences, without having to check if the key already exists every time.

In [None]:
from collections import defaultdict

# Example 1: Grouping data by category (e.g., experimental runs by status)
# If a key is accessed and not found, it creates an empty list for that key
grouped_runs = defaultdict(list)

run_data = [
    ("Run_1", "Success"),
    ("Run_2", "Failure"),
    ("Run_3", "Success"),
    ("Run_4", "Pending"),
    ("Run_5", "Failure")
]

for run_id, status in run_data:
    grouped_runs[status].append(run_id)

print(f"Runs grouped by status: {grouped_runs}")

In [None]:
grouped_runs = dict(run_data)
print(grouped_runs)

In [None]:
# Example 2: Counting with defaultdict (similar to Counter, but manual)
# If a key is accessed and not found, it creates an integer 0 for that key
item_counts = defaultdict(int)
items = ["apple", "banana", "apple", "orange", "banana", "apple"]

for item in items:
    item_counts[item] += 1

print(f"Item counts with defaultdict: {item_counts}")

`defaultdict` simplifies code by removing the need for `if key in dict:` checks when building collections.

## Summary and Key Takeaways

* **Tuples** are ordered, immutable collections defined with `()`. They are good for fixed data sets and as dictionary keys.
* **Dictionaries** are mutable collections of unique key-value pairs, defined with `{}`. They are excellent for mapping and storing structured metadata.
* Access dictionary values by `dictionary[key]` or `dictionary.get(key)`.
* Iterate through dictionaries using `keys()`, `values()`, or `items()`.
* The **`collections` module** provides specialized data structures:
    * **`Counter`** is a convenient way to count occurrences of hashable items.
    * **`defaultdict`** simplifies building dictionaries where values are collections (like lists or counts) by providing default values for new keys.

## Exercises

Complete the following exercises in a new Python script or a new Jupyter Notebook.

1.  **Tuple for Chemical Formula:**
    * Create a tuple `water_molecule: tuple[str, int] = ("H2O", 18)`. The first element is the formula, the second is its molar mass.
    * Print the chemical formula.
    * Try (and observe the error if you uncomment) to change the molar mass to `18.015`. Explain why this fails.

2.  **Experiment Log Dictionary:**
    * Create a dictionary called `experiment_log` to store details for a single experiment.
    * Add the following key-value pairs:
        * `"experiment_id"`: `"Exp-2025-A"`
        * `"date"`: `"2025-07-23"`
        * `"scientist"`: `"Dr. Jane Doe"`
        * `"temperature_c"`: `25.0`
        * `"pressure_atm"`: `1.2`
    * Print the entire `experiment_log` dictionary.
    * Update the `temperature_c` to `25.5`.
    * Add a new entry: `"result_code"`: `"SUCCESS"`.
    * Print the updated `experiment_log`.
    * Access and print the `scientist`'s name.

3.  **Data Analysis with Dictionary Iteration:**
    * You have a dictionary of sensor readings:
        `sensor_readings: dict[str, float] = {"sensor_A": 10.5, "sensor_B": 12.1, "sensor_C": 9.8, "sensor_D": 11.2}`
    * Iterate through the dictionary and print each sensor ID and its reading in the format: "Sensor [ID]: [Reading] units".
    * Calculate the average of all sensor readings. Print the average.

4.  **Using `Counter` for Sample Analysis:**
    * A list of material types found in a geological survey:
        `materials: list[str] = ["quartz", "feldspar", "mica", "quartz", "calcite", "mica", "quartz", "feldspar"]`
    * Use `collections.Counter` to count the occurrences of each material.
    * Print the `Counter` object.
    * Print the count of "quartz".
    * Print the two most common materials.

5.  **Using `defaultdict` for Grouping Data:**
    * You have a list of (student_id, grade) pairs:
        `grades_data: list[tuple[str, str]] = [("S001", "A"), ("S002", "B"), ("S001", "B"), ("S003", "A"), ("S002", "C")]`
    * Use `collections.defaultdict(list)` to group the grades by student ID. The result should be a dictionary where each key is a student ID and its value is a list of grades for that student.
    * Print the resulting `defaultdict` object.
    * Print the grades for student "S001".