# Python Data Structures: Lists, Tuples, Sets, and Dictionaries

In this notebook we’ll learn:
- What each built-in data structure is for
- How to create, access, and modify them
- Common methods and operations
- Practical tips & pitfalls
- Practice exercises

> “Data structure” refers to a way of representing values in a program. There are usually several useful structures for a problem, but some are more correct or efficient than others.

## 0) Quick Overview

- **List**: ordered, mutable collection — use for sequences you’ll edit.
- **Tuple**: ordered, immutable collection — use for fixed records or as dict keys.
- **Set**: unordered, unique elements — use to remove duplicates, membership tests, set algebra.
- **Dictionary**: key → value mapping — use to label data or store attributes.

We’ll also cover:
- Copying vs referencing
- Comprehensions
- When to use which

## 1) Lists

Use a **list** to store multiple values in one variable, with the ability to **add**, **remove**, and **access** items.

### Creation (examples from your notes)

In [1]:
students = ["santiaguito", "juliansito", "danilito"]
primes_up_to_20 = [2, 3, 5, 7, 11, 13, 17, 19]
negative_integers_greater_than_minus_5 = [-5, -4, -2, -1]
collection_of_different_types = [None, 3, "4", 0.15]
empty_list = []

students, primes_up_to_20, negative_integers_greater_than_minus_5, collection_of_different_types, empty_list


(['santiaguito', 'juliansito', 'danilito'],
 [2, 3, 5, 7, 11, 13, 17, 19],
 [-5, -4, -2, -1],
 [None, 3, '4', 0.15],
 [])

### Indexing & Slicing


In [3]:
numbers = [10, 20, 30, 40, 50]
numbers[0], numbers[-1], numbers[1:4], numbers[::-1]


(10, 50, [20, 30, 40], [50, 40, 30, 20, 10])

In [7]:
numbers[0:3], numbers[::2]

([10, 20, 30], [10, 30, 50])

### Common list methods

full list of methods: https://www.w3schools.com/python/python_lists_methods.asp

In [23]:
fruits = ['orange', 'apple', 'pear', 'banana', 'kiwi', 'apple', 'banana']
print(fruits)
print(fruits.count('apple'))      # 2
print(fruits.count('tangerine'))  # 0
print(fruits.index('banana'))     # 3
print(fruits.index('banana', 4))  # 6 (search starting at position 4)

fruits.reverse()
print(fruits)

fruits.append('grape')
print(fruits)

fruits.sort()
print(fruits)

last = fruits.pop()
print(last, fruits)

thislist = ["apple", "banana", "cherry"]
thislist.remove("banana")
print(thislist)

thislist = ["apple", "banana", "cherry", "banana", "kiwi"]
thislist.remove("banana")
print(thislist)


['orange', 'apple', 'pear', 'banana', 'kiwi', 'apple', 'banana']
2
0
3
6
['banana', 'apple', 'kiwi', 'banana', 'pear', 'apple', 'orange']
['banana', 'apple', 'kiwi', 'banana', 'pear', 'apple', 'orange', 'grape']
['apple', 'apple', 'banana', 'banana', 'grape', 'kiwi', 'orange', 'pear']
pear ['apple', 'apple', 'banana', 'banana', 'grape', 'kiwi', 'orange']
['apple', 'cherry']
['apple', 'cherry', 'banana', 'kiwi']


In [24]:
thislist = ["apple", "banana", "cherry"]
del thislist[0]
print(thislist)

['banana', 'cherry']


In [26]:
thislist = ["apple", "banana", "cherry"]
del thislist
print(thislist)

NameError: name 'thislist' is not defined

In [27]:
thislist = ["apple", "banana", "cherry"]
thislist.clear()
print(thislist)

[]


### Join Lists

In [28]:
list1 = ["a", "b", "c"]
list2 = [1, 2, 3]

list3 = list1 + list2
print(list3)

['a', 'b', 'c', 1, 2, 3]


In [9]:
list1 = ["a", "b" , "c"]
list2 = [1, 2, 3]

for x in list2:
    list1.append(x)

print(list1)

['a', 'b', 'c', 1, 2, 3]


In [30]:
list1 = ["a", "b" , "c"]
list2 = [1, 2, 3]

list1.extend(list2)
print(list1)

['a', 'b', 'c', 1, 2, 3]


### Mutability: assigning and replacing items


In [6]:
fruits = ['orange', 'apple', 'kiwi', 'banana', 'pear', 'apple', 'banana']
fruits[2] = 456    # Lists can be mutated
fruits

['orange', 'apple', 456, 'banana', 'pear', 'apple', 'banana']

### Sorting & copying

In [17]:
scores = [4, 2, 9, 1, 9]
scores_sorted = sorted(scores)      # returns new list
scores.sort(reverse=True)           # in-place sort, descending
scores, scores_sorted


([9, 9, 4, 2, 1], [1, 2, 4, 9, 9])

### List Comprehension

In [1]:
notes = ["A", "B", "C", "D", "E", "F", "G"]
lower = [n.lower() for n in notes]
lengths = [len(n) for n in notes]
filtered = [n for n in notes if n in ("A", "E")]
numbers = [1,2,3,4,5,6,7,8]
numbers_below_five = [n for n in numbers if n<5]
lower, lengths, filtered, numbers_below_five

(['a', 'b', 'c', 'd', 'e', 'f', 'g'],
 [1, 1, 1, 1, 1, 1, 1],
 ['A', 'E'],
 [1, 2, 3, 4])

## 2) Tuples

Almost like lists, but **immutable** (cannot be modified). Good for fixed records or to use as dictionary keys.


In [9]:
fruits = ('orange', 'apple', 'banana')
try:
    fruits[0] = 3
except TypeError as e:
    print("TypeError:", e)


TypeError: 'tuple' object does not support item assignment


In [10]:
# Creating tuples & unpacking
t = (440, "A4")
singleton_wrong = (5)     # this is just int 5
singleton_right = (5,)    # tuple with one element

track = ("MySong", 120, "C major")
name, bpm, key = track

type(singleton_wrong), singleton_right, type(singleton_right), name, bpm, key


(int, (5,), tuple, 'MySong', 120, 'C major')

## 3) Sets

Use a **set** to represent a mathematical set:
- **Unique** elements (duplicates removed)
- Fast membership checks
- Set algebra: union, intersection, difference, symmetric difference


In [11]:
# Examples from your notes (and expanded)
basket = {'apple', 'orange', 'apple', 'pear', 'orange', 'banana'}
print(basket)             # duplicates removed
print('orange' in basket) # True
print('crabgrass' in basket)  # False

basket.add(345)
print(basket)

basket.add("banana")      # already present; no change
print(basket)

try:
    basket.add(["una", "lista"])   # lists are unhashable → error
except TypeError as e:
    print("TypeError:", e)


{'apple', 'banana', 'orange', 'pear'}
True
False
{'apple', 'banana', 'orange', 345, 'pear'}
{'apple', 'banana', 'orange', 345, 'pear'}
TypeError: unhashable type: 'list'


In [12]:
# Set algebra (from your notes)
a = set('abracadabra')
b = set('alacazam')
print("a =", a)
print("b =", b)

print("a.isdisjoint(b) ->", a.isdisjoint(b))
print("a.issubset(b)  ->", a.issubset(b))
print("a ∪ b          ->", a.union(b))
print("a ∩ b          ->", a.intersection(b))
print("a − b          ->", a.difference(b))
print("a △ b          ->", a.symmetric_difference(b))


a = {'r', 'b', 'c', 'a', 'd'}
b = {'z', 'l', 'c', 'a', 'm'}
a.isdisjoint(b) -> False
a.issubset(b)  -> False
a ∪ b          -> {'r', 'z', 'b', 'l', 'c', 'a', 'd', 'm'}
a ∩ b          -> {'c', 'a'}
a − b          -> {'b', 'd', 'r'}
a △ b          -> {'r', 'l', 'z', 'b', 'd', 'm'}


In [13]:
# De-duplicating with sets
genres = ["rock", "pop", "rock", "jazz", "pop", "electronic"]
unique_genres = list(set(genres))
unique_genres


['pop', 'electronic', 'jazz', 'rock']

## 4) Dictionaries

## What is a Dictionary?

A **dictionary** stores data as **key → value** pairs.

- Keys are used to **identify** values
- Keys must be **hashable** (e.g. strings, integers, tuples)
- Values can be **any Python object**
- Dictionaries are **mutable**

Think of a dictionary like:
- A real dictionary (word → definition)
- A contact list (name → phone)
- A row in a table (column_name → value)



In [12]:
# Basic dictionary example
student = {
    "name": "Ana",
    "age": 22,
    "grade": 4.5,
    "passed": True
}

student


{'name': 'Ana', 'age': 22, 'grade': 4.5, 'passed': True}

## Accessing, Adding, Updating, Removing Entries

In [13]:
# Access by key
print(student["name"])
print(student["grade"])

Ana
4.5


In [14]:
# Safe access (avoids KeyError)
print(student.get("email"))
print(student.get("email", "not provided"))

None
not provided


In [15]:
# Add / update
print(student)
student["age"] = 23
student["email"] = "ana@example.com"
print(student)


{'name': 'Ana', 'age': 22, 'grade': 4.5, 'passed': True}
{'name': 'Ana', 'age': 23, 'grade': 4.5, 'passed': True, 'email': 'ana@example.com'}


In [16]:
# Remove
removed = student.pop("passed")
print(student)
print(removed)

{'name': 'Ana', 'age': 23, 'grade': 4.5, 'email': 'ana@example.com'}
True


## Iterating Over Dictionaries

Very common in data workflows.


In [17]:
# Iterate keys
for key in student:
    print(key)


name
age
grade
email


In [18]:
# Iterate values
for value in student.values():
    print(value)


Ana
23
4.5
ana@example.com


In [19]:
# Iterate key-value pairs
for key, value in student.items():
    print(f"{key} -> {value}")


name -> Ana
age -> 23
grade -> 4.5
email -> ana@example.com


## Lists of Dictionaries (Tables of Data)

A very common pattern.


In [23]:
events = [
    {"user_id": 1, "event": "login"},
    {"user_id": 2, "event": "purchase", "amount": 49.9},
    {"user_id": 1, "event": "logout"}
]

events


[{'user_id': 1, 'event': 'login'},
 {'user_id': 2, 'event': 'purchase', 'amount': 49.9},
 {'user_id': 1, 'event': 'logout'}]

This is conceptually similar to a table:

| user_id | event     | amount |
|--------:|-----------|--------|
| 1       | login     | null   |
| 2       | purchase  | 49.9   |
| 1       | logout    | null   |


## Dictionaries and JSON: A Direct Relationship

JSON (JavaScript Object Notation) maps **almost 1:1** to Python dictionaries.

| JSON         | Python |
|--------------|--------|
| object       | dict   |
| array        | list   |
| string       | str    |
| number       | int / float |
| true / false | True / False |
| null         | None   |


In [28]:
import json

# Python dict → JSON string
print(student)
json_text = json.dumps(student, indent=2)
print(json_text)

{'name': 'Ana', 'age': 23, 'grade': 4.5, 'email': 'ana@example.com'}
{
  "name": "Ana",
  "age": 23,
  "grade": 4.5,
  "email": "ana@example.com"
}


In [27]:
# JSON string → Python dict
parsed = json.loads(json_text)
print(parsed)
type(parsed)


{'name': 'Ana', 'age': 23, 'grade': 4.5, 'email': 'ana@example.com'}


dict

## Why JSON Is Extremely Important

JSON is everywhere because it is:
- Human-readable
- Language-independent
- Easy to serialize/deserialize
- Supported by almost every system

JSON is used in:
- REST APIs
- Kafka messages
- Configuration files
- Event tracking
- Cloud services


In [29]:
# Example: API-like JSON payload
api_payload = {
    "request_id": "abc-123",
    "status": "success",
    "data": {
        "user_id": 42,
        "roles": ["admin", "editor"]
    }
}

json.dumps(api_payload, indent=2)


'{\n  "request_id": "abc-123",\n  "status": "success",\n  "data": {\n    "user_id": 42,\n    "roles": [\n      "admin",\n      "editor"\n    ]\n  }\n}'

## Key Takeaways

- Dictionaries model **structured, labeled data**
- They are the foundation of JSON
- They appear everywhere in data workflows
- Mastering dictionaries = mastering data interchange
- Almost all modern data systems speak “dictionary”


## 5) Choosing the Right Structure

- **List**: sequence with order that you’ll modify (append/remove/sort).
- **Tuple**: fixed-size record; safe to pass around (can be keys).
- **Set**: unique items, membership tests, set operations.
- **Dict**: labeled data/attributes, quick lookups by key.

> Always consider correctness and efficiency for your use case.


## 6) Practice (Mini Exercises)

1) Lists  
- Create a list `temps = [18, 21, 19, 23, 22]`.  
- Append `20`, remove `19`, and print the reversed list.

2) Tuples  
- Make a tuple `album = ("Kind of Blue", 1959, "Miles Davis")`.  
- Unpack it into `title, year, artist` and print: `title (year) - artist`.

3) Sets  
- Given `tags = ["jazz", "live", "jazz", "trio", "improv", "live"]`, create a set of unique tags.  
- Check if `"fusion"` is in the set; add it if not.

4) Dicts  
- Create `user = {"name": "Vanessa", "role": "student"}`.  
- Add `"courses": ["Python", "SQL"]`, update `"role"` to `"analyst"`, and print keys & values.

5) Comprehensions  
- Build a list of `len(word)` for `["data", "science", "ai", "ml", "python"]`.


## 7) Challenge Exercises

6) Frequency Counter (Dict)  
- Given `words = ["a", "b", "a", "c", "b", "a"]`, build a dict of counts like `{"a": 3, "b": 2, "c": 1}` using a loop (no libraries).

7) Unique & Sorted (Set + List)  
- Given `emails = ["a@x.com","b@x.com","a@x.com","c@x.com"]`, build a **sorted** list of unique emails.

8) Nested Lookup (Dict of Dicts)  
- Create `catalog = {"guitar": {"price": 800, "stock": 5}, "piano": {"price": 5000, "stock": 2}}`.  
- Increase guitar stock by 2 and print the updated entry.

9) Safe Get (Dict)  
- Using `song = {"title": "Blue", "bpm": 92}`, print `song.get("key", "Unknown")` and then set `song["key"] = "D"` and print again.

10) De-dup with Order (List + Set)  
- From `names = ["Ana", "Luis", "Ana", "Sofia", "Luis", "Karla"]`, build a new list keeping **first occurrences only** and preserving order.
