# Appendix J — Python Foundations: A Deep Dive
## *Python for AI/ML: A Complete Learning Journey*

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/timothy-watt/python-for-ai-ml/blob/main/APP_J_Python_Foundations.ipynb)
&nbsp;&nbsp;[![Back to TOC](https://img.shields.io/badge/Back_to-Table_of_Contents-1B3A5C?style=flat-square)](https://colab.research.google.com/github/timothy-watt/python-for-ai-ml/blob/main/Python_for_AIML_TOC.ipynb)

---

**Who this appendix is for:** Learners who are brand new to programming, or who
want to solidify their Python foundations before tackling Chapter 1.
If you can already write a for loop, define a function, and read a Python
error message without panic — you can skip straight to Chapter 0.

**What this appendix covers:** The foundational Python concepts taught in books
like *Python Crash Course*, *Automate the Boring Stuff with Python*, and
*Learning Python* — condensed, practical, and applied to the kinds of data
you'll actually work with in the main chapters.

**What it deliberately omits:** Type hints, dataclasses, generators, decorators,
and OOP — those are covered in Chapters 1 and 2 once you're ready for them.

### Contents

| Section | Topic |
|---------|-------|
| J.1 | How Python works — the mental model |
| J.2 | Variables, types, and the object model |
| J.3 | Numbers, strings, and basic operators |
| J.4 | String methods and formatting |
| J.5 | Control flow — if, elif, else |
| J.6 | Loops — for, while, break, continue |
| J.7 | Functions |
| J.8 | Lists in depth |
| J.9 | Tuples and when to use them |
| J.10 | Dictionaries in depth |
| J.11 | Sets |
| J.12 | Mutability and the reference model |
| J.13 | Error handling |
| J.14 | Reading and writing files |
| J.15 | Putting it together — mini project |


---

## Setup

No third-party libraries needed for this appendix — pure Python throughout.


In [None]:
# This appendix uses only Python built-ins.
# If you see an ImportError anywhere, it means a cell ran out of order.
# Use Runtime → Restart and run all to reset.

import sys
print(f'Python {sys.version}')
print('Ready.')


---

## J.1 — How Python Works: The Mental Model

Before writing code, it helps to understand what Python actually *is* and
what happens when you run it.

**Python is an interpreted language.** Unlike C or Java, you don't compile
your code into a binary before running it. Python reads your code line by line,
translates it into instructions, and executes them immediately. This is why
you can open a Python shell and type one line at a time.

**A Jupyter notebook is a Python shell with extra powers.** Each cell is a
mini-program. When you press Shift+Enter, Python executes that cell and shows
you the result. Cells share memory — a variable defined in one cell is
available in all cells below it (as long as they've been run).

**Python is case-sensitive.** `salary`, `Salary`, and `SALARY` are three
different variables. `print` is a built-in function; `Print` is not.

**Indentation is syntax.** In Python, the indented block *is* the code block.
Four spaces (or one tab, but don't mix them) is the standard indent.
Getting this wrong causes an `IndentationError` — Python's way of saying
the structure of your code doesn't make sense.

**Comments start with `#`.** Python ignores everything from `#` to the end
of the line. Use them liberally — code is read far more often than it's written.


In [None]:
# J.1 -- Your first Python cells

# This is a comment. Python ignores it.
print('Hello, world!')          # prints to the output area below

# The last expression in a cell is automatically displayed
2 + 2


In [None]:
# J.1 -- Indentation matters

salary = 95000

if salary > 80000:
    print('High earner')        # this line is INSIDE the if block
    print('Well done!')         # so is this one
print('This always prints')     # this line is OUTSIDE — note no indent

# Try removing the indent from the second print and see what happens


---

## J.2 — Variables, Types, and the Object Model

**A variable is a name that refers to a value.**
When you write `salary = 95000`, you're telling Python:
*create an integer object with value 95000, and attach the name `salary` to it.*

This matters because Python variables are *references*, not boxes.
The variable doesn't contain the value — it points to it.
Section J.12 covers why this distinction is critical.

### Built-in types

| Type | Example | Description |
|------|---------|-------------|
| `int` | `42`, `-7`, `1_000_000` | Whole numbers (unlimited size in Python) |
| `float` | `3.14`, `-0.5`, `1e6` | Decimal numbers (64-bit) |
| `str` | `'hello'`, `"world"` | Text — immutable sequence of characters |
| `bool` | `True`, `False` | Logical values (a subtype of int: True==1, False==0) |
| `NoneType` | `None` | The absence of a value — Python's null |

Use `type()` to inspect any value's type. Use `isinstance()` to check it.


In [None]:
# J.2 -- Variables and types

respondent_id   = 12345          # int
years_exp       = 7.5            # float
country         = 'Germany'      # str
uses_python     = True           # bool
salary          = None           # NoneType -- not yet known

print(type(respondent_id))       # <class 'int'>
print(type(years_exp))           # <class 'float'>
print(type(country))             # <class 'str'>
print(type(uses_python))         # <class 'bool'>
print(type(salary))              # <class 'NoneType'>


In [None]:
# J.2 -- isinstance() is safer than type() for checks

salary = 95000

print(isinstance(salary, int))        # True
print(isinstance(salary, (int,float)))# True -- checks against a tuple of types
print(isinstance(salary, str))        # False

# Type conversion (casting)
salary_str  = str(salary)             # '95000'
salary_back = int(salary_str)         # 95000
salary_f    = float(salary)           # 95000.0

print(salary_str,  type(salary_str))
print(salary_back, type(salary_back))
print(salary_f,    type(salary_f))

# Careful with float precision
print(0.1 + 0.2)                      # 0.30000000000000004 -- floating point!
print(round(0.1 + 0.2, 2))            # 0.3 -- use round() when it matters


In [None]:
# J.2 -- Variable naming rules

# VALID names
salary = 95000
years_experience = 7
_private = 'convention: internal use'
MAX_SALARY = 500000   # convention: constants in ALL_CAPS

# Python naming conventions (PEP 8)
# variables and functions: snake_case
# classes:                 PascalCase
# constants:               ALL_CAPS
# 'private' internals:     _leading_underscore

# Multiple assignment
x = y = z = 0
a, b, c = 1, 2, 3          # tuple unpacking
first, *rest = [10, 20, 30, 40]   # starred assignment

print(f'x={x}, y={y}, z={z}')
print(f'a={a}, b={b}, c={c}')
print(f'first={first}, rest={rest}')


---

## J.3 — Numbers and Operators

Python has a full set of arithmetic, comparison, and logical operators.
Understanding operator precedence (which operations happen first) saves
a lot of debugging time.

### Arithmetic operators

| Operator | Meaning | Example | Result |
|----------|---------|---------|--------|
| `+` | Addition | `3 + 2` | `5` |
| `-` | Subtraction | `10 - 3` | `7` |
| `*` | Multiplication | `4 * 5` | `20` |
| `/` | Division (always float) | `7 / 2` | `3.5` |
| `//` | Floor division | `7 // 2` | `3` |
| `%` | Modulo (remainder) | `7 % 2` | `1` |
| `**` | Exponentiation | `2 ** 8` | `256` |

### Comparison operators (return `True` or `False`)

| Operator | Meaning |
|----------|----------|
| `==` | Equal to |
| `!=` | Not equal to |
| `<`, `>` | Less than, greater than |
| `<=`, `>=` | Less/greater than or equal |

### Logical operators

| Operator | Meaning | Short-circuits? |
|----------|---------|----------------|
| `and` | Both must be True | Yes — stops at first False |
| `or` | At least one True | Yes — stops at first True |
| `not` | Inverts the boolean | No |


In [None]:
# J.3 -- Arithmetic operators in practice

salary      = 95000
tax_rate    = 0.25
bonus_pct   = 0.10

take_home   = salary * (1 - tax_rate)
bonus       = salary * bonus_pct
total       = take_home + bonus

print(f'Gross salary:  ${salary:>10,.2f}')
print(f'Take-home:     ${take_home:>10,.2f}')
print(f'Bonus:         ${bonus:>10,.2f}')
print(f'Total:         ${total:>10,.2f}')

# Useful number operations
print()
print(f'7 / 2  = {7 / 2}')    # 3.5  -- true division
print(f'7 // 2 = {7 // 2}')   # 3    -- floor division (discard remainder)
print(f'7 % 2  = {7 % 2}')    # 1    -- remainder
print(f'2**10  = {2**10}')     # 1024 -- two to the power ten

# Augmented assignment
years = 5
years += 1     # same as: years = years + 1
years *= 2     # same as: years = years * 2
print(f'years = {years}')  # 12


In [None]:
# J.3 -- Comparisons and logical operators

salary     = 95000
years_exp  = 7
uses_python = True

# Comparisons
print(salary > 80000)               # True
print(salary == 100000)             # False
print(salary != 100000)             # True
print(50000 <= salary <= 150000)    # True -- Python allows chained comparisons!

# Logical operators
print(salary > 80000 and uses_python)       # True and True  -> True
print(salary > 80000 and years_exp > 10)    # True and False -> False
print(salary < 50000 or uses_python)        # False or True  -> True
print(not uses_python)                      # not True       -> False

# Short-circuit evaluation -- useful for guarding against None
name = None
display = name or 'Anonymous'    # name is falsy, so evaluates to 'Anonymous'
print(display)

# Truthy and falsy values
# Falsy: None, 0, 0.0, '', [], {}, set()
# Everything else is truthy
for val in [None, 0, '', [], 42, 'hello', [1,2]]:
    print(f'  bool({str(val):<10}) = {bool(val)}')


---

## J.4 — String Methods and Formatting

Strings are **immutable sequences of characters**. You can't change a character
in place — string methods always return a *new* string.

Python offers three main ways to format strings:
- **f-strings** (Python 3.6+): `f'Hello {name}'` — the modern standard
- **`.format()`**: `'Hello {}'.format(name)` — still common in older code
- **`%` formatting**: `'Hello %s' % name` — legacy, avoid in new code

F-strings are the right choice for all new code.


In [None]:
# J.4 -- String basics: creation, indexing, slicing

job_title = 'Senior Data Scientist'

# Length
print(len(job_title))         # 21

# Indexing (zero-based)
print(job_title[0])           # 'S'  -- first character
print(job_title[-1])          # 't'  -- last character
print(job_title[-9:])         # 'Scientist' -- last 9 characters

# Slicing: [start:stop:step]  (stop is exclusive)
print(job_title[0:6])         # 'Senior'
print(job_title[7:11])        # 'Data'
print(job_title[::2])         # every second character
print(job_title[::-1])        # reversed string

# Strings are immutable -- this would raise a TypeError:
# job_title[0] = 'J'  # TypeError: 'str' object does not support item assignment


In [None]:
# J.4 -- The most useful string methods

raw = '   Senior Data Scientist   '

print(raw.strip())            # remove leading/trailing whitespace
print(raw.strip().lower())    # lowercase
print(raw.strip().upper())    # uppercase
print(raw.strip().title())    # Title Case

sentence = 'python,java,sql,rust'
langs = sentence.split(',')   # split on delimiter -> list
print(langs)                  # ['python', 'java', 'sql', 'rust']
print(', '.join(langs))       # join list back to string with separator

title = 'Senior Data Scientist'
print(title.replace('Senior', 'Staff'))   # returns new string
print(title.startswith('Senior'))         # True
print(title.endswith('Engineer'))         # False
print('Data' in title)                    # True -- membership test
print(title.count('e'))                   # count occurrences
print(title.find('Data'))                 # index of first occurrence (or -1)


In [None]:
# J.4 -- F-strings: the modern way to format output

name        = 'Aisha'
salary      = 128500.75
years       = 8
uses_python = True

# Basic interpolation
print(f'Name: {name}')
print(f'{name} has {years} years of experience.')

# Format specifiers inside {}
print(f'Salary:    ${salary:,.2f}')     # comma separator, 2 decimal places
print(f'Salary:    ${salary:>12,.0f}')  # right-aligned in 12-char field
print(f'Years:     {years:03d}')        # zero-padded integer
print(f'Python:    {str(uses_python):<8}')  # left-aligned in 8-char field

# Expressions inside f-strings
print(f'Monthly:   ${salary / 12:,.2f}')
print(f'Senior:    {years >= 5}')

# Multi-line f-strings
summary = (
    f'--- Respondent Profile ---\n'
    f'  Name:    {name}\n'
    f'  Salary:  ${salary:,.0f}\n'
    f'  Exp:     {years} years\n'
)
print(summary)


---

## J.5 — Control Flow: if, elif, else

Control flow lets your program make decisions. The `if` statement evaluates
a condition and runs a block of code only if it's `True`.

```
if condition:
    # runs when condition is True
elif other_condition:
    # runs when first was False but this is True
else:
    # runs when all conditions above were False
```

You can have as many `elif` branches as you need, but at most one `else`.
Python evaluates them top to bottom and runs the **first** branch that matches.


In [None]:
# J.5 -- Basic if/elif/else

salary = 112000

if salary >= 150000:
    tier = 'Top earner'
elif salary >= 100000:
    tier = 'High earner'
elif salary >= 60000:
    tier = 'Mid earner'
else:
    tier = 'Entry level'

print(f'${salary:,} → {tier}')

# Nested conditions
years_exp   = 9
uses_python = True

if salary > 100000:
    if years_exp >= 5 and uses_python:
        print('Likely senior Python role')
    else:
        print('High salary, mixed profile')
else:
    print('Mid/junior profile')


In [None]:
# J.5 -- The ternary (one-line) conditional expression

salary = 95000

# Full form
if salary >= 100000:
    label = 'high'
else:
    label = 'standard'

# Ternary equivalent -- use for simple assignments only
label = 'high' if salary >= 100000 else 'standard'
print(label)

# Practical: classify a list of salaries in one expression
salaries = [45000, 78000, 125000, 62000, 200000]
tiers = ['high' if s >= 100000 else 'mid' if s >= 60000 else 'low'
         for s in salaries]
print(list(zip(salaries, tiers)))


---

## J.6 — Loops: for, while, break, continue

Loops let you repeat code. Python has two kinds:

- **`for` loops** — iterate over a sequence (list, string, range, etc.)
  Use when you know what you're iterating over.
- **`while` loops** — repeat while a condition is True.
  Use when you don't know in advance how many iterations you need.

Two important loop control keywords:
- **`break`** — exit the loop immediately
- **`continue`** — skip the rest of the current iteration and go to the next


In [None]:
# J.6 -- for loops: iterating sequences

languages = ['Python', 'SQL', 'JavaScript', 'Rust', 'Go']

# Basic iteration
for lang in languages:
    print(lang)

print()

# range() generates a sequence of numbers
for i in range(5):          # 0, 1, 2, 3, 4
    print(i, end=' ')
print()

for i in range(2, 10, 2):   # start, stop, step
    print(i, end=' ')
print()

# enumerate() gives you index AND value
for i, lang in enumerate(languages):
    print(f'  {i}: {lang}')


In [None]:
# J.6 -- for loops: useful built-in helpers

names   = ['Alice', 'Ben', 'Caro']
salaries = [95000, 112000, 87500]

# zip() pairs up two sequences
for name, sal in zip(names, salaries):
    print(f'{name:<8} ${sal:>10,}')

print()

# Nested loops -- iterate over rows and columns
skills = ['Python', 'SQL']
levels = ['beginner', 'advanced']

for skill in skills:
    for level in levels:
        print(f'  {skill} / {level}')


In [None]:
# J.6 -- while loops, break, continue

# while loop: keep going until condition is False
balance = 1000
year    = 0
while balance < 2000:
    balance *= 1.07     # 7% annual growth
    year += 1
print(f'Doubled in {year} years (balance: ${balance:,.2f})')

print()

# break -- exit immediately when found
salaries = [45000, 78000, 130000, 92000, 200000]
for sal in salaries:
    if sal > 100000:
        print(f'First salary over $100k: ${sal:,}')
        break

# continue -- skip invalid entries
raw_salaries = [45000, -1, 78000, None, 130000]
valid = []
for sal in raw_salaries:
    if sal is None or sal <= 0:
        continue              # skip bad data, go to next iteration
    valid.append(sal)
print(f'Valid salaries: {valid}')

# else clause on loops -- runs if loop completed WITHOUT a break
target = 250000
for sal in salaries:
    if sal == target:
        print(f'Found {target}')
        break
else:
    print(f'{target} not found in list')


---

## J.7 — Functions

A function is a named, reusable block of code. You define it once and call it
as many times as you need. Good functions:

- Do **one thing** and do it well
- Have a clear name that says what they do (a verb is usually right)
- Accept inputs as **parameters** and return outputs with `return`
- Have a docstring explaining what they do

If a function doesn't have a `return` statement, it implicitly returns `None`.


In [None]:
# J.7 -- Defining and calling functions

def greet(name):
    """Return a personalised greeting string."""
    return f'Hello, {name}!'

print(greet('Aisha'))
print(greet('Ben'))

# Functions with multiple parameters
def salary_after_tax(gross, tax_rate):
    """Return take-home pay after applying tax_rate (0.0–1.0)."""
    return gross * (1 - tax_rate)

print(f'${salary_after_tax(95000, 0.25):,.2f}')
print(f'${salary_after_tax(130000, 0.30):,.2f}')

# Calling with keyword arguments (order doesn't matter)
print(f'${salary_after_tax(tax_rate=0.28, gross=100000):,.2f}')


In [None]:
# J.7 -- Default parameter values

def salary_after_tax(gross, tax_rate=0.25, include_bonus=False, bonus_pct=0.10):
    """Return take-home pay with optional bonus."""
    net = gross * (1 - tax_rate)
    if include_bonus:
        net += gross * bonus_pct
    return net

# Call with only required argument -- defaults fill the rest
print(f'${salary_after_tax(95000):,.2f}')                        # default tax
print(f'${salary_after_tax(95000, tax_rate=0.30):,.2f}')         # custom tax
print(f'${salary_after_tax(95000, include_bonus=True):,.2f}')    # with bonus

# IMPORTANT: never use a mutable default (list, dict) -- it's a classic bug
# BAD:  def add_lang(lang, langs=[]):  langs.append(lang); return langs
# GOOD: def add_lang(lang, langs=None):
#           if langs is None: langs = []
#           langs.append(lang); return langs


In [None]:
# J.7 -- Multiple return values, *args, **kwargs

def salary_stats(salaries):
    """Return (min, max, mean) of a salary list."""
    if not salaries:
        return None, None, None
    return min(salaries), max(salaries), sum(salaries) / len(salaries)

data = [45000, 95000, 130000, 78000, 200000]
low, high, avg = salary_stats(data)   # unpack the tuple
print(f'Min: ${low:,}  Max: ${high:,}  Mean: ${avg:,.0f}')

# *args -- accept any number of positional arguments
def total(*amounts):
    """Sum any number of values."""
    return sum(amounts)

print(total(100, 200, 300))           # 600
print(total(1, 2, 3, 4, 5, 6))       # 21

# **kwargs -- accept any number of keyword arguments
def display_profile(**fields):
    """Print any named fields."""
    for key, value in fields.items():
        print(f'  {key}: {value}')

display_profile(name='Aisha', salary=128500, country='Germany')


In [None]:
# J.7 -- Variable scope: local vs global

TAX_RATE = 0.25   # module-level (global) constant

def compute_tax(gross):
    # TAX_RATE is readable inside the function (enclosing scope)
    tax = gross * TAX_RATE    # 'tax' is LOCAL -- only exists inside this function
    return tax

print(compute_tax(100000))    # 25000.0
# print(tax)  # NameError: 'tax' is not defined -- it doesn't exist out here

# The golden rule: functions should receive data via parameters and
# return results -- don't rely on or modify globals inside functions.

# Lambda functions -- one-expression anonymous functions
double = lambda x: x * 2
is_high_earner = lambda sal: sal > 100000

print(double(5))
print(is_high_earner(95000))
print(sorted([3, 1, 4, 1, 5], key=lambda x: -x))  # sort descending


---

## J.8 — Lists in Depth

A list is an **ordered, mutable** sequence of items. Items can be of any type,
and you can mix types in one list (though it's rarely a good idea).

Lists are the workhorse data structure in Python. Understanding them deeply —
especially the difference between methods that *modify in place* and those
that *return a new list* — prevents a whole class of bugs.


In [None]:
# J.8 -- Creating and accessing lists

languages  = ['Python', 'SQL', 'JavaScript', 'Rust']
salaries   = [95000, 112000, 87500, 145000, 62000]
mixed      = [42, 'hello', True, None, 3.14]  # valid but usually avoid

# Indexing and slicing (same as strings)
print(languages[0])         # 'Python'
print(languages[-1])        # 'Rust'
print(salaries[1:3])        # [112000, 87500] -- indices 1 and 2
print(salaries[::-1])       # reversed copy

# Length and membership
print(len(languages))       # 4
print('SQL' in languages)   # True
print('Go' not in languages)# True

# Lists are mutable -- you CAN change elements in place
languages[2] = 'TypeScript'
print(languages)


In [None]:
# J.8 -- Methods that MODIFY the list in place (return None)

langs = ['Python', 'SQL']

langs.append('Rust')           # add one item to the end
print('after append:', langs)

langs.extend(['Go', 'Java'])   # add multiple items
print('after extend:', langs)

langs.insert(1, 'JavaScript')  # insert at specific index
print('after insert:', langs)

langs.remove('Java')           # remove first occurrence by value
print('after remove:', langs)

popped = langs.pop()           # remove and return last item
print(f'popped: {popped}, list: {langs}')

popped_idx = langs.pop(1)      # remove and return item at index
print(f'popped index 1: {popped_idx}, list: {langs}')

langs.sort()                   # sort in place (alphabetical for strings)
print('after sort:', langs)

langs.reverse()                # reverse in place
print('after reverse:', langs)

langs.clear()                  # remove all items
print('after clear:', langs)


In [None]:
# J.8 -- Methods that RETURN a new value (don't modify the original)

salaries = [95000, 45000, 130000, 78000, 200000, 62000]

# sorted() returns a NEW sorted list -- original unchanged
sorted_sals = sorted(salaries)
sorted_desc = sorted(salaries, reverse=True)
print('Original:   ', salaries)
print('Sorted asc: ', sorted_sals)
print('Sorted desc:', sorted_desc)

# list() creates a shallow copy
copy = list(salaries)
copy.append(999)
print('Original unchanged:', salaries)   # True -- copy is independent

# Other useful functions
print(f'min:   {min(salaries):,}')
print(f'max:   {max(salaries):,}')
print(f'sum:   {sum(salaries):,}')
print(f'len:   {len(salaries)}')
print(f'mean:  {sum(salaries)/len(salaries):,.0f}')
print(f'index of max: {salaries.index(max(salaries))}')


In [None]:
# J.8 -- List comprehensions: the Pythonic way to build lists

salaries = [95000, 45000, 130000, 78000, 200000, 62000]

# Traditional loop approach
high_earners_loop = []
for s in salaries:
    if s > 100000:
        high_earners_loop.append(s)

# List comprehension -- reads like English: 'give me s for each s in salaries IF s > 100000'
high_earners = [s for s in salaries if s > 100000]

# Transformation: convert to $k
in_thousands = [s / 1000 for s in salaries]

# Transformation + filter
high_k = [s / 1000 for s in salaries if s > 100000]

print('High earners:    ', high_earners)
print('In thousands:    ', in_thousands)
print('High in $k:      ', high_k)

# Nested comprehension -- flatten a list of lists
survey_batches = [[95000, 78000], [130000, 62000], [200000, 45000]]
all_salaries   = [s for batch in survey_batches for s in batch]
print('Flattened:       ', all_salaries)


---

## J.9 — Tuples and When to Use Them

A tuple is an **ordered, immutable** sequence. Once created, it cannot be changed.
Use tuples when the data should not change: coordinates, RGB colours,
database records, or anything you want to use as a dictionary key.

The comma makes the tuple, not the parentheses:
`x = 1,` is a tuple; `x = (1)` is just an integer in parentheses.


In [None]:
# J.9 -- Tuples: creation, access, unpacking

# Creation
point       = (3.5, 7.2)            # coordinates
rgb_blue    = (0, 0, 255)           # colour
respondent  = ('Aisha', 'Germany', 128500)  # record
single      = (42,)                 # single-element tuple -- note the comma!
also_tuple  = 1, 2, 3               # parentheses optional

# Access -- same as lists
print(respondent[0])     # 'Aisha'
print(respondent[-1])    # 128500

# Unpacking
name, country, salary = respondent
print(f'{name} in {country} earns ${salary:,}')

# Swap two variables elegantly with tuple unpacking
a, b = 10, 20
a, b = b, a
print(f'a={a}, b={b}')   # a=20, b=10

# Tuples are hashable -- can be used as dict keys (lists cannot)
location_salary = {('US', 'New York'): 145000, ('Germany', 'Berlin'): 82000}
print(location_salary[('US', 'New York')])

# Named tuples -- tuple with named fields (no extra libraries needed)
from collections import namedtuple
Respondent = namedtuple('Respondent', ['name', 'country', 'salary'])
r = Respondent('Ben', 'UK', 98000)
print(f'{r.name} ({r.country}): ${r.salary:,}')


---

## J.10 — Dictionaries in Depth

A dictionary stores **key → value** mappings. It's Python's most versatile
data structure. Keys must be hashable (strings, numbers, tuples);
values can be anything.

As of Python 3.7+, dictionaries maintain insertion order.

Dictionaries are used everywhere in data work: to represent a row of data,
store configuration, count occurrences, group items, and cache results.


In [None]:
# J.10 -- Creating and accessing dictionaries

respondent = {
    'name':        'Aisha',
    'country':     'Germany',
    'salary':      128500,
    'years_exp':   9,
    'uses_python': True,
    'languages':   ['Python', 'SQL', 'R'],
}

# Access by key
print(respondent['name'])        # 'Aisha'
print(respondent['salary'])      # 128500

# KeyError if key doesn't exist:
# print(respondent['age'])  # KeyError: 'age'

# Safe access with .get() -- returns None (or a default) if missing
print(respondent.get('age'))           # None
print(respondent.get('age', 'unknown'))# 'unknown'

# Check membership
print('salary' in respondent)     # True
print('age' in respondent)        # False

# Modify and add
respondent['salary'] = 135000     # update existing
respondent['remote'] = True       # add new key
print(respondent['salary'])


In [None]:
# J.10 -- Iterating over dictionaries

respondent = {'name': 'Aisha', 'country': 'Germany', 'salary': 128500}

# Keys (default iteration)
for key in respondent:
    print(key)

print()

# Values
for value in respondent.values():
    print(value)

print()

# Key-value pairs -- the most common pattern
for key, value in respondent.items():
    print(f'  {key:<15} {value}')

print()

# Dictionary comprehension
salaries = {'Alice': 95000, 'Ben': 112000, 'Caro': 78000, 'Dan': 145000}
high_earners = {name: sal for name, sal in salaries.items() if sal > 100000}
in_thousands = {name: sal/1000 for name, sal in salaries.items()}
print('High earners:', high_earners)
print('In $k:       ', in_thousands)


In [None]:
# J.10 -- Useful dictionary patterns

# Counting occurrences
countries = ['US', 'India', 'Germany', 'US', 'UK', 'India', 'US', 'Germany']

counts = {}
for c in countries:
    counts[c] = counts.get(c, 0) + 1    # .get with default 0 avoids KeyError
print('Counts:', counts)

# Grouping items
data = [('Alice','US',95000), ('Ben','UK',112000),
        ('Caro','US',78000),  ('Dan','UK',145000)]

by_country = {}
for name, country, salary in data:
    if country not in by_country:
        by_country[country] = []
    by_country[country].append(salary)
print('By country:', by_country)

# Merging dicts (Python 3.9+)
defaults = {'tax_rate': 0.25, 'currency': 'USD', 'remote': False}
overrides = {'tax_rate': 0.30, 'remote': True}
merged = defaults | overrides    # overrides wins on conflicts
print('Merged:', merged)

# .setdefault() -- set a key only if it doesn't exist
config = {'model': 'GBM'}
config.setdefault('n_estimators', 100)
config.setdefault('model', 'RF')    # won't overwrite -- already set
print('Config:', config)


---

## J.11 — Sets

A set is an **unordered collection of unique items**. Duplicates are automatically
removed. Sets are useful for:
- Removing duplicates from a list
- Fast membership testing (much faster than lists for large data)
- Set operations: union, intersection, difference

Sets are mutable but their elements must be hashable
(strings, numbers, tuples — not lists or dicts).


In [None]:
# J.11 -- Sets: creation and basic operations

# From a list -- duplicates removed automatically
languages_raw  = ['Python', 'SQL', 'Python', 'Java', 'SQL', 'Python']
languages_set  = set(languages_raw)
print(languages_set)            # {'Python', 'SQL', 'Java'} -- order not guaranteed
print(len(languages_set))       # 3

# Literal syntax
team_a = {'Python', 'SQL', 'Spark', 'Scala'}
team_b = {'Python', 'R', 'SQL', 'Tableau'}

# Membership test -- O(1) even for huge sets
print('Python' in team_a)       # True
print('Go' in team_a)           # False

# Set operations
print('Union (all skills):      ', team_a | team_b)
print('Intersection (shared):  ', team_a & team_b)
print('Difference (a not b):   ', team_a - team_b)
print('Symmetric diff (unique):', team_a ^ team_b)

# Subset / superset
ml_core = {'Python', 'SQL'}
print('ml_core subset of team_a:', ml_core <= team_a)   # True
print('team_a superset of ml_core:', team_a >= ml_core) # True


---

## J.12 — Mutability and the Reference Model

This is the concept that trips up more Python beginners than any other.
Understanding it will save you hours of debugging.

**In Python, a variable is a name that points to an object in memory.**
When you write `b = a`, you're not copying the value — you're creating
a second name that points to the *same object*.

- **Immutable types** (int, float, str, tuple, bool): can't be changed in place.
  When you do `x = x + 1`, Python creates a *new* integer object.
  Multiple names pointing to the same immutable object is safe — it can't change.

- **Mutable types** (list, dict, set): *can* be changed in place.
  If two names point to the same list and one modifies it, *both names see the change*.


In [None]:
# J.12 -- The reference model: immutable types are safe

a = 42
b = a           # b points to the same integer object as a
b = b + 1       # creates a NEW integer (43) and attaches b to it
print(f'a={a}, b={b}')   # a=42, b=43 -- a is unchanged

# Same with strings
s1 = 'hello'
s2 = s1
s2 = s2.upper()           # creates a new string 'HELLO'
print(f's1={s1!r}, s2={s2!r}')  # s1 unchanged

# id() shows the memory address of an object
x = 100
y = x
print(f'id(x)={id(x)}, id(y)={id(y)}, same object: {x is y}')
y = 200
print(f'after y=200: id(x)={id(x)}, id(y)={id(y)}, same object: {x is y}')


In [None]:
# J.12 -- Mutable types: the alias trap

# THE BUG: two names, one list
salaries_a = [95000, 112000, 78000]
salaries_b = salaries_a       # b is an ALIAS -- same object!

salaries_b.append(200000)     # modifies the shared list
print('salaries_a:', salaries_a)  # [95000, 112000, 78000, 200000] -- changed!
print('salaries_b:', salaries_b)  # [95000, 112000, 78000, 200000]
print('Same object?', salaries_a is salaries_b)  # True

print()

# THE FIX: make an explicit copy
salaries_a = [95000, 112000, 78000]
salaries_c = salaries_a.copy()    # option 1: .copy()
salaries_d = list(salaries_a)     # option 2: list()
salaries_e = salaries_a[:]        # option 3: full slice

salaries_c.append(200000)
print('salaries_a after modifying salaries_c:', salaries_a)  # unchanged
print('Same object?', salaries_a is salaries_c)               # False


In [None]:
# J.12 -- Functions and mutability: the parameter trap

# Functions receive references -- modifying a mutable argument modifies the original
def add_bonus_salary(salary_list, bonus):
    salary_list.append(bonus)    # modifies the ORIGINAL list
    return salary_list

my_salaries = [95000, 112000]
result = add_bonus_salary(my_salaries, 200000)
print('my_salaries:', my_salaries)  # [95000, 112000, 200000] -- modified!

print()

# Safe version: work on a copy inside the function
def add_bonus_salary_safe(salary_list, bonus):
    new_list = salary_list.copy()   # work on a copy
    new_list.append(bonus)
    return new_list

my_salaries = [95000, 112000]
result = add_bonus_salary_safe(my_salaries, 200000)
print('my_salaries:  ', my_salaries)   # unchanged
print('result:       ', result)        # new list with bonus

# Deep copy: for nested structures (list of lists, list of dicts)
import copy
nested = [[1, 2], [3, 4]]
shallow = nested.copy()       # copies outer list, but inner lists are shared!
deep    = copy.deepcopy(nested)  # fully independent copy

shallow[0].append(99)
print('nested after shallow copy modification:', nested)   # inner list changed!
deep[0].append(99)
print('nested after deep copy modification:   ', nested)   # unchanged


---

## J.13 — Error Handling

Errors in Python come in two kinds:

- **Syntax errors** — detected before code runs. Python can't parse your code.
  Fix the typo and try again.
- **Exceptions** — occur while code runs. The code is syntactically valid but
  something goes wrong at runtime (file not found, wrong type, division by zero).

The `try/except` block lets you catch exceptions and handle them gracefully
instead of crashing.

### Common exceptions

| Exception | Typical cause |
|-----------|---------------|
| `TypeError` | Wrong type — `'2' + 2` |
| `ValueError` | Right type, wrong value — `int('hello')` |
| `KeyError` | Dict key doesn't exist — `d['missing']` |
| `IndexError` | List index out of range — `lst[100]` |
| `AttributeError` | Object has no such attribute — `None.upper()` |
| `FileNotFoundError` | File doesn't exist |
| `ZeroDivisionError` | Dividing by zero |
| `ImportError` | Module not found |


In [None]:
# J.13 -- try/except basics

# Without error handling -- crashes on bad input
# int('hello')  # ValueError

# Basic try/except
def safe_to_int(value):
    try:
        return int(value)
    except ValueError:
        return None

print(safe_to_int('42'))      # 42
print(safe_to_int('hello'))   # None
print(safe_to_int(3.9))       # 3

# Catching multiple exception types
raw_salaries = ['95000', 'N/A', None, '130000', '']

valid = []
for raw in raw_salaries:
    try:
        valid.append(int(raw))
    except (TypeError, ValueError):
        pass    # silently skip unparseable values

print('Valid salaries:', valid)


In [None]:
# J.13 -- else, finally, and raising exceptions

def load_survey_row(row: dict) -> dict:
    """
    Parse a raw survey row dict.
    Raises ValueError if required fields are missing or invalid.
    """
    try:
        salary = float(row['salary'])
        if salary <= 0:
            raise ValueError(f'Salary must be positive, got {salary}')
        years  = int(row['years_exp'])
    except KeyError as e:
        raise ValueError(f'Missing required field: {e}') from e
    except (TypeError, ValueError) as e:
        raise ValueError(f'Invalid field value: {e}') from e
    else:
        # runs ONLY if no exception was raised in try
        return {'salary': salary, 'years_exp': years}
    finally:
        # runs ALWAYS -- use for cleanup (closing files, DB connections)
        pass  # nothing to clean up here

# Test it
good = {'salary': '95000', 'years_exp': '7'}
bad1 = {'salary': 'N/A',   'years_exp': '7'}
bad2 = {'years_exp': '7'}   # missing salary

print(load_survey_row(good))

for bad in [bad1, bad2]:
    try:
        load_survey_row(bad)
    except ValueError as e:
        print(f'Error: {e}')


---

## J.14 — Reading and Writing Files

Python makes file I/O straightforward. The `open()` function returns a file
object. Always use the `with` statement — it guarantees the file is closed
even if an error occurs inside the block.

```python
with open('filename.txt', mode) as f:
    # do something with f
# file is automatically closed here
```

### File modes

| Mode | Meaning |
|------|----------|
| `'r'` | Read (default). File must exist. |
| `'w'` | Write. Creates file, **overwrites** if exists. |
| `'a'` | Append. Creates file, adds to end if exists. |
| `'x'` | Exclusive create. Fails if file already exists. |
| `'rb'`, `'wb'` | Binary read/write (for non-text files). |


In [None]:
# J.14 -- Writing and reading text files

import os

# Write a simple CSV manually
rows = [
    'name,country,salary,years_exp',
    'Aisha,Germany,128500,9',
    'Ben,UK,98000,5',
    'Caro,US,145000,12',
    'Dan,India,32000,3',
]

filepath = '/tmp/survey_sample.csv'

with open(filepath, 'w') as f:
    for row in rows:
        f.write(row + '\n')    # \n is the newline character

print(f'Wrote {len(rows)} lines to {filepath}')

# Read it back
with open(filepath, 'r') as f:
    content = f.read()          # read entire file as one string

print('--- File contents ---')
print(content)


In [None]:
# J.14 -- Reading line by line and parsing CSV

filepath = '/tmp/survey_sample.csv'

# Read line by line (memory-efficient for large files)
with open(filepath, 'r') as f:
    header = f.readline().strip()      # first line is header
    columns = header.split(',')
    print('Columns:', columns)
    print()
    for line in f:                     # iterate remaining lines
        values = line.strip().split(',')
        row = dict(zip(columns, values))
        print(row)

print()

# Using the csv module -- handles quoted fields, special characters
import csv

with open(filepath, 'r', newline='') as f:
    reader = csv.DictReader(f)         # each row is a dict automatically
    for row in reader:
        salary = int(row['salary'])
        print(f"{row['name']:<8} ${salary:>10,}")


In [None]:
# J.14 -- File operations: check existence, paths, cleanup

import os

filepath = '/tmp/survey_sample.csv'

# Check if a file exists before opening
if os.path.exists(filepath):
    size_bytes = os.path.getsize(filepath)
    print(f'{filepath} exists ({size_bytes} bytes)')
else:
    print(f'{filepath} not found')

# Path manipulation
dirname   = os.path.dirname(filepath)
basename  = os.path.basename(filepath)
name, ext = os.path.splitext(basename)
print(f'dir={dirname!r}  base={basename!r}  name={name!r}  ext={ext!r}')

# Safe file loading pattern
def load_csv_safe(filepath: str) -> list[dict]:
    """Load CSV as list of dicts; return empty list if file missing."""
    if not os.path.exists(filepath):
        print(f'Warning: {filepath} not found, returning empty list')
        return []
    with open(filepath, 'r', newline='') as f:
        return list(csv.DictReader(f))

data = load_csv_safe(filepath)
print(f'Loaded {len(data)} rows')
print(data[0])


---

## J.15 — Putting It Together: Mini Project

This project ties together everything covered in this appendix.
You'll build a small salary analysis tool using only Python built-ins —
no NumPy, no Pandas. This is exactly the kind of code you could write
on day one of a Python course, and it's the foundation everything else builds on.

**The task:** Given a list of survey respondent records, produce a report showing:
- Overall salary statistics (min, max, mean, median)
- Statistics grouped by country
- The top 5 highest-paid respondents
- Save the report to a text file


In [None]:
# J.15 -- Mini project: salary analysis with pure Python

import csv, os

# ── Step 1: Generate sample data ─────────────────────────────────
import random
random.seed(42)

COUNTRIES  = ['US', 'UK', 'Germany', 'India', 'Canada']
BASE_SAL   = {'US': 120000, 'UK': 85000, 'Germany': 78000,
               'India': 22000, 'Canada': 95000}

def generate_respondents(n: int = 200) -> list[dict]:
    """Generate n synthetic survey respondents."""
    respondents = []
    for i in range(n):
        country  = random.choice(COUNTRIES)
        base     = BASE_SAL[country]
        salary   = int(base * random.uniform(0.6, 1.8))
        years    = random.randint(0, 25)
        respondents.append({
            'id':       i + 1,
            'country':  country,
            'salary':   salary,
            'years_exp': years,
            'uses_python': random.choice([True, False]),
        })
    return respondents

respondents = generate_respondents(200)
print(f'Generated {len(respondents)} respondents')
print('First 3:', respondents[:3])


In [None]:
# J.15 -- Step 2: Analysis functions

def compute_stats(salaries: list[float]) -> dict:
    """Return min, max, mean, median for a list of salaries."""
    if not salaries:
        return {}
    n       = len(salaries)
    sorted_s = sorted(salaries)
    median  = sorted_s[n // 2] if n % 2 else (sorted_s[n//2-1] + sorted_s[n//2]) / 2
    return {
        'n':      n,
        'min':    min(salaries),
        'max':    max(salaries),
        'mean':   sum(salaries) / n,
        'median': median,
    }

def group_by_country(respondents: list[dict]) -> dict[str, list]:
    """Group respondents by country."""
    groups = {}
    for r in respondents:
        groups.setdefault(r['country'], []).append(r)
    return groups

def top_n(respondents: list[dict], n: int = 5) -> list[dict]:
    """Return top n highest-paid respondents."""
    return sorted(respondents, key=lambda r: r['salary'], reverse=True)[:n]

# Compute overall stats
all_salaries = [r['salary'] for r in respondents]
overall      = compute_stats(all_salaries)
print('Overall stats:')
for k, v in overall.items():
    print(f'  {k:<8} {v:>12,.0f}')


In [None]:
# J.15 -- Step 3: Build and print the report

def build_report(respondents: list[dict]) -> str:
    """Build a formatted salary analysis report as a string."""
    lines = []
    lines.append('=' * 55)
    lines.append('  SALARY ANALYSIS REPORT — SO 2025 (sample)')
    lines.append('=' * 55)

    # Overall
    all_sals = [r['salary'] for r in respondents]
    stats    = compute_stats(all_sals)
    lines.append(f'\nTotal respondents: {stats["n"]}')
    lines.append(f'  Min salary:    ${stats["min"]:>10,.0f}')
    lines.append(f'  Max salary:    ${stats["max"]:>10,.0f}')
    lines.append(f'  Mean salary:   ${stats["mean"]:>10,.0f}')
    lines.append(f'  Median salary: ${stats["median"]:>10,.0f}')

    # By country
    lines.append('\n' + '-' * 55)
    lines.append(f'  {"Country":<12} {"n":>5} {"Median":>12} {"Mean":>12}')
    lines.append('-' * 55)
    groups = group_by_country(respondents)
    for country in sorted(groups.keys()):
        sals = [r['salary'] for r in groups[country]]
        s    = compute_stats(sals)
        lines.append(f'  {country:<12} {s["n"]:>5} '
                     f'${s["median"]:>10,.0f} ${s["mean"]:>10,.0f}')

    # Top 5
    lines.append('\n' + '-' * 55)
    lines.append('  TOP 5 HIGHEST-PAID RESPONDENTS')
    lines.append('-' * 55)
    for rank, r in enumerate(top_n(respondents, 5), 1):
        lines.append(f'  {rank}. {r["country"]:<10} ${r["salary"]:>10,}  '
                     f'({r["years_exp"]} yrs exp)')

    lines.append('\n' + '=' * 55)
    return '\n'.join(lines)

report = build_report(respondents)
print(report)

# Save to file
with open('/tmp/salary_report.txt', 'w') as f:
    f.write(report)
print('\nReport saved to /tmp/salary_report.txt')


---

## Concept Check Questions

Test your understanding before moving to Chapter 0.

**Q1.** What is the difference between `=` and `==` in Python?

<details><summary>Answer</summary>

`=` is **assignment** — it binds a name to a value: `x = 5`.
`==` is **comparison** — it tests equality and returns `True` or `False`: `x == 5`.

</details>

**Q2.** What does `salaries = my_list` do, and why might it cause a bug?

<details><summary>Answer</summary>

It creates an **alias** — both names point to the same list object in memory.
Modifying through either name affects the shared object.
To make an independent copy: `salaries = my_list.copy()` or `list(my_list)`.

</details>

**Q3.** What is the difference between `list.sort()` and `sorted(list)`?

<details><summary>Answer</summary>

`list.sort()` modifies the list **in place** and returns `None`.
`sorted(list)` returns a **new sorted list** and leaves the original unchanged.
Use `sorted()` when you need to keep the original order.

</details>

**Q4.** Why should you never use a mutable default argument like `def f(lst=[])`?

<details><summary>Answer</summary>

Default argument values are evaluated **once** when the function is defined,
not each time it's called. The same list object is reused across all calls,
so `lst.append(x)` accumulates values across calls.
Use `def f(lst=None): if lst is None: lst = []` instead.

</details>

**Q5.** What is the difference between `dict.get('key')` and `dict['key']`?

<details><summary>Answer</summary>

`dict['key']` raises a `KeyError` if the key doesn't exist.
`dict.get('key')` returns `None` (or a specified default) if the key is missing.
Use `.get()` when a missing key is an expected, normal condition.
Use `['key']` when a missing key is a bug that should raise an error.

</details>

**Q6.** What does a `with` statement guarantee when working with files?

<details><summary>Answer</summary>

The file is **always closed** when the `with` block exits — even if an exception
is raised inside the block. Without `with`, forgetting to call `f.close()` or
having an exception before `close()` can leave file handles open.

</details>

**Q7.** What is the difference between a list and a tuple?

<details><summary>Answer</summary>

Lists are **mutable** (can be changed in place) and use square brackets `[]`.
Tuples are **immutable** (cannot be changed) and use parentheses `()`.
Use tuples for data that should not change: records, coordinates, dict keys.
Use lists for collections you'll add to, remove from, or sort.

</details>

**Q8.** What will `0.1 + 0.2 == 0.3` evaluate to, and why?

<details><summary>Answer</summary>

`False`. Floating-point numbers are stored in binary, and not all decimal
fractions can be represented exactly. `0.1 + 0.2` evaluates to
`0.30000000000000004` due to floating-point precision.
Use `round(0.1 + 0.2, 10) == 0.3` or `math.isclose()` for float comparisons.

</details>


---

## Appendix J Summary

You've covered the complete Python foundation. Here's what you now know:

| Topic | Key insight |
|-------|-------------|
| Variables and types | Variables are names pointing to objects, not boxes containing values |
| Numbers and operators | `/` always gives float; `//` is floor division; `%` is remainder |
| Strings | Immutable — methods return new strings. F-strings are the modern format choice |
| Control flow | Python evaluates `if/elif/else` top-to-bottom, runs the first matching branch |
| Loops | `for` iterates sequences; `while` runs until condition is False; `break` exits |
| Functions | Parameters are references; mutable defaults are a common bug; return copies |
| Lists | Mutable ordered sequences; `.sort()` modifies in place; `sorted()` returns new |
| Tuples | Immutable ordered sequences; use as dict keys; unpack elegantly |
| Dictionaries | Key→value mapping; `.get()` is safe; `.items()` for iteration |
| Sets | Unordered unique values; fast membership test; union/intersection operations |
| Mutability | Mutable objects are shared via references — copy explicitly when needed |
| Error handling | `try/except` catches exceptions; `finally` always runs; `raise` when invalid |
| File I/O | Always use `with open()` — guarantees the file is closed |

### Where to go next

You're ready for **Chapter 0 — Orientation and Setup**, which walks you through
the Colab environment and the Stack Overflow 2025 dataset you'll use throughout
the book. Everything in this appendix is the foundation; Chapters 1 and 2 build
the intermediate Python layer (type hints, OOP, generators, error handling patterns)
on top of it.

---

*End of Appendix J — Python for AI/ML*  
[![Back to TOC](https://img.shields.io/badge/Back_to-Table_of_Contents-1B3A5C?style=flat-square)](https://colab.research.google.com/github/timothy-watt/python-for-ai-ml/blob/main/Python_for_AIML_TOC.ipynb)
