### 1\. Reshaping Data: From Lists to Lookups

This is the **most common pattern** on every exam. You're almost always given data as a `list` of records (like a list of dictionaries) and your first job is to reshape it into a `dict` for fast lookups.

**Use Case**: You have a list of songs, and you want to group them by artist. Or, you have a list of connections `[(user_a, user_b), ...]` and you want to be able to quickly find all friends for any given user.

**Your Go-To Tool**: `collections.defaultdict`

  * `defaultdict(list)`: The perfect tool for grouping items. When you access a key for the first time, it automatically creates an empty list for you.
  * `defaultdict(set)`: Useful when you need to store a collection of unique items, like a user's connections.
  * `defaultdict(int)`: The simplest way to create a counter (see next section).

**Common Syntax:**

In [1]:
from collections import defaultdict

# Example: Grouping songs by artist from a list of song records
# artists_list = [{'artist': 'Taylor Swift', 'song': 'Cruel Summer'}, {'artist': 'Jung Kook', 'song': 'Seven'}, ...]
songs_by_artist = defaultdict(list)

for record in records_list:
    # Handle cases with multiple artists by splitting the string
    artists = record['artist_name'].split(',')
    for artist_name in artists:
        clean_name = artist_name.strip()
        songs_by_artist[clean_name].append(record['song_title'])

# The result is a dictionary like: {'Taylor Swift': ['Cruel Summer', ...], ...}

NameError: name 'records_list' is not defined

In [None]:
from google.colab import drive
drive.mount('/content/drive')

-----

### 2\. Counting and Aggregation

Another universal task is counting things: how many times a word appears, how many songs an artist has, the number of visits per country.

**Use Case**: You have a long text and need to count word frequencies. You have check-in data and need to count visits by country.

**Your Go-To Tools**: `collections.defaultdict(int)` or `collections.Counter`

  * A `defaultdict(int)` starts every new key's count at 0, which is very convenient.
  * A `Counter` is often even more direct. You can initialize it from a list of items to count them all at once.

**Common Syntax:**

In [None]:
from collections import Counter, defaultdict

# --- Using defaultdict(int) ---
# Example: Counting visits by country
country_counts = defaultdict(int)
for visit in all_visits:
    # Assuming 'visit' is a dict with a 'country_code' key
    country_counts[visit['country_code']] += 1

# --- Using Counter ---
# Example: Counting words in a list
all_words = ['hello', 'world', 'hello', 'python']
word_counts = Counter(all_words)
# Result: Counter({'hello': 2, 'world': 1, 'python': 1})

# Example: Counting bigrams (pairs of words)
bigrams = [('in', 'the'), ('the', 'end'), ('in', 'the')]
bigram_counts = Counter(bigrams)
# Result: Counter({('in', 'the'): 2, ('the', 'end'): 1})

-----

### 3\. Advanced Sorting

You will **definitely** be asked to sort data. It's rarely a simple sort. Usually, you need to sort by one value in descending order and then break ties by sorting by another value in ascending order.

**Use Case**: Rank artists first by the number of songs they have (most to least), and if there's a tie, rank them by name alphabetically.

**Your Go-To Tool**: `sorted()` with a `lambda` function as the `key`.

**Common Syntax:**

The key to multi-level sorting is to return a **tuple** from your lambda function. Python sorts by the first element of the tuple, then the second to break ties, and so on.

In [None]:
# data is a list of tuples: [('Taylor Swift', 10, 5000), ('Jung Kook', 12, 4500), ...]
# where each tuple is (artist_name, song_count, total_streams)

# Sort by song_count (descending), then artist_name (ascending)
# To sort a number in descending order, just negate it!
sorted_data = sorted(data, key=lambda item: (-item[1], item[0]))

# item[1] is song_count. -item[1] makes it sort descending.
# item[0] is artist_name. It sorts ascending by default.

This single line of code is one of the most powerful and frequently tested concepts. Practice it\!

-----

### 4\. String Cleaning & Regular Expressions

Raw text data is always messy. You'll need to clean it before you can analyze it. This usually involves a pipeline of several small steps.

**Use Case**: You're given raw song lyrics with punctuation, mixed case, and parenthetical notes like `(Ooh!)` that you need to remove.

**Your Go-To Tools**: Standard string methods and the `re` (regular expressions) module.

**Common Syntax:**

In [None]:
import re

raw_text = "  (Yeah!) Python's my favorite--I give it a 10/10! \n"

# 1. Make lowercase and remove leading/trailing whitespace
text = raw_text.lower().strip()
# -> "(yeah!) python's my favorite--i give it a 10/10!"

# 2. Remove parenthetical phrases
text = re.sub(r'\(.*?\)', '', text)
# -> " python's my favorite--i give it a 10/10!"

# 3. Remove punctuation (keep letters, numbers, spaces, and apostrophes)
text = re.sub(r"[^a-z0-9' ]", '', text)
# -> " pythons my favoritei give it a 1010"
# (Note: be careful with your regex to preserve spaces!)

# 4. A better way: Replace non-alphanumeric chars with a space, then join
text = re.sub(r'[^a-z0-9\']', ' ', text) # Replace with space
words = text.split()                      # Split into words
clean_text = ' '.join(words)              # Rejoin with single spaces
# -> "python's my favorite i give it a 10 10"

-----

### 5\. Implementing Mathematical Formulas

Often, you'll be given a mathematical formula, like for cosine similarity or a Naive Bayes score, and asked to translate it into code.

**Use Case**: Calculate the "cosine similarity" between two vectors $v_0$ and $v_1$. The formula is $cos(\theta) = \frac{v_0 \cdot v_1}{\|v_0\| \|v_1\|}$.

**Your Go-To Tools**: The `math` module and `sum()` with generator expressions.

**Common Syntax:**

In [None]:
import math

# Assume v0 and v1 are dictionaries mapping words to counts: {'python': 3, 'code': 5}
# The vectors only share dimensions for words present in both.
dot_product = sum(v0[k] * v1[k] for k in v0.keys() & v1.keys())

# Calculate magnitude (norm) of v0
mag_v0 = math.sqrt(sum(v**2 for v in v0.values()))

# Calculate magnitude (norm) of v1
mag_v1 = math.sqrt(sum(v**2 for v in v1.values()))

# Calculate cosine similarity
if mag_v0 > 0 and mag_v1 > 0:
    cosine_sim = dot_product / (mag_v0 * mag_v1)
else:
    cosine_sim = 0.0

**Key takeaway**: Don't be intimidated by the formulas. Break them down piece by piece: numerator, denominator, sums, square roots, etc. The Python code is usually very direct.

-----

Good luck with your studies. Go back through these exams and try to label each problem with one or more of the patterns above. You'll see just how repeatable they are. You've got this\!

Best,
Your Professor