
# TECHIN 509 — Melody Representation (Python Structures)

This notebook answers the assignment using simple, readable Python structures, focusing on **how to represent melodies** rather than writing fancy code. Explanations are embedded with runnable examples.



## 1) How to store a sequence of notes like `C D E F G` in Python?

### Design choices (summary)
- **Note** = minimal musical atom. Represented by a small record with:
  - `pitch` (e.g., `"C"`, `"C#"`, `"D"`),
  - `octave` (e.g., `4` for the 4th octave),
  - `duration` in beats (e.g., `1.0` = quarter note, `0.5` = eighth, `2.0` = half),
  - optional `velocity` (MIDI-like loudness, default 100).
- **Melody** = an **ordered** sequence of `Note` objects → a Python **`list`** preserves order naturally.
- **Many melodies** (a dataset) = `List[List[Note]]` or a labeled `Dict[str, List[Note]]`.

**Why this works:** Music is a time-ordered sequence. Python `list` is the simplest ordered container with great readability and good performance.


In [None]:

from dataclasses import dataclass, asdict
from typing import List, Dict, Any

@dataclass
class Note:
    pitch: str           # e.g., "C", "C#", "D", ...
    octave: int          # typical range 0-8
    duration: float      # beats: 1.0 = quarter, 0.5 = eighth, 2.0 = half
    velocity: int = 100  # optional performance parameter

# Example melody: C D E F G (all quarter notes, octave 4)
melody: List[Note] = [
    Note("C", 4, 1.0),
    Note("D", 4, 1.0),
    Note("E", 4, 1.0),
    Note("F", 4, 1.0),
    Note("G", 4, 1.0),
]

# Show both object view and JSON-like dicts for storage/serialization
melody_dicts: List[Dict[str, Any]] = [asdict(n) for n in melody]
melody, melody_dicts[:2]  # preview



**Guiding questions:**  
- If a melody is like a sentence, **what is each word?** → A **`Note`**.  
- Which Python type **keeps items in order**? → A **`list`**.  
- How to group several melodies? → `List[List[Note]]` or a named `Dict[str, List[Note]]`.



## 2) Preferred storage format for a collection of melodies

I prefer **JSON/JSONL** for readability and ease of preprocessing. It supports nested structures naturally, can be diffed in version control, and is language-agnostic. CSV is awkward for nested lists; MIDI is compact but not ideal for early analysis/teaching.

### Example JSON structure


In [None]:

import json

piece = {
    "title": "simple_scale",
    "bpm": 120,
    "key": "C",
    "melody": [asdict(n) for n in melody]
}

json_preview = json.dumps(piece, indent=2)
print(json_preview.splitlines()[0])
print("\n".join(json_preview.splitlines()[1:12]), "...")  # show a short preview



## 3) Flattening and combining melodies

Given multiple melodies, we sometimes want **one long sequence** of notes (e.g., for training). Recommended tools:

- `list.extend(...)` (mutates in-place, efficient),
- `itertools.chain.from_iterable(...)` (memory-friendly, readable).  
Avoid repeatedly using `+` inside loops (it creates new lists each time).


In [None]:

from itertools import chain
from typing import Iterable

Dataset = List[List[Note]]

def flatten_with_chain(dataset: Dataset) -> List[Note]:
    return list(chain.from_iterable(dataset))

def flatten_with_comprehension(dataset: Dataset) -> List[Note]:
    return [n for m in dataset for n in m]

# Demo dataset
dataset: Dataset = [
    melody,
    [Note("A",4,0.5), Note("B",4,0.5), Note("C",5,2.0)],
]

flat1 = flatten_with_chain(dataset)
flat2 = flatten_with_comprehension(dataset)

len(flat1), len(flat2), flat1[:3], flat2[-3:]


In [None]:

# Accumulating from an empty list (recommended: extend)
all_notes: List[Note] = []
for m in dataset:
    all_notes.extend(m)

len(all_notes), all_notes[:5]



## 4) What information to extract to *teach/train* a music composer?

Useful features from a set of melodies:
- **Pitch-based**: MIDI numbers, **interval sequence** (transposition-invariant), range, histogram.
- **Rhythm-based**: duration histogram, common rhythmic patterns (n-grams).
- **Motivic/structural**: repeated cells and variants.
- **Global metadata**: BPM, time signature, phrase boundaries (if available).

### Minimal, runnable feature extractors


In [None]:

from collections import Counter

PITCH_TO_SEMITONE = {
    "C":0,"C#":1,"Db":1,"D":2,"D#":3,"Eb":3,"E":4,
    "F":5,"F#":6,"Gb":6,"G":7,"G#":8,"Ab":8,"A":9,
    "A#":10,"Bb":10,"B":11
}

def to_midi_number(note: Note) -> int:
    """MIDI convention: C4 = 60."""
    return 12 * (note.octave + 1) + PITCH_TO_SEMITONE[note.pitch]

def intervals(mel: List[Note]) -> List[int]:
    mids = [to_midi_number(n) for n in mel]
    return [mids[i+1] - mids[i] for i in range(len(mids)-1)]

def duration_histogram(mel: List[Note]) -> Dict[float, int]:
    return dict(Counter(n.duration for n in mel))

iv = intervals(melody)                 # C-D-E-F-G -> [2, 2, 1, 2]
dur_hist = duration_histogram(melody)  # {1.0: 5}

iv, dur_hist



## 5) Justification of choices (concise)
- **Expressive & extensible**: `Note` record captures essentials (pitch, octave, duration, velocity) and can grow (articulations, ties, etc.).
- **Order preserved**: `list` fits musical time naturally and is easy to index/slice.
- **Composable datasets**: `List[List[Note]]` or labeled dictionaries keep the hierarchy simple.
- **Engineering-friendly**: JSON/JSONL are human-readable and tool-friendly; conversion to/from MIDI can be added later.
- **Efficient ops**: Use `extend`/`itertools.chain` for linear-time flattening; avoid O(n^2) concatenations.
