In [4]:
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# TECHIN 509: Bigram Melody Generator\n",
    "\n",
    "This project will use Python and a Bigram model to generate musical melodies, as specified in the `README.md` file.\n",
    "\n",
    "We will follow modular programming principles, splitting the program into several clear functions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import random"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Part 1 & 2: Data Preparation\n",
    "\n",
    "First, we define the sample dataset mentioned in the `README.md`.\n",
    "\n",
    "This dataset is a `list of lists`, where each sub-list represents a single melody."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Our sample dataset (based on the README examples)\n",
    "melody_dataset = [\n",
    "    ['E4', 'F#4', 'G4', 'A4', 'B4', 'A4', 'G4', 'F#4', 'E4'],\n",
    "    ['C4', 'D4', 'E4', 'C4'],\n",
    "    ['C4', 'E4', 'G4', 'E4', 'C4'],\n",
    "    ['G4', 'A4', 'G4', 'F#4', 'E4', 'D4', 'E4'],\n",
    "    ['C4', 'D4', 'E4', 'F4', 'G4', 'A4', 'B4', 'C5']\n",
    "]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Part 5 (Clean Up): Melody Preprocessing\n",
    "\n",
    "As required by Part 5 of the `README.md`, we need to add **start (`^`)** and **end (`$`)** tokens. This is excellent practice, as it allows the model to \"learn\" how melodies begin and end.\n",
    "\n",
    "We'll define the `preprocess_melodies` function to handle this."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def preprocess_melodies(melodies):\n",
    "    \"\"\"\n",
    "    Adds start ('^') and end ('$') tokens to each melody in the dataset.\n",
    "\n",
    "    Args:\n",
    "        melodies (list): The list of melodies (list of lists).\n",
    "\n",
    "    Returns:\n",
    "        list: The list of melodies with tokens added.\n",
    "    \"\"\"\n",
    "    processed = []\n",
    "    for melody in melodies:\n",
    "        # Add '^' to the beginning and '$' to the end of the list\n",
    "        processed.append(['^'] + melody + ['$'])\n",
    "    return processed"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Part 2: Build the Bigram Model\n",
    "\n",
    "Now we'll write the `build_bigram_model` function. This function will iterate through all the processed melodies and count the occurrences of each \"current note\" to \"next note\" transition.\n",
    "\n",
    "The model will use a nested dictionary structure:\n",
    "`{ 'current_note': { 'next_note_1': count, 'next_note_2': count } }`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def build_bigram_model(processed_melodies):\n",
    "    \"\"\"\n",
    "    Builds a bigram model from the (processed) melody data.\n",
    "\n",
    "    Args:\n",
    "        processed_melodies (list): The list of melodies from preprocess_melodies.\n",
    "\n",
    "    Returns:\n",
    "        dict: The bigram model.\n",
    "    \"\"\"\n",
    "    model = {}\n",
    "    \n",
    "    for melody in processed_melodies:\n",
    "        # Iterate through each (current, next) pair in the melody\n",
    "        # We stop at len(melody) - 1 to avoid an index out of bounds error\n",
    "        for i in range(len(melody) - 1):\n",
    "            current_note = melody[i]\n",
    "            next_note = melody[i+1]\n",
    "            \n",
    "            # 1. Check if 'current_note' is already a key in the model\n",
    "            if current_note not in model:\n",
    "                model[current_note] = {}\n",
    "            \n",
    "            # 2. Check if 'next_note' is already in the entry for 'current_note'\n",
    "            if next_note not in model[current_note]:\n",
    "                model[current_note][next_note] = 0\n",
    "            \n",
    "            # 3. Increment the count\n",
    "            model[current_note][next_note] += 1\n",
    "            \n",
    "    return model"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Part 3: Generate New Melodies\n",
    "\n",
    "This is the most exciting part! We'll write the `generate_melody` function.\n",
    "\n",
    "1.  Start with the `^` (start) token.\n",
    "2.  Look up all possible next notes from `^` and their weights (counts) in the model.\n",
    "3.  Use `random.choices()` to make a **weighted random selection**.\n",
    "4.  Set this new note as the \"current note\" and repeat the process.\n",
    "5.  If the `$` (end) token is chosen, or we hit the max length, the melody is complete."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def generate_melody(model, max_length=20):\n",
    "    \"\"\"\n",
    "    Generates a new melody using the bigram model.\n",
    "\n",
    "    Args:\n",
    "        model (dict): The model generated by build_bigram_model.\n",
    "        max_length (int): Max length to prevent infinite loops.\n",
    "\n",
    "    Returns:\n",
    "        str: A space-separated string of notes.\n",
    "    \"\"\"\n",
    "    # Melodies always start with the start token\n",
    "    melody = []\n",
    "    current_note = '^'\n",
    "    \n",
    "    for _ in range(max_length):\n",
    "        # 1. Check if the current note is in the model (prevents dead ends)\n",
    "        if current_note not in model:\n",
    "            break \n",
    "        \n",
    "        # 2. Get all possible next notes and their weights\n",
    "        transitions = model[current_note]\n",
    "        next_notes = list(transitions.keys())\n",
    "        weights = list(transitions.values())\n",
    "        \n",
    "        # 3. Make a weighted random choice\n",
    "        # random.choices returns a list, so we take the first element [0]\n",
    "        next_note = random.choices(next_notes, weights, k=1)[0]\n",
    "        \n",
    "        # 4. Check if we've reached the end token\n",
    "        if next_note == '$':\n",
    "            break # The melody ends naturally\n",
    "        \n",
    "        # 5. Append the note and continue\n",
    "        melody.append(next_note)\n",
    "        current_note = next_note\n",
    "        \n",
    "    # Join all the notes into a single string\n",
    "    return \" \".join(melody)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Part 4 & 5: Show Results and Analysis\n",
    "\n",
    "Finally, we'll write two helper functions:\n",
    "\n",
    "1.  `print_model_readable`: To print the model clearly, as requested in the `README.md`.\n",
    "2.  `find_most_common_transition`: To iterate through the model and find the most-learned note transition."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def print_model_readable(model):\n",
    "    \"\"\"\n",
    "    Prints the Bigram model in a human-readable format.\n",
    "    \"\"\"\n",
    "    print(\"--- Bigram Model ---\")\n",
    "    for current_note, transitions in model.items():\n",
    "        # Format the dictionary for easier reading\n",
    "        transition_str = \", \".join(f\"'{note}': {count}\" for note, count in transitions.items())\n",
    "        print(f\"{current_note} → {{{transition_str}}}\")\n",
    "    print(\"-\" * 20)\n",
    "\n",
    "def find_most_common_transition(model):\n",
    "    \"\"\"\n",
    "    Finds the most common note transition (pair) in the dataset.\n",
    "    \"\"\"\n",
    "    most_common_pair = (None, None)\n",
    "    max_count = 0\n",
    "    \n",
    "    # Iterate through every entry in the model\n",
    "    for current_note, transitions in model.items():\n",
    "        for next_note, count in transitions.items():\n",
    "            if count > max_count:\n",
    "                max_count = count\n",
    "                most_common_pair = (current_note, next_note)\n",
    "                \n",
    "    return most_common_pair, max_count"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Main Program: Executing All Steps\n",
    "\n",
    "Now we'll chain all the functions together to run the full process and print the final results."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 1. Preprocess the data\n",
    "print(\"Preprocessing data...\")\n",
    "processed_data = preprocess_melodies(melody_dataset)\n",
    "# print(processed_data) # (Uncomment to view)\n",
    "\n",
    "# 2. Build the model\n",
    "print(\"Building bigram model...\")\n",
    "bigram_model = build_bigram_model(processed_data)\n",
    "\n",
    "# 3. (Part 4) Print the readable model\n",
    "print_model_readable(bigram_model)\n",
    "\n",
    "# 4. (Part 4) Generate 3 sample melodies\n",
    "print(\"--- Generated Melodies ---\")\n",
    "for i in range(3): # Generate 3 melodies\n",
    "    new_song = generate_melody(bigram_model)\n",
    "    print(f\"{i+1}. {new_song}\")\n",
    "print(\"-\" * 20)\n",
    "\n",
    "# 5. (Part 5) Analysis: Find the most common transition\n",
    "common_pair, count = find_most_common_transition(bigram_model)\n",
    "print(\"--- Analysis ---\")\n",
    "print(f\"Most common transition: {common_pair[0]} → {common_pair[1]} (Count: {count})\")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}

NameError: name 'null' is not defined