# AAI-511-Final

### Download and Unzip the Dataset

* !pip install kaggle
* !kaggle datasets download -d blanderbuss/midi-classic-music
* !unzip midi-classic-music.zip -d midi-classic-music

### Install Required Libraries

* !pip install mido
* !pip install pretty_midi
* !pip install tensorflow
* !pip install pandas

# Data Collection

The dataset consists of MIDI files of compositions from four well-known classical composers: Bach, Beethoven, Chopin, and Mozart. These files are downloaded from Kaggle. The dataset is organized into folders, each named after one of the composers. Each folder contains multiple MIDI files representing various musical pieces by the respective composer.

# Data Pre-processing

The following steps were taken to preprocess the data:
1. Load MIDI Files
2. Handle Missing or Corrupt Data
3. Normalize Data
4. Data Augmentation

### Load Midi-Files

In [82]:
import os
import pretty_midi
import numpy as np
import warnings

# Suppress specific warnings
warnings.filterwarnings("ignore", message="Tempo, Key or Time signature change events found on non-zero tracks")

# Define the path to the extracted dataset
dataset_path = 'midi-classic-music/midiclassics'

# List of required composers
required_composers = ['Bach', 'Beethoven', 'Chopin', 'Mozart']

# Function to load and preprocess MIDI files
def preprocess_midi(file_path):
    try:
        midi_data = pretty_midi.PrettyMIDI(file_path)
        notes = []
        for instrument in midi_data.instruments:
            if not instrument.is_drum:
                for note in instrument.notes:
                    notes.append([note.start, note.end, note.pitch, note.velocity])
        return np.array(notes)
    except Exception as e:
        print(f"Error processing {file_path}: {e}")
        return None

# Function to process MIDI files for the specified composers
def process_composers(dataset_path, composers):
    all_notes = []
    for composer in composers:
        composer_path = os.path.join(dataset_path, composer)
        if os.path.exists(composer_path):
            for root, dirs, files in os.walk(composer_path):
                for file in files:
                    if file.endswith('.mid'):
                        midi_file_path = os.path.join(root, file)
                        print(f'Reading MIDI file: {midi_file_path}')
                        notes = preprocess_midi(midi_file_path)
                        if notes is not None:
                            all_notes.append(notes)
        else:
            print(f'Folder for {composer} does not exist in the dataset path')
    return all_notes

# Process MIDI files for the specified composers
all_notes = process_composers(dataset_path, required_composers)

# Example of checking the processed notes
if all_notes:
    print("Example notes from one of the MIDI files:")
    print(all_notes[0])
else:
    print("No notes were processed.")

Reading MIDI file: midi-classic-music/midiclassics/Bach/Bwv0997 Partita for Lute 1mov.mid
Reading MIDI file: midi-classic-music/midiclassics/Bach/Bwv0535 Prelude and Fugue.mid
Reading MIDI file: midi-classic-music/midiclassics/Bach/Bwv0806 English Suite n1 05mov.mid
Reading MIDI file: midi-classic-music/midiclassics/Bach/Bwv0998 Prelude Fugue Allegro for Lute 3mov.mid
Reading MIDI file: midi-classic-music/midiclassics/Bach/Jesu Joy of Man Desiring.mid
Reading MIDI file: midi-classic-music/midiclassics/Bach/Prelude and Fugue in C Sharp BWV 872.mid
Reading MIDI file: midi-classic-music/midiclassics/Bach/Bwv0582 Passacaglia and Fugue.mid
Reading MIDI file: midi-classic-music/midiclassics/Bach/Bwv0527 Sonate en trio n3.mid
Reading MIDI file: midi-classic-music/midiclassics/Bach/Bwv0806 English Suite n1 03mov .mid
Reading MIDI file: midi-classic-music/midiclassics/Bach/Bwv0560 Short Prelude and Fugue n8 (Spurious).mid
Reading MIDI file: midi-classic-music/midiclassics/Bach/Bwv0811 English S

### Handle Missing or Corrupt Data

In [113]:
cleaned_notes = [notes for notes in all_notes if notes is not None]

if not cleaned_notes:
    print("No valid notes found after cleaning.")
else:
    print(f"Number of valid note sequences: {len(cleaned_notes)}")

Number of valid note sequences: 1528


### Normalize Data

In [115]:
def normalize_notes(notes_list):
    max_velocity = 127  # MIDI velocity range
    normalized_notes = []
    for notes in notes_list:
        normalized = [[note[0], note[1], note[2], note[3] / max_velocity] for note in notes]
        normalized_notes.append(normalized)
    return normalized_notes

normalized_notes = normalize_notes(cleaned_notes)

### Data Augmentation

In [117]:
def transpose_notes(notes_list, semitones):
    transposed_notes = []
    for notes in notes_list:
        transposed = [[note[0], note[1], note[2] + semitones, note[3]] for note in notes]
        transposed_notes.append(transposed)
    return transposed_notes

augmented_notes = []
for semitone in range(-3, 4):  # Transpose from -3 to +3 semitones
    if semitone != 0:
        augmented_notes.extend(transpose_notes(normalized_notes, semitone))
augmented_notes.extend(normalized_notes)  # Include original notes

# Feature Extraction

The following features were extracted from the MIDI files:
- Start Time: The starting time of each note.
- End Time: The ending time of each note.
- Pitch: The pitch of the note.
- Velocity: The velocity (intensity) of the note.

In [119]:
def extract_features(notes_list):
    features = []
    for notes in notes_list:
        for note in notes:
            features.append([note[0], note[1], note[2]])  # Using start, end, pitch as features
    return np.array(features)

features = extract_features(augmented_notes)
print("Extracted features:")
print(features)

Extracted features:
[[3.75000000e-01 5.62500000e-01 6.90000000e+01]
 [5.62500000e-01 7.50000000e-01 7.10000000e+01]
 [7.50000000e-01 1.12500000e+00 7.20000000e+01]
 ...
 [4.76691124e+02 4.77352235e+02 7.60000000e+01]
 [4.77357791e+02 4.78018902e+02 7.40000000e+01]
 [4.77357791e+02 4.78018902e+02 7.70000000e+01]]
