This notebook will help you define and run pipelines to process your data. This includes data augmentation, slicing, stretching and encoding among others. If you want to use this notebook, you are expected to have already collated your original `.xml` with the help of `1.1. Collate Files.ipynb`.

Pipelines are a data processing module which transforms input data types to output data types. The idea as well as bits & pieces are borrowed [the Magenta project](https://github.com/tensorflow/magenta/tree/master/magenta/pipelines).


**INSTRUCTIONS**
 
First, adjust the definition of the pipelines inside `pipeline_graph_def`. Then run `build_dataset`. This will create 4 files, two sets of train and evaluate. The first set is the inputs, and the second set is the targets.

**DEPENDENCIES**

In [1]:
import data_process
from data_process import PerformanceExtractor, PerformanceParser, TranspositionPipeline

from magenta.protobuf import music_pb2
from magenta.pipelines import pipelines_common, dag_pipeline, note_sequence_pipelines

import os

**PARAMETERS**

In [2]:
pipeline_config = dict()

pipeline_config['mode'] = "train"

pipeline_config['data_source_dir'] = "./data/collated/A/"
pipeline_config['data_target_dir'] = "./data/processed/heron/"

# How many steps per quarter note
pipeline_config['steps_per_quarter'] = 4

pipeline_config['min_events'] = 1

pipeline_config['max_events'] = 10000

# Inclusive.
pipeline_config['MIN_MIDI_PITCH'] = 0

# Inclusive.
pipeline_config['MAX_MIDI_PITCH'] = 127

**DEFINITIONS**

In [3]:
def pipeline_graph_def(collection_name,
                       config):
    """Returns the Pipeline instance which creates the RNN dataset.

    Args:
        collection_name:
        config: dict() with configuration settings

    Returns:
        A pipeline.Pipeline instance.
    """
    
    # Transpose no more than a major third.
    transposition_range = range(-12, 12)

    mode = pipeline_config['mode']
    key = 'arrangement' + '_' + collection_name

    quantizer = note_sequence_pipelines.Quantizer(
        steps_per_quarter=pipeline_config['steps_per_quarter'], 
        name='Quantizer_' + key)
        # input_type=music_pb2.NoteSequence
        # output_type=music_pb2.NoteSequence

    transposer = TranspositionPipeline(
        transposition_range if mode == 'train' else [0],
        min_pitch = pipeline_config['MIN_MIDI_PITCH'],
        max_pitch = pipeline_config['MAX_MIDI_PITCH'],
        name='Transposer_' + key)
        # input_type=music_pb2.NoteSequence
        # output_type=music_pb2.NoteSequence

    perf_extractor = PerformanceExtractor(
        min_events=pipeline_config['min_events'],
        max_events=pipeline_config['max_events'],
        num_velocity_bins=0,
        name='PerformanceExtractor_' + key)
        # input_type = music_pb2.NoteSequence
        # output_type = magenta.music.MetricPerformance

    perf_parser = PerformanceParser(
        name='PerformanceParser_' + key)
        # input_type = magenta.music.MetricPerformance
        # output_type = str

    # Reverse
    # Meter
    
    dag = {}
    dag[quantizer] = dag_pipeline.DagInput(music_pb2.NoteSequence)
    dag[transposer] = quantizer
    dag[perf_extractor] = transposer
    dag[perf_parser] = perf_extractor
    dag[dag_pipeline.DagOutput(key)] = perf_parser
        
    return dag_pipeline.DAGPipeline(dag)

# Build Dataset

In [5]:
data_process.build_dataset(pipeline_config, pipeline_graph_def)

INFO: Transposition pipeline ignores Key Signatures, Pitch Names and Chord Symbols.
INFO:tensorflow:

Completed.

INFO:tensorflow:Processed 99 inputs total. Produced 2376 outputs.
INFO:tensorflow:DAGPipeline_PerformanceExtractor_arrangement_inputs_performance_lengths_in_bars:
  [1,10): 744
  [10,20): 696
  [20,30): 576
  [30,40): 264
  [40,50): 96
INFO:tensorflow:DAGPipeline_PerformanceExtractor_arrangement_inputs_performances_discarded_more_than_1_program: 0
INFO:tensorflow:DAGPipeline_PerformanceExtractor_arrangement_inputs_performances_discarded_too_short: 0
INFO:tensorflow:DAGPipeline_PerformanceExtractor_arrangement_inputs_performances_truncated: 0
INFO:tensorflow:DAGPipeline_PerformanceExtractor_arrangement_inputs_performances_truncated_timewise: 0
INFO:tensorflow:DAGPipeline_Transposer_arrangement_inputs_skipped_due_to_range_exceeded: 0
INFO:tensorflow:DAGPipeline_Transposer_arrangement_inputs_transpositions_generated: 2376
INFO: Transposition pipeline ignores Key Signatures, Pi

# Build Vocabulary

In [7]:
data_process.build_vocab(pipeline_config)

INFO: Vocabulary built.
