# List Generator

This python notebook will look through corpuses that exist natively in music21. For each composer found, the code will generate a properly formatted text file containing relevant information to conduct chord phrase frequency analysis and save the dataset files to the folder `datasets`.

## Text Files Structure

These text files will consist of a list of roman numerals for all the pieces that have an existing `.rntxt` file under the same composer.

* The characters `,` will separate data for different roman numerals in the same piece
* The characters `\n` will separate data for different pieces by the same composer


## Table of Contents:

* [Method 1 - Record Only Order and Roman Numeral](#method-1)
    * [a. Listing One Song in Proper Format](#method-1-part-a)
    * [b. Listing Multiple Songs in Proper Format](#method-1-part-b)
    * [c. Create the File for One Composer](#method-1-part-c)
    * [d. Create Files for Several Composers](#method-1-part-d)
* [Method 2 - Record Inversion As Well](#method-2)
    * [a. Listing One Song in Proper Format](#method-2-part-a)
    * [b. Listing Multiple Songs in Proper Format](#method-2-part-b)
    * [c. Create the File for One Composer](#method-2-part-c)
    * [d. Create Files for Several Composers](#method-2-part-d)

In [1]:
from music21 import corpus, stream

# Method 1 - Record Only Order and Roman Numeral <a class="anchor" name="method-1"></a>

In this first approach, I will be importing various corpuses from the existing music21 library which already have the roman text in them and save them to separate text files. As the title would imply, these text files will only record the relative order of these roman numerals for simplicity, and nothing more. This means we will be disregarding parts of the corpus that we may or may not wish to take into account, including duration, root note, and key changes. 

## a. Listing One Song in Proper Format <a class="anchor" name="method-1-part-a"></a>

To begin with, I will load up a single song and generate a string with all of the roman numerals in the song. This song is selected from the music21 module reference page for the `romanText.translate`.

In [2]:
# http://web.mit.edu/music21/doc/moduleReference/moduleRomanTextTranslate.html
monteverdi = corpus.parse('monteverdi/madrigal.3.1.rntxt')

I loop through all elements with the "RomanNumeral" class and get the string that represents the roman numeral. (I currently don't know whether this would get inversions as well). It then joins all of the roman numerals with a comma, and prints. This gives a good idea of what one song will look like encoded using our idea.

In [3]:
monteverdi_chords = []
for part in monteverdi.parts:
    for roman_numerals in part.flat.getElementsByClass('RomanNumeral'):
        monteverdi_chords.append(roman_numerals.romanNumeral)
print(",".join(monteverdi_chords))

vi,V,I,IV,I,V,V,i,V,i,VI,V,i,V,i,i,i,I,IV,ii,vii,I,vi,I,I,IV,vii,I,vi,vi,I,IV,IV,I,I,IV,ii,V,I,V,V,ii,ii,vi,IV,V,I,I,V,V,ii,VI,VI,iv,V,V,i,V,I,V,IV,I,vi,V,I,I,i,i,VII,VII,iv,VI,i,V,i,ii,V,I,vi,V,vi,V,vi,i,V,I,V,ii,V,I,V,vi,i,i,V,i,V,I,i,V,i,V,i,V,I,ii,I,ii,V,I,I,vi,V,II,V,I,IV,ii,vii,V,i,V,I,i,V,i,V,I,ii,I,ii,V,I,ii,I,ii,V,I,I,vi,vi,II,V,I,vi,V,i,V,I,i,i,i,i,i,ii,V,i,V,i,i,i,v,IV,III,V,i,V,i,i,II,III,v,IV,i,#vii,i,i,V,V,V,I,#vii,I,I,#vii,I,V,V,i,VII,v,i,V,V,i,V,I,I,V,vi,V,IV,ii,V,i,V,i,I,IV,IV,I,ii,I,ii,V,i,V,i,V,vi,ii,V,IV,vii,iii,IV,V,V,i,V,iii,vi,ii,V,I,I,vi,ii,I,IV,V,I,V,I,IV,V,vi,iii,i,V,i,VII,v,iv,III,iv,iv,III,I,ii,v,i,V,I


Next, I will generalize the above algorithm to take in a song, and return a comma separated string

In [4]:
def simple_format_score(s : stream.Score) -> str:
    """
    Given a Score, generates a comma-separated string of the roman numerals in
    order that they appear in the score.
    """
    chord_order = []
    
    for part in s.parts:
        for rn in part.flat.getElementsByClass('RomanNumeral'):
            chord_order.append(rn.romanNumeral)
    return ",".join(chord_order)

## (b) Listing Multiple Songs in Proper Format <a class="anchor" name="method-1-part-b"></a>

Now that I've made a method that generates the properly formatted string for a single song, the next goal is to repeat this process for multiple songs under the same composer.

One way we could do this would be to manually find the files in the composer's folder which end in `.rntxt` and loop through the list of paths. However, it would be much more convenient for the method to do this automatically, which I don't know how to do yet so I'll leave it here.

In [5]:
def simple_format_scores_for_composer (composer="monteverdi"):
    """
    Generates the contents of the composer's dataset, which is a concatenation
    of the chord progressions for a specific song
    """
    MAX_NUM_DOTS = 20
    
    # 1. Search for all files that end with rntxt
    metadatas = corpus.search(query=composer, fileExtensions="rntxt")
    num_files = len(metadatas)
    print("Number of rntxt files found:", num_files)
    
    file_contents = []
    
    num_parsed = 0
    for m in metadatas:
        # Format the score and append it to a list
        formatted_score = simple_format_score(m.parse())
        file_contents.append( formatted_score )
        
        # Print Progress
        num_parsed += 1
        num_dots = int(num_parsed / num_files * MAX_NUM_DOTS)
        ending = '\r' if num_parsed != num_files else '\n'
        print('[' + ('.' * num_dots) + (' ' * (MAX_NUM_DOTS - num_dots)) + ']', end=ending)
        
    return "\n".join(file_contents)

## (c) Create the File for One Composer <a class="anchor" name="method-1-part-c"></a>

In [6]:
def generate_simple_dataset (composer: str):
    """
    Saves a text file with the composer's repertoire, ignoring inversions
    
    
    Given the name of the composer, it will search for files in the local
    corpus which have the file extension ".rntxt" and generates a list of
    formatted songs where data on each roman numeral is separated by a
    comma, and each song is separated by a new line character '\n'. With 
    this string, it saves the formatted string to a text file.
    
    Parameters
    ----------
    composer : str
        The last name of the composer whose .rntxt is in music21
    """
    
    print("Generating Dataset for:", composer)
    file_contents = simple_format_scores_for_composer(composer)
    
    # Doesn't generate the dataset if the contents are empty
    if len(file_contents) == '0':
        print('Dataset empty')
        return
    
    # Generates a file to save the file contents to
    file_name = "datasets/simple-dataset-" + composer + ".txt"
    f = open(file_name,"w+")
    f.write(file_contents)
    f.close()
    
    # Print the location
    print("Dataset written to:", file_name, end="\n\n")

## (d) Create Files for Several Composers <a class="anchor" name="method-1-part-d"></a>

In [7]:
generate_simple_dataset('monteverdi')
generate_simple_dataset('bach')

Generating Dataset for: monteverdi
Number of rntxt files found: 46
[....................]
Dataset written to: datasets/simple-dataset-monteverdi.txt

Generating Dataset for: bach
Number of rntxt files found: 20
[....................]
Dataset written to: datasets/simple-dataset-bach.txt



# Method 2 - Record Order, Roman Numeral, and Inversion <a class="anchor" name="method-2"></a>

## a. Listing One Song in Proper Format <a class="anchor" name="method-2-part-a"></a>

To begin with, I will load up a single song and generate a string with all of the roman numerals in the song. This song is selected from the music21 module reference page for the `romanText.translate`.

In [8]:
# http://web.mit.edu/music21/doc/moduleReference/moduleRomanTextTranslate.html
monteverdi = corpus.parse('monteverdi/madrigal.3.1.rntxt')

I loop through all elements with the "RomanNumeral" class and get the string that represents the roman numeral.
The new addition is that after every roman numeral is its inversion in parenthesis. For example, I in root position would be `"I(0)"`. It then joins all of the roman numerals with a comma, and prints. This gives a good idea of what one song will look like encoded using our idea.

In [9]:
monteverdi_chords = []
for part in monteverdi.parts:
    for rn in part.flat.getElementsByClass('RomanNumeral'):
        rn_data = "{}({})".format(rn.romanNumeral,rn.inversion())
        monteverdi_chords.append(rn_data)
        
print(",".join(monteverdi_chords))

vi(0),V(0),I(0),IV(0),I(0),V(1),V(1),i(0),V(0),i(0),VI(0),V(0),i(2),V(0),i(0),i(0),i(0),I(0),IV(0),ii(0),vii(1),I(0),vi(0),I(0),I(0),IV(1),vii(1),I(0),vi(0),vi(0),I(0),IV(0),IV(0),I(0),I(0),IV(1),ii(0),V(0),I(0),V(0),V(0),ii(1),ii(0),vi(0),IV(0),V(0),I(0),I(0),V(0),V(0),ii(1),VI(0),VI(0),iv(1),V(0),V(0),i(0),V(0),I(0),V(0),IV(1),I(0),vi(0),V(0),I(0),I(0),i(0),i(0),VII(0),VII(0),iv(1),VI(0),i(2),V(0),i(0),ii(0),V(0),I(0),vi(1),V(0),vi(0),V(0),vi(1),i(0),V(0),I(0),V(0),ii(0),V(0),I(0),V(0),vi(0),i(1),i(0),V(0),i(2),V(0),I(0),i(0),V(0),i(0),V(0),i(0),V(0),I(0),ii(0),I(1),ii(1),V(0),I(0),I(0),vi(0),V(3),II(1),V(0),I(1),IV(0),ii(1),vii(1),V(0),i(2),V(0),I(0),i(0),V(0),i(0),V(0),I(0),ii(0),I(1),ii(1),V(0),I(0),ii(0),I(1),ii(1),V(0),I(0),I(0),vi(0),vi(0),II(1),V(0),I(1),vi(0),V(0),i(2),V(0),I(0),i(0),i(0),i(1),i(0),i(1),ii(1),V(0),i(2),V(0),i(0),i(0),i(0),v(1),IV(1),III(1),V(0),i(2),V(0),i(0),i(1),II(1),III(1),v(0),IV(0),i(1),#vii(1),i(0),i(0),V(0),V(1),V(0),I(1),#vii(1),I(0),I(0),#vii(1),I(1

Next, I will generalize the above algorithm to take in a song, and return a comma separated string

In [10]:
def inv_format_score(s : stream.Score) -> str:
    """
    Given a Score, generates a comma-separated string of the roman numerals in
    order that they appear in the score.
    """
    chord_order = []
    
    for part in s.parts:
        for rn in part.flat.getElementsByClass('RomanNumeral'):
            rn_data = "{}({})".format(rn.romanNumeral,rn.inversion())
            chord_order.append(rn_data)
    return ",".join(chord_order)

## (b) Listing Multiple Songs in Proper Format <a class="anchor" name="method-2-part-b"></a>

Now that I've made a method that generates the properly formatted string for a single song, the next goal is to repeat this process for multiple songs under the same composer.

One way we could do this would be to manually find the files in the composer's folder which end in `.rntxt` and loop through the list of paths. However, it would be much more convenient for the method to do this automatically, which I don't know how to do yet so I'll leave it here.

In [11]:
def inv_format_scores_for_composer (composer="monteverdi"):
    """
    Generates the contents of the composer's dataset, which is a concatenation
    of the chord progressions for a specific song
    """
    MAX_NUM_DOTS = 20
    
    # 1. Search for all files that end with rntxt
    metadatas = corpus.search(query=composer, fileExtensions="rntxt")
    num_files = len(metadatas)
    print("Number of rntxt files found:", num_files)
    
    file_contents = []
    
    num_parsed = 0
    for m in metadatas:
        # Format the score and append it to a list
        formatted_score = inv_format_score(m.parse())
        file_contents.append( formatted_score )
        
        # Print Progress
        num_parsed += 1
        num_dots = int(num_parsed / num_files * MAX_NUM_DOTS)
        ending = '\r' if num_parsed != num_files else '\n'
        print('[' + ('.' * num_dots) + (' ' * (MAX_NUM_DOTS - num_dots)) + ']', end=ending)
        
    return "\n".join(file_contents)

## (c) Create the File for One Composer <a class="anchor" name="method-2-part-c"></a>

In [12]:
def generate_inv_dataset (composer: str):
    """
    Given the name of the composer, it will search for files in the local
    corpus which have the file extension ".rntxt" and generates a list of
    formatted songs where data on each roman numeral is separated by a
    comma, and each song is separated by a new line character '\n'.
    
    With this string, it saves the formatted string to a text file 
    """
    
    print("Generating Dataset for:", composer)
    file_contents = inv_format_scores_for_composer(composer)
    
    # Doesn't generate the dataset if the contents are empty
    if len(file_contents) == '0':
        print('Dataset empty')
        return
    
    # Generates a file to save the file contents to
    file_name = "datasets/inv-dataset-" + composer + ".txt"
    f = open(file_name,"w+")
    f.write(file_contents)
    f.close()
    
    # Print the location
    print("Dataset written to:", file_name, end="\n\n")

## (d) Create Files for Several Composers <a class="anchor" name="method-2-part-d"></a>

In [13]:
generate_inv_dataset('monteverdi')
generate_inv_dataset('bach')

Generating Dataset for: monteverdi
Number of rntxt files found: 46
[....................]
Dataset written to: inv-dataset-monteverdi.txt

Generating Dataset for: bach
Number of rntxt files found: 20
[....................]
Dataset written to: inv-dataset-bach.txt



In [14]:
#eof