<a href="https://colab.research.google.com/github/tuomaseerola/music21/blob/master/corpus_demo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Music21 Corpus Analysis
*Music and Science (Year 2 UG Module, February 2020*)

Tuomas Eerola, Durham University, UK. The colab idea is fantastic idea from Myke Cuthbert. Much of the demos are build aroud the examples in _music21_

**This is a companion to the first music21 demo in Music and Science Module.**


---





## 1 Build music21 environment in Colab

First we build a virtual machine that will be able run _music21_ in your browser.


### 1.1 Install Music21

In [0]:
!pip install --upgrade music21

In [0]:
opus133violin = opus133.getElementById('1st Violin') # just select the 1st violin part 
opus133violin.measures(1,8).show() # let's look at first 8 bars 

### 1.2 Install musescore to display scores


In [0]:
!add-apt-repository ppa:mscore-ubuntu/mscore-stable -y
!apt-get update
!apt-get install musescore


### 1.3 Modify the environment

In [0]:
!apt-get install xvfb
#
!sh -e /etc/init.d/x11-common start


### 1.4 Final touches...

In [0]:
import os
os.putenv('DISPLAY', ':99.0')

### 1.5 ....and you are off.

In [0]:
from music21 import *
us = environment.UserSettings()
us['musescoreDirectPNGPath'] = '/usr/bin/mscore'
us['directoryScratch'] = '/tmp'

### 1.6 Test that everything works

In [0]:
opus133 = corpus.parse('beethoven/opus133.mxl') # we "parse" one specific work from the corpus
opus133.measures(1, 4).show() # Show first 4 bars

## 2 Corpus analysis
Up to this point we have extracted some interesting musical properties from single pieces of music. Now it is time to apply the analysis to a corpus, a collection of pieces.

### 5.1 Corpus and metadata
Let's work with the build-in corpus of *Music21*. The system has neat architecture for searching, combining and loading all pieces with certain metadata. Also, it is typical that the pieces themselves contain different types of metadata.

In [0]:
# What other information besides the score do we have about a piece of music?
opus3no1 = corpus.parse('corelli/opus3no1/1grave') # Get one Corelli Sonata
print(opus3no1.metadata.all()) # Show metadata

partStream = opus3no1.parts.stream() # Show the list of instruments
for p in partStream:
    print(p.id)

opus3no1.measures(1,4).show()        # Plot first 4 bars

In [0]:
# OK, let's select a corpus of all works composed by Giovanni Palestrina
corpus1 = corpus.search(composer='palestrina')
print(corpus1)

# Let's select works within the corpus titled 'Kyrie'
corpus2 = corpus1.search(title='Kyrie')
print(corpus2)

# What about even more specific, compositions that have the word "Papae" in it in Palestrina's Kyrie corpus?
corpus3 = corpus2.search(parentTitle='Papae')
print(corpus3)

s=corpus.parse(corpus3[0]) # is this the famous Missa Papae Marcelli?
s.measures(1,7).show()


In [0]:
# You could display all full titles of Palestrina's Kyries:
for work in corpus2:
    score = corpus.parse(work)
    print(score.metadata.parentTitle)


### 5.2 Corpus analysis of keys

Were Bach chorales written in specific keys? Perhaps the keys with only one or two sharps and flats were regularly utilised since they are easier to perform and notate? Are there more chorales in major mode than in minor? Let's look at the key distribution across chorales.

In [0]:
chorales = corpus.search(composer='J.S. Bach',numberOfParts=4)
print(chorales)


In [0]:
from music21 import*
import matplotlib.pyplot as plt # Load some extra plotting libraries

chorales = corpus.search(composer='J.S. Bach',numberOfParts=4)

dict = {}
dict2 = {}
counter=1; maxlen = len(chorales)
for chorale in chorales:
   print('Analysing', counter,'/',maxlen, chorale.metadata.title,'...')
   score = corpus.parse(chorale)
   key = score.analyze('key').tonicPitchNameWithCase
   key2 = score.analyze('key').mode
   dict[key] = dict[key] + 1 if key in dict.keys() else 1
   dict2[key2] = dict2[key2] + 1 if key2 in dict2.keys() else 1
   counter +=1


In [0]:
# Plot the results
ind = [i for i in range(len(dict))]
fig, ax = plt.subplots()
ax.bar(ind, dict.values())
ax.set_title('Frequency of Each Key')
ax.set_ylabel('Frequency')
plt.xticks(ind, dict.keys(), rotation='horizontal',size=12)
plt.show()

print(dict2) # print the frequency of major and minor keys


<p style="border:3px; border-style:solid; border-color:#335EFF; padding: 1em;">
<span style="color:blue"><B>LEARNING TASK</B>: How do you interpret the key data of chorales? Note that within Bach chorales, there are also some modal chorales and sometimes the definition of the key is ambiguous.</span>
</p>

### Clarity of key (optional)
In the example above, we analysed the most likely in each chorael. There are some chorales where they key is ambiguous, which can be explored by obtaining the tonalCertainty measure, which underlies the key analysis.

In [0]:
chorales = corpus.search(composer='J.S. Bach',numberOfParts=4)
print(chorales)

c=[]
title=[]
counter=0; maxlen = len(chorales)
for chorale in chorales:
   print('Analysing', counter,'/',maxlen, chorale.metadata.title,'...')
   score = corpus.parse(chorale)
   tc = score.analyze('key')
   c.append(tc.correlationCoefficient) # get the correlation to the highest key
   title.append(score.metadata.title)
   counter +=1


In [0]:
# Let's see what the correlations look like

fig, ax = plt.subplots()
ax.plot(c, 'bo:',markersize=5,markerfacecolor='r')
fig.set_size_inches(20, 10)
ax.set_ylabel('Correlation coefficient',size=15)
ax.set_xlabel('Nro in corpus',size=15)
plt.show()


In [0]:
# There are few chorales where the correlations are lower than the other chorales, say under 0.85. 
# Let's look at those chorales

ambiguous = [ n for n,i in enumerate(c) if i < 0.85 ] # get indices of tonally ambiguous chorales.
print('Ambiguous keys can be found in:',ambiguous)

# let's look at one of these
score = corpus.parse(chorales[ambiguous[3]])
tc = score.analyze('key')
print(score.metadata.title,':',tc.correlationCoefficient)
score.show()

# This is G minor Dorian

### 5.2 Corpus analysis of vocal range

Are the basses expected to sing over a larger range than tenors? Has the vocal range tended to be the same for SATB works over the centuries? Of course we do not always know what pitch was the score originally mapped onto but at least the vocal ranges should be comparable between in soprano, alto, tenor and bass voices.

Let's explore the vocal ranges.

In [0]:
import statistics
# Start with Bach chorales
chorales = corpus.search(composer='J.S. Bach',numberOfParts=4)

soprano_range = []
alto_range = []
tenor_range = []
bass_range = []
for chorale in chorales:                                          # Loop across chorales
    s = corpus.parse(chorale)
    for el in s.recurse().parts:                                  # Loop across the parts
        #print(el.offset, el, el.analyze('range').semitones)
        #print(el.partName)
        if 'Soprano' in el.partName:
            soprano_range.append(el.analyze('range').semitones)   # Calculate range if the part is soprano
        if 'Alto' in el.partName:
            alto_range.append(el.analyze('range').semitones)
        if 'Tenor' in el.partName:
            tenor_range.append(el.analyze('range').semitones)
        if 'Bass' in el.partName:
            bass_range.append(el.analyze('range').semitones)
# Summarise the results
print('Soprano', round(statistics.mean(soprano_range),2))
print('Alto', round(statistics.mean(alto_range),2))
print('Tenor', round(statistics.mean(tenor_range),2))
print('Bass', round(statistics.mean(bass_range),2))

<p style="border:3px; border-style:solid; border-color:#335EFF; padding: 1em;">
<span style="color:blue"><B>LEARNING TASK</B>: Which part had the largest range and why? Is this related to the Bach chorales or would similar results be evident in another, polyphonic vocal corpus? You could try Monteverdi (replace 'J.S. Bach' with 'Monteverdi', change the 'NumberOfParts' to 6, and also replace 'Soprano' with 'Canto' and 'Bass' with 'Basso'.
</span>
</p>

## 6 Corpus search

Sometimes the useful approach is not to summarise the entire collection of music in terms of a specific feature but to search for a musical excerpt. Let's search the corpus for a theme that we have in mind. First we will select a suitable corpus of music and then search for a theme with or without the rhythms.

In [0]:
from music21 import *
## Select all Bach Chorales
#chorales = corpus.search('bach', fileExtensions='xml')
chorales = corpus.search(composer='J.S. Bach')
#print(chorales)            # shows how many pieces there are in the corpus
chorales2 = corpus.search('bwv364') # Let's get the Dorian chorale again
bwv364=corpus.parse(chorales2[0]) # is this the famous Missa Papae Marcelli?
bwv364.measures(0, 4).show() # display the notation

### 6.1 Theme search (pitch only)

In [0]:
# define a theme to search. Let's first try a simple theme (G F E) without considering the rhythm.
searchList = [note.Note('G'), note.Note('F'), note.Note('E')] # define a search pattern G-F-E
s = bwv364.recurse().notes                 # prepares the piece for the search
p = search.noteNameSearch(s, searchList) # executes the search

# show were the exact matches were
for notePosition in p:
    startingNote = s[notePosition]
    startingMeasure = startingNote.measureNumber
    startingBeat = startingNote.beat
    startingPart = startingNote.getContextByClass('Part')
    print(startingNote, startingMeasure, startingBeat, startingPart)
# This report below shows in which voice, in which bar and which beat, does the theme occur: 

### 6.2 Theme search (pitch and rhythm)
The search above was simple and unrealistic. Let's search for a real theme with note durations
Let's find the theme from "Vom Himmel hoch, da komm ich her" ("From Heaven Above to Earth I Come"), which was supposedly composed by Luther in 1539. The theme has been used in Bach's Christmas oratorio, but which chorale does it come from? Here we want to preserve the rhythm as well.

In [0]:
# Define the theme "From Heaven Above to Earth I Come")
theme = converter.parse("tinynotation: 4/4 g4_From f#_hea- e_ven f#_above d_to e_earth f#_I g_come")
theme.show()

In [0]:
# Here we want to preserve the approximate rhythm, but I will make all notes equally long (crotchets).
searchStream2 = stream.Stream([key.KeySignature(1),
                               note.Note('G4', type='quarter'),
                               note.Note('F#4', type='quarter'),
                               note.Note('E4', type='quarter'),
                               note.Note('F#4', type='quarter'),
                               note.Note('D4', type='quarter'),
                               note.Note('E4', type='quarter'),
                               note.Note('F#4', type='quarter'),
                               note.Note('G4', type='quarter')])

target1=[]
target2=[]
import time
t = time.time()
for i in range(100): # loop through the first 100 chorales ...
    tmp = chorales[i].parse()
    s = tmp.recurse().notes
    for unused in range(12): # loop through different transpositions (up to 12 semitones)
        s2=searchStream2.transpose(unused)
        entryPoints = search.noteNameRhythmicSearch(s, s2.notes)
        len1=len(entryPoints)
        target1.append(len1)
    len2=sum(target1)
    target2.append(len2)
    #print(i,target2[i])
    target1=[]
elapsed = time.time() - t
print('Done! This search took',round(elapsed,1),'seconds')

In [0]:
# Display results
hits=[i for i, x in enumerate(target2) if x]
print("These works contain the theme:",hits)

catalog = stream.Opus()

for i in range(0,len(hits)):
    tmp=chorales[hits[i]].parse()
    incipit = tmp.measures(0,3)
    catalog.insert(0, incipit.implode())
catalog.show() # Display the works that contain the theme

### 6.3 Search for the same theme in other collections

In [0]:
# Has the theme from "Vom Himmel hoch, da komm ich her" used earlier?

palestrina = corpus.search('palestrina')
print(palestrina)

# let's allow some rhythmic variations and remove the note durations from the search
searchStream3 = stream.Stream([key.KeySignature(1),
                               note.Note('G4'),
                               note.Note('F#4'),
                               note.Note('E4'),
                               note.Note('F#4'),
                               note.Note('D4'),
                               note.Note('E4'),
                               note.Note('F#4'),
                               note.Note('G4')])
target1=[]
target2=[]
import time
t = time.time()
for i in range(100):
    tmp = palestrina[i].parse()
    s = tmp.recurse().notes
    for unused in range(12): # unison to seventh
        s2=searchStream3.transpose(unused)
        entryPoints = search.noteNameSearch(s, s2.notes)
        len1=len(entryPoints)
        target1.append(len1)
    len2=sum(target1)
    target2.append(len2)
#    print(i,target2[i])
    target1=[]
print('Done! This search took',round(elapsed,1),'seconds')
    
hits=[i for i, x in enumerate(target2) if x]
print("These works contain the theme:",hits)


In [0]:
# display one example
print(hits)
tmp=palestrina[hits[3]].parse()        # We are looking at the fourth example in the list above
s = tmp.recurse().notes                 # prepares the piece for the search
p = search.noteNameSearch(s, searchStream3) # executes the search
target1=[]
target2=[]
for unused in range(12): # unison to seventh
    s2=searchStream3.transpose(unused)
    entryPoints = search.noteNameSearch(s, s2.notes)
    len1=len(entryPoints)
    target1.append(len1)
tr=[i for i, x in enumerate(target1) if x]

# show were the exact matches were
s2=searchStream3.transpose(tr[0])
p = search.noteNameSearch(s, s2.notes) # executes the search
for notePosition in p:
    startingNote = s[notePosition]
    startingMeasure = startingNote.measureNumber
    startingBeat = startingNote.beat
    startingPart = startingNote.getContextByClass('Part')
    print(startingNote, startingMeasure, startingBeat, startingPart)
tmp.measures(10, 12).show()


The soprano part from Palestrina's Agnus shows a rhythmic variation of the same patterns as our search theme. Is this a coincidence or an actual precursor of the theme? Well, we could calculate how commonly this pattern has been created in the past or how likely you would get the same pattern by scrambling music randomly, but these are advanced topics.   

## References

* Cuthbert, M. S., & Ariza, C. (2010). music21: A Toolkit for Computer-Aided Musicology and Symbolic Music Data. In J. Stephen Downie and Remco C. Veltkamp (Eds.). 11th International Society for Music Information Retrieval Conference (ISMIR 2010), August 9-13, 2010, Utrecht, Netherlands. pp. 637-642. [link](http://ismir2010.ismir.net/proceedings/ismir2010-108.pdf)

* Cuthbert, M. S., Ariza, C., & Friedland, L. (2011). Feature Extraction and Machine Learning on Symbolic Music using the music21 Toolkit. In 11th International Society for Music Information Retrieval Conference (ISMIR 2011) (pp. 387--392). [link](http://ismir2011.ismir.net/papers/PS3-6.pdf)
