<a href="https://colab.research.google.com/github/tuomaseerola/music_and_science_seminar/blob/master/structure_discovery.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Structure discovery

Examples of structural analysis.

_Tuomas Eerola, 2/3/2023_


In [None]:
# pip install librosa
# pip install matplotlib
from __future__ import print_function
import librosa
#import pandas as pd 
import numpy as np 
import matplotlib.pyplot as plt
import librosa.display
import os
import sklearn
import warnings
%matplotlib inline
import IPython.display as ipd
import soundfile as sf
import io
from six.moves.urllib.request import urlopen
warnings.filterwarnings('ignore')

### Select the piece and downsample for easier processing


In [None]:
url = "https://raw.githubusercontent.com/tuomaseerola/music_and_science_seminar/master/Vivaldi.wav"
#url = "https://raw.githubusercontent.com/tuomaseerola/music_and_science_seminar/master/medtner_op_8_no_1_shatskes.wav"
# from : https://www.medtner.org.uk/mp3/Medtner%20-%20Op%208%20no%201%20-%20Shatskes.mp3
y, sr = sf.read(io.BytesIO(urlopen(url).read()))
y = librosa.resample(y, orig_sr=sr, target_sr=22050); sr=22050 # downsampling if the file is long
print(sr)
plt.figure(figsize=(18, 4))
librosa.display.waveshow(y,sr)
ipd.display(ipd.Audio(data=y, rate=sr))

## Look at the spectrum first

In [None]:
hop_length = 1024
D = librosa.amplitude_to_db(np.abs(librosa.stft(y, hop_length=hop_length)),ref=np.max)
plt.rcParams['figure.figsize'] = (18,4)
librosa.display.specshow(D, y_axis='log', sr=sr, hop_length=hop_length,x_axis='time',cmap='jet')
plt.show()

## Chromagram representation
Let's collapse the spectrum into chromagram, which means that the energies across all pitch-classes (C, C#, D,...) are summed. This can be done in many ways, but Constant-Q Transform (CQT) is one of the most common transformations.


In [None]:
C = librosa.feature.chroma_cqt(y=y, sr=sr, n_chroma=12)
plt.figure(figsize=(18, 4))
librosa.display.specshow(C, y_axis='chroma',x_axis='time')
plt.title('CQT Chromagram')
plt.show()
ipd.display(ipd.Audio(data=y, rate=sr))

### Simple structure segmentation -- based on pitch-classes
Segmenting based on clustering algorithm based on `k` contiguous segments. Let's use pitch-class (chroma) information first and put in an arbitrary number of segment (8?).

In [None]:
chroma = librosa.feature.chroma_cqt(y=y, sr=sr)
bounds = librosa.segment.agglomerative(chroma, 8)
bound_times = librosa.frames_to_time(bounds, sr=sr)
print(bound_times)

Let's print the chromagram and the potential boundaries determined by the clustering.

In [None]:
import matplotlib.pyplot as plt
import matplotlib.transforms as mpt
fig, ax = plt.subplots()
trans = mpt.blended_transform_factory(
    ax.transData, ax.transAxes)
librosa.display.specshow(chroma, y_axis='chroma', x_axis='time', ax=ax)
ax.vlines(bound_times, 0, 1, color='lime', linestyle='--',
          linewidth=2, alpha=0.9, label='Segment boundaries',
          transform=trans)
ax.legend()
ax.set(title='Chromagram')
ipd.display(ipd.Audio(data=y, rate=sr))

## What about another representations -- MFCCs?
And _mel-frequency cepstrum coefficients_ are popular representation of audio signal that incorporates some perceptual processing, namely frequency is represented with mel scale. These are used in many applications, in speech recognition, genre recognition etc. 

In [None]:
mfccs = librosa.feature.mfcc(y, sr=sr)
mfccs = sklearn.preprocessing.scale(mfccs, axis=1)

bounds = librosa.segment.agglomerative(mfccs, 8)
bound_times = librosa.frames_to_time(bounds, sr=sr)

fig, ax = plt.subplots()
trans = mpt.blended_transform_factory(ax.transData, ax.transAxes)
librosa.display.specshow(mfccs, y_axis='mel', x_axis='time', ax=ax)
ax.vlines(bound_times, 0, 1, color='lime', linestyle='--',
          linewidth=2, alpha=0.9, label='Segment boundaries',
          transform=trans)
ax.legend()
ax.set(title='MFCCs')
ipd.display(ipd.Audio(data=y, rate=sr))

How does this correspond to your intuition about the segments?

## Another representations to calculate the segmenting from? -- Tonnetz 
Tonnez ("tone networks" in German) are a way to represent tonal centroids using 6-dimensions, popularised by neo-Riemannian music theorists.

In [None]:
tonnez = librosa.feature.tonnetz(y, sr=sr)

bounds = librosa.segment.agglomerative(tonnez, 8)
bound_times = librosa.frames_to_time(bounds, sr=sr)

fig, ax = plt.subplots()
trans = mpt.blended_transform_factory(ax.transData, ax.transAxes)
librosa.display.specshow(tonnez, y_axis='tonnetz', x_axis='time', ax=ax,cmap='Accent')
ax.vlines(bound_times, 0, 1, color='lime', linestyle='--',
          linewidth=3, alpha=0.9, label='Segment boundaries',
          transform=trans)
ax.set(title='Tonal centroids (Tonnetz)')
ipd.display(ipd.Audio(data=y, rate=sr))

Happy with the results?

## One more segmentation based on .... rhythm?
This time we analyse the rhythm distributions using so-called tempograms.

In [None]:
tempogram = librosa.feature.tempogram(y=y, sr=sr)

bounds = librosa.segment.agglomerative(tempogram, 8)
bound_times = librosa.frames_to_time(bounds, sr=sr)

fig, ax = plt.subplots()
trans = mpt.blended_transform_factory(ax.transData, ax.transAxes)
librosa.display.specshow(tempogram, y_axis='tempo', x_axis='time', ax=ax)
ax.vlines(bound_times, 0, 1, color='lime', linestyle='--',
          linewidth=3, alpha=0.9, label='Segment boundaries',
          transform=trans)
ax.set(title='Tempogram')
ipd.display(ipd.Audio(data=y, rate=sr))

There are more direct techniques such as recurrence matrix and other clever techniques (see [https://librosa.org/doc/latest/segment.html](https://librosa.org/doc/latest/segment.html) for list of techniques and examples.