### Harmonic distance calculation

In [3]:
import pandas as pd
from sklearn.metrics import pairwise_distances

#### Spotify mapping of key index to key

In [4]:
note_table = [ 'C', 'C#/Db', 'D', 'D#/Eb', 'E', 'F', 'F#/Gb', 'G', 'G#/Ab', 'A', 'A#/Bb', 'B' ]

#### Circle of fifths
According to harmonic progression, nearby keys are harmonic. Because it's a circle, 'C' is also next to 'F'.

In [5]:
fifth_table = [ 7 * i % 12 for i in range(12) ]
[(i, note_table[i]) for i in fifth_table]

[(0, 'C'),
 (7, 'G'),
 (2, 'D'),
 (9, 'A'),
 (4, 'E'),
 (11, 'B'),
 (6, 'F#/Gb'),
 (1, 'C#/Db'),
 (8, 'G#/Ab'),
 (3, 'D#/Eb'),
 (10, 'A#/Bb'),
 (5, 'F')]

#### Mapping of distance between keys to harmonic distance
That is if the distance in key index is 7, e.g. 'C' -> 'G', the distance is only 1 because it is just one fifth above. Or, the distance from 'D' to 'C' is 2 which also happens to be two fifths (downwards).

In [6]:
inv_fifth_table = [ (fifth_table.index(i)+5)%12 - 5 for i in range(12) ]
inv_fifth_table

[0, -5, 2, -3, 4, -1, 6, 1, -4, 3, -2, 5]

#### Distance-function for comparing keys and a distance table between all keys

In [7]:
def key_distance(key1: int, key2: int):
  return inv_fifth_table[(key2 - key1) % 12]
pd.DataFrame(
  [ [ key_distance(i, j) for j in range(12) ] for i in range(12) ],
  index = note_table,
  columns = note_table)

Unnamed: 0,C,C#/Db,D,D#/Eb,E,F,F#/Gb,G,G#/Ab,A,A#/Bb,B
C,0,-5,2,-3,4,-1,6,1,-4,3,-2,5
C#/Db,5,0,-5,2,-3,4,-1,6,1,-4,3,-2
D,-2,5,0,-5,2,-3,4,-1,6,1,-4,3
D#/Eb,3,-2,5,0,-5,2,-3,4,-1,6,1,-4
E,-4,3,-2,5,0,-5,2,-3,4,-1,6,1
F,1,-4,3,-2,5,0,-5,2,-3,4,-1,6
F#/Gb,6,1,-4,3,-2,5,0,-5,2,-3,4,-1
G,-1,6,1,-4,3,-2,5,0,-5,2,-3,4
G#/Ab,4,-1,6,1,-4,3,-2,5,0,-5,2,-3
A,-3,4,-1,6,1,-4,3,-2,5,0,-5,2


#### Metric that calculates euclidean distance but aware of the cyclic key distance:

In [8]:
def key_aware_metric(X, Y, key_index=None):
  if key_index == None:
    return (sum((X - Y)**2))**.5
  
  ordinary_columns = [ i for i in range(len(X)) if i != key_index ]
  sum2 = sum((X[ordinary_columns] - Y[ordinary_columns])**2)
  sum2 += key_distance(int(X[key_index]), int(Y[key_index]))**2
  return sum2**.5

#### And a variant of `pairwise_distances` that recognizes the `key` column and applies the `key_aware_metric`:

In [9]:
def key_aware_pairwise_distances(df):
  try:
    key_index = df.columns.values.tolist().index('key')
  except:
    key_index = None
  return pairwise_distances(df, metric=key_aware_metric, key_index=key_index)

#### Example:

In [22]:
df = (
  pd.DataFrame([
      [ f'Song in {key}', note_table.index(key) ] 
      for key in ['C', 'C#/Db', 'B', 'G']
    ],
    columns=['song', 'key'])
  .set_index('song')
)
df

Unnamed: 0_level_0,key
song,Unnamed: 1_level_1
Song in C,0
Song in C#/Db,1
Song in B,11
Song in G,7


Notice how 'C' and 'C#' are considered close while 'G' is considered far away. Also, 'B' and 'C' are not considered close because wrapping the scale is not implemented:

In [23]:
pd.DataFrame(
  pairwise_distances(df),
  index=df.index,
  columns=df.index)  

song,Song in C,Song in C#/Db,Song in B,Song in G
song,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Song in C,0.0,1.0,11.0,7.0
Song in C#/Db,1.0,0.0,10.0,6.0
Song in B,11.0,10.0,0.0,4.0
Song in G,7.0,6.0,4.0,0.0


In contrast, notice now, how 'C' and 'C#' are now considered harmonically far apart while 'C' and 'G' are close. And 'C' and 'B' is now equally far apart as 'C' and 'C#' which are both a single half-tone apart, i.e. scale wrapping works:

In [24]:
pd.DataFrame(
  key_aware_pairwise_distances(df),
  index=df.index,
  columns=df.index)  

song,Song in C,Song in C#/Db,Song in B,Song in G
song,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Song in C,0.0,5.0,5.0,1.0
Song in C#/Db,5.0,0.0,2.0,6.0
Song in B,5.0,2.0,0.0,4.0
Song in G,1.0,6.0,4.0,0.0
