![](..\MBIT_logo.png)

# Deej-A.I.

[Robert Dargavel Smith](mailto:teticio@gmail.com) - Advanced Machine Learning end of Masters project ([MBIT School](http://mbitschool.academy), Madrid, Spain)

### Motivation

There are a number of automatic DJ tools around, which cleverly match the tempo of one song with another and mix the beats. To be honest, I have always found that kind of DJ rather boring: the better they are technically, the more it sounds just like one interminable song. In my book, it's not about how you play, but <i>what</i> you play. I have collected many rare records over the years and done a bit of deejaying on the radio and in clubs. I can almost instantly tell whether I am going to like a song or not just by listening to it for a few seconds. Or, if a song is playing, one that would go well with it usually comes to mind immediately. I thought that artificial intelligence could be applied to this "music intuition" as a music recommendation system based on simply *listening* to a song (and, of course, having an encyclopaedic knowledge of music).

Some years ago, the iPod had a very cool feature called *Genius*, which created a playlist on-the-fly based on a few example songs. Apple decided to remove this functionality (although it is still available in iTunes), presumably in a move to persuade people to subscribe to their music streaming service. Of course, Spotify now offers this functionality but, personally, I find the recommendations that it makes to be, at best, music I already know and, at worst, rather commercial and uncreative. I have a large library of music and I miss having a simple way to say "keep playing songs like this" (especially when I am driving) and something to help me discover new music, even within my own collection. I spent some time looking for an alternative solution but didn't find anything.

### Implementation details

A common approach is to use music genres to classify music, but I find this to be too simplistic and constraining. Is *Roxanne* by The Police reggae, pop or rock? And what about all the constantly evolving subdivisions of electronic music? I felt it necessary to find a higher dimensional, more continuous description of music and one that did not require labeling each track (i.e., an unsupervised learning approach).

The first thing I did was to [scrape](Spewtify.ipynb) as many playlists from Spotify as possible. (Unfortunately, I had the idea to work on this after a [competition](https://labs.spotify.com/2018/05/30/introducing-the-million-playlist-dataset-and-recsys-challenge-2018/) to do something similar had already been closed, in which access to a million songs was granted.) The idea was that grouping by playlists would give some context or meaning to the individual songs - for example, "80s disco music" or "My favourite songs for the beach". People tend to make playlists of songs by similar artists, with a similar mood, style, genre or for a particular purpose (e.g., for a workout in the gym). Unfortunately, the Spotify API doesn't make it particularly easy to download playlists, so the method was rather crude: I searched for all the playlists with the letter 'a' in the name, the letter 'b', and so on, up to 'Z'. In this way, I managed to grab 240,000 playlists comprising 4 million unique songs. I deliberately excluded all playlists curated by Spotify as these were particularly commercial (I believe that artists can pay to feature in them).

Then, I created an embedding ("[Track2Vec](Track2Vec.ipynb)") of these songs using the Word2Vec algorithm by considering each song as a "word" and each playlist as a "sentence". (If you can believe me, I had the same idea independently of [these guys](https://spandan-madan.github.io/Spotify/).) I found 100 dimensions to be a good size. Given a particular song, the model was able to convincingly suggest Spotify songs by the same artist or similar, or from the same period and genre. As the number of unique songs was huge, I limited the "vocabulary" to those which appeared in at least 10 playlists, leaving me with 450,000 tracks.

One nice thing about the Spotify API is that it provides a URL for most songs, which allows you to download a 30 second sample as an MP3. I downloaded all of these MP3s and converted them to a [Mel Spectrogram](Get_spectrograms.ipynb) - a compact representation of each song, which supposedly reflects how the human ear responds to sound. In the same way as a human being can think of related music just by listening to a few seconds of a song, I thought that a window of just 5 seconds would be enough to capture the gist of a song. Even with such a limited representation, the zipped size of all the spectrograms came to 4.5 gigabytes!

The next step was to try to use the information gleaned from Spotify to extract features from the spectrograms in order to meaningfully relate them to each other. I trained a convolutional neural network to reproduce as closely as possible (in cosine proximity) the Track2Vec vector (output $y$) corresponding to a given spectrogram (input $x$). I tried both [one dimensional](Speccy_1D.ipynb) (in the time axis) and [two dimensional](Speccy_2D.ipynb) convolutional networks and compared the results to a baseline model. The baseline model tried to come up with the closest Track2Vec vector without actually listening to the music. This lead to a song that, in theory, everybody should either like (or hate) a little bit ;-) ([SBTRKT - Sanctuary](https://p.scdn.co/mp3-preview/5ac546c1bcbb1d0a6dbeced979dc95361ffc2530?cid=194086cb37be48ebb45b9ba4ce4c5936)), with a cosine proximity of 0.52. The best score I was able to obtain with the validation data before overfitting set in was 0.70. With a 300-dimensional embedding, the validation score was better, but so was that of the baseline: I felt it was more important to have a lower baseline score and a bigger difference between the two, reflecting a latent representation with more diversity and capacity for discrimination. The score, of course, is still very low, but it is not really reasonable to expect that a spectrogram can capture the similarities between songs that human beings group together based on cultural and historical factors. Also, some songs were quite badly represented by the 5 second window (for example, in the case of "Don't stop me now" by Queen, this section corresponded to Brian May's guitar solo...). I played around with an [Auto-Encoder](Speccy_AE.ipynb) and a [Variational Auto-Encoder](Speccy_VAE.ipynb) in the hope of forcing the internal latent representation of the spectrograms to be more continuous, disentangled and therefore meaningful. The initial results appeared to indicate that a two dimensional convolutional network is better at capturing the information contained in the spectrograms. I also considered training a Siamese network to directly compare two spectrograms. I've left these ideas for possible future research.

Finally, with a library of MP3 files, I mapped each MP3 to a series of Track2Vec vectors for each 5 second time slice. Most songs vary significantly from beginning to end and so the slice by slice recommendations are all over the place. In the same way as we can apply a Doc2Vec model to compare similar documents, I calculated a "Mp3ToVec" vector for each mp3, including each constituent Track2Vec vector according to its *TF-IDF* (Term Frequency, Inverse Document Frequency) weight. This scheme gives more importance to recommendations which are frequent *and* specific to a particular song. As this is an $O(n^2)$ algorithm, it was necessary to break the library of MP3s into batches of 100 (my 8,000 MP3s would have taken 10 days to process otherwise!). I checked that this had a negligible impact on the calculated vectors.

### Results

You can see the some of the results at the end of this [workbook](Deej-A.I.ipynb#Some-example-playlists) and judge them for yourself. It is particularly good at recognizing classical music, spoken word, hip-hop and electronic music. In fact, I was so surprised by how well it worked, that I started to wonder how much was due to the TF-IDF algorithm and how much was due to the neural network. So I created another base-line model using the neural network with randomly initialized weights to map the spectrograms to vectors. I found that this base-line model was good at spotting genres and structurally similar songs, but, when in doubt, would propose something totally inappropriate. In these cases, the trained neural net seemed to choose something that had a similar energy, mood or instrumentation. In many ways, this was exactly what I was looking for: a creative approach that transcended rigid genre boundaries. By playing around with the $\epsilon$ parameter which determines whether two vectors are the same or not, for the purposes of the TF-IDF algorithm, it is possible to find a good trade-off between the genre (global) and the "feel" (local) characteristics. I also compared the results to playlists generated by Genius in iTunes and, although it is very subjective, I felt that Genius was sticking to genres even if the songs didn't quite go together, and came up with less "inspired" choices. Perhaps a crowd sourced "Coca Cola" test is called for to be the final judge.

Certainly, given the limitations of data, computing power and time, I think that the results serve as a proof of concept.

### Applications

Apart from the original idea of an automatic (radio as opposed to club) DJ, there are several other interesting things you can do. For example, as the vector mapping is continuous, you can easily create a playlist which smoothly "joins the dots" between one song and another, passing through as many waypoints as you like. For example, you could travel from soul to techno via funk and drum 'n' bass. Or from rock to opera :-).

Another simple idea is to listen to music using a microphone and to propose a set of next songs to play on the fly. Rather than comparing with the overall MP3ToVec, it might be more appropriate to just take into account the beginning of each song, so that the music segues more naturally from one track to another.

### Try it out for yourself

Once you have installed the required python packages with

```python
pip install -r requirements.txt
```

you can process your library of MP3s (and M4As). Simply run the following command and wait...


```python
python Mp3ToVec.py Pickles mp3tovec --scan c:/your_music_library
```

It will create a directory called "Pickles" and, within the subdirectory "mp3tovecs" a file called "mp3tovec". Once this has completed, you can try it out with

```python
python Deej-A.I.py Pickles mp3tovec
```

Then go to [http://localhost:8050](http://localhost:8050) in your browser.  If you add the parameter `--demo 5`, you don't have to wait until the end of each song. Finally, there are a couple of controls you can fiddle with (as it is currently programmed, these only take effect after the next song if one is already playing). "Keep on" determines the number of previous tracks to take into account in the generation of the playlist and "Drunk" specifies how much randomness to throw into the mix.

### Import libraries

In [2]:
import keras
from keras.models import load_model
import gensim
from gensim.models.callbacks import CallbackAny2Vec
import csv
import os
import time
import numpy as np
import scipy
import librosa
import pygame
import pickle
from tqdm import tqdm
import random
from subprocess import Popen
from mutagen.id3 import ID3
from mutagen.mp4 import MP4
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import subprocess
%pylab inline

Using TensorFlow backend.


pygame 1.9.6
Hello from the pygame community. https://www.pygame.org/contribute.html
Populating the interactive namespace from numpy and matplotlib


`%matplotlib` prevents importing * from pylab and numpy
  "\n`%matplotlib` prevents importing * from pylab and numpy"


### Load the spectrogram to embedding model and Track2Vec embedding

In [3]:
model = load_model('../speccy_model')

Instructions for updating:
Colocations handled automatically by placer.
Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.


In [4]:
play_songs        = False
verbose           = False
show_spectrograms = False
mp3_directory     = 'H:/Music' # directory to find MP3s
dump_directory    = '../Pickles' # directory to store pickled results
sr                = 22050
n_fft             = 2048
hop_length        = 512
n_mels            = model.layers[0].input_shape[1]
slice_size        = model.layers[0].input_shape[2]
slice_time        = slice_size * hop_length / sr

In [2]:
class logger(CallbackAny2Vec):
    None
    
embedding_model = gensim.models.Word2Vec.load('word2vec.model')
embedding = embedding_model.wv.syn0

  


In [3]:
tracks = {}
with open('tracks.csv', "r", encoding='utf-8') as csvfile:
    spamreader = csv.reader(csvfile, delimiter=';')
    for row in spamreader:
        columns = str(row)[2:-2].split(';')
        tracks[columns[0]] = [columns[1] + ' - ' +
                              columns[2], columns[3]] # title - artist, url

### Map MP3s to a series of Track2Vec vectors by slice

In [25]:
def walkmp3s(folder):
    for dirpath, dirs, files in os.walk(folder, topdown=False):
        for filename in files:
            if filename[-3:].lower() == 'mp3' or filename[-3:].lower() == 'm4a':
                yield filename, os.path.abspath(os.path.join(dirpath, filename))

num_files = 0
for filename, full_path in walkmp3s(mp3_directory):
    num_files += 1
try:
    with tqdm(walkmp3s(mp3_directory), total=num_files, unit="file") as t:
        for filename, full_path in t:
            if verbose:
                print(filename)
            pickle_filename = (full_path[:-3]).replace('\\', '_').replace('/', '_').replace(':','_') + 'p'
            if pickle_filename in os.listdir(dump_directory):
                continue
            if play_songs:
                yn = str(input('Skip (y/n)?'))
                if yn == 'y' or yn == 'Y':
                    continue
            try:
                y, sr = librosa.load(full_path, mono=True)
            except:
                print(f'Skipping {full_path}')
                continue
            if y.shape[0] < slice_size:
                print(f'Skipping {full_path}')
                continue
            S = librosa.feature.melspectrogram(y=y, sr=sr, n_fft=n_fft, hop_length=hop_length, n_mels=n_mels, fmax=sr/2)
            if play_songs and filename[3:].lower() == 'mp3':
                pygame.mixer.init(sr)
                pygame.mixer.music.load(full_path)
                pygame.mixer.music.play()
            track_matrix = []
            start = time.time()
            slice = 0
            while (slice + 1) * slice_size < S.shape[1]:
                log_S = librosa.power_to_db(S[:, slice * slice_size : (slice+1) * slice_size], ref=np.max)
                if np.max(log_S) - np.min(log_S) != 0:
                    log_S = (log_S - np.min(log_S)) / (np.max(log_S) - np.min(log_S))
                x = np.expand_dims(np.expand_dims(log_S, axis=2), axis=0)
                y_pred = model.predict(x)
                track_matrix.append(y_pred[0])
                most_similar = embedding_model.wv.most_similar(positive=[y_pred[0]], topn=1)
                if verbose:
                    print(f'{slice_time*slice//60:.0f}m{slice_time*slice%60:02.0f}s : {most_similar[0][0]} : \
                            {tracks[most_similar[0][0]][0]} [{most_similar[0][1]:.2f}] : {tracks[most_similar[0][0]][1]}')
                if show_spectrograms:
                    imgplot = plt.imshow(log_S)
                    plt.show()
                slice += 1
                if play_songs:
                    while time.time() - start < slice_time * slice:
                        pass
            if verbose:
                print()
            pickle.dump((full_path, np.array(track_matrix)), open(dump_directory + '/' + pickle_filename, 'wb'))
except KeyboardInterrupt:
    t.close() # stop the progress bar from sprawling all over the place after a keyboard interrupt
    pygame.mixer.quit() # stop the music from playing
    raise
t.close()
pygame.mixer.quit()

Populating the interactive namespace from numpy and matplotlib


 15%|█▌        | 1165/7716 [00:13<01:13, 88.97file/s]

Skipping H:\Music\Compilations\Desert Blues_ Ambiances Du Sahara [Disc\1-09 Ere Mela Mela_Meche Neu.m4a
Skipping H:\Music\Compilations\Desert Blues_ Ambiances Du Sahara [Disc\1-10 Duniya.m4a


 15%|█▌        | 1174/7716 [00:13<01:31, 71.55file/s]

Skipping H:\Music\Compilations\Desert Blues_ Ambiances Du Sahara [Disc\1-11 Tono.m4a
Skipping H:\Music\Compilations\Desert Blues_ Ambiances Du Sahara [Disc\1-12 Agne Anko.m4a


 56%|█████▌    | 4285/7716 [00:48<00:38, 89.03file/s]

Skipping H:\Music\Luiz Carlos Vinhas\O Som Psicodélico De L.C.V_\06 Song To My Father.mp3


 58%|█████▊    | 4512/7716 [1:02:01<11:09:23, 12.54s/file]

Skipping H:\Music\Meirelles E Os Copa 5\O Som\01 Quintessencia.mp3
Skipping H:\Music\Meirelles E Os Copa 5\O Som\02 Solitude.mp3


 59%|█████▊    | 4514/7716 [1:02:01<7:49:39,  8.80s/file] 

Skipping H:\Music\Meirelles E Os Copa 5\O Som\03 Blue Bottle's.mp3
Skipping H:\Music\Meirelles E Os Copa 5\O Som\04 Nordeste.mp3


 59%|█████▊    | 4516/7716 [1:02:01<5:29:45,  6.18s/file]

Skipping H:\Music\Meirelles E Os Copa 5\O Som\05 Contemplacao.mp3
Skipping H:\Music\Meirelles E Os Copa 5\O Som\06 Tania.mp3


 60%|█████▉    | 4595/7716 [1:20:18<8:36:10,  9.92s/file] 

Skipping H:\Music\Milton Banana Trio\Milton Banana Trio\09 Take it Easy my Brother Charles.mp3


100%|██████████| 7716/7716 [18:23:47<00:00, 17.31s/file]


### Load MP3 data from pickle dumps created in previous step (or previously)

In [6]:
dropout = 1000/7716 # only process a selection of MP3s
#random.seed(789123)
mp3s = {}
for filename in os.listdir(dump_directory):
    if random.uniform(0, 1) > dropout or filename == 'idfs.p' or filename == 'mp3tovec.p':
        continue
    unpickled = pickle.load(open(dump_directory + '/' + filename, 'rb'))
    mp3s[unpickled[0]] = unpickled[1]
print(f'{len(mp3s)} MP3s')

959 MP3s


### Map each MP3 to a new vector space of "MP3ToVec" (Doc2Vec) vectors
If we now think of an MP3 as a "document" containing a bunch of "words" of Track2Vec (Word2Vec) vectors, we can try to come up with a suitable <i>MP3ToVec</i> (Doc2Vec) vector space. The complication here is that we can't use the original labels, as these correspond to Spotify tracks which may or may not be in our music library. Instead, we consider any two tracks within a small distance $\epsilon$ in the Track2Vec vector space to be identical. Then we define the "MP3ToVec" vector to be a weighted sum of the Track2Vec vectors that it comprises. The weighting we use is TF-IDF, where tf is the term frequency and idf the inverse document frequency.

$$\mathrm{tfidf}(t,d,D) = \mathrm{tf}(t,d) \cdot \mathrm{idf}(t,D) \\
\mathrm{tf}(t,d) = f_{t, d} \\
\mathrm{idf}(t, D) =  \log \frac{N}{|\{d \in D: t \in d\}|}$$

where $f_{t, d}$ is the number of times that the track (word) $t$ appears in MP3 (document) $d$, $D$ is the set or "corpus" of all MP3s (all documents) and $N$ is the total number of MP3s (documents) in $D$.

This is not exactly equivalent to the standard TF-IDF method in the following sense. When several track (word) vectors are within a distance of $\epsilon$, they are <i>each</i> counted as repetitions of the same track (word). This means that $\sum_{t \in T} \mathrm{tf}(t,d) = \sum_{t \in T} f_{t, d}$, where $T$ is the set of all tracks (words) in the "vocabulary", can be greater than the total number of tracks (words) in a given MP3 (document) $d$. This also applies to the term $\sum_{d \in D} f_{t, d}$ in the definition of $\mathrm{idf}(t,D)$ which, when summed up over all the tracks (words) $t$ in $T$, can be greater than the total number of tracks (words) in all the MP3s (documents) in the corpus $D$. As a result, the weightings depend somewhat on the setting of $\epsilon$. However, a similar effect would be obtained with the standard TF-IDF method if we were to consider hypenated words as separate words (that did not otherwise appear in isolation).

In [7]:
mp3_vecs = []
mp3_indices = {}
for mp3 in mp3s:
    mp3_indices[mp3] = []
    for mp3_vec in mp3s[mp3]:
        mp3_indices[mp3].append(len(mp3_vecs))
        mp3_vecs.append(mp3_vec / np.linalg.norm(mp3_vec)) # normalize
num_mp3_vecs = len(mp3_vecs)

In [8]:
# this takes up a lot of memory
cos_distances = np.ndarray((num_mp3_vecs, num_mp3_vecs), dtype=np.float16)

In [9]:
# this needs speeding up
try:
    with tqdm(mp3_vecs, unit="vector") as t:
        for i, mp3_vec_i in enumerate(t):
            for j , mp3_vec_j in enumerate(mp3_vecs):
                if i > j:
                    cos_distances[i, j] = cos_distances[j, i] # I've been here before
                elif i < j:
                    cos_distances[i, j] = 1 - np.dot(mp3_vec_i, mp3_vec_j)
                else:
                    cos_distances[i, j] = 0 # i == j
except KeyboardInterrupt:
    t.close() # stop the progress bar from sprawling all over the place after a keyboard interrupt
    raise
t.close()

100%|██████████| 77006/77006 [3:54:31<00:00,  5.47vector/s]


In [10]:
epsilon_distance = 0.01 # should be small, but not too small
idfs = []
#if os.path.isfile(dump_directory + '/mp3tovecs/idfs.p'):
#    idfs = pickle.load(open(dump_directory + '/mp3tovecs/idfs.p', 'rb'))
try:
    with tqdm(range(len(idfs), num_mp3_vecs), unit="vector") as t:
        for i in t:
            idf = 0
            for mp3 in mp3s:
                for j in mp3_indices[mp3]:
                    if cos_distances[i, j] < epsilon_distance:
                        idf += 1 
                        break
            idfs.append(-np.log(idf / len(mp3s)))
#            if i % 100 == 99:
#                pickle.dump(idfs, open(dump_directory + '/mp3tovecs/idfs.p', 'wb'))
except KeyboardInterrupt:
    t.close() # stop the progress bar from sprawling all over the place after a keyboard interrupt
    raise
t.close()
#pickle.dump(idfs, open(dump_directory + '/mp3tovecs/idfs.p', 'wb'))

100%|██████████| 77006/77006 [2:38:14<00:00,  7.74vector/s]


In [11]:
mp3tovec = {}
#if os.path.isfile(dump_directory + '/mp3tovecs/mp3tovec.p'):
#    mp3tovec = pickle.load(open(dump_directory + '/mp3tovecs/mp3tovec.p', 'rb'))
try:
    with tqdm(mp3s, unit="mp3") as t:
        for mp3 in t:
#            if mp3 in mp3tovec:
#                continue
            vec = 0
            for i in mp3_indices[mp3]:
                tf = 0
                for j in mp3_indices[mp3]:
                    if cos_distances[i, j] < epsilon_distance:
                        tf += 1
                vec += mp3_vecs[i] * tf * idfs[i]
                mp3tovec[mp3] = vec
#            pickle.dump(mp3tovec, open(dump_directory + '/mp3tovecs/mp3tovec.p', 'wb'))
except KeyboardInterrupt:
    t.close() # stop the progress bar from sprawling all over the place after a keyboard interrupt
    raise
t.close()
pickle.dump(mp3tovec, open(dump_directory + '/mp3tovecs/mp3tovec.p', 'wb'))

100%|██████████| 959/959 [01:28<00:00, 10.87mp3/s]


In [5]:
def most_similar(positive=[], negative=[], topn=5):
    if isinstance(positive, str):
        positive = [positive] # broadcast to list
    if isinstance(negative, str):
        negative = [negative] # broadcast to list
    mp3_vec_i = np.sum([mp3tovec[i] for i in positive] + [-mp3tovec[i] for i in negative], axis=0)
    similar = []
    for track_j in mp3tovec:
        if track_j in positive or track_j in negative:
            continue
        mp3_vec_j = mp3tovec[track_j]
        cos_proximity = np.dot(mp3_vec_i, mp3_vec_j) / (np.linalg.norm(mp3_vec_i) * np.linalg.norm(mp3_vec_j))
        similar.append((track_j, cos_proximity))
    return sorted(similar, key=lambda x:-x[1])[:topn]

def make_playlist(seed_tracks, size=10, lookback=1):
    playlist = seed_tracks
    while len(playlist) < size:
        candidates = most_similar(positive=playlist[-lookback:], topn=5)
        for i in range(5):
            if not candidates[i][0] in playlist:
                break
        playlist.append(candidates[i][0])
    return playlist

def get_track_details(tracks):
    details = []
    for file in tracks:
        artist = track = album = None
        if file[-3:].lower() == 'm4a':
            audio = MP4(file)
            try:
                artist = audio['\xa9ART'][0]
            except:
                pass
            try:
                track = audio['\xa9nam'][0]
            except:
                pass
            try:
                album = audio['\xa9alb'][0]
            except:
                pass
        elif file[-3:].lower() == 'mp3':
            try:
                audio = ID3(file)
                try:
                    artist = audio['TPE1'].text[0]
                except:
                    pass
                try:
                    track = audio["TIT2"].text[0]
                except:
                    pass
                try:
                    album = audio["TALB"].text[0]
                except:
                    pass
            except:
                pass
        if (artist, track, album) == (None, None, None):
            artist = file
        details.append((artist, track, album))
    return details

def print_track_details(track_details):
    for i, track_detail in enumerate(track_details):
        if len(track_detail) == 1:
            print(f'{i+1}. {track_detail[0]}')
        else:
            print(f'{i+1}. {track_detail[0]} - {track_detail[1]} ({track_detail[2]})')

def play_playlist(playlist):
    try:
        for i, track in enumerate(playlist):
            with Popen(["C:\\Program Files (x86)\\Windows Media Player\\wmplayer.exe", track]) as p:
                start = time.time()
                # quick way to get duration of song in seconds
                args=("ffprobe", "-show_entries", "format=duration", "-i", track)
                popen = Popen(args, stdout = subprocess.PIPE)
                popen.wait()
                output = popen.stdout.read()
                duration = output.decode('utf-8')[19:]
                duration = float(duration[:duration.find('\r')])
                while time.time() - start < duration and p.poll() == None:
                    pass
                p.terminate()
    except KeyboardInterrupt:
        p.terminate()
        raise
    p.terminate()
    
def most_similar_by_vec(positive=[], negative=[], topn=5):
    if isinstance(positive, str):
        positive = [positive] # broadcast to list
    if isinstance(negative, str):
        negative = [negative] # broadcast to list
    mp3_vec_i = np.sum([i for i in positive] + [-i for i in negative], axis=0)
    similar = []
    for track_j in mp3tovec:
        mp3_vec_j = mp3tovec[track_j]
        cos_proximity = np.dot(mp3_vec_i, mp3_vec_j) / (np.linalg.norm(mp3_vec_i) * np.linalg.norm(mp3_vec_j))
        similar.append((track_j, cos_proximity))
    return sorted(similar, key=lambda x:-x[1])[:topn]

def join_the_dots(tracks, n=5): # create a musical journey between given track "waypoints"
    playlist = []
    start = tracks[0]
    start_vec = mp3tovec[start]
    for end in tracks[1:]:
        end_vec = mp3tovec[end]
        playlist.append(start)
        for i in range(n-1):
            candidates = most_similar_by_vec(positive=[(n-i+1)/n * start_vec + (i+1)/n * end_vec], topn=10)
            for j in range(10):
                if not candidates[j][0] in playlist and candidates[j][0] != start and candidates[j][0] != end:
                    break
            playlist.append(candidates[j][0])
        start = end
        start_vec = end_vec
    playlist.append(end)
    return playlist

In [6]:
mp3tovec = pickle.load(open(dump_directory + '/mp3tovecs/mp3tovec.p', 'rb'))

### Some example playlists taken from the same music library

In [20]:
print_track_details(get_track_details(make_playlist([
    "H:\\Music\\Alfred Brendel\\Schubert_ The Last 3 Piano Sonatas [Disc\\1-05 Schubert_ Piano Sonata #20 In A.m4a"
], size=20, lookback=3)))

1. Alfred Brendel - Schubert: Piano Sonata #20 In A, D 959 - 1. Allegro (Schubert: The Last 3 Piano Sonatas [Disc 1])
2. Alfred Brendel - Schubert: Piano Sonata #21 In B Flat, D 960 - 1. Molto Moderato (Schubert: The Last 3 Piano Sonatas [Disc 2])
3. Alfred Brendel - Schubert: Piano Sonata #20 In A, D 959 - 4. Rondo: Allegretto (Schubert: The Last 3 Piano Sonatas [Disc 1])
4. Alfred Brendel - Schubert: Klavierstück In E Flat, D 946/2 (Schubert: The Last 3 Piano Sonatas [Disc 2])
5. Alfred Brendel - Schubert: Piano Sonata #19 In C Minor, D 958 - 3. Menuetto: Allegro (Schubert: The Last 3 Piano Sonatas [Disc 1])
6. Alfred Brendel - Schubert: Piano Sonata #19 In C Minor, D 958 - 1. Allegro (Schubert: The Last 3 Piano Sonatas [Disc 1])
7. Alfred Brendel - Schubert: Klavierstück In C, D 946/3 (Schubert: The Last 3 Piano Sonatas [Disc 2])
8. Alfred Brendel - Schubert: Klavierstück In E Flat Minor, D 946/1 (Schubert: The Last 3 Piano Sonatas [Disc 2])
9. Joep Beving - The Gift (Prehension)
10

In [42]:
print_track_details(get_track_details(make_playlist([
    "H:\\Music\\Compilations\\Shapes_ Circles (Compiled by Robert Luis\\1-12 They Reminisce Over You (T.R.O..m4a"
], size=10, lookback=3)))

1. Quantic Y Su Conjunto Los Miticos Del Ritmo - They Reminisce Over You (T.R.O.Y) (Shapes: Circles (Compiled by Robert Luis))
2. Rosendo Martínez Y Su Orquesta - El Alegron (Cartagena!)
3. Hamid El Kasri - Chbakrou (Soirées gnawa neurasys remaster, vol. 11)
4. Galileo y Su Banda - Cali Pachanguero (Greatest Salsa Classics of Colombia, Vol. 1)
5. Elis Regina - Bala Com Bala (Sounds From the Verve Hi-Fi)
6. The Latin Brothers - Sobre las Olas (Greatest Salsa Classics of Colombia, Vol. 1)
7. Joe Arroyo - Las Cajas (Greatest Salsa Classics of Colombia, Vol. 1)
8. Fania All Stars Feat. Celia Cr - Bamboleo (Beginners Guide To World Music)
9. Fania All-Stars - Guasasa (Fania DJ Series: DJ Muro)
10. The Latin Brothers - Las Caleñas Son Como las Flores (Greatest Salsa Classics of Colombia, Vol. 1)


In [22]:
print_track_details(get_track_details(make_playlist([
    "H:\\Music\Luciano Pavarotti, Richard Bonynge_ Lond\\Pavarotti_ Greatest Hits [Disc 1]\\1-12 Verdi_ Rigoletto - La Donna E M.mp3"
], size=10, lookback=3)))

1. Luciano Pavarotti, Richard Bonynge; London Symphony Orchestra - Verdi: Rigoletto - La Donna E Mobile (Pavarotti: Greatest Hits [Disc 1])
2. Luciano Pavarotti; Bologna Community Theatre Orchestra - Valente: Passione (Pavarotti: Greatest Hits [Disc 2])
3. Luciano Pavarotti; Bologna Community Theatre Orchestra - Cardillo: Core 'ngrato (Pavarotti: Greatest Hits [Disc 2])
4. Luciano Pavarotti, Herbert Von Karajan; Berlin Philharmonic Orchestra - Puccini: La Boheme - Che Gelida Manina (Pavarotti: Greatest Hits [Disc 1])
5. Luciano Pavarotti, Nicola Rescigno; National Philharmonic Orchestra - Puccini: Tosca - Recondita Armonia (Pavarotti: Greatest Hits [Disc 1])
6. Luciano Pavarotti, Leone Magiera; New Philharmonia Orchestra - Ponchielli: La Gioconda - Cielo E Mar! (Pavarotti: Greatest Hits [Disc 1])
7. Luciano Pavarotti, Leone Magiera; Vienna State Opera Orchestra - Bizet: Carmen - La Fleur Que Tu M'avais Jetee (Pavarotti: Greatest Hits [Disc 1])
8. Luciano Pavarotti, Richard Bonynge; New

In [23]:
print_track_details(get_track_details(make_playlist([
    "H:\\Music\\Roni Size\\Breakbeat Era - Ultra Obscene\\Terrible Funk.mp3"
], size=10, lookback=3)))

1. Roni Size - Terrible Funk (Breakbeat Era - Ultra Obscene)
2. Paul SG - Not Forgotten (Bukem in Session)
3. DJ Fresh - X Project (Escape from Planet Monday)
4. LTJ Bukem - Listen (Producer 05)
5. Tayla - Bang The Drums (Producer 04)
6. Drumagick - Live @ Clublife, Radio 3 (Belgrade) (Drumagick Live At Radio 3, Belgrade)
7. Drumagick - Full Energy (Microsoft Windows Sponsored Songs)
8. jx3p - indian summer mix (Indian Summer Mix)
9. Talvin Singh - Butterfly (OK)
10. Grooverider - None (None)


In [24]:
print_track_details(get_track_details(make_playlist([
    "H:\\Music\\Banda sinfónica municipal de Madrid\\El pasodoble\\19 Paquito Chocolatero.m4a"
], size=10, lookback=3)))

1. Banda sinfónica municipal de Madrid - Paquito Chocolatero (El pasodoble)
2. Banda sinfónica municipal de Madrid - Puenteareas (El pasodoble)
3. Banda sinfónica municipal de Madrid - Amparito Roca (El pasodoble)
4. Banda sinfónica municipal de Madrid - Pepita Creus (El pasodoble)
5. Banda sinfónica municipal de Madrid - Gerona (El pasodoble)
6. Banda sinfónica municipal de Madrid - Liria (El pasodoble)
7. Banda sinfónica municipal de Madrid - Suspiros de España (El pasodoble)
8. Banda sinfónica municipal de Madrid - Viva el Rumbo (El pasodoble)
9. Banda sinfónica municipal de Madrid - Gerona (El pasodoble)
10. Banda sinfónica municipal de Madrid - Liria (El pasodoble)


In [28]:
print_track_details(get_track_details(make_playlist([
    "H:\\Music\\Pete Rock & C.L. Smooth\\Mecca & The Soul Brother\\06 Straighten It Out.mp3"
], size=10, lookback=3)))

1. Pete Rock and C.L. Smooth - Straighten It Out (Mecca & The Soul Brother)
2. Gamma & Defesis - Slang Teacher (Sound01: A Big Dada Sampler)
3. Gamma & Defesis - Slang Teacher (Well Deep: Ten Years of Big Dada)
4. Galliano - Stoned Again (In Pursuit Of The 13th Note)
5. H:\Music\Unknown Artist\Unknown Album\Ud14Wv1BtQrT.128.mp3 - None (None)
6. Red Cloud & Digital Hemp - Afro Latin Concrete (Desert Island Mix Part 2)
7. A Tribe Called Quest - Footprints (People's Instinctive Travels and the Paths of Rhythm)
8. Cypress Hill - How I Could Just Kill A Man (Cypress Hill)
9. Cypress Hill - How I Could Just Kill a Man (Original Album Classics: Cypress Hill)
10. Cypress Hill - Break It Up (Original Album Classics: Cypress Hill)


In [29]:
print_track_details(get_track_details(make_playlist([
    "H:\\Music\\Estrella Fernández\\Unknown Album\\Estiramientos de pie + meditación se.mp3"
], size=10, lookback=3)))

1. Estrella Fernández - Estiramientos de pie + meditación sentados (respiración y cuerpo) (None)
2. Estrella - Body Scan (None)
3. Estrella Fernández - Movimientos tumbados (MBCT)
4. H:\Music\Unknown Artist\Unknown Album\MBSR-Session06-11-audio.mp3 - None (None)
5. H:\Music\Unknown Artist\Unknown Album\MBSR-Session01-04-audio.mp3 - None (None)
6. H:\Music\Unknown Artist\Unknown Album\MBSR-Session03-03-audio.mp3 - None (None)
7. Carola García Díaz - Yoga de pie 40´ (Mindfulness)
8. H:\Music\Unknown Artist\Unknown Album\MBSR-Session06-09-audio.mp3 - None (None)
9. H:\Music\Unknown Artist\Unknown Album\MBSR-Session06-14-audio.mp3 - None (None)
10. H:\Music\Unknown Artist\Unknown Album\MBSR-Session08-03-audio.mp3 - None (None)


In [30]:
print_track_details(get_track_details(make_playlist([
    "H:\\Music\\Unknown Artist\\Unknown Album\\06 Atlas Shrugged - 06.mp3"
], size=10, lookback=3)))

1. H:\Music\Unknown Artist\Unknown Album\06 Atlas Shrugged - 06.mp3 - None (None)
2. H:\Music\Unknown Artist\Unknown Album\04 Atlas Shrugged - 04.mp3 - None (None)
3. H:\Music\Unknown Artist\Unknown Album\65 Atlas Shrugged - 65 1.mp3 - None (None)
4. H:\Music\Unknown Artist\Unknown Album\05 Atlas Shrugged - 05.mp3 - None (None)
5. H:\Music\Unknown Artist\Unknown Album\03 Atlas Shrugged - 03.mp3 - None (None)
6. H:\Music\Unknown Artist\Unknown Album\30 Atlas Shrugged - 30 1.mp3 - None (None)
7. H:\Music\Unknown Artist\Unknown Album\29 Atlas Shrugged - 29 1.mp3 - None (None)
8. H:\Music\Unknown Artist\Unknown Album\32 Atlas Shrugged - 32 1.mp3 - None (None)
9. H:\Music\Unknown Artist\Unknown Album\31 Atlas Shrugged - 31 1.mp3 - None (None)
10. H:\Music\Unknown Artist\Unknown Album\18 Atlas Shrugged - 18.mp3 - None (None)


In [31]:
print_track_details(get_track_details(make_playlist([
    "H:\\Music\\Dee Edwards\\Gilles Peterson Digs America Vol.2\\06 Why Can't There Be Love.mp3"
], size=10, lookback=3)))

1. Dee Edwards - Why Can't There Be Love (Gilles Peterson Digs America Vol.2)
2. H:\Music\Unknown Artist\Unknown Album\03 Genuine Pt 1.mp3 - None (None)
3. Sandra Sá - Guarde Minha Voz (Terca Sapphire)
4. Byrdie Green - Return of the Prodigal Son (Afrodisia 3)
5. H:\Music\Unknown Artist\Unknown Album\14  I Learned the Hard Way.mp3 - None (None)
6. Letta Mbulu - Mahlalela (Club Africa 2)
7. Janis Joplin - Trust Me (The Essential Janis Joplin)
8. H:\Music\Unknown Artist\Unknown Album\10 Let Them Knock.mp3 - None (None)
9. Janis Joplin - Cry Baby (The Essential Janis Joplin)
10. Janis Joplin - Bye, Bye Baby (The Essential Janis Joplin)


In [32]:
print_track_details(get_track_details(make_playlist([
    "H:\\Music\\The Clash\\Story of the Clash, Volume 1 (Disc 1)\\1-10 I Fought the Law.m4a"
], size=10, lookback=3)))

1. The Clash - I Fought the Law (Story of the Clash, Volume 1 (Disc 1))
2. The Clash - English Civil War (Story of the Clash, Volume 1 (Disc 2))
3. The Clash - Complete Control (Story of the Clash, Volume 1 (Disc 2))
4. The Clash - Tommy Gun (Story of the Clash, Volume 1 (Disc 2))
5. Radiohead - Just (The Bends)
6. The Clash - Safe European Home (Story of the Clash, Volume 1 (Disc 2))
7. The Stranglers - Duchess (Greatest Hits 1977-1990)
8. Various Artists - Feel FL↑P『Theme From Spider Man』 (元気でSka～!?-Dramatic Ska-)
9. The Sex Pistols - Anarchy In The UK (The Great Rock 'N' Roll Swindle)
10. The Zutons - Valerie (Tired of Hanging Around)


In [33]:
print_track_details(get_track_details(make_playlist([
    "H:\\Music\\Freemasons\\Love On My Mind\\01 Love On My Mind (Club Mix).m4a"
], size=10, lookback=3)))

1. Freemasons - Love On My Mind (Club Mix) (Love On My Mind)
2. Freemasons - Love On My Mind ft. Amanda Wilson (Original) (Ultra.10)
3. Buraka Som Sistema & Deize Tigron - Aqui Para Voces (feat. Deize Tigrona) (Black Diamond)
4. Hatiras - Spaced Invader (Bugged Out! Classics)
5. AR Rahman and Madhumitha - Millionaire (Slumdog Millionaire (Original Motion Picture Soundtrack))
6. David Guetta - One Love (Calvin Harris Mix) (One Love (Remixes) [feat. Estelle])
7. Bob Sinclar - Love Generation (Western Dream)
8. EZ Rollers - Hope & Inspiration (Bukem in Session)
9. Tim Deluxe - We All Love Sax (The Little Ginger Club Kid)
10. Buraka Som Sistema - IC19 (Black Diamond)


In [67]:
print_track_details(get_track_details(make_playlist([
    "H:\\Music\\Janis Joplin\\The Essential Janis Joplin\\15 Kozmic Blues (Live).m4a"
], size=10, lookback=3)))

1. Janis Joplin - Kozmic Blues (Live) (The Essential Janis Joplin)
2. Janis Joplin - To Love Somebody (Live) (The Essential Janis Joplin)
3. Jannis of Jakarta Records - None (None)
4. Ananda Shankar - Dancing Drums (A Life in Music)
5. Paul Jackson - Funk Times Three (Black Octopus)
6. Amsterdam Klezmer Band - Ludacris (Remixed)
7. Corona Del Mar High School Jazz Ensemble - Us (School Me! Volume One 1968-1975)
8. Joni Haastrup - Greetings (Wake Up Your Mind)
9. None - afrique mix (None)
10. Martin Kratochvil & Jazz Q - Toledo (Prog Is Not a Four Letter Word)


In [35]:
print_track_details(get_track_details(make_playlist([
    "H:\\Music\\Vangelis\\Blade Runner\\12 Tears In Rain.mp3"
], size=10, lookback=3)))

1. Vangelis - Tears In Rain (Blade Runner)
2. The Quantic Soul Orchestra - Interlude (Tropidelico (Tru Thoughts))
3. H:\Music\Unknown Artist\Unknown Album\MBSR-Session07-02-audio.mp3 - None (None)
4. H:\Music\Unknown Artist\Unknown Album\MBSR-Session05-02-audio.mp3 - None (None)
5. Robert Smith - Objeto Comestible (MBCT)
6. H:\Music\Unknown Artist\Unknown Album\MBSR-Session04-07-audio.mp3 - None (None)
7. H:\Music\Unknown Artist\Unknown Album\MBSR-Session08-02-audio.mp3 - None (None)
8. H:\Music\Unknown Artist\Unknown Album\21-Practica-Perdon-uno-mismo-MAM-09-.MP3 - None (None)
9. H:\Music\Unknown Artist\Unknown Album\13-Encuentra-tu-voz-Compasiva-MAM-08.MP3 - None (None)
10. H:\Music\Unknown Artist\Unknown Album\23-Practica-Amigo-Compasivo-MAM-09-1.MP3 - None (None)


In [36]:
print_track_details(get_track_details(make_playlist([
    "H:\\Music\\Sven Väth\\In the Mix_ The Sound of the Sixteenth S\\11 Battery.m4a"
], size=10, lookback=3)))

1. Patrick Specke & Daze Maxim - Battery (In the Mix: The Sound of the Sixteenth Season (Bonus Track Version))
2. Fudge Fingas - Dindins4dada (R³)
3. Massive Attack - Paradise Circus (Breakage's Tight Rope Remix) (Heligoland (Deluxe Version))
4. Yes Please aka Mr Vasovski - More Than I Wished For (Extended Trumpet Mix) (Hed Kandi Beach House 2011)
5. Tim Wright - The Crab (In the Mix: The Sound of the Sixteenth Season (Bonus Track Version))
6. John Tejada - Cipher (In the Mix: The Sound of the Sixteenth Season (Bonus Track Version))
7. Sven Väth - In the Mix: The Sound of the Sixteenth Season (Continuous DJ Mix 1) (In the Mix: The Sound of the Sixteenth Season (Bonus Track Version))
8. H:\Music\Unknown Artist\Unknown Album\sven väth mixmag.m4a - None (None)
9. Âme - Rej (Rej)
10. Tim Green - Body Language, Vol. 18 (Continuous Mix) (Get Physical Music Presents: Body Language, Vol. 18 by Tim Green)


In [37]:
print_track_details(get_track_details(make_playlist([
    "H:\\Music\\Nina Simone\\Little Girl Blue\\14 African Mailman.m4a"
], size=10, lookback=3)))

1. Nina Simone - African Mailman (Little Girl Blue)
2. Cantina Y Su Combo - Santa Marta Cumbia (Cartagena!)
3. The Ipanemas - Malandro Quando Vaza (Gilles Peterson Brazilika)
4. Johnny Raducanu - Blues Minor (Romanian Jazz)
5. Manfred Fest Trio - Quem E Homem Nao Chora (Manfred Fest Trio)
6. Tim Maia - Guiné Bissau, Moçambique E Angola Racional (Tim Maia Racional Volume 2)
7. Dona Onete - Moreno Morenado (Rolê: New Sounds of Brazil)
8. Peret - La Medallona (De los Cobardes Nunca Se Ha Escrito Nada)
9. João Donato - Cala Boca Menino (Quem E Quem)
10. Chicago Afrobeat Project - Media Man (A Move To Silent Unrest)


In [54]:
print_track_details(get_track_details(make_playlist([
    "H:\\Music\\Amadou & Mariam\\Dimanche À Bamako\\06 Artistiya.m4a"
], size=10, lookback=3))) # lookback 5

1. Amadou & Mariam - Artistiya (Dimanche À Bamako)
2. Montefiori Cocktail - Gipsy Woman (The Trip Created By Tom Middleton (Disc 1))
3. Montefiori Cocktail - Gipsy Woman (La Da De La Da Da) (Raccolta No. 1)
4. Daft Punk - Get Lucky (Radio Edit) [feat. Pharrell Williams] (Get Lucky (Radio Edit) [feat. Pharrell Williams] - Single)
5. The Quantic Soul Orchestra - Tropidelico (Tropidelico (Tru Thoughts))
6. Daft Punk - Get Lucky (feat. Pharrell Williams & Nile Rodgers) (Random Access Memories)
7. T.P. Orchestre Poly-Rythmo - Gendamou Na Wii We Gnannin (The Kings Of Benin Urban Groove 1972 - 80)
8. Maria Bethania - Mano Caetano (Edit) (None)
9. Jurassic 5 - Canto De Ossanha (Feedback)
10. Mono Mono - Eme Kowa Iasa Ile Wa (Nigeria Special: Modern Highlife, Afro-Sounds & Nigerian Blues 1970-6)


In [39]:
print_track_details(get_track_details(make_playlist([
    "H:\\Music\\Les Barons\\Beginner's Guide To World Lounge\\1-08 Batucada de Bahia.mp3"
], size=10, lookback=5))) # lookback 5

1. Les Barons - Batucada de Bahia (Beginner's Guide To World Lounge)
2. KYOTO JAZZ MASSIVE - Endless Flight (Kyoto Jazz Massive 10th Anniversary)
3. Llorca - Sabotage (Little Computer People)
4. Daniel Ibbotson - Celebrate (Coming Home (... Warming Up Your Living Area))
5. Classen Collective - Close To Greatness (Deep Joy Mix) (JCR playlist by Jazzanova)
6. Mr. Scruff - Chicken In A Box (Mr. Scruff)
7. Blackjoy - Moustache (Knights of the Playboy Mansion (Mixed by Bob Sinclar & Dimitri from Paris))
8. Peace Orchestra - Shining (Uptight4s Cold Weathe (Soundselection 05.02)
9. Gershon Kingsley - Pop Corn (Music to Moog By)
10. Jazzanova - Fedime's Flight (Desert Island Mix)


In [62]:
print_track_details(get_track_details(make_playlist([
    "H:\\Music\\Jimmy Smith\\The Best Of Jimmy Smith\\10 Got My Mojo Working.mp3"
], size=10, lookback=3)))

1. Jimmy Smith - Got My Mojo Working (The Best Of Jimmy Smith)
2. Trudy Pitts And Pat Martino - The Spanish Flea (Legends Of Acid Jazz)
3. Hugo Montenegro - The Good, The Bad And The Ugly (The Good, The Bad And The Ugly)
4. Trudy Pitts And Pat Martino - Fiddlin' (Legends Of Acid Jazz)
5. Klaus Doldinger Quartett - Fiesta (Exotic Jazz)
6. Charly Antolini - Senor Hoche (Exotic Jazz)
7. Deodato - September 13 (Preludes And Rhapsodies)
8. Fela Kuti - It's No Possible (Expensive Shit + He Miss Road (feat. Africa 70))
9. Ebo Taylor - Love & Death (Love & Death)
10. Ebo Taylor - Love and Death (Taken from the Album "Love & Death") (Strut Africa)


In [55]:
print_track_details(get_track_details(make_playlist([
    "H:\\Music\\Alexandre Desplat\\Lust, Caution\\22 Wong Chia Chi's Theme.mp3"
], size=10, lookback=3)))

1. Alexandre Desplat - Wong Chia Chi's Theme (Lust, Caution)
2. Max Richter - The Trees (The Blue Notebooks)
3. Maxence Cyrin - Smokeblech II (Modern Rhapsodies)
4. John Williams - The Scavenger (Star Wars: El Despertar de la Fuerza (Banda Sonora Original))
5. Maxence Cyrin - Triangle (Novö Piano)
6. Maxence Cyrin - No Cars Go (Novö Piano)
7. Maxence Cyrin - D.A.N.C.E (Novö Piano)
8. Maxence Cyrin - Lithium (Novö Piano)
9. Maxence Cyrin - Unfinished Sympathy (Modern Rhapsodies)
10. Michael Nyman - The Heart Asks Pleasure First (The Piano)


In [7]:
print_track_details(get_track_details(make_playlist([
    "H:\\Music\\Astor Piazzolla\\Años De Soledad\\01 Libertango.mp3"
], size=10, lookback=3)))

1. Astor Piazzolla - Libertango (Años De Soledad)
2. Yo-Yo Ma - Libertango (Essential Yo-Yo Ma)
3. The Hastings Street Jazz Experience - Ja Mil (Spiritual Jazz: Esoteric, Modal And Deep Jazz From The Underground 1968-77)
4. René Aubry - Steppe II (Steppe)
5. David McCallum - Edge (A Part of Me / A Bit More of Me)
6. MICHEL GARRICK BAND - Fire Opal And Blue Poppies-A Sequence Of Visions:Fire Opal (Kind Of Jazz-Jazz Rock)
7. Derek & The Dominos - Nobody Knows You When You're Down and Out (Layla and Other Assorted Love Songs)
8. Musicians From The Summer Program For Youthful Musicians - Brougham (Gilles Peterson Digs America Vol.2)
9. Nathan Davis - Makatuka (Best Of 1965-76)
10. Minnie Riperton - Les Fleur (INCredible Sound of Gilles Peterson)


In [12]:
print_track_details(get_track_details(make_playlist([
    "H:\\Music\\Fela Kuti\\The Best of Fela\\04 Water No Get Enemy.m4a"
], size=10, lookback=3)))

1. Fela Kuti - Water No Get Enemy (The Best of Fela)
2. Fela Kuti - Water No Get Enemy (The Best of the Black President)
3. Brasil Show - Pot-Pourri: Gostava Tanto De Voce-Que Pena-Toda Menina Baiana-Alem Do Horizonte-Malandro-Se Quiser Chorar Por Mim (Voo Livre)
4. Brasil Show - Voce Pode (Voo Livre)
5. Fela Kuti - It's No Possible (Expensive Shit + He Miss Road (feat. Africa 70))
6. Ebo Taylor - Love & Death (Love & Death)
7. Ebo Taylor - Love and Death (Taken from the Album "Love & Death") (Strut Africa)
8. Deodato - Also Sprach Zarathustra (Preludes And Rhapsodies)
9. Manu Dibango - Wouri (Africadelic)
10. Romano Rizzati - Vocal '700 (Shake Your Booty V1 Euro Groove)


In [14]:
print_track_details(get_track_details(make_playlist([
    "H:\\Music\\Johnny Osbourne\\Truth and Rights\\Truth and Rights.mp3"
], size=10, lookback=3)))

1. Johnny Osbourne - Truth and Rights (Truth and Rights)
2. Fat Freddy's Drop - This Room (Gilles Peterson Presents - The BBC Sessions - Vol. 1)
3. 10cc - Dreadlock Holiday (The Very Best of 10cc)
4. Peret - Un Tiempo para Todo (De los Cobardes Nunca Se Ha Escrito Nada)
5. Fat Freddys Drop - Midnight Marauders - live (Record-Play presents - Fat Freddys Drop live mix 2006)
6. Cymande - To You (Second Time Around)
7. Cymande - The Recluse (Promised Heights)
8. Fat Freddy's Drop - This Room (Based on a True Story)
9. Fat Freddy's Drop - Cay's Crays (Based on a True Story)
10. The Egg - Say You Will (Forwards)


In [16]:
print_track_details(get_track_details(make_playlist([
    "H:\\Music\\Jean Michel Jarre\\Equinoxe\\01 Equinoxe, Pt. 1.m4a"
], size=10, lookback=3)))

1. Jean Michel Jarre - Equinoxe, Pt. 1 (Equinoxe)
2. Jean-Michel Jarre - Equinoxe Part 1 (Equinoxe)
3. Jean Michel Jarre - Oxygène Part I (Oxygène)
4. Jean-Michel Jarre - Equinoxe Part 2 (Equinoxe)
5. Jean Michel Jarre - Equinoxe, Pt. 2 (Equinoxe)
6. John Williams - Farewell and The Trip (Star Wars: El Despertar de la Fuerza (Banda Sonora Original))
7. John Williams - Rey's Theme (Star Wars: El Despertar de la Fuerza (Banda Sonora Original))
8. John Williams - The Jedi Steps and Finale (Star Wars: El Despertar de la Fuerza (Banda Sonora Original))
9. John Williams - Main Title and The Attack on the Jakku Village (Star Wars: El Despertar de la Fuerza (Banda Sonora Original))
10. London Symphony Orchestra & Sir Adrian Boult - Variations On an Original Theme, 'Enigma', Op. 36: IX. Nimrod (100 Best Relaxing Classics)


In [28]:
print_track_details(get_track_details(make_playlist([
    "H:\\Music\\Orquesta Sinfónica De España, Orquesta F\\Clásicos de España\\2-11 Asturias.m4a"
], size=10, lookback=3)))

1. Orquesta Sinfónica De España, Orquesta Filarmónica De España, Isaac Albéniz, Enrique Granados, Salvador Bacarisse, Fernando Sor - Asturias (Clásicos de España)
2. Pepe Romero - Asturias (Guitar Solos)
3. Michael Andrews - The Artifact and Living (Donnie Darko (Soundtrack from the Motion Picture))
4. Michael Andrews - The Artifact & Living (Donnie Darko)
5. Joep Beving - 432 (Prehension)
6. Michael Andrews - Liquid Spear Waltz (Donnie Darko)
7. Sigur Rós - Heysátan (Takk...)
8. Gonzales - Paristocrats (Solo Piano)
9. Gonzales - CM Blues (Solo Piano)
10. Gonzales - Manifesto (Solo Piano)


In [34]:
print_track_details(get_track_details(make_playlist([
    "H:\\Music\\Aníbal Troilo\\20 Tangos Famosos\\11 La Viruta (feat. Orquesta de Juan.m4a"
], size=10, lookback=3)))

1. Juan D'Arienzo - La Viruta (feat. Orquesta de Juan D'Arienzo) (20 Tangos Famosos)
2. Juan D'Arienzo - El Esquinazo (feat. Orquesta de Juan D'Arienzo) (20 Tangos Famosos)
3. Juan D'Arienzo - El Porteñito (feat. Orquesta de Juan D'Arienzo) (20 Tangos Famosos)
4. Juan D'Arienzo - Hotel Victoria (feat. Orquesta de Juan D'Arienzo) (20 Tangos Famosos)
5. Trudy Pitts And Pat Martino - Steppin' In Minor (Legends Of Acid Jazz)
6. Fats Waller - Ain't Misbehavin' (Remastered) (Piano Bar - 30 Jazz Hits (Remastered))
7. Juan D'Arienzo - Re Fa Si (feat. Orquesta de Juan D'Arienzo) (20 Tangos Famosos)
8. serge gainsbourg - None (inédits 57-68)
9. Makoto Miura - Interlude (Adieu Tristesse) (Be Bop Or Do Something)
10. Artie Shaw - Moonglow (Music From The Films Of Woody Allen [Disc 1])


In [37]:
print_track_details(get_track_details(make_playlist([
    "H:\\Music\\Mulatu Astatke\\Mulatu Of Ethiopia\\01 Mulatu 2.mp3"
], size=10, lookback=3)))

1. Mulatu Astatke - Mulatu (Mulatu of Ethiopia)
2. The Whitefield Brothers - Sem Yelesh (Earthology)
3. Mulatu Astatke - Kasalefkut-Hulu (Mulatu of Ethiopia)
4. Nathan Davis - Makatuka (Best Of 1965-76)
5. Minnie Riperton - Les Fleur (INCredible Sound of Gilles Peterson)
6. Musicians From The Summer Program For Youthful Musicians - Brougham (Gilles Peterson Digs America Vol.2)
7. Antonio Machín - Angelitos Negros (Angelitos Negros)
8. David McCallum - Edge (A Part of Me / A Bit More of Me)
9. The Love Affair - Never In My Life (Gilles Peterson Digs America Vol.2)
10. Jimmy Smith - Goldfinger (The Best Of Jimmy Smith)


In [44]:
print_track_details(get_track_details(make_playlist([
    "H:\\Music\\Natacha Atlas, Transglobal Underground\\Whatever Lola Wants _ Original Motion Pi\\21 Whatever Lola Wants (Lola Gets).mp3"
], size=10, lookback=3)))

1. Natacha Atlas - Whatever Lola Wants (Lola Gets) (Whatever Lola Wants : Original Motion Picture Soundtrack)
2. OUM - Oum Maysan (Sweerty)
3. Corinne Bailey Rae - Breathless (Corinne Bailey Rae)
4. BEBETECK - ￐﾿ￕ￩ﾤ　￐﾿ￕ￩ﾤ (Jazz Connection: around the Shibuya Corner)
5. Sarah Vaughan - Dreamy (Remastered) (Piano Bar - 30 Jazz Hits (Remastered))
6. Duffy - Stepping Stone (Rockferry)
7. Amadou & Mariam - M'bifé (Dimanche À Bamako)
8. Aloe Blacc - Mama Hold My Hand (Good Things)
9. Amy Winehouse & Tony Bennett - Body and Soul (Lioness: Hidden Treasures)
10. Juan Zelada - I Can't Love (High Ceilings & Collarbones)


In [31]:
print_track_details(get_track_details(make_playlist([
    "H:\\Music\\Joby Talbot\\Franklyn\\01 Gonna Kill a Man.m4a"
], size=10, lookback=3)))

1. Joby Talbot - Gonna Kill a Man (Franklyn)
2. Joby Talbot - End Credits (Franklyn)
3. Krister Linder - Look At Me (Metropia (Motion Picture Soundtrack))
4. Thomas Newman - Any Other Name (American Beauty)
5. Lars Horntveth - Pooka Soundtrack (Pooka)
6. John Williams - Rey Meets BB-8 (Star Wars: El Despertar de la Fuerza (Banda Sonora Original))
7. John Williams - Maz's Counsel (Star Wars: El Despertar de la Fuerza (Banda Sonora Original))
8. John Williams - Torn Apart (Star Wars: El Despertar de la Fuerza (Banda Sonora Original))
9. John Williams - Han and Leia (Star Wars: El Despertar de la Fuerza (Banda Sonora Original))
10. John Williams - Finn's Confession (Star Wars: El Despertar de la Fuerza (Banda Sonora Original))


In [11]:
print_track_details(get_track_details(make_playlist([
    "H:\\Music\\Echo & the Bunnymen\\Ocean Rain\\The Killing Moon.mp3"
], size=10, lookback=3)))

1. Echo & the Bunnymen - The Killing Moon (Ocean Rain)
2. Simple Minds - Don´t You Forget About Me (Live - In The City Of Light)
3. Simple Minds - Alive And Kicking (Live - In The City Of Light)
4. Simple Minds - Promise You A Miracle (Live - In The City Of Light)
5. Simple Minds - Oh Jungleland (Live - In The City Of Light)
6. Simple Minds - Sanctify Yourself (Live - In The City Of Light)
7. The Clash - (White Man) In Hammersmith Palais (Story of the Clash, Volume 1 (Disc 2))
8. Simple Minds - Love Song Medley: Sun City / Dance To The Music (Live - In The City Of Light)
9. Duran Duran - A View to a Kill (The Singles Box 1986 - 1995)
10. Simple Minds - Ghost Dancing (Live - In The City Of Light)


In [17]:
print_track_details(get_track_details(make_playlist([
    "H:\\Music\\Compilations\\Maxima FM Compilation, Vol. 11\\1-07 No Superstar (Full Vocal Radio.m4a"
], size=10, lookback=3)))

1. Remady - No Superstar (Full Vocal Radio Mix) (Maxima FM Compilation, Vol. 11)
2. Maroon 5 - Moves Like Jagger (feat. Christina Aguilera) (Moves Like Jagger (feat. Christina Aguilera) - Single)
3. Shawn Wolf - Moves Like Jagger (I've Got the Moves Like Jagger) (Moves Like Jagger (I've Got the Moves Like Jagger) - Single)
4. Usher - DJ Got Us Fallin' In Love (feat. Pitbull) [feat. Pitbull] (Versus)
5. Pitbull - Timber (feat. Ke$ha) (Timber (feat. Ke$ha) - Single)
6. Olav Basoski - Waterman (Radio Mix) (Dream Dance Vol. 38)
7. Milk, Sugar & Vaya Con Dios - Hey (Nah Neh Nah) (Milk & Sugar Radio Version) (Hey (Nah Neh Nah))
8. Carl Cox - Ain't It Funky Now- (Second Sign)
9. Shapeshifters - Back To Basics (Steve Lawler's Return To Rehab Mix) (Back to Basics)
10. Freemasons - Love On My Mind (Club Mix) (Love On My Mind)


### Let's listen to the results

In [None]:
play_playlist(make_playlist([
    "H:\\Music\\Orquesta Sinfónica De España, Orquesta F\\Clásicos de España\\2-11 Asturias.m4a"
], size=10, lookback=3))

### Find a particular song, artist or album

In [19]:
for mp3 in mp3tovec:
    if mp3.lower().find('moondog') != -1:
        print('"'+ mp3.replace('\\', '\\\\') + '"')

"H:\\Music\\Moondog\\The German Years 1977-1999 CD1\\Bird's Lament.mp3"
"H:\\Music\\Moondog\\The German Years\\01 Bird`s Lament.m4a"
"H:\\Music\\Moondog\\Moondog _ Moondog 2\\Stamping Ground.mp3"


### Joining the dots: a musical journey

Using the MP3ToVec vectors we can smoothly "interpolate" between one MP3 and another. In this way we can create a muiscal "journey" that passes through several "waypoints".

In [18]:
print_track_details(get_track_details(join_the_dots([
    "H:\\Music\\Aretha Franklin\\I Never Loved a Man the Way I Love You\\01 Respect.mp3", # soul
    "H:\\Music\\James Brown\\The Godfather - The Very Best of James B\\02 I Got You (I Feel Good).m4a", # funk
    "H:\\Music\\Jurassic 5\\Jurassic 5 LP\\Lesson 6_ The Lecture.mp3", # hip-hop
    "H:\\Music\\Roni Size\\Breakbeat Era - Ultra Obscene\\Terrible Funk.mp3", # drum 'n' bass
    "H:\\Music\\Sven Väth\\In the Mix_ The Sound of the Sixteenth S\\14 Eclipse.m4a", # techno
], n=7)))

1. Aretha Franklin - Respect (I Never Loved a Man the Way I Love You)
2. Dee Edwards - Why Can't There Be Love (Gilles Peterson Digs America Vol.2)
3. Linda Jones - I Just Can’t Live My Life (Without You Babe)  [7" Mix] (Northern Soul (The Soundtrack) [Extended Version])
4. Carnegie Mellon Jazz Band - The First Thing I Do (None)
5. Helene Smith - Pot Can't Talk About the Kettle (Mad Men (Music from the Series) Vol. 2)
6. Tom Jones - Ain't No Sunshine (None)
7. Janis Joplin - Trust Me (The Essential Janis Joplin)
8. James Brown - I Got You (I Feel Good) (The Godfather - The Very Best of James Brown)
9. Chubby Checker - Let's Twist Again (Mad Men (Music from the Series) Vol. 2)
10. Cut Chemist Feat. Hymnal - What's The Altitude (Paul's Second Compilation)
11. Nicolas Repac - Swing Swing (Swing swing)
12. Cut Chemist - Lesson 6 - The Lecture (Original Un-Edited Version) [Cut Chemist] (The Ultimate Lessons)
13. Coldcut - Mag (70 Minutes Of Madness - Journeys By DJ Special Release)
14. Cut 

In [None]:
play_playlist(join_the_dots([
    "H:\\Music\\Aretha Franklin\\I Never Loved a Man the Way I Love You\\01 Respect.mp3", # soul
    "H:\\Music\\James Brown\\The Godfather - The Very Best of James B\\02 I Got You (I Feel Good).m4a", # funk
    "H:\\Music\\Jurassic 5\\Jurassic 5 LP\\Lesson 6_ The Lecture.mp3", # hip-hop
    "H:\\Music\\Roni Size\\Breakbeat Era - Ultra Obscene\\Terrible Funk.mp3", # drum 'n' bass
    "H:\\Music\\Sven Väth\\In the Mix_ The Sound of the Sixteenth S\\14 Eclipse.m4a", # techno
], n=7))

### Output similar songs on the fly based on 5 second samples

In [441]:
#full_path = "H:\\Music\\Q-Tip\\Amplified\\Breathe And Stop.mp3"
#full_path = "H:\\Music\\PFM\\Producer 02\\02 Dreams.mp3"
full_path = "H:\\Music\\Jurassic 5\\Jurassic 5 LP\\Lesson 6_ The Lecture.mp3"
try:
    pickle_filename = (full_path[:-3]).replace('\\', '_').replace('/', '_').replace(':','_') + 'p'
    y, sr = librosa.load(full_path, mono=True)
    S = librosa.feature.melspectrogram(y=y, sr=sr, n_fft=n_fft, hop_length=hop_length, n_mels=n_mels, fmax=sr/2)
    pygame.mixer.init(sr)
    pygame.mixer.music.load(full_path)
    pygame.mixer.music.play()
    track_matrix = []
    start = time.time()
    slice = 0
    while (slice + 1) * slice_size < S.shape[1]:
        log_S = librosa.power_to_db(S[:, slice * slice_size : (slice+1) * slice_size], ref=np.max)
        if np.max(log_S) - np.min(log_S) != 0:
            log_S = (log_S - np.min(log_S)) / (np.max(log_S) - np.min(log_S))
        x = np.expand_dims(np.expand_dims(log_S, axis=2), axis=0)
        y_pred = model.predict(x)
        track_matrix.append(y_pred[0])
        most_similar = most_similar_by_vec(positive=[y_pred[0]], topn=1)
        print(f'{slice_time*slice//60:.0f}m{slice_time*slice%60:02.0f}s : {most_similar[0][0]} [{most_similar[0][1]:.2f}]')
        slice += 1
        while time.time() - start < slice_time * slice:
            pass
except KeyboardInterrupt:
    pygame.mixer.quit() # stop the music from playing
    raise
pygame.mixer.quit()

0m00s : H:\Music\Justin Hurwitz\Whiplash (Original Motion Picture Soundt\07 What's Your Name.m4a [0.99]
0m05s : H:\Music\Compilations\Mad Men Music (Music To Watch The Boys &\50 The Driving Instructor.m4a [0.99]
0m10s : H:\Music\Serge Gainsbourg\inédits 57-68\07 - serge gainsbourg - inédits 57-6.mp3 [0.99]
0m15s : H:\Music\Compilations\Mad Men Music (Music To Watch The Boys &\50 The Driving Instructor.m4a [0.98]
0m20s : H:\Music\Compilations\70 Minutes Of Madness - Journeys By DJ S\12 Stratus Static.m4a [0.99]
0m25s : H:\Music\Cypress Hill\Black Sunday\03 Insane In The Brain.mp3 [0.99]
0m30s : H:\Music\Cypress Hill\Original Album Classics_ Cypress Hill\2-05 Lick a Shot.m4a [0.99]
0m35s : H:\Music\Serge Gainsbourg\inédits 57-68\07 - serge gainsbourg - inédits 57-6.mp3 [0.99]
0m40s : H:\Music\Kool Kyle\No Frills 12_\Getting Over.mp3 [0.98]
0m45s : H:\Music\Compilations\Mad Men Music (Music To Watch The Boys &\50 The Driving Instructor.m4a [0.98]
0m50s : H:\Music\Serge Gainsbourg\inédits 

### Determine MP3ToVec for a song not in the library

(Note that this does not effect the exisiting vectors, even though it should.)

In [22]:
mp3s = {}
for vec in mp3tovec:
    pickle_filename = (vec[:-3]).replace('\\', '_').replace('/', '_').replace(':','_') + 'p'
    unpickled = pickle.load(open(dump_directory + '/' + pickle_filename, 'rb'))
    mp3s[unpickled[0]] = unpickled[1]
print(f'{len(mp3s)} MP3s')

7692 MP3s


In [23]:
full_path = "738d01610631b9b78ca839d61db74df3aa1db49d.mp3" # Luis Fonsi - Despacito (30 second sample)
full_path = "H:\\Music\\Joaquin Claussell & Kerri Chandler\\Gilles Peterson In Brazil_ Da Hora\\2-05 Escrávos De Jô (Robust Horns Mi.mp3"
pickle_filename = (full_path[:-3]).replace('\\', '_').replace('/', '_').replace(':','_') + 'p'
y, sr = librosa.load(full_path, mono=True)
S = librosa.feature.melspectrogram(y=y, sr=sr, n_fft=n_fft, hop_length=hop_length, n_mels=n_mels, fmax=sr/2)
track_matrix = []
start = time.time()
slice = 0
while (slice + 1) * slice_size < S.shape[1]:
    log_S = librosa.power_to_db(S[:, slice * slice_size : (slice+1) * slice_size], ref=np.max)
    if np.max(log_S) - np.min(log_S) != 0:
        log_S = (log_S - np.min(log_S)) / (np.max(log_S) - np.min(log_S))
    x = np.expand_dims(np.expand_dims(log_S, axis=2), axis=0)
    y_pred = model.predict(x)
    track_matrix.append(y_pred[0])
    slice += 1

In [24]:
epsilon_distance = 0.01
new_idfs = []
new_vecs = np.array(track_matrix)
for vec_i in new_vecs:
    idf = 1 # because each new vector is in the new mp3 by definition
    for mp3 in mp3s:
        for vec_j in mp3s[mp3]:
            if 1 - np.dot(vec_i, vec_j) / (np.linalg.norm(vec_i) * np.linalg.norm(vec_j)) < epsilon_distance:
                idf += 1
                break
    new_idfs.append(-np.log(idf / (len(mp3s) + 1))) # N + 1
vec = 0
for i, vec_i in enumerate(track_matrix):
    tf = 0
    for vec_j in new_vecs:
        if 1 - np.dot(vec_i, vec_j) / (np.linalg.norm(vec_i) * np.linalg.norm(vec_j)) < epsilon_distance:
            tf += 1
    vec += vec_i * tf * new_idfs[i]

In [26]:
print(f'Songs most similar to {full_path}:')
similar = most_similar_by_vec([vec], topn=5)
for i in similar:
    print(f'{i[0]} [{i[1]:.2f}]')

Songs most similar to H:\Music\Joaquin Claussell & Kerri Chandler\Gilles Peterson In Brazil_ Da Hora\2-05 Escrávos De Jô (Robust Horns Mi.mp3:
H:\Music\Joaquin Claussell & Kerri Chandler\Gilles Peterson In Brazil_ Da Hora\2-05 Escrávos De Jô (Robust Horns Mi.mp3 [1.00]
H:\Music\Unknown Artist\TINSEL TOWNY 2 BONUS CD\09 Track 9.mp3 [0.99]
H:\Music\Sven Väth\In the Mix_ The Sound of the Sixteenth S\24 Universal Love.m4a [0.99]
H:\Music\Bob Sinclar & Dimitri from Paris\Knights of the Playboy Mansion (Mixed by\06 Coma Cat.m4a [0.99]
H:\Music\DJ Tayla\Producer 04\01 Timefields.mp3 [0.99]


### Check similarity of MP3ToVecs

In [18]:
mp3tovec_old = pickle.load(open(dump_directory + '/mp3tovecs/mp3tovec_001.p', 'rb'))
mp3tovec_new = pickle.load(open(dump_directory + '/mp3tovecs/mp3tovec_001v2.p', 'rb'))
i = 0
for vec in mp3tovec:
    if vec in mp3tovec_new:
        vec_i = mp3tovec_old[vec]
        vec_j = mp3tovec_new[vec]
        dist = np.dot(vec_i, vec_j) / (np.linalg.norm(vec_i) * np.linalg.norm(vec_j))
        print(f'{vec} {dist}')
        i += 1
        if i >= 50:
            break

H:\Music\The Pharcyde\Bizarre Ride II The Pharcyde\08 On The DL.mp3 1.0
H:\Music\JJ DOOM\Key to the Kuffs\12 Still Kaps (feat. Khujo Goodie).m4a 1.0
H:\Music\Compilations\Bukem in Session\05 Alone This Way (No Need to Stay).m4a 0.9999999403953552
H:\Music\The Clash\Story of the Clash, Volume 1 (Disc 2)\2-15 Police & Thieves.m4a 0.9999999403953552
H:\Music\James Mason\Electric Soul\I Want Your Love.mp3 1.0
H:\Music\Koop\JCR playlist by Jazzanova\08 Bright Nights (Rima Fusion Mix).mp3 1.0
H:\Music\Ben Westbeech\There's More to Life Than This\08 Sugar.m4a 1.0
H:\Music\Jeff Mills\Blue Potential_ Live With Montpellier Ph\11 4 Art.m4a 0.9999999403953552
H:\Music\Reuben Wilson\Got To Get Your Own\07 Got To Get Your Own.mp3 1.0
H:\Music\40 Winks\The Lucid Effect\Sleep Ritual.mp3 1.0
H:\Music\United Future Organization\UFOs For Real_ Scene 3\12 The Planet Plan (Carl Craig Remix.mp3 0.9999999403953552
H:\Music\Lily Allen\Sheezus (Special Edition)\02 L8 Cmmr.m4a 1.0
H:\Music\Eddie Russ\Take A Loo

### Visualize MP3ToVec embedding with t-SNE

In [27]:
# importing bokeh library for interactive data visualization
import pandas as pd
import bokeh.plotting as bp
from bokeh.models import HoverTool, BoxSelectTool
from bokeh.plotting import figure, show, output_notebook

# defining the chart
output_notebook()
fig = bp.figure(plot_width=700, plot_height=600, title="A map of MP3ToVec vectors",
                tools="pan,wheel_zoom,box_zoom,reset,hover,previewsave",
                x_axis_type=None, y_axis_type=None, min_border=1)

# getting a list of word vectors. limit to 1000. each is of 200 dimensions
word_vectors = [mp3tovec[w] for w in list(mp3tovec.keys())]

# dimensionality reduction. converting the vectors to 2d vectors
from sklearn.manifold import TSNE
tsne_model = TSNE(n_components=2, verbose=1, random_state=0)
tsne_w2v = tsne_model.fit_transform(word_vectors)

# putting everything in a dataframe
tsne_df = pd.DataFrame(tsne_w2v, columns=['x', 'y'])
tsne_df['words'] = list(mp3tovec.keys())

# plotting. the corresponding word appears when you hover on the data point.
fig.scatter(x='x', y='y', source=tsne_df)
hover = fig.select(dict(type=HoverTool))
hover.tooltips={"word": "@words"}
show(fig)

[t-SNE] Computing 91 nearest neighbors...
[t-SNE] Indexed 7692 samples in 0.108s...
[t-SNE] Computed neighbors for 7692 samples in 11.507s...
[t-SNE] Computed conditional probabilities for sample 1000 / 7692
[t-SNE] Computed conditional probabilities for sample 2000 / 7692
[t-SNE] Computed conditional probabilities for sample 3000 / 7692
[t-SNE] Computed conditional probabilities for sample 4000 / 7692
[t-SNE] Computed conditional probabilities for sample 5000 / 7692
[t-SNE] Computed conditional probabilities for sample 6000 / 7692
[t-SNE] Computed conditional probabilities for sample 7000 / 7692
[t-SNE] Computed conditional probabilities for sample 7692 / 7692
[t-SNE] Mean sigma: 20.096825
[t-SNE] KL divergence after 250 iterations with early exaggeration: 80.889595
[t-SNE] KL divergence after 1000 iterations: 1.667775


In [28]:
# last batch is smaller than the others...
mp3tovec_test = pickle.load(open('Pickles/mp3tovecs/mp3tovec_78.p', 'rb'))
for i in mp3tovec_test: print(i)

H:\Music\Nicolas Repac\Swing swing\10 Tambours battants.mp3
H:\Music\Ferry Corsten\It's Time\It's Time (Luke Slater's Rockers Sho.mp3
H:\Music\Compilations\Strut Africa\10 Kruman Dey (Taken from the Album.m4a
H:\Music\Joaquin Claussell & Kerri Chandler\Gilles Peterson In Brazil_ Da Hora\2-05 Escrávos De Jô (Robust Horns Mi.mp3
