# Lyric Mood Classification - Word Embeddings

The majority of the code used in this workbook can be found in `lyrics2vec.py`.

The notebook is split into two parts:

1. Lyrics & Vocabulary Working Examples
2. Tensorflow & Word2Vec

In [41]:
# Project Imports
from scrape_lyrics import configure_logging, logger
from index_lyrics import read_file_contents
import lyrics2vec

# Python and Package Imports
from tensorflow.contrib.tensorboard.plugins import projector
import tensorflow as tf
import pandas as pd
import numpy as np
import collections
import datetime
import random
import string
import math
import time
import os

# NLTK materials - make sure that you have stopwords and punkt
import nltk
nltk.download('stopwords')
nltk.download('punkt')
from nltk import word_tokenize
from nltk.corpus import stopwords

# setup logging
configure_logging(logname='lyrics2vec_notebook')

[nltk_data] Downloading package stopwords to
[nltk_data]     /home/jcworkma/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package punkt to /home/jcworkma/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


# Lyrics & Vocabulary Working Examples
### How many unique words do we have?

We begin by tackling this question as an exercise to familiarize ourselves with accessing and reading the lyrics and building a vocabulary.

In [16]:
start = time.time()

unique_words = collections.defaultdict(lambda: 0)
lyricfiles = os.listdir(lyrics2vec.LYRICS_TXT_DIR)
num_files = len(lyricfiles)
contents_processed = 0

for count, lyricfile in enumerate(lyricfiles):

    # progress update
    if count % 10000 == 0:
        print('{0}/{1} lyric files processed. {2:.02f} minutes elapsed. {3} contents processed. {4} unique words acquired.'.format(
            count, num_files, (time.time() - start) / 60, contents_processed, len(unique_words)))

    # read contents and look for unique words    
    lyricfile = os.path.join(lyrics2vec.LYRICS_TXT_DIR, lyricfile)
    contents = read_file_contents(lyricfile)
    if contents and contents[0]:
        split = contents[0].split()
        for word in split:
            unique_words[word] += 1
        contents_processed += 1
            
end = time.time()
elapsed = (end - start) / 60

print('Elapsed Time: {0} minutes.'.format(elapsed))

0/294299 lyric files processed. 0.00 minutes elapsed. 0 contents processed. 0 unique words acquired.
10000/294299 lyric files processed. 0.01 minutes elapsed. 9547 contents processed. 122479 unique words acquired.
20000/294299 lyric files processed. 0.02 minutes elapsed. 19145 contents processed. 195130 unique words acquired.
30000/294299 lyric files processed. 0.03 minutes elapsed. 28736 contents processed. 257393 unique words acquired.
40000/294299 lyric files processed. 0.04 minutes elapsed. 38320 contents processed. 308552 unique words acquired.
50000/294299 lyric files processed. 0.05 minutes elapsed. 47911 contents processed. 357693 unique words acquired.
60000/294299 lyric files processed. 0.06 minutes elapsed. 57519 contents processed. 402886 unique words acquired.
70000/294299 lyric files processed. 0.07 minutes elapsed. 67117 contents processed. 446049 unique words acquired.
80000/294299 lyric files processed. 0.08 minutes elapsed. 76664 contents processed. 484761 unique word

In [17]:
print('Number of Unique Words: {0}'.format(len(unique_words)))

Number of Unique Words: 1099635


### What are the most common words?

In [18]:
# import words into a pandas dataframe and display top N words
df = pd.DataFrame.from_dict(unique_words, orient='index', columns=['count'])
df = df.sort_values('count', ascending=False)
df[:20]

Unnamed: 0,count
the,1888745
I,1607271
you,1313028
to,1119509
a,1003483
me,748343
and,704609
my,614638
in,613737
of,582333


# TensorFlow & Word2vec

To generate our word embeddings, we make use of the word2vec model as defined by [Mikolov et al](https://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf) and the [implementation provided by TensorFlow](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/tutorials/word2vec/word2vec_basic.py).

We also utilized the following sources to provide more background and direction:
* http://adventuresinmachinelearning.com/word2vec-tutorial-tensorflow/
* https://www.tensorflow.org/tutorials/representation/word2vec
* https://github.com/PacktPublishing/TensorFlow-Machine-Learning-Cookbook/blob/master/Chapter%2007/doc2vec.py
* https://towardsdatascience.com/another-twitter-sentiment-analysis-with-python-part-11-cnn-word2vec-41f5e28eda74

Roughly, the steps we followed are

1. Preprocess Lyrics
2. Build Vocabulary
3. Construct Dataset
4. Train Model

Our model and most of the supporting code can be found in the _lyrics2vec_ class in `lyrics2vec.py`.

### Preprocessing the Lyrics

First, when reading in the lyrics, there is some amount of preprocessing we must do to make the words more machine friendly. For our preprocessing, we

1. Remove all stopwords
2. Remove all punctuation
3. Lowercase everything
4. Perform tokenization with NLTK's word_tokenize function

Below is an example of the output of the preprocessing. Note how contractions and certain slang words like 'wanna' are split. This is because these entities are handled as two different words.

In [4]:
s = "I don't wanna die I sometimes wish I'd never been born at all"
print('Before: {0}\nAfter: {1}'.format(s, lyrics2vec.lyrics_preprocessing(s)))

Before: I don't wanna die I sometimes wish I'd never been born at all
After: ["n't", 'wan', 'na', 'die', 'sometimes', 'wish', "'d", 'never', 'born']


### Building the Vocabulary

We begin by initializing the lyrics2vec class.

In [5]:
lyrics_vectorizer = lyrics2vec.lyrics2vec()

We then use the extract_words function to loop through all of the lyric txt files, apply the lyrics_preprocessing function, and append all tokens to a growing list.

The list is saved to a file. The core one for this project being `logs/tf/vocabulary.txt` but the extract_words function can accept any abritrary words file.

In [7]:
words = lyrics_vectorizer.extract_words(
    preprocessing_func=lyrics2vec.lyrics_preprocessing,
    root_dir=lyrics2vec.LYRICS_TXT_DIR,
    words_file=None)

2018-11-25 08:35:23,531 - INFO: No word_file provided. Creating new word file at logs/tf/vocabulary.txt.
2018-11-25 08:35:23,711 - DEBUG: 0/294299 lyric files processed. 0.00 minutes elapsed. 0 contents processed. 0 words acquired.
2018-11-25 08:35:35,271 - DEBUG: 10000/294299 lyric files processed. 0.20 minutes elapsed. 9547 contents processed. 1259791 words acquired.
2018-11-25 08:35:46,964 - DEBUG: 20000/294299 lyric files processed. 0.39 minutes elapsed. 19145 contents processed. 2515120 words acquired.
2018-11-25 08:35:58,796 - DEBUG: 30000/294299 lyric files processed. 0.59 minutes elapsed. 28736 contents processed. 3779741 words acquired.
2018-11-25 08:36:10,394 - DEBUG: 40000/294299 lyric files processed. 0.78 minutes elapsed. 38320 contents processed. 5016148 words acquired.
2018-11-25 08:36:22,229 - DEBUG: 50000/294299 lyric files processed. 0.98 minutes elapsed. 47911 contents processed. 6281925 words acquired.
2018-11-25 08:36:33,928 - DEBUG: 60000/294299 lyric files proces

In [8]:
print('{0} words found'.format(len(words)))
print('First {0} words:\n{1}'.format(10, words[:10]))

36906526 words found
First 10 words:
["'s", 'taken', 'long', 'see', "'ve", 'wrong', 'see', "'d", 'gone', 'today']


### Constructing Dataset From Vocabulary

With the preprocessed vocabulary in hand, we are ready to build the dataset. The dataset will consist of four entities:

* count: a dictionary that maps each unique token to its int num of occurences in the dataset
* dictionary: a dictionary that maps each token to its int id
* reversed_dictionary: a dictionary maps each int id to its token
* data: a list of integer ids in order for all tokens in the dataset

These four entities can be used in conjuction with one another to find, for example, a word given an integer id or vice versa or the number of times a word occurs in the dataset. They are all stored as data members of the lyrics2vec class as they are frequently referenced by the model itself.

In an effort to not be weighted down by the more obscure words in the vocabulary, we've elected to maintain only the top 50,000 words in the vocabulary. The rest will be denoted as 'UNK'.

In [13]:
lyrics_vectorizer.build_dataset(lyrics2vec.VOCAB_SIZE, words)

At this point, the words dictionary is no longer necessary. We remove it to free up memory.

In [14]:
# memory footprint is probably getting pretty large...
# remove unneeded 'words'
del words

And here are some numbers and examples from the output of build_dataset:

In [30]:
print('Length of count: {0}'.format(len(lyrics_vectorizer.count)))
print('count[:5]: {0}'.format(lyrics_vectorizer.count[:5]))
print()
print('Length of dictionary: {0}'.format(len(lyrics_vectorizer.dictionary)))
print('dictionary["hello"]: {0}'.format(lyrics_vectorizer.dictionary['hello']))
print('dictionary["world"]: {0}'.format(lyrics_vectorizer.dictionary['world']))
print()
print('Length of reversed_dictionary: {0}'.format(len(lyrics_vectorizer.reversed_dictionary)))
print('reversed_dictionary[805]: {0}'.format(lyrics_vectorizer.reversed_dictionary[805]))
print('reversed_dictionary[49]: {0}'.format(lyrics_vectorizer.reversed_dictionary[49]))
print()
print('Length of data: {0}'.format(len(lyrics_vectorizer.data)))
print('data[:5]: {0}'.format(lyrics_vectorizer.data[:5]))
l = list()
for word_id in lyrics_vectorizer.data[:5]:
    l.append(lyrics_vectorizer.reversed_dictionary[word_id])
print('reversed_dictionary[data[:5]]: {0}'.format(l))

Length of count: 50000
count[:5]: [['UNK', 1697931], ("'s", 752206), ("n't", 698683), ("'m", 402884), ('love', 310636)]

Length of dictionary: 50000
dictionary["hello"]: 805
dictionary["world"]: 49

Length of reversed_dictionary: 50000
reversed_dictionary[805]: hello
reversed_dictionary[49]: world

Length of data: 36906526
data[:5]: [1, 893, 70, 17, 15]
reversed_dictionary[data[:5]]: ["'s", 'taken', 'long', 'see', "'ve"]


### Training the Word2Vec Model

words

In [39]:
batch, labels = lyrics_vectorizer._generate_batch(
    lyrics_vectorizer.data,
    batch_size=8,
    num_skips=2,
    skip_window=4)

for i in range(batch_size):
    print(batch[i], lyrics_vectorizer.reversed_dictionary[batch[i]], '->', labels[i, 0],
        lyrics_vectorizer.reversed_dictionary[labels[i, 0]])


406 thinking -> 74 find
406 thinking -> 304 tears
113 another -> 406 thinking
113 another -> 74 find
41 man -> 74 find
41 man -> 304 tears
249 today -> 304 tears
249 today -> 245 seen


In [50]:
lyrics_vectorizer.train(
    V=lyrics2vec.VOCAB_SIZE,
    batch_size=128,
    embedding_size=300,
    skip_window=4,
    num_skips=2,
    num_sampled=64
)
lyrics_vectorizer.save_embeddings()

2018-11-25 10:17:22,351 - INFO: Building lyrics2vec graph
2018-11-25 10:17:22,351 - INFO: Building lyrics2vec graph
2018-11-25 10:17:22,354 - INFO: V=50000, batch_size=128, embedding_size=300, skip_window=4, num_skips=2, num_sampled=2
2018-11-25 10:17:22,354 - INFO: V=50000, batch_size=128, embedding_size=300, skip_window=4, num_skips=2, num_sampled=2
2018-11-25 10:17:22,548 - INFO: Beginning graph training
2018-11-25 10:17:22,548 - INFO: Beginning graph training
2018-11-25 10:17:22,639 - INFO: Initialized
2018-11-25 10:17:22,639 - INFO: Initialized
2018-11-25 10:17:22,683 - DEBUG: Average loss at step 0: 271.61688232421875
2018-11-25 10:17:22,683 - DEBUG: Average loss at step 0: 271.61688232421875
2018-11-25 10:17:22,684 - DEBUG: Time Elapsed: 0.0022478262583414716 minutes
2018-11-25 10:17:22,684 - DEBUG: Time Elapsed: 0.0022478262583414716 minutes
2018-11-25 10:17:22,724 - DEBUG: Nearest to 1: dann, kauniin, attending, zon, qu'aujourd'hui, common, policia, snow,
2018-11-25 10:17:22,7

2018-11-25 10:17:55,930 - DEBUG: Nearest to find: ry, 's, commande, terminal, whoomp, peng, lalalalalalala, ovo,
2018-11-25 10:17:55,930 - DEBUG: Nearest to find: ry, 's, commande, terminal, whoomp, peng, lalalalalalala, ovo,
2018-11-25 10:17:55,934 - DEBUG: Nearest to around: oho, lili, whoomp, jee, commande, hine, lalalalalalala, taivaan,
2018-11-25 10:17:55,934 - DEBUG: Nearest to around: oho, lili, whoomp, jee, commande, hine, lalalalalalala, taivaan,
2018-11-25 10:17:55,938 - DEBUG: Nearest to look: oho, mm, macho, exalted, sims, decipher, terminal, bucket,
2018-11-25 10:17:55,938 - DEBUG: Nearest to look: oho, mm, macho, exalted, sims, decipher, terminal, bucket,
2018-11-25 10:17:55,943 - DEBUG: Nearest to know: macho, exalted, terminal, whoomp, lalalalalalala, oho, peng, bim,
2018-11-25 10:17:55,943 - DEBUG: Nearest to know: macho, exalted, terminal, whoomp, lalalalalalala, oho, peng, bim,
2018-11-25 10:17:55,947 - DEBUG: Nearest to 2: jee, bim, whoomp, candle, ovo, tic, termina

2018-11-25 10:19:01,894 - DEBUG: Nearest to well: inda, whoomp, pao, ching-a-ling, moneymaker, ah-ah-ah-ah-ah-ah-ah, ry, peng,
2018-11-25 10:19:01,903 - DEBUG: Nearest to heart: inda, love, know, animator, chiqui, hooligans, do-be-da, bim,
2018-11-25 10:19:01,903 - DEBUG: Nearest to heart: inda, love, know, animator, chiqui, hooligans, do-be-da, bim,
2018-11-25 10:19:01,910 - DEBUG: Nearest to life: 's, do-be-da, know, macho, 'll, inda, cumin, like,
2018-11-25 10:19:01,910 - DEBUG: Nearest to life: 's, do-be-da, know, macho, 'll, inda, cumin, like,
2018-11-25 10:19:01,916 - DEBUG: Nearest to 're: 'm, 's, know, inda, hooligans, pao, aum, do-be-da,
2018-11-25 10:19:01,916 - DEBUG: Nearest to 're: 'm, 's, know, inda, hooligans, pao, aum, do-be-da,
2018-11-25 10:19:01,922 - DEBUG: Nearest to la: terminal, funkadelala, UNK, inda, chiquitita, aum, oho, pao,
2018-11-25 10:19:01,922 - DEBUG: Nearest to la: terminal, funkadelala, UNK, inda, chiquitita, aum, oho, pao,
2018-11-25 10:19:01,927 - D

2018-11-25 10:19:41,521 - DEBUG: Time Elapsed: 2.3161828597386678 minutes
2018-11-25 10:19:41,521 - DEBUG: Time Elapsed: 2.3161828597386678 minutes
2018-11-25 10:19:48,144 - DEBUG: Average loss at step 44000: 5.741657909452915
2018-11-25 10:19:48,144 - DEBUG: Average loss at step 44000: 5.741657909452915
2018-11-25 10:19:48,147 - DEBUG: Time Elapsed: 2.42662593126297 minutes
2018-11-25 10:19:48,147 - DEBUG: Time Elapsed: 2.42662593126297 minutes
2018-11-25 10:19:54,685 - DEBUG: Average loss at step 46000: 5.7037414273321625
2018-11-25 10:19:54,685 - DEBUG: Average loss at step 46000: 5.7037414273321625
2018-11-25 10:19:54,688 - DEBUG: Time Elapsed: 2.535642659664154 minutes
2018-11-25 10:19:54,688 - DEBUG: Time Elapsed: 2.535642659664154 minutes
2018-11-25 10:20:01,235 - DEBUG: Average loss at step 48000: 5.657136115074158
2018-11-25 10:20:01,235 - DEBUG: Average loss at step 48000: 5.657136115074158
2018-11-25 10:20:01,238 - DEBUG: Time Elapsed: 2.644806778430939 minutes
2018-11-25 10

2018-11-25 10:20:40,930 - DEBUG: Nearest to cause: zimbo, pao, inda, blackman, got, get, macho, 're,
2018-11-25 10:20:40,935 - DEBUG: Nearest to would: could, pao, never, eat'em, inda, jee, oho, saboreando,
2018-11-25 10:20:40,935 - DEBUG: Nearest to would: could, pao, never, eat'em, inda, jee, oho, saboreando,
2018-11-25 10:20:40,939 - DEBUG: Nearest to find: 'll, love, see, inda, oooo-oooo, aum, animator, cumin,
2018-11-25 10:20:40,939 - DEBUG: Nearest to find: 'll, love, see, inda, oooo-oooo, aum, animator, cumin,
2018-11-25 10:20:40,944 - DEBUG: Nearest to around: aum, 's, saboreando, chiqui, cumin, pao, ching-a-ling, do-be-da,
2018-11-25 10:20:40,944 - DEBUG: Nearest to around: aum, 's, saboreando, chiqui, cumin, pao, ching-a-ling, do-be-da,
2018-11-25 10:20:40,948 - DEBUG: Nearest to look: do-be-da, aum, oho, see, chiqui, 's, pao, saboreando,
2018-11-25 10:20:40,948 - DEBUG: Nearest to look: do-be-da, aum, oho, see, chiqui, 's, pao, saboreando,
2018-11-25 10:20:40,953 - DEBUG: Ne

2018-11-25 10:21:46,812 - DEBUG: Nearest to 1: chorus, 's, a-ask, corinna, inda, 2, whoomp, doo-ah,
2018-11-25 10:21:46,822 - DEBUG: Nearest to well: a-ask, corinna, inda, doo-ah, 's, hooligans, pao, whoomp,
2018-11-25 10:21:46,822 - DEBUG: Nearest to well: a-ask, corinna, inda, doo-ah, 's, hooligans, pao, whoomp,
2018-11-25 10:21:46,829 - DEBUG: Nearest to heart: inda, love, a-ask, hooligans, doo-ah, do-be-da, libe, 're,
2018-11-25 10:21:46,829 - DEBUG: Nearest to heart: inda, love, a-ask, hooligans, doo-ah, do-be-da, libe, 're,
2018-11-25 10:21:46,836 - DEBUG: Nearest to life: a-ask, inda, lucinda, pao, hooligans, corinna, 's, do-be-da,
2018-11-25 10:21:46,836 - DEBUG: Nearest to life: a-ask, inda, lucinda, pao, hooligans, corinna, 's, do-be-da,
2018-11-25 10:21:46,841 - DEBUG: Nearest to 're: 'm, know, 's, a-ask, hooligans, inda, moneymaker, doo-ah,
2018-11-25 10:21:46,841 - DEBUG: Nearest to 're: 'm, know, 's, a-ask, hooligans, inda, moneymaker, doo-ah,
2018-11-25 10:21:46,846 - DE

2018-11-25 10:22:26,514 - DEBUG: Average loss at step 92000: 5.076148509383201
2018-11-25 10:22:26,517 - DEBUG: Time Elapsed: 5.066127101580302 minutes
2018-11-25 10:22:26,517 - DEBUG: Time Elapsed: 5.066127101580302 minutes
2018-11-25 10:22:33,125 - DEBUG: Average loss at step 94000: 5.111923713445663
2018-11-25 10:22:33,125 - DEBUG: Average loss at step 94000: 5.111923713445663
2018-11-25 10:22:33,128 - DEBUG: Time Elapsed: 5.176307479540507 minutes
2018-11-25 10:22:33,128 - DEBUG: Time Elapsed: 5.176307479540507 minutes
2018-11-25 10:22:39,696 - DEBUG: Average loss at step 96000: 5.074449409872294
2018-11-25 10:22:39,696 - DEBUG: Average loss at step 96000: 5.074449409872294
2018-11-25 10:22:39,699 - DEBUG: Time Elapsed: 5.285823913415273 minutes
2018-11-25 10:22:39,699 - DEBUG: Time Elapsed: 5.285823913415273 minutes
2018-11-25 10:22:46,288 - DEBUG: Average loss at step 98000: 5.159196599721908
2018-11-25 10:22:46,288 - DEBUG: Average loss at step 98000: 5.159196599721908
2018-11-2

In [44]:
embeddings_png = os.path.join(
    lyrics2vec.LOGS_TF_DIR, 
    '{0}_{1}'.format(datetime.datetime.now().strftime('%Y-%m-%d_%H-%M-%S'), 'embeddings.png'))
lyrics_vectorizer.plot_with_labels(embeddings_png)

2018-11-25 09:12:44,796 - INFO: Beginning label plotting
2018-11-25 09:12:44,796 - INFO: Beginning label plotting
2018-11-25 09:20:47,172 - INFO: Elapsed Time: 8.039559098084768
2018-11-25 09:20:47,172 - INFO: Elapsed Time: 8.039559098084768
2018-11-25 09:20:47,174 - INFO: saved plot at logs/tf/2018-11-25_09-12-44_embeddings.png
2018-11-25 09:20:47,174 - INFO: saved plot at logs/tf/2018-11-25_09-12-44_embeddings.png


![title](logs/tf/2018-11-25_09-12-44_embeddings.png)

## Lyrics2vec Optimizations

Because generating the vocabulary takes 6+ minutes and training the word embeddings takes 5 minutes, we've enabled several optimizations in the lyrics2vec class.

1. Vocabulary Saving
2. Dataset Pickling
3. Embedding Saving

lyrics2vec contains functions to do each of the above so that you only have to do each step once.

In [45]:
# vocabulary.txt was already saved as part of the extract_words step
# save datasets
lyrics_vectorizer.save_datasets()
# save word embeddings
lyrics_vectorizer.save_embeddings()

2018-11-25 09:29:11,356 - DEBUG: pickled <class 'list'> to logs/tf/lyrics2vec_data.pickle
2018-11-25 09:29:11,356 - DEBUG: pickled <class 'list'> to logs/tf/lyrics2vec_data.pickle
2018-11-25 09:29:11,411 - DEBUG: pickled <class 'list'> to logs/tf/lyrics2vec_count.pickle
2018-11-25 09:29:11,411 - DEBUG: pickled <class 'list'> to logs/tf/lyrics2vec_count.pickle
2018-11-25 09:29:11,439 - DEBUG: pickled <class 'dict'> to logs/tf/lyrics2vec_dict.pickle
2018-11-25 09:29:11,439 - DEBUG: pickled <class 'dict'> to logs/tf/lyrics2vec_dict.pickle
2018-11-25 09:29:11,451 - DEBUG: pickled <class 'dict'> to logs/tf/lyrics2vec_revdict.pickle
2018-11-25 09:29:11,451 - DEBUG: pickled <class 'dict'> to logs/tf/lyrics2vec_revdict.pickle
2018-11-25 09:29:11,452 - INFO: datasets successfully pickled
2018-11-25 09:29:11,452 - INFO: datasets successfully pickled
2018-11-25 09:29:11,486 - DEBUG: pickled <class 'numpy.ndarray'> to logs/tf/lyrics2vec_embeddings.pickle
2018-11-25 09:29:11,486 - DEBUG: pickled <c

There are also functions to let you pick up where you left off.

In [46]:
lyrics_vectorizer.load_datasets()
lyrics_vectorizer.load_embeddings()

2018-11-25 09:29:59,722 - DEBUG: unpickled <class 'list'> from logs/tf/lyrics2vec_data.pickle
2018-11-25 09:29:59,722 - DEBUG: unpickled <class 'list'> from logs/tf/lyrics2vec_data.pickle
2018-11-25 09:30:00,378 - DEBUG: unpickled <class 'list'> from logs/tf/lyrics2vec_count.pickle
2018-11-25 09:30:00,378 - DEBUG: unpickled <class 'list'> from logs/tf/lyrics2vec_count.pickle
2018-11-25 09:30:00,395 - DEBUG: unpickled <class 'dict'> from logs/tf/lyrics2vec_dict.pickle
2018-11-25 09:30:00,395 - DEBUG: unpickled <class 'dict'> from logs/tf/lyrics2vec_dict.pickle
2018-11-25 09:30:00,404 - DEBUG: unpickled <class 'dict'> from logs/tf/lyrics2vec_revdict.pickle
2018-11-25 09:30:00,404 - DEBUG: unpickled <class 'dict'> from logs/tf/lyrics2vec_revdict.pickle
2018-11-25 09:30:00,418 - INFO: datasets successfully loaded via pickle
2018-11-25 09:30:00,418 - INFO: datasets successfully loaded via pickle
2018-11-25 09:30:00,438 - DEBUG: unpickled <class 'numpy.ndarray'> from logs/tf/lyrics2vec_embed

True

And finally, the vocabulary and dataset can be loaded all in one go with

In [47]:
lyrics2vec.lyrics2vec.InitFromLyrics()

2018-11-25 09:30:46,179 - DEBUG: unpickled <class 'list'> from logs/tf/lyrics2vec_data.pickle
2018-11-25 09:30:46,179 - DEBUG: unpickled <class 'list'> from logs/tf/lyrics2vec_data.pickle
2018-11-25 09:30:46,700 - DEBUG: unpickled <class 'list'> from logs/tf/lyrics2vec_count.pickle
2018-11-25 09:30:46,700 - DEBUG: unpickled <class 'list'> from logs/tf/lyrics2vec_count.pickle
2018-11-25 09:30:46,712 - DEBUG: unpickled <class 'dict'> from logs/tf/lyrics2vec_dict.pickle
2018-11-25 09:30:46,712 - DEBUG: unpickled <class 'dict'> from logs/tf/lyrics2vec_dict.pickle
2018-11-25 09:30:46,721 - DEBUG: unpickled <class 'dict'> from logs/tf/lyrics2vec_revdict.pickle
2018-11-25 09:30:46,721 - DEBUG: unpickled <class 'dict'> from logs/tf/lyrics2vec_revdict.pickle
2018-11-25 09:30:46,722 - INFO: datasets successfully loaded via pickle
2018-11-25 09:30:46,722 - INFO: datasets successfully loaded via pickle


<lyrics2vec()>