# Model results by trivial testing

We can make a trivial test of how the trained recommender systems are performing.

In [1]:
import gensim

First, load both our recommender models saved from earlier.

In [2]:
model_cbow = gensim.models.Word2Vec.load('dependencies_recommender_w2v_cbow')
model_skipgram = gensim.models.Word2Vec.load('dependencies_recommender_w2v_skipgram')

Then as a test dataset, we load the whole dependencies dataset we extracted to lists of lists.

In [3]:
import csv

In [4]:
with open('extracted_dependencies.csv', 'r') as csv_file:
    entries_csv = csv.reader(csv_file, delimiter=',')
    test_entries = list(entries_csv)

# Testing the results

For the simplicity we are going to use a trivial testing method. We randomly select a test_entry from the test_entries, randomly pop out one item from the test_entry and then check if the suggestions made by the recommender model contained the popped out entry.

We can select how much of these tests can be executed.

In [5]:
test_iterations = 100

We can also specify how much missing_entries will model suggest. Let's set the limit to one hundred suggestions.

In [6]:
top_values = 50

Now we can begin testing the both models. We are going to need the random module for randomly selecting the indexes of selected entries and items.

In [7]:
from random import randint

As a measurement the ratio of succesful:unsuccessful will be at the end calculated

In [8]:
successful_suggestions_cbow = 0
successful_suggestions_skipgram = 0

We can see that I did not process the dataset well. I actually forgot to exclude the transactions that contained only one item, which are in terms of analyzing relations between dependencies unuseful. This teaches us a lesson to always think about how the data would be selected - if right choices are not made, the model and usage implementation can greatly suffer from it.

In [9]:
i = 0
while i < test_iterations:
    test_entry = test_entries.pop(randint(0, len(test_entries)-1))
    if len(test_entry) <= 1:
        continue
    missing_dependency = test_entry.pop(randint(0, len(test_entry)-1))
    
    predictions_cbow = model_cbow.predict_output_word(test_entry, topn=top_values)
    predicted_modules_cbow = [x[0] for x in predictions_cbow]
    predictions_skipgram = model_skipgram.predict_output_word(test_entry, topn=top_values)
    predicted_modules_skipgram = [x[0] for x in predictions_skipgram]
    
    if missing_dependency in predicted_modules_cbow:
        successful_suggestions_cbow += 1
    if missing_dependency in predicted_modules_skipgram:
        successful_suggestions_skipgram += 1
    i += 1

Note: sometimes the word2vec model can output RuntimeWarning, the reasons behind this are unknown for me and need to be further inspected.

We can print the statistics for the CBOW variant:

In [10]:
print('number of successful CBOW recommender suggestions:', successful_suggestions_cbow)
print('number of failed CBOW recommender suggestions:', test_iterations - successful_suggestions_cbow)

number of successful CBOW recommender suggestions: 83
number of failed CBOW recommender suggestions: 17


And also for the Skip-gram variant:

In [11]:
print('number of successful Skip-gram recommender suggestions:', successful_suggestions_skipgram)
print('number of failed Skip-gram recommender suggestions:', test_iterations - successful_suggestions_skipgram)

number of successful Skip-gram recommender suggestions: 68
number of failed Skip-gram recommender suggestions: 32


# Final comparisson

In [12]:
print('The limit of', top_values, 'suggestions per one recommendation was used')
print('The CBOW model was successful in', (successful_suggestions_cbow / test_iterations)*100, '% of cases')
print('The Skip-gram model was successful in', (successful_suggestions_skipgram / test_iterations)*100, '% of cases')


The limit of 50 suggestions per one recommendation was used
The CBOW model was successful in 83.0 % of cases
The Skip-gram model was successful in 68.0 % of cases
