# Tutorial: Using Doc2Vec to calculate similarity between documents

## Overview
Duration: 1min

This tutorial will show you how to use Doc2Vec to calculate similarity scores between pairs of documents. Doc2Vec is an NLP tool for representing documents as a vector and is a generalizing of the Word2Vec method. 

Before you start, follow the instructions from [README.md](https://github.com/ucals/bettertogether/blob/master/README.md) to get the PDF documents in your local machine. This corpus, which will be located in `pdfs/` folder, contains ~1,000 documents. They are essays written by OMSCS CS-6460 students in Fall 2018 class.

### What you'll learn

- How to pre-process text, getting it ready to apply NLP tecniques, by using [Gensim Library](https://radimrehurek.com/gensim/)
- How to train Doc2Vec model, also using [Gensim Library](https://radimrehurek.com/gensim/)
- How to test it by eye, using [TextRazor](https://www.textrazor.com/) classification API to check if the similarities from our trained model are actually true

### What you'll need

- The corpora: a collection of PDF files correspondent to Assignments 2, 3 and 4 from OMSCS CS-6460 students
- Some basic Python knowledge

### Steps:

1. Preprocess the text: 10min
2. Train Doc2Vec model: 5min
3. Test the result: 10min

## Step 1: Preprocess the text

The first step will be preprocess the text, getting it ready to train Doc2Vec model. We will do this by combining all assignment documents in one single file called `unigram_sentences_all_assignment_X.txt`, where X will be 2, 3 or 4 depending on the assignment. In this file, each line will be one sentence from the assignment documents. We will read the assignment documents line-by-line, pre-process each line using a simple gensim pre-processing tool (i.e., tokenize text into individual words, remove punctuation, set to lowercase, etc), and return a list of words. Then, we will combine the list of words of each line in one string (separated by spaces) and store it in the file.

In [1]:
# Basic setup
import logging, gensim, os, textract
import itertools as it
from gensim.parsing.preprocessing import preprocess_string, remove_stopwords

logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO)

In [2]:
assignment = "Assignment 1"
save_files = True

assignment_code = assignment.lower().replace(' ', '_')
path = os.path.join(os.getcwd(), "pdfs")

In [3]:
unigram_sentences_filepath = os.path.join(path, 'unigram_sentences_all_' + assignment_code + '.txt')

if save_files:
    with open(unigram_sentences_filepath, 'w', encoding='utf_8') as f:
        for file in os.listdir(path):
            if assignment in str(file):
                text = textract.process(os.path.join(path, file)).decode("utf-8")
                unigram_sentences = []
                for sent in gensim.summarization.textcleaner.get_sentences(text):
                    processed = preprocess_string(sent)
                    if len(processed) > 0:
                        f.write(" ".join(processed) + "\n")

Gensim's LineSentence class provides a convenient iterator for working with other gensim components. It streams the documents/sentences from disk, so that you never have to hold the entire corpus in RAM at once. This allows you to scale your modeling pipeline up to potentially very large corpora:

In [4]:
unigram_sentences = gensim.models.word2vec.LineSentence(unigram_sentences_filepath)

Now, we need to apply *phrase modeling* to combine tokens that together represent meaningful multi-word concepts (bi-grams and tri-grams). After applying it, `new york` would become `new_york`; `new york times` would become `new_york_times`. 

First, let's create bi-grams from the uni-grams:

In [5]:
bigram_model_filepath = os.path.join(path, 'bigram_model_all_' + assignment_code)
bigram_model = gensim.models.Phrases(unigram_sentences)
if save_files:
    bigram_model.save(bigram_model_filepath)

2018-11-06 11:37:38,715 : INFO : collecting all words and their counts
2018-11-06 11:37:38,718 : INFO : PROGRESS: at sentence #0, processed 0 words and 0 word types
2018-11-06 11:37:38,847 : INFO : PROGRESS: at sentence #10000, processed 46813 words and 32555 word types
2018-11-06 11:37:38,888 : INFO : collected 42725 word types from a corpus of 64536 words (unigram + bigrams) and 13742 sentences
2018-11-06 11:37:38,888 : INFO : using 42725 counts as vocab in Phrases<0 vocab, min_count=5, threshold=10.0, max_vocab_size=40000000>
2018-11-06 11:37:38,889 : INFO : saving Phrases object under /Users/carlossouza/Dropbox/2018/OMSCS/CS-6460-Education-Technology/bettertogether/pdfs/bigram_model_all_assignment_1, separately None
2018-11-06 11:37:38,950 : INFO : saved /Users/carlossouza/Dropbox/2018/OMSCS/CS-6460-Education-Technology/bettertogether/pdfs/bigram_model_all_assignment_1


In [6]:
bigram_sentences_filepath = os.path.join(path, 'bigram_sentences_all_' + assignment_code + '.txt')

if save_files:
    with open(bigram_sentences_filepath, 'w', encoding='utf_8') as f:
        for unigram_sentence in unigram_sentences:
            bigram_sentence = " ".join(bigram_model[unigram_sentence])
            f.write(bigram_sentence + "\n")
            
bigram_sentences = gensim.models.word2vec.LineSentence(bigram_sentences_filepath)



We can check that it worked by examining a small slice of our file:

In [7]:
bigram_sentences = gensim.models.word2vec.LineSentence(bigram_sentences_filepath)

for bigram_sentence in it.islice(bigram_sentences, 3000, 3010):
    print(u' '.join(bigram_sentence))
    print(u'')

student interview

medic profession gain insight method effect futur

direct interest

student conclud human patient simul

prefer technolog advanc medic educ physician provid interview

student ad color gener consensu medic commun simul

supplement replac true human human interact medic educ cadav

dissect virtual_realiti inform real_life

project interest recent ask compani work

“doximity” profession platform physician lower skill healthcar worker rn



As you can see, the algorithm combined some tokens, forming bi-grams like `mobil_devic`, `social_media`, etc.

Now, let's use the bi-grams to create tri-grams:

In [8]:
trigram_model_filepath = os.path.join(path, 'trigram_model_all_' + assignment_code)
trigram_model = gensim.models.Phrases(bigram_sentences)
if save_files:
    trigram_model.save(trigram_model_filepath)

2018-11-06 11:37:49,378 : INFO : collecting all words and their counts
2018-11-06 11:37:49,379 : INFO : PROGRESS: at sentence #0, processed 0 words and 0 word types
2018-11-06 11:37:49,499 : INFO : PROGRESS: at sentence #10000, processed 44481 words and 32970 word types
2018-11-06 11:37:49,547 : INFO : collected 43344 word types from a corpus of 61388 words (unigram + bigrams) and 13742 sentences
2018-11-06 11:37:49,547 : INFO : using 43344 counts as vocab in Phrases<0 vocab, min_count=5, threshold=10.0, max_vocab_size=40000000>
2018-11-06 11:37:49,548 : INFO : saving Phrases object under /Users/carlossouza/Dropbox/2018/OMSCS/CS-6460-Education-Technology/bettertogether/pdfs/trigram_model_all_assignment_1, separately None
2018-11-06 11:37:49,618 : INFO : saved /Users/carlossouza/Dropbox/2018/OMSCS/CS-6460-Education-Technology/bettertogether/pdfs/trigram_model_all_assignment_1


In [9]:
trigram_sentences_filepath = os.path.join(path, 'trigram_sentences_all_' + assignment_code + '.txt')

if save_files:
    with open(trigram_sentences_filepath, 'w', encoding='utf_8') as f:
        for bigram_sentence in bigram_sentences:
            trigram_sentence = " ".join(trigram_model[bigram_sentence])
            f.write(trigram_sentence + "\n")
            
trigram_sentences = gensim.models.word2vec.LineSentence(trigram_sentences_filepath)



In [10]:
trigram_assignments_filepath = os.path.join(path, 'trigram_transformed_assignments_all_' + assignment_code + '.txt')

if save_files:
    with open(trigram_assignments_filepath, 'w', encoding='utf_8') as f:
        for file in os.listdir(path):
            if assignment in str(file):
                text = textract.process(os.path.join(path, file)).decode("utf-8")
                unigram_assignment = preprocess_string(text)
                bigram_assignment = bigram_model[unigram_assignment]
                trigram_assignment = trigram_model[bigram_assignment]
                f.write(" ".join(trigram_assignment) + "\n") 











Let's examine a slice of our tri-gram file to check that it worked as well:

In [11]:
for trigram_sentence in it.islice(trigram_sentences, 1000, 1010):
    print(u' '.join(trigram_sentence))
    print(u'')

explor multipl type

game competit strategi consid audienc modifi depth

content order cover benefici

audienc

program experi didn’t want basic

creat tool

gamifi learn program python includ leaderboard user

sens accomplish encourag learn

like cool

project took account audienc like fun wai learn python



As you can see, it worked: our algorithm created tri-grams like `simul_base_learn` (simulated based learning).

### Comparing the original version with the preprocessed version of an assignment

Let's see exactly what we did by randomly selecting an assignment and comparing as slice (first 2000 characters) of its original version with an slice of its preprocessed version:

In [12]:
from random import randrange

file_list = []
for file in os.listdir(path):
    if assignment in str(file):
        file_list.append(file)

random_index = randrange(len(file_list))
random_filename = file_list[random_index]

with open(trigram_assignments_filepath) as f:
    trigram_assignments_list = f.readlines()

trigram_assignments_list = [x.strip() for x in trigram_assignments_list]

In [13]:
original = textract.process(os.path.join(path, random_filename)).decode("utf-8")
print(original[:2000])

HW1
Luke Rucks
John Yahiro tackled the problem of ghost writing, creating a web app that can identify
assignments written by someone other than the person who claimed to have wrote it. He used
cosine similarity and Dale-Chall readability in combination for this task. Cosign similarity
compares the relative frequency of terms appearing in a given writing sample, whereas DaleChall measures the frequency of use of words outside the most common 3000 words used by 4th
graders. He ended up having a false-positive ghost-written identification rate of 2.7%, while his
thresholds were chosen so that there would be no false negatives. The biggest question regards
his data: there were only 3 students who had written 3 papers, and because of the general lack of
data, there was no test-train split, so his results (though impressive) were almost certainly
massively overfit to the limited data. As a follow-up to this, I would like to see both an
application that could do more and one which is more rob

In [14]:
processed = trigram_assignments_list[random_index]
print(processed[:2000])

luke ruck john yahiro tackl problem ghost write creat web_app identifi assign written person claim wrote cosin similar dale chall readabl combin task cosign similar compar rel frequenc term appear given write sampl dalechal measur frequenc us word outsid common word grader end have fals posit ghost written identif rate threshold chosen fals neg biggest question regard data student written paper gener lack data test train split result impress certainli massiv overfit limit data follow like applic robust actual statist reliabl solut need data provid john addit applic love classif allow probabl distribut written given paper base catalog previous complet on help catch student shirk work person help cheat david_murphi us adapt_learn teach music_theori train student learn thing like rd th chord progress increas difficulti rel student capabl iphon app student identifi chord type harmoni distanc base sound sight app increas difficulti increas tempo tempo varianc base note varianc base student’

As you can see, our algorithm worked perfectly!

## Step 2: Train Doc2Vec model

Before we train our model, let's define a function `read_corpus` to open the train/test file (with latin encoding), read the file line-by-line, pre-process each line using a simple gensim pre-processing tool (i.e., tokenize text into individual words, remove punctuation, set to lowercase, etc), and return a list of words. Also, to train the model, we'll need to associate a tag/number with each line of the training corpus. In our case, the tag is the student's name, given by `get_student_name` function:

In [15]:
def read_corpus(fname, tokens_only=False):
    with open(fname, encoding="utf-8") as f:
        for i, line in enumerate(f):
            if tokens_only:
                yield gensim.utils.simple_preprocess(line)
            else:
                # For training data, add tags
                yield gensim.models.doc2vec.TaggedDocument(gensim.utils.simple_preprocess(line), [get_student_name(i)])
                
def get_student_name(file_index):
    return file_list[file_index].split(" -")[0]

In [16]:
train_corpus = list(read_corpus(trigram_assignments_filepath))
test_corpus = list(read_corpus(trigram_assignments_filepath, tokens_only=True))

Now, we'll instantiate a Doc2Vec model with a vector size with 300 words and iterating over the training corpus 100 times. We set the minimum word count to 5 in order to discard words with very few occurrences. (Without a variety of representative examples, retaining such infrequent words can often make a model worse!) Typical iteration counts in published 'Paragraph Vectors' results, using 10s-of-thousands to millions of docs, are 10-20. More iterations take more time and eventually reach a point of diminishing returns.

In [17]:
model = gensim.models.doc2vec.Doc2Vec(vector_size=300, min_count=5, epochs=100)

Now, let's build a vocabulary. Essentially, the vocabulary is a dictionary (accessible via `model.wv.vocab`) of all of the unique words extracted from the training corpus along with the count (e.g., `model.wv.vocab['penalty'].count` for counts for the word `penalty`).

In [18]:
model.build_vocab(train_corpus)

2018-11-06 11:38:30,942 : INFO : collecting all words and their counts
2018-11-06 11:38:30,943 : INFO : PROGRESS: at example #0, processed 0 words (0/s), 0 word types, 0 tags
2018-11-06 11:38:30,958 : INFO : collected 5569 word types and 180 unique tags from a corpus of 180 examples and 59963 words
2018-11-06 11:38:30,959 : INFO : Loading a fresh vocabulary
2018-11-06 11:38:30,965 : INFO : effective_min_count=5 retains 1884 unique words (33% of original 5569, drops 3685)
2018-11-06 11:38:30,966 : INFO : effective_min_count=5 leaves 53740 word corpus (89% of original 59963, drops 6223)
2018-11-06 11:38:30,972 : INFO : deleting the raw counts dictionary of 5569 items
2018-11-06 11:38:30,973 : INFO : sample=0.001 downsamples 51 most-common words
2018-11-06 11:38:30,974 : INFO : downsampling leaves estimated 47285 word corpus (88.0% of prior 53740)
2018-11-06 11:38:30,984 : INFO : estimated required memory for 1884 words and 300 dimensions: 5715600 bytes
2018-11-06 11:38:30,985 : INFO : re

Now, let's train our model. It should take ~10 seconds to run:

In [19]:
%time model.train(train_corpus, total_examples=model.corpus_count, epochs=model.epochs)

2018-11-06 11:38:34,462 : INFO : training model with 3 workers on 1884 vocabulary and 300 features, using sg=0 hs=0 sample=0.001 negative=5 window=5
2018-11-06 11:38:34,549 : INFO : worker thread finished; awaiting finish of 2 more threads
2018-11-06 11:38:34,554 : INFO : worker thread finished; awaiting finish of 1 more threads
2018-11-06 11:38:34,562 : INFO : worker thread finished; awaiting finish of 0 more threads
2018-11-06 11:38:34,564 : INFO : EPOCH - 1 : training on 59963 raw words (47511 effective words) took 0.1s, 495580 effective words/s
2018-11-06 11:38:34,651 : INFO : worker thread finished; awaiting finish of 2 more threads
2018-11-06 11:38:34,652 : INFO : worker thread finished; awaiting finish of 1 more threads
2018-11-06 11:38:34,656 : INFO : worker thread finished; awaiting finish of 0 more threads
2018-11-06 11:38:34,657 : INFO : EPOCH - 2 : training on 59963 raw words (47449 effective words) took 0.1s, 556451 effective words/s
2018-11-06 11:38:34,724 : INFO : worker

2018-11-06 11:38:35,996 : INFO : worker thread finished; awaiting finish of 2 more threads
2018-11-06 11:38:35,997 : INFO : worker thread finished; awaiting finish of 1 more threads
2018-11-06 11:38:35,997 : INFO : worker thread finished; awaiting finish of 0 more threads
2018-11-06 11:38:35,998 : INFO : EPOCH - 21 : training on 59963 raw words (47401 effective words) took 0.1s, 736468 effective words/s
2018-11-06 11:38:36,064 : INFO : worker thread finished; awaiting finish of 2 more threads
2018-11-06 11:38:36,066 : INFO : worker thread finished; awaiting finish of 1 more threads
2018-11-06 11:38:36,067 : INFO : worker thread finished; awaiting finish of 0 more threads
2018-11-06 11:38:36,068 : INFO : EPOCH - 22 : training on 59963 raw words (47510 effective words) took 0.1s, 703405 effective words/s
2018-11-06 11:38:36,136 : INFO : worker thread finished; awaiting finish of 2 more threads
2018-11-06 11:38:36,137 : INFO : worker thread finished; awaiting finish of 1 more threads
2018

2018-11-06 11:38:37,303 : INFO : worker thread finished; awaiting finish of 2 more threads
2018-11-06 11:38:37,305 : INFO : worker thread finished; awaiting finish of 1 more threads
2018-11-06 11:38:37,309 : INFO : worker thread finished; awaiting finish of 0 more threads
2018-11-06 11:38:37,309 : INFO : EPOCH - 41 : training on 59963 raw words (47412 effective words) took 0.1s, 693291 effective words/s
2018-11-06 11:38:37,370 : INFO : worker thread finished; awaiting finish of 2 more threads
2018-11-06 11:38:37,371 : INFO : worker thread finished; awaiting finish of 1 more threads
2018-11-06 11:38:37,373 : INFO : worker thread finished; awaiting finish of 0 more threads
2018-11-06 11:38:37,374 : INFO : EPOCH - 42 : training on 59963 raw words (47518 effective words) took 0.1s, 764907 effective words/s
2018-11-06 11:38:37,451 : INFO : worker thread finished; awaiting finish of 2 more threads
2018-11-06 11:38:37,452 : INFO : worker thread finished; awaiting finish of 1 more threads
2018

2018-11-06 11:38:38,700 : INFO : worker thread finished; awaiting finish of 1 more threads
2018-11-06 11:38:38,701 : INFO : worker thread finished; awaiting finish of 0 more threads
2018-11-06 11:38:38,702 : INFO : EPOCH - 61 : training on 59963 raw words (47398 effective words) took 0.1s, 731757 effective words/s
2018-11-06 11:38:38,764 : INFO : worker thread finished; awaiting finish of 2 more threads
2018-11-06 11:38:38,765 : INFO : worker thread finished; awaiting finish of 1 more threads
2018-11-06 11:38:38,766 : INFO : worker thread finished; awaiting finish of 0 more threads
2018-11-06 11:38:38,766 : INFO : EPOCH - 62 : training on 59963 raw words (47413 effective words) took 0.1s, 762978 effective words/s
2018-11-06 11:38:38,823 : INFO : worker thread finished; awaiting finish of 2 more threads
2018-11-06 11:38:38,825 : INFO : worker thread finished; awaiting finish of 1 more threads
2018-11-06 11:38:38,828 : INFO : worker thread finished; awaiting finish of 0 more threads
2018

2018-11-06 11:38:39,934 : INFO : worker thread finished; awaiting finish of 0 more threads
2018-11-06 11:38:39,934 : INFO : EPOCH - 81 : training on 59963 raw words (47533 effective words) took 0.1s, 767864 effective words/s
2018-11-06 11:38:39,996 : INFO : worker thread finished; awaiting finish of 2 more threads
2018-11-06 11:38:39,996 : INFO : worker thread finished; awaiting finish of 1 more threads
2018-11-06 11:38:39,997 : INFO : worker thread finished; awaiting finish of 0 more threads
2018-11-06 11:38:39,998 : INFO : EPOCH - 82 : training on 59963 raw words (47554 effective words) took 0.1s, 785563 effective words/s
2018-11-06 11:38:40,054 : INFO : worker thread finished; awaiting finish of 2 more threads
2018-11-06 11:38:40,059 : INFO : worker thread finished; awaiting finish of 1 more threads
2018-11-06 11:38:40,061 : INFO : worker thread finished; awaiting finish of 0 more threads
2018-11-06 11:38:40,062 : INFO : EPOCH - 83 : training on 59963 raw words (47536 effective word

CPU times: user 16.4 s, sys: 497 ms, total: 16.9 s
Wall time: 6.64 s


In [20]:
model_path = os.path.join(os.getcwd(), "models")
doc2vec_model_filepath = os.path.join(model_path, 'doc2vec_model_' + assignment_code)
if save_files:
    model.save(doc2vec_model_filepath)

2018-11-06 11:38:42,951 : INFO : saving Doc2Vec object under /Users/carlossouza/Dropbox/2018/OMSCS/CS-6460-Education-Technology/bettertogether/models/doc2vec_model_assignment_1, separately None
2018-11-06 11:38:43,007 : INFO : saved /Users/carlossouza/Dropbox/2018/OMSCS/CS-6460-Education-Technology/bettertogether/models/doc2vec_model_assignment_1


## Step 3: Test the result

That's it! Now, let's test the result by first looking who are the most similar students of **Carlos Souza**:

In [21]:
test_student = "Carlos Souza"

similar_doc = model.docvecs.most_similar(test_student) 
print("Most similar documents of " + test_student + " for " + assignment)
for item in similar_doc:
    print(str(item[1])  + " \t" + str(item[0]))

2018-11-06 11:38:46,811 : INFO : precomputing L2-norms of doc weight vectors


Most similar documents of Carlos Souza for Assignment 1
0.38691163063049316 	Andrew Wolfe
0.3522692918777466 	Navnit Belur
0.34605270624160767 	Imad Rahman
0.33395418524742126 	Richard Doan
0.3208683729171753 	Huan Chen
0.3152480125427246 	Langston Chandler
0.30462008714675903 	Shelby Allen
0.30134904384613037 	Mark Wahnish
0.2821483612060547 	Braden Whited
0.2779601514339447 	Daniel Fecher


  if np.issubdtype(vec.dtype, np.int):


Our model says that, based on **Assignment 2**, **Carlos Souza**'s most similar student is **Alaa Shafaee**. Let's see if that's true by retrieving main topics from Carlos Souza's Assignment 2 and Alaa Shafaee's Assignment 2 and compare them. We will do it by using an NPL API called TextRazor.

First, let's define a function `print_top_entities` to retrieve main topics of an assignment, and initialize TextRazor:

In [22]:
def print_top_entities(student, limit=15):
    text = textract.process(os.path.join(path, student + " - " + assignment + ".pdf")).decode("utf-8")
    response = client.analyze(text)
    entities = list(response.entities())
    entities.sort(key=lambda x: x.relevance_score, reverse=True)
    seen = set()
    index = 0
    for entity in entities:
        if entity.id not in seen:
            print(entity.id, entity.relevance_score, entity.confidence_score)
            seen.add(entity.id)
            index += 1
            if index >= limit:
                break

In [23]:
import textrazor

textrazor.api_key = "d26228c7df9e0ea2a9c2656b520eefe049ae8c837fb1ef5a95619e04"
client = textrazor.TextRazor(extractors=["entities", "topics"])

Now, let's compare Carlos Souza's main topics and Alaa Shafaee's main topics in Assignment 2:

In [24]:
print_top_entities("Carlos Souza")

Artificial intelligence 0.7157 5.663
Educational technology 0.6323 2.001
Learning theory (education) 0.6189 2.685
Learning 0.5775 7.618
Web application 0.5759 4.967
Gender 0.5633 4.598
Internet forum 0.5609 2.506
Prototype 0.5608 3.904
Mentorship 0.5398 1.392
Research 0.5227 7.77
Programming language 0.5191 15.69
Conditional (computer programming) 0.4961 3.225
Computer programming 0.4799 9.991
Computer 0.4741 15.63
Computer science 0.4592 10.05


In [25]:
print_top_entities("Andrew Wolfe")

Educational technology 0.8022 3.868
Research 0.6652 6.994
Watson (computer) 0.654 7.869
Teaching method 0.6258 1.319
Software 0.5865 10.98
Information technology 0.5729 3.136
Computer science 0.5696 17.79
Artificial intelligence 0.5579 16.76
Statistics 0.5565 11.62
IBM 0.5542 18.71
Intelligence 0.5393 5.171
Science 0.5307 13.74
Georgia Institute of Technology 0.5217 5.195
Library 0.5169 3.375
Education 0.4798 1.484


As you can see, our algorithm worked perfectly! The API identifies all main topics and most of them are the same!

### Next steps

- If you need support, drop me a note at [souza@gatech.edu](mailto:souza@gatech.edu), I will be glad to help.

### Further readings

- [Distributed Representations of Sentences and Documents](https://cs.stanford.edu/~quocle/paragraph_vector.pdf)
- [Doc2Vec Tutorial on the Lee Dataset](https://github.com/RaRe-Technologies/gensim/blob/develop/docs/notebooks/doc2vec-lee.ipynb)
- [Modern NLP in Python](https://github.com/skipgram/modern-nlp-in-python/blob/master/executable/Modern_NLP_in_Python.ipynb)
- [Gensim - Topic Modelling for Humans - tutorials](https://radimrehurek.com/gensim/tutorial.html)
- [TextRazor API tutorials](https://www.textrazor.com/tutorials)
