This script is to test how the Universal Sentence Encoder behave in certain situations. It supports two functions: you can either calculate the USE similarity value for two phrases, or import an excel file with two columns: text1 and text2.

**Important!**
If you're running this script for the first time, open "Anaconda Prompt", and type in "pip install tensorflow" and "pip install tensorflow_hub"

In [5]:
# load packages and USE
import pandas as pd 
import numpy as np
import tensorflow_hub as hub
USE = hub.load("https://tfhub.dev/google/universal-sentence-encoder/4")
# If it says cannot locate the model, just go to the tensorflow folder and delete the folder
# (the one with long and meaningless name). It will force the program to re-download the model
# from the website above. 

In [6]:
# define functions
def USE_similarity(USE, sentence1, sentence2):
    """
    Return the similarity as calculated in the universal sentence encoder
    between <sentence1> and <sentence2>. <USE> stands for an already-imported
    pretrained USE model.
    """
    USE_output = np.array(USE([sentence1, sentence2]))
    similarity = np.inner(USE_output[0], USE_output[1])
    return similarity

def USE_similarity_excel(USE, filename):
    """
    Run USE_similarity on excel file <filename>.
    """
    df = pd.read_excel(f'{filename}.xlsx')
    df['similarity'] = df.apply(lambda x: USE_similarity(USE, x.text1, x.text2), axis=1)
    df.to_excel(f'{filename}_output.xlsx')
    return df

In [7]:
USE_similarity(USE, 'apple', 'banana')





0.48818177

In [8]:
# you don't need extensions in the filename, but it has to be excel (no csv)
USE_similarity_excel(USE, 'test')

Unnamed: 0,text1,text2,similarity
0,apple,banana,0.488182
1,banana,watermelon,0.483925
2,pear,peach,0.421444
