# **Exploring Sentence Transformer Library**

In [3]:
%%capture
!pip install sentence_transformers==0.4.0

In [4]:
from sentence_transformers import SentenceTransformer

In [5]:
model = SentenceTransformer('bert-base-nli-mean-tokens')

100%|██████████| 405M/405M [00:46<00:00, 8.74MB/s]
Some weights of the model checkpoint at /root/.cache/torch/sentence_transformers/sbert.net_models_bert-base-nli-mean-tokens/0_BERT were not used when initializing BertModel: ['classifier.bias', 'classifier.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [6]:
sentence = 'dallas is a beautiful city'

In [7]:
sentence_represententation = model.encode(sentence)

In [8]:
print(sentence_represententation.shape)

(768,)


# **Computing Sentence Similarity**

In [9]:
import scipy

In [10]:
from sentence_transformers import util

In [11]:
model = SentenceTransformer('bert-base-nli-mean-tokens')

Some weights of the model checkpoint at /root/.cache/torch/sentence_transformers/sbert.net_models_bert-base-nli-mean-tokens/0_BERT were not used when initializing BertModel: ['classifier.bias', 'classifier.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [12]:
sentence1 = 'It was a great day' 
sentence2 = 'Today was awesome'

In [13]:
sentence1_representation = model.encode(sentence1)
sentence2_representation = model.encode(sentence2)

In [14]:
cosine_sim = util.pytorch_cos_sim(sentence1_representation, sentence2_representation)

In [15]:
print(cosine_sim)

tensor([[0.9313]])


# **Finding a Similar Sentence with BERT**

In [16]:
import numpy as np

In [17]:
master_dict = [
                'How to cancel my order?',
                'Please let me know about the cancellation policy?',
                'Do you provide refund?',
                'what is the estimated delivery date of the product?', 
                'why my order is missing?',
                'how do i report the delivery of the incorrect items?'
              ]

In [18]:
inp_question = 'When is my product getting delivered?'

In [19]:
inp_question_representation = model.encode(inp_question, convert_to_tensor=True)

In [20]:
master_dict_representation = model.encode(master_dict, convert_to_tensor=True)

In [21]:
similarity = util.pytorch_cos_sim(inp_question_representation, master_dict_representation)

In [24]:
print('The most similar question in the master dictionary to given input question is: ', master_dict[np.argmax(similarity)])

The most similar question in the master dictionary to given input question is:  what is the estimated delivery date of the product?
