This is a notebook with steps for 1) using CLIP to embed sentences, 2) comparing those sentences to "anchors", and 3) calculating a "certainty" score for a sentence. Use fake sentences for now and see if you can improve on this!


Set up step: Load the libraries needed to use CLIP (ETA 2 min)

---



In [1]:
# Use the Long-CLIP model to handle longer context

import torch
from transformers import CLIPTokenizer, CLIPTextModel,CLIPModel, CLIPConfig, CLIPProcessor
import pandas as pd

model_id = ('zer0int/LongCLIP-GmP-ViT-L-14')
tokenizer = CLIPTokenizer.from_pretrained(model_id)
text_model = CLIPTextModel.from_pretrained(model_id)
config = CLIPConfig.from_pretrained(model_id)
config.text_config.max_position_embeddings = 248

model = CLIPModel.from_pretrained(model_id,config=config)
processor = CLIPProcessor.from_pretrained(model_id,config=config)


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/907 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/961k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/525k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/389 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.22M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/746 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.71G [00:00<?, ?B/s]

Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.


preprocessor_config.json:   0%|          | 0.00/335 [00:00<?, ?B/s]

Set up step: Define a cosine similarity function (We'll use it later on)

In [2]:
# define a function called cosine_similarity to compute the cosine similarity of two normalized text embeddings
def cosine_similarity(a, b):
    cos_sim = torch.dot(a, b) / (torch.norm(a) * torch.norm(b))
    return cos_sim

Set up step: Define a function for estimating certainty (We'll use it later on)

In [3]:
def estimate_certainty(embedding):
    certain_sim = cosine_similarity(embedding.squeeze(0), certainAnchorEmbedding) #.mean(dim=0) instead of object_pooler is an option
    uncertain_sim = cosine_similarity(embedding.squeeze(0), uncertainAnchorEmbedding)
    score = certain_sim / (certain_sim + uncertain_sim)
    return score  # ~ between 0 and 1

Set up step: Define the CERTAIN and UNCERTAIN anchor points

In [58]:
# You could play with these. The point here is that they are "anchors" and should be generic statements (i.e., not about a particular object)

certainAnchorSentences = [
    "I'm absolutely certain.",
    "I know exactly what this is.",
    "For sure.",
    "Definitely.",
    "I’ve used this before.",
    "I’m 100% confident.",
    "I'm positive I’ve seen this before.",
    "I know",
    "I'm sure",
    "I believe"
]
uncertainAnchorSentences = [
    "I honestly have no idea.",
    "I’m not sure.",
    "I’m guessing.",
    "I don't know.",
    "Maybe",
    "Might",
    "Could",
    "May",
    "Don't know",
    "I've never seen this before"


]



Turn the anchor sentences into an anchor embedding (Think of these as known landmarks on a map. The embedding is the coordinate location on that map)

In [59]:
# Get an embedding for each of the sentences in uncertainAnchorSentences and average the embedding

# Tokenize the uncertain anchor sentences
inputs = tokenizer(uncertainAnchorSentences, padding=True, return_tensors="pt")

# Get the text embeddings
with torch.no_grad():
    text_embeddings = text_model(**inputs).pooler_output

# Average the embeddings
uncertainAnchorEmbedding = torch.mean(text_embeddings, dim=0)

# Do the same thing for certainAnchorSentences...... (copy paste but update)

inputs = tokenizer(certainAnchorSentences, padding=True, return_tensors="pt")

# Get the text embeddings
with torch.no_grad():
    text_embeddings = text_model(**inputs).pooler_output
#Average embeddings
certainAnchorEmbedding = torch.mean(text_embeddings, dim = 0)


Get an embedding for a sentence and calculate a similarity (In other words: how close on the map is the toy sentence to the certain / uncertain landmark?)

I would expect a sentence like "I have never heard of a french press before" to have a low certainty score, but I'm not sure if this method will work to capture that!

In [66]:
toySentence = ["I am certain this is a"] # play around with this. Try out different sentences and see if you can tell whether this is working!
# toySentenceList # you could also make a list of sentences and write a line of code to loop through the list, giving you a certainty score for each sentence in the list

In [None]:
certainAnchorEmbedding.shape
uncertainAnchorEmbedding.unsqueeze(0).shape

torch.Size([1, 768])

In [67]:
# Get an embedding for the toy sentence
inputs = tokenizer(toySentence, padding=True, return_tensors="pt")

with torch.no_grad():
    toySentenceEmbedding = text_model(**inputs).pooler_output

# Calculate the certainty score
certainty_score = estimate_certainty(toySentenceEmbedding)
print(f"Certainty score for '{toySentence}': {certainty_score}")


Certainty score for '['I am certain this is a']': 0.5063498616218567
