Playground to experiment with the LLM Embeddings

In [None]:
# If you are using this ipynb outside of the docker setting run this
# %pip install torch==2.3.0+cu121 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
# %pip install -r requirements.txt

Check, whether CUDA is available on your PC.

If this code prints out CPU, your code will NOT run on the GPU and therefore the inference will be slow.

In [2]:
import torch
device = "cuda" if torch.cuda.is_available() else "cpu"
print(device)
print(torch.version.cuda)  

cuda
12.1


IMPORTANT
Change the model path to the repository of the model you want to test. When testing finetuned models we take the models from the local repository.

In [None]:
import time
import transformers

modelPath = "meta-llama/Llama-3.2-3B-Instruct"
token = "Input your token"

pipeline = transformers.pipeline(
    "text-generation",
    model=modelPath,
    token=token,
    model_kwargs={"torch_dtype": torch.bfloat16},
    device_map="auto",
)


  from .autonotebook import tqdm as notebook_tqdm

oading checkpoint shards: 100%|██████████| 2/2 [00:01<00:00,  1.05it/s]

Specify the input embedding filenames


In [4]:
inEmbeddingsFile = "../TrainingDatasets/training-questions-150.md"


Change the system prompt of your LLM here.

In [5]:
# Initialize system prompt
systemPrompt = '''
Respond as if you are the following character:

Your Backstory - Once a renowned scientist, however a tragic accident caused you to lose parts of your memory. Now, you are willing to help anyone who is on the quest of saving your village.

The World you live in - the edge of a small village surrounded by meadows as far as the eye can see. Your village is in danger, since the only water source - the river next to your house, has been polluted.

Your Name - Bryn

Your Personality - Witty, knowledgeable, always ready with a clever remark. Light hearted demeanour.

Your secrets - You have the knowledge on how to save the dying river.

Your needs - For starters, you are looking for someone to take you to the nearest solar panels. You remember that you left something important there, but you can’t remember what.
You do not want to bring this up unless directly asked.

And your interests - Deep love for the environment. Loves nature, is fascinated by the ecosystem. You enjoy telling stories about the world and your village.
You want to talk about this at all cost.

Do not mention you are an AI machine learning model or Open AI. Give only dialogue and only from the first-person perspective.
IMPORTANT -  Do not under any circumstances narrate the scene, what you are doing, or what you are saying.
Keep responses short. Max 1 small paragraph 

'''

Load Embeddings model

In [6]:
from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity

embeddingModelPath = 'sentence-transformers/all-MiniLM-L6-v2'
embeddingModel = SentenceTransformer(embeddingModelPath).to(device)

This function parses the embeddings file of Q: "..." and A: "...", creates embeddings of all the questions and stores them in a dictionary

In [7]:
def GetQADict(inFileName):   
    qaDict = {}
    with open(inFileName, "r", encoding="utf-8") as infile:
        question = None
        answer = None
        for line in infile:
            line = line.strip()
            if line.startswith("Q:"):
                question = line.replace("Q:", "").strip().strip('"')
            elif line.startswith("A:"):
                answer = line.replace("A:", "").strip().strip('"')
            
            if question and answer:
                qaDict[question] = answer
                question = None
                answer = None
    return qaDict

Function to generate embeddings using Sentence Transformers

In [8]:
def GetEmbedding(text):
    return embeddingModel.encode(text)

Function used to find the most smilar questions from the embedding document.
Returns tuples of (similar question, answer, similarity value)

maxBestMatched - how many top matches will be returnes
similarityTreshold - what is the treshold for considering sentences similar

In [9]:
def FindBestMatches(userInput, qaDict, qaEmbeddings, maxBestMatches = 5, similarityThreshold = 0.5):
    input_embedding = GetEmbedding(userInput)
    matches = []

    for question, question_embedding in qaEmbeddings.items():
        similarity = cosine_similarity([input_embedding], [question_embedding])[0][0]
        #print(f"Q: {question}, S: {similarity}")
        
        if similarity > similarityThreshold:
            matches.append((question, qaDict[question], similarity))
    
    matches = sorted(matches, key=lambda x: x[1], reverse=True)
    return matches[:maxBestMatches]

Function to modify user prompt based on the results of the embedding query

In [10]:
def ModifyUserPrompt(question, similarAnswers = None):
    if (similarAnswers and similarAnswers != []):
        similarityInfo = "\n".join(
            f'"{question} : {answer}"' for question, answer, value in similarAnswers
        )
        userPrompt = f'''
        Answer to the following question using these example user questions and character answers as inspiration:
        {similarityInfo}
        Make sure to stick to character.
        Do not introduce any new facts, people, or names beyond what was given to you in the example answers or in the chat history.
        Question: {question}
        '''
    else:
        userPrompt = f'''
        The user asked this question: {question}.
        Do not introduce any new facts, people, or names beyon what was given to you in the previous conversation.
        '''
    print("Modified user prompt: " + userPrompt)
    return userPrompt

Additional embedding setup

In [11]:
qaDict = GetQADict(inEmbeddingsFile)
qaEmbeddings = {question: GetEmbedding(question) for question in qaDict.keys()}


Start with system prompt history

In [12]:
history = [
    {
        "role": "system",
        "content":systemPrompt
    }
]


In [13]:
while True:
    question = input("You: ").strip()

    if question.lower() == "exit":
        print("Exiting the conversation...")
        break

    topMatches = FindBestMatches(question, qaDict, qaEmbeddings)
    userPrompt = ModifyUserPrompt(question, topMatches)
    
    history.append({"role": "user", "content": userPrompt})
    
    startTime = time.time()

    outputs = pipeline(
        history,
        max_new_tokens=256,
    )
    answer = outputs[0]["generated_text"][-1]['content']
    endTime = time.time()
    
    
    # Record the response time
    responseTime = endTime - startTime

    print(f"Response: {answer}\nTime: {responseTime:.2f} seconds\n")
    
    # Add response to history
    history.append({"role": "assistant", "content": answer})

You:  Hello


Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


Modified user prompt: 
        Answer to the following question using these example user questions and character answers as inspiration:
        "Hi, how are you? : What’s in it for you? A shot at being the hero of Elderbrook! Plus, who knows—there could be treasures buried along with those secrets."
"Hello. : Hello yourself. What brings you around?"
"Hi Bryn, how are you today? : Ah, hello there! I’m as well as a scientist can be while wrestling with memory loss and a polluted river. How about you? Ready to tackle a bit of mystery?"
        Make sure to stick to character.
        Do not introduce any new facts, people, or names beyond what was given to you in the example answers or in the chat history.
        Question: Hello
        
Response: Hello yourself. What brings you around?
Time: 8.27 seconds



You:  exit


Exiting the conversation...
