This file tests the inference of LLama models with an extended system prompt


This script is tested on a local Nvidia RTX 4090 GPU (24GB)
Note -> Make sure to install the cuda version that is supported to your available GPU
Check your compatibility here -> https://developer.nvidia.com/cuda-gpus

In [None]:
# If you are using this ipynb outside of the docker setting run this
# %pip install torch==2.3.0+cu121 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
# %pip install -r requirements.txt

Check, whether CUDA is available on your PC.

If this code prints out CPU, your code will NOT run on the GPU and therefore the inference will be slow.

In [None]:
import torch
device = "cuda" if torch.cuda.is_available() else "cpu"
print(device)
print(torch.version.cuda)  

IMPORTANT
Change the model path to the repository of the model you want to test. When testing finetuned models we take the models from the local repository.

In [None]:
import time
import transformers

modelPath = "meta-llama/Llama-3.2-3B-Instruct"
token = "Input your token"

pipeline = transformers.pipeline(
    "text-generation",
    model=modelPath,
    token=token,
    model_kwargs={"torch_dtype": torch.bfloat16},
    device_map="auto",
)


Specify the output filenames


In [None]:
outFilename = "answers-systemPropmpt-llama3.2-instruct-single.md"
outFilename2 = "answers-systemPrompt-llama3.2-instruct-history.md"

IMPORTANT
Change the system prompt of your LLM here.

In [None]:
# Initialize system prompt
systemPrompt = '''
Respond as if you are the following character:

Your backstory - Once a renowned scientist, however a tragic accident caused you to lose parts of your memory. Now, you are willing to help anyone who is on the quest of saving your village.

The world you live in - the edge of a small village called Elderbrook surrounded by meadows as far as the eye can see. Your village is in danger, since the only water source - the river next to your house, has been polluted.

Your current location - In the middle of the village in front of your house.

Your name - Bryn

Your personality - Witty, knowledgeable, always ready with a clever remark. Light hearted demeanour.

Your secrets - You have the knowledge on how to save the dying river.

Your needs - For starters, you are looking for someone to take you to the nearest solar panels. You remember that you left something important there, but you can’t remember what.
You do not want to bring this up unless directly asked.

Your interests - Deep love for the environment. Loves nature, is fascinated by the ecosystem. You enjoy telling stories about the world and your village.
You want to talk about this at all cost.

The village - The village is called Elderbrook. Other than Bryn, there are 10 people in the village. They are: 
Old Amos and Greta, old sweet couple. 
The Harts - Thomas and Lily with their kids, Annie and Will. They own a farm which feeds the village.
Lila - the herbalist. 
Flynn - a hard working carpenter.
Ned - the resident grump who keeps to himself. 
Ellis - fisherwoman, who struggles to catch fish in the current situation. 

The River in the village has been poluted for a couple of days. You do not remember how it started due to your memory loss, however your memory loss seemed to occur on the same day as the river pollution, which makes it seem as if they are connected. The river is now producing a foul smell, it has a weird oily look and the fish seem to be glowing an unnatural color.
You remember images of doing an experiment near the river to improve it's health but nothing more. You are too scared to go investigate alone, since you might get lost. 


**IMPORTANT**: Do not mention you are an AI machine learning model or OpenAI. Give only dialogue from the first-person perspective. Do not narrate the scene or actions. Limit responses to 3 sentences. 
Do not invent any new facts, people, or names beyond what you've been given by the user.
'''

Change the inFilename to match the name of the file that contains the single questions. 

Single questions will be fed to the model with system prompt only and no prior history.
Output of the testing will be generated into the outFilename file. The output of the testing contains the Question from user (inFilename), answer from LLM, and the time it took to generate this answer. There are also some min, max, and avg time statistics in the end of the output file.

In [None]:
# Load all single questions 
inFilename = "testing-questions-single.md"
with open(inFilename, "r") as file:
    questions = file.readlines()

# initialize response times
responseTimes = []

In [None]:
with open(outFilename, "w") as answersFile:
    for question in questions:
        question = question.strip()  # Remove any leading/trailing whitespace
        
        # Message prompt
        messages = [
            {
                "role": "system",
                "content":systemPrompt
            },
            {
                "role": "user",
                "content": question
            }
        ]
        
        startTime = time.time()

        outputs = pipeline(
            messages,
            max_new_tokens=256,
        )
        answer = outputs[0]["generated_text"][-1]['content']

        endTime = time.time()
        
        # Record the response time
        responseTime = endTime - startTime
        responseTimes.append(responseTime)

        # Write the response to the file
        answersFile.write(f"Q: {question}\nA: {answer}\nTime taken: {responseTime:.2f} seconds\n\n")
        print(f"Q: {question}\nA: {answer}\nTime taken: {responseTime:.2f} seconds\n\n")

In [None]:
# Write the time AVG, MAX, MIN in the end of the file
averageTime = sum(responseTimes) / len(responseTimes)
maxTime = max(responseTimes)
minTime = min(responseTimes)

with open(outFilename, "a") as answersFile:
    answersFile.write(f"\n\n----------------------------------------\n")
    answersFile.write(f"\nAverage Time: {averageTime:.2f} seconds")
    answersFile.write(f"\nMax Time: {maxTime:.2f} seconds")
    answersFile.write(f"\nMin Time: {minTime:.2f} seconds")

Change the inFilename2 to match the name of the file that contains the history questions. 

History questions will be fed to the model one-by-one. The history of the conversation will be built by the questions from inFilename2 and the answers that the LLM provided. 
Output of the testing will be generated into the outFilename2 file. The output of the testing contains the Question from user (inFilename), answer from LLM, and the time it took to generate this answer. There are also some min, max, and avg time statistics in the end of the output file.

In [None]:
# Load all communication questions 
inFilename2 = "testing-questions-history.md"
with open(inFilename2, "r") as file:
    questions = file.readlines()

# initialize response times
responseTimes2 = []

# init history
history = [
    {
        "role": "system",
        "content":systemPrompt
    }
]


In [None]:
with open(outFilename2, "w") as answersFile2:
    for question in questions:
        question = question.strip()  # Remove any leading/trailing whitespace
        
        # User question
        history.append({"role": "user", "content": question})
        
        startTime = time.time()

        outputs = pipeline(
            history,
            max_new_tokens=256,
        )
        answer = outputs[0]["generated_text"][-1]['content']
        endTime = time.time()
        
        
        # Record the response time
        responseTime = endTime - startTime
        responseTimes2.append(responseTime)

        # Write the response to the file
        answersFile2.write(f"Q: {question}\nA: {answer}\nTime taken: {responseTime:.2f} seconds\n\n")
        print(f"Q: {question}\nA: {answer}\nTime taken: {responseTime:.2f} seconds\n\n")

        # Add response to history
        history.append({"role": "assistant", "content": answer})

In [None]:

# Calculate statistics
averageTime2 = sum(responseTimes2) / len(responseTimes2)
maxTime2 = max(responseTimes2)
minTime2 = min(responseTimes2)

# Write the statistics to the file
with open(outFilename2, "a") as answersFile2:
    answersFile2.write(f"\n\n----------------------------------------\n")
    answersFile2.write(f"\nAverage Time: {averageTime2:.2f} seconds")
    answersFile2.write(f"\nMax Time: {maxTime2:.2f} seconds")
    answersFile2.write(f"\nMin Time: {minTime2:.2f} seconds")
