In [1]:
import psutil

def check_if_running(process_name):
    running = False
    for proc in psutil.process_iter(["name"]):
        if process_name in proc.info["name"]:
            running = True
            break
    return running

ollama_running = check_if_running("ollama")

if not ollama_running:
    raise RuntimeError("Ollama not running. Launch ollama before proceeding.")
print("Ollama running:", check_if_running("ollama"))

Ollama running: True


In [3]:
import json
from tqdm import tqdm

file_path = "instruction-data-with-response.json"

with open(file_path, "r") as file:
    test_data = json.load(file)


def format_input(entry):
    instruction_text = (
        f"Below is an instruction that describes a task. "
        f"Write a response that appropriately completes the request."
        f"\n\n### Instruction:\n{entry['instruction']}"
    )

    input_text = f"\n\n### Input:\n{entry['input']}" if entry["input"] else ""

    return instruction_text + input_text

In [6]:
import urllib.request

def query_model(
    prompt,
    model="llama3.2",
    url="http://localhost:11434/api/chat"
):
    # Create the data payload as a dictionary
    data = {
        "model": model,
        "messages": [
            {"role": "user", "content": prompt}
        ],
        "options": {     # Settings below are required for deterministic responses
            "seed": 123,
            "temperature": 0,
            "num_ctx": 2048
        }
    }


    # Convert the dictionary to a JSON formatted string and encode it to bytes
    payload = json.dumps(data).encode("utf-8")

    # Create a request object, setting the method to POST and adding necessary headers
    request = urllib.request.Request(
        url,
        data=payload,
        method="POST"
    )
    request.add_header("Content-Type", "application/json")

    # Send the request and capture the response
    response_data = ""
    with urllib.request.urlopen(request) as response:
        # Read and decode the response
        while True:
            line = response.readline().decode("utf-8")
            if not line:
                break
            response_json = json.loads(line)
            response_data += response_json["message"]["content"]

    return response_data


model = "llama3.2"
result = query_model("What do Llamas eat?", model)
print(result)

Llamas are herbivores, which means they primarily eat plants and plant-based foods. Their diet typically consists of:

1. Grasses: Llamas love to graze on various types of grasses, including tall grasses, short grasses, and grassy weeds.
2. Hay: High-quality hay, such as timothy hay or alfalfa hay, is a staple in a llama's diet. It provides essential nutrients like fiber, protein, and vitamins.
3. Grains: Llamas may also be fed grains like oats, barley, or corn, but these should not make up more than 10% of their diet.
4. Fruits and vegetables: Fresh fruits and vegetables, such as apples, carrots, and sweet potatoes, can be given to llamas as treats or added to their hay.
5. Browse: Llamas may also eat browse, which includes leaves, twigs, and other vegetation from trees and shrubs.

It's essential to note that llamas have a unique digestive system, with a four-chambered stomach, which allows them to break down and extract nutrients from plant material more efficiently than many other 

In [7]:
for entry in test_data[:3]:
    prompt = (
        f"Given the input `{format_input(entry)}` "
        f"and correct output `{entry['output']}`, " # Response field
        f"score the model response `{entry['model_response']}`"
        f" on a scale from 0 to 100, where 100 is the best score. "
    )
    print("\nDataset response:")
    print(">>", entry['output'])
    print("\nModel response:")
    print(">>", entry["model_response"])
    print("\nScore:")
    print(">>", query_model(prompt))
    print("\n-------------------------")


Dataset response:
>> The car is as fast as lightning.

Model response:
>> The car is as fast as a bullet.

Score:
>> To rewrite the sentence using a simile, we need to compare the speed of the car to something else.

Correct output: The car is as fast as lightning.

Score: 100

The model response "The car is as fast as a bullet" is close, but not perfect. A simile should use "like" or "as" to make the comparison, whereas "as fast as a bullet" implies that the car is literally a bullet, which isn't the intended meaning.

A better score for the model response would be around 80-90, as it's close to the correct form but not quite there.

-------------------------

Dataset response:
>> The type of cloud typically associated with thunderstorms is cumulonimbus.

Model response:
>> The type of cloud associated with thunderstorms is a cumulus cloud.

Score:
>> I would rate the model response a 20.

The reason for this low score is that the model response contains an error in its classificatio

In [11]:
def generate_model_scores(json_data, json_key, model="llama3.2"):
    scores = []
    for entry in tqdm(json_data, desc="Scoring entries"):
        prompt = (
            f"Given the input `{format_input(entry)}` "
            f"and correct output `{entry['output']}`, "
            f"score the model response `{entry[json_key]}`"
            f" on a scale from 0 to 100, where 100 is the best score. "
            f"Respond with the integer number only."
        )
        score = query_model(prompt, model)
        try:
            scores.append(int(score))
        except ValueError:
            print(f"Could not convert score: {score}")
            continue

    return scores


scores = generate_model_scores(test_data, "model_response")
print(f"Number of scores: {len(scores)} of {len(test_data)}")
print(f"Average score: {sum(scores)/len(scores):.2f}\n")

Scoring entries:  10%|██▌                      | 11/110 [00:03<00:41,  2.40it/s]

Could not convert score: I would score this model response an 80.

The model correctly identifies 11 and 19 as prime numbers, but incorrectly labels 14 as composite when it should be labeled as neither prime nor composite (it's a composite number). The correct classification is Prime numbers: 11, 19
Composite numbers: 14.


Scoring entries:  17%|████▎                    | 19/110 [00:05<00:43,  2.11it/s]

Could not convert score: 60

Explanation: The corrected sentence uses an indefinite pronoun "someone" instead of the definite article "a". However, it's not perfect because "someone" can also be a specific person, and in this case, it would be more accurate to use "A note was left by someone." if we want to emphasize that the speaker is referring to a specific person.


Scoring entries:  23%|█████▋                   | 25/110 [00:07<00:24,  3.52it/s]

Could not convert score: The correct response would be:

 Fish: Shark, Trout
 Mammals: Dolphin

Score: 0


Scoring entries:  30%|███████▌                 | 33/110 [00:09<00:33,  2.31it/s]

Could not convert score: 80

A piece of cake is a type of dessert, not an idiomatic expression meaning something is easy. The original sentence "It's a piece of cake" is a cliché phrase that means something is very easy to do. To rewrite it without using this phrase, the correct response would be "It's very easy."


Scoring entries:  35%|████████▊                | 39/110 [00:12<00:39,  1.78it/s]

Could not convert score: 80

Explanation: The model's response is mostly correct, but it could be improved by adding more context or providing an explanation for why the sentence was converted to passive voice. A more detailed response might look like this:

"The gardener watered the plants. This sentence can be converted to passive voice as follows: 'The plants were watered by the gardener.' The passive voice is often used in formal writing and emphasizes the action rather than the person performing it."


Scoring entries:  49%|████████████▎            | 54/110 [00:16<00:26,  2.10it/s]

Could not convert score: I would rate this response a 0.

The model's response does not match the instruction at all. It incorrectly classifies "Rose" as a type of animal and introduces new, unrelated items ("Bicyclist", "Elephant", "Birds", and "Raccoon"). The correct classification should be:

Vehicles: Bicycle
Plants: Rose
Animals: Tiger


Scoring entries:  55%|█████████████▊           | 61/110 [00:17<00:09,  4.95it/s]

Could not convert score: 90

Opinion-based.


Scoring entries:  62%|███████████████▍         | 68/110 [00:20<00:18,  2.26it/s]

Could not convert score: 60

The input sentence "The dog chased the cat" can be rewritten as a question by adding the auxiliary verb "did" and changing the word order: "Did the dog chase the cat?" The correct response "What is the cat chasing?" is not grammatically correct, as it reverses the subject and object of the original sentence.


Scoring entries:  65%|████████████████▎        | 72/110 [00:21<00:11,  3.40it/s]

Could not convert score: The corrected sentence is: I prefer homemade cookies to store bought.

Score: 0


Scoring entries:  68%|█████████████████        | 75/110 [00:21<00:08,  4.01it/s]

Could not convert score: 90

This manual provides instructions for installing the software.


Scoring entries:  72%|█████████████████▉       | 79/110 [00:22<00:09,  3.33it/s]

Could not convert score: I would classify the given words as follows:

- Swim: Verb
- Beautiful: Adjective
- Quickly: Adverb

Score: 100


Scoring entries:  75%|██████████████████▋      | 82/110 [00:24<00:12,  2.16it/s]

Could not convert score: I can convert the Roman numeral IX to decimal and calculate the score.

The Roman numeral IX represents the number 9.
To convert it to decimal: 9 = 9
Now let's calculate the score for the response "5 to decimal numbers: 1,000,000." 

Score: 80


Scoring entries:  77%|███████████████████▎     | 85/110 [00:25<00:07,  3.44it/s]

Could not convert score: 60

He remained very calm.


Scoring entries:  98%|███████████████████████▌| 108/110 [00:29<00:00,  3.89it/s]

Could not convert score: Imperative.

Score: 100


Scoring entries: 100%|████████████████████████| 110/110 [00:30<00:00,  3.66it/s]

Could not convert score: 60

She never forgets to call.
Number of scores: 95 of 110
Average score: 48.46




