# Machine learning for physical systems (TUHH, Prof. Roland Aydin, Marius Tacke)
## Homework 5 (submission until 13.01.2025): Prompt engineering

Names of all group members: Joshua Windle

**How to submit: This file has to be uploaded into your homework group directory on stud.ip by the submission date given above (end of day).**

This is one of multiple homework projects in this course. Successful completion of these projects can earn you a bonus for the exam. You will work on these projects in groups of two to four students; submissions from individual students will not be accepted. To form a group, join one of the homework groups in stud.ip. Each group will have a separate homework submission directory in stud.ip. Append your solution as code and markdown cells to this Jupyter notebook. Put the names of all group members into the cell above. Execute your code and save your file including all cells' outputs. Upload this pre-executed Jupyter notebook to your homework group directory on stud.ip In case stud.ip restricts the upload of your Jupyter notebook, change its extension to ".txt".  Do not change the name of the notebook. Any other form of submission, such as source code outside of the Jupyter notebook, additional text files, or Word documents, will not be accepted. Your score for this homework project will be communicated to you via email.

In Exercise 9, we familiarized ourselves with accessing LLMs via Hugging Face and attempted to predict values of concrete compressive strength, similar to the third homework project. In this homework project, we will explore the wide range of prompt engineering techniques to improve our initial performance.

Your task is to model the concrete compressive strength dataset using an LLM of your choice. Since you are not required (but are free) to fine-tune the LLM, and an LLM's context window is typically too small to accommodate all data points in this dataset, you may randomly select a subset of data points to model. We would like you to explore how different prompts influence the LLM's performance.

Here are some ideas to inspire you:

- Try zero-shot prompting.
- Try few-shot prompting and present all your few shots at once.
- Try few-shot prompting and present your few shots in batches.
- Implement chain-of-thought prompting.
- Ask the model to generate new artificial input features and use them as the basis for its analysis.
- Ask the model to perform a similarity analysis and compute a weighted average of the few shots as a prediction.

Your task includes defining and testing at least six different prompting strategies. You may use up to three of the strategies provided above. Additionally, you must come up with at least three strategies not mentioned in this task description. Be creative and experiment with what you think might work!

To determine how specific your observations are to the model you are using, you need to test at least two different models from different series. Different parameter or version numbers do not count as different models here.

Include a short report reflecting which strategies you implemented, which models you used, which approaches worked well, whether that confirmed or contracticted your expectations, and how the llms performed compared to the regression models of the third homework project.

Additionally, we are curious to see who can develop the best-performing prompt. Therefore, we invite you to participate in a prompt engineering competition with this homework: The group whose model prediction results in the lowest mean squared error (MSE) will be rewarded with cookies during the final lecture. To participate in this competition, you need to use the provided split of the concrete compressive strength dataset ("Concrete_Train_Data.csv" and "Concrete_Test_Data.csv") and clearly mention your final smallest MSE on the test data in the very last cell of your homework notebook. Happy prompting!

In [2]:
#!pip install

In [3]:
from huggingface_hub import login
import os
import copy
import pandas as pd
import sklearn.model_selection
import json
import torch
import transformers

In [7]:
login("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") # Permissions are restricted.

Load and preprocess dataset

In [5]:
# Colab doesn't store the data file so lets load from github if it isn't present.
try:
  input_file = os.path.join("data", "Concrete_Data.xls")
  data = pd.read_excel(input_file)
except:
  raw_url = "https://github.com/worwin/M1807-MLPS/blob/main/HW5%20-%20Prompt%20Engineering/data/Concrete_Data.xls?raw=true"
  data = pd.read_excel(raw_url)

data.dropna(inplace=True)
data.drop_duplicates(inplace=True)

X = data.drop(columns=["Concrete compressive strength(MPa, megapascals) "])
y = data[["Concrete compressive strength(MPa, megapascals) "]]

X_train, X_temp, y_train, y_temp = sklearn.model_selection.train_test_split(X,      y,      test_size=0.3, random_state=42)
X_val,   X_test, y_val,   y_test = sklearn.model_selection.train_test_split(X_temp, y_temp, test_size=0.5, random_state=42)

y_train = pd.DataFrame(y_train, index=y_train.index, columns=y.columns)
y_val   = pd.DataFrame(y_val,   index=y_val.index,   columns=y.columns)
y_test  = pd.DataFrame(y_test,  index=y_test.index,  columns=y.columns)

In [9]:
train_batches = 2
batch_size    = 2

X_train_sub = X_train.sample(n=train_batches*batch_size, random_state=42)
y_train_sub = y_train.loc[X_train_sub.index]

X_train_sub_batches = [X_train_sub[i:i + batch_size] for i in range(0, len(X_train_sub), batch_size)]
y_train_sub_batches = [y_train_sub[i:i + batch_size] for i in range(0, len(y_train_sub), batch_size)]

X_val_sub = X_val.sample(n=batch_size, random_state=42)
y_val_sub = y_val.loc[X_val_sub.index]

def to_string(row):
    return {col: row[col] for col in row.index}

messages = [{"role": "system", "content": "You are a helpful assistant. Your task is to predict missing values of concrete "
                                          "compressive strength. Known data are presented in form of a dialogue. Continue "
                                          "this dialogue. Provide your predictions in exactly the format of the dialogue."}]
for X_train_sub_batch, y_train_sub_batch in zip(X_train_sub_batches, y_train_sub_batches):
    messages.append({"role": "user",      "content": json.dumps(X_train_sub_batch.apply(to_string, axis=1).to_dict())})
    messages.append({"role": "assistant", "content": json.dumps(y_train_sub_batch.apply(to_string, axis=1).to_dict())})

messages.append({"role": "user", "content": json.dumps(X_val_sub.apply(to_string, axis=1).to_dict())})

model = "meta-llama/Llama-3.2-3B-Instruct"
cache_dir = "./cache"
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

tokenizer = transformers.AutoTokenizer.from_pretrained(
    pretrained_model_name_or_path = model,
    cache_dir = cache_dir,
)
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

model = transformers.AutoModelForCausalLM.from_pretrained(
    pretrained_model_name_or_path = model,
    cache_dir                     = cache_dir,
    torch_dtype                   = torch.float16,
    use_safetensors               = True
).to(device)

encoded_messages = tokenizer.apply_chat_template(
    conversation          = messages,
    add_generation_prompt = True,
    tokenize              = True,
    padding               = True,
    return_tensors        = "pt",
    return_dict           = True,
).to(device)

encoded_response = model.generate(
    input_ids         = encoded_messages.data["input_ids"],
    attention_mask    = encoded_messages.data["attention_mask"],
    max_new_tokens    = 500,
    pad_token_id      = tokenizer.pad_token_id
)

response = tokenizer.decode(encoded_response[0])

print(response)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 02 Jan 2025

You are a helpful assistant. Your task is to predict missing values of concrete compressive strength. Known data are presented in form of a dialogue. Continue this dialogue. Provide your predictions in exactly the format of the dialogue.<|eot_id|><|start_header_id|>user<|end_header_id|>

{"957": {"Cement (component 1)(kg in a m^3 mixture)": 143.0, "Blast Furnace Slag (component 2)(kg in a m^3 mixture)": 169.4, "Fly Ash (component 3)(kg in a m^3 mixture)": 142.7, "Water  (component 4)(kg in a m^3 mixture)": 190.7, "Superplasticizer (component 5)(kg in a m^3 mixture)": 8.4, "Coarse Aggregate  (component 6)(kg in a m^3 mixture)": 967.4, "Fine Aggregate (component 7)(kg in a m^3 mixture)": 643.5, "Age (day)": 28.0}, "414": {"Cement (component 1)(kg in a m^3 mixture)": 190.34, "Blast Furnace Slag (component 2)(kg in a m^3 mixture)": 0.0, "Fly Ash (component 3)(kg in a 

Implementing Zero Shot


In [6]:
batch_size = 2

X_val_sub = X_val.sample(n=batch_size, random_state=42)

def to_string(row):
    return {col: row[col] for col in row.index}

messages = [
    {
        "role": "system",
        "content": "You are a helpful assistant. Your task is to predict "
        "missing values of concrete compressive strength. Known data are "
        "presented in form of a dialogue. Continue this dialogue. Provide your "
        "predictions in exactly the format of the dialogue."
    },
    {
        "role": "user",
        "content": json.dumps(X_val_sub.apply(to_string, axis=1).to_dict())
    }
]

model_name = "meta-llama/Llama-3.2-3B-Instruct"
cache_dir = "./cache"
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

tokenizer = transformers.AutoTokenizer.from_pretrained(
    pretrained_model_name_or_path = model_name,
    cache_dir = cache_dir,
)

if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

model = transformers.AutoModelForCausalLM.from_pretrained(
    pretrained_model_name_or_path = model_name,
    cache_dir = cache_dir,
    torch_dtype = torch.float16,
    use_safetensors = True
).to(device)

encoded_messages = tokenizer.apply_chat_template(
    conversation          = messages,
    add_generation_prompt = True,
    tokenize              = True,
    padding               = True,
    return_tensors        = "pt",
    return_dict           = True,
).to(device)

encoded_response = model.generate(
    input_ids         = encoded_messages.data["input_ids"],
    attention_mask    = encoded_messages.data["attention_mask"],
    max_new_tokens    = 500,
    pad_token_id      = tokenizer.pad_token_id
)

response = tokenizer.decode(encoded_response[0])

print(response)


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/54.5k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/296 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/878 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/20.9k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.97G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/1.46G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/189 [00:00<?, ?B/s]

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 02 Jan 2025

You are a helpful assistant. Your task is to predict missing values of concrete compressive strength. Known data are presented in form of a dialogue. Continue this dialogue. Provide your predictions in exactly the format of the dialogue.<|eot_id|><|start_header_id|>user<|end_header_id|>

{"606": {"Cement (component 1)(kg in a m^3 mixture)": 236.0, "Blast Furnace Slag (component 2)(kg in a m^3 mixture)": 0.0, "Fly Ash (component 3)(kg in a m^3 mixture)": 0.0, "Water  (component 4)(kg in a m^3 mixture)": 194.0, "Superplasticizer (component 5)(kg in a m^3 mixture)": 0.0, "Coarse Aggregate  (component 6)(kg in a m^3 mixture)": 968.0, "Fine Aggregate (component 7)(kg in a m^3 mixture)": 885.0, "Age (day)": 14.0}, "273": {"Cement (component 1)(kg in a m^3 mixture)": 231.75, "Blast Furnace Slag (component 2)(kg in a m^3 mixture)": 0.0, "Fly Ash (component 3)(kg in a m^3 