# Machine learning for physical systems (TUHH, Prof. Roland Aydin, Marius Tacke)
## Homework 5 (submission until 13.01.2025): Prompt engineering

Names of all group members: Joshua Windle

**How to submit: This file has to be uploaded into your homework group directory on stud.ip by the submission date given above (end of day).**

This is one of multiple homework projects in this course. Successful completion of these projects can earn you a bonus for the exam. You will work on these projects in groups of two to four students; submissions from individual students will not be accepted. To form a group, join one of the homework groups in stud.ip. Each group will have a separate homework submission directory in stud.ip. Append your solution as code and markdown cells to this Jupyter notebook. Put the names of all group members into the cell above. Execute your code and save your file including all cells' outputs. Upload this pre-executed Jupyter notebook to your homework group directory on stud.ip In case stud.ip restricts the upload of your Jupyter notebook, change its extension to ".txt".  Do not change the name of the notebook. Any other form of submission, such as source code outside of the Jupyter notebook, additional text files, or Word documents, will not be accepted. Your score for this homework project will be communicated to you via email.

In Exercise 9, we familiarized ourselves with accessing LLMs via Hugging Face and attempted to predict values of concrete compressive strength, similar to the third homework project. In this homework project, we will explore the wide range of prompt engineering techniques to improve our initial performance.

Your task is to model the concrete compressive strength dataset using an LLM of your choice. Since you are not required (but are free) to fine-tune the LLM, and an LLM's context window is typically too small to accommodate all data points in this dataset, you may randomly select a subset of data points to model. We would like you to explore how different prompts influence the LLM's performance.

Here are some ideas to inspire you:

- Try zero-shot prompting.
- Try few-shot prompting and present all your few shots at once.
- Try few-shot prompting and present your few shots in batches.
- Implement chain-of-thought prompting.
- Ask the model to generate new artificial input features and use them as the basis for its analysis.
- Ask the model to perform a similarity analysis and compute a weighted average of the few shots as a prediction.

Your task includes defining and testing at least six different prompting strategies. You may use up to three of the strategies provided above. Additionally, you must come up with at least three strategies not mentioned in this task description. Be creative and experiment with what you think might work!

To determine how specific your observations are to the model you are using, you need to test at least two different models from different series. Different parameter or version numbers do not count as different models here.

Include a short report reflecting which strategies you implemented, which models you used, which approaches worked well, whether that confirmed or contracticted your expectations, and how the llms performed compared to the regression models of the third homework project.

Additionally, we are curious to see who can develop the best-performing prompt. Therefore, we invite you to participate in a prompt engineering competition with this homework: The group whose model prediction results in the lowest mean squared error (MSE) will be rewarded with cookies during the final lecture. To participate in this competition, you need to use the provided split of the concrete compressive strength dataset ("Concrete_Train_Data.csv" and "Concrete_Test_Data.csv") and clearly mention your final smallest MSE on the test data in the very last cell of your homework notebook. Happy prompting!

In [1]:
from huggingface_hub import login
import copy
import json
import keras
import matplotlib.pyplot
import numpy
import os
import pandas            # TA uses pandas instead of pd
import pandas as pd
import sklearn.svm
import sklearn.tree
import sklearn.metrics
import sklearn.pipeline
import sklearn.ensemble
import sklearn.neighbors
import sklearn.linear_model
import sklearn.preprocessing
import sklearn.model_selection
import torch
import transformers

In [2]:
login("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") # Permissions are restricted.

Load and preprocess dataset

In [3]:
# Colab doesn't store the data file so lets load from github if it isn't present.
try:
  input_file = os.path.join("data", "Concrete_Data.xls")
  data = pd.read_excel(input_file)
except:
  raw_url = "https://github.com/worwin/M1807-MLPS/blob/main/HW5%20-%20Prompt%20Engineering/data/Concrete_Data.xls?raw=true"
  data = pd.read_excel(raw_url)

data.dropna(inplace=True)
data.drop_duplicates(inplace=True)

X = data.drop(columns=["Concrete compressive strength(MPa, megapascals) "])
y = data[["Concrete compressive strength(MPa, megapascals) "]]

X_train, X_temp, y_train, y_temp = sklearn.model_selection.train_test_split(X,      y,      test_size=0.3, random_state=42)
X_val,   X_test, y_val,   y_test = sklearn.model_selection.train_test_split(X_temp, y_temp, test_size=0.5, random_state=42)

y_train = pd.DataFrame(y_train, index=y_train.index, columns=y.columns)
y_val   = pd.DataFrame(y_val,   index=y_val.index,   columns=y.columns)
y_test  = pd.DataFrame(y_test,  index=y_test.index,  columns=y.columns)

In [None]:
batch_size = 2
X_val_sub = X_val.sample(n=batch_size, random_state=42)

predictions = {
    606: 33.67, 
    273: 32.8
}

zero_shot_df = pd.DataFrame.from_dict(predictions, orient="index", columns=["Concrete compressive strength(MPa, megapascals) "])

X_val_with_predictions = X_val_sub.copy()

# Use .loc to select by index label (606)
X_val_with_predictions = X_val_with_predictions.loc[[606]]

# Merge predictions
X_val_with_predictions = X_val_with_predictions.merge(
    zero_shot_df, left_index=True, right_index=True, how="left"
)

# Ensure y_val is the ground truth for these indices
y_val_for_mse = y_val.loc[X_val_with_predictions.index]

# Compute metrics
mse_llm = sklearn.metrics.mean_squared_error(
    y_val_for_mse,
    X_val_with_predictions["Concrete compressive strength(MPa, megapascals) "]
)
r2_llm = sklearn.metrics.r2_score(
    y_val_for_mse,
    X_val_with_predictions["Concrete compressive strength(MPa, megapascals) "]
)

print(f"LLM MSE: {mse_llm}")
print(f"LLM R²: {r2_llm}")


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Implementing Zero Shot

In [None]:
# Benchmarking from HW3

# --- Preprocess Scaler dataset ---
scaler_X = sklearn.preprocessing.StandardScaler()
X_train = pandas.DataFrame(scaler_X.fit_transform(X_train), columns=X.columns)
X_val   = pandas.DataFrame(scaler_X.transform(    X_val),   columns=X.columns)
X_test  = pandas.DataFrame(scaler_X.transform(    X_test),  columns=X.columns)

scaler_y = sklearn.preprocessing.StandardScaler()
y_train = pandas.Series(scaler_y.fit_transform(y_train.values.reshape(-1, 1)).flatten())
y_val   = pandas.Series(scaler_y.transform(      y_val.values.reshape(-1, 1)).flatten())

# --- Linear Regression ---
model_linear = sklearn.linear_model.LinearRegression()
model_linear.fit(X_train, y_train)
y_test_pred_linear = model_linear.predict(X_test)
y_test_pred_linear = numpy.reshape(y_test_pred_linear, (-1,1))
y_test_pred_linear = scaler_y.inverse_transform(y_test_pred_linear)
mse_linear = sklearn.metrics.mean_squared_error(y_test, y_test_pred_linear)
r2_linear  = sklearn.metrics.r2_score(          y_test, y_test_pred_linear)

# --- Polynomial Regression ---
model_poly = sklearn.pipeline.make_pipeline(sklearn.preprocessing.PolynomialFeatures(degree=2), sklearn.linear_model.LinearRegression())
model_poly.fit(X_train, y_train)
y_test_pred_poly = model_poly.predict(X_test)
y_test_pred_poly = numpy.reshape(y_test_pred_poly, (-1,1))
y_test_pred_poly = scaler_y.inverse_transform(y_test_pred_poly)
mse_poly = sklearn.metrics.mean_squared_error(y_test, y_test_pred_poly)
r2_poly  = sklearn.metrics.r2_score(          y_test, y_test_pred_poly)

# --- Support Vector Regression ---
model_svr = sklearn.svm.SVR(kernel='rbf')
model_svr.fit(X_train, y_train)
y_test_pred_svr = model_svr.predict(X_test)
y_test_pred_svr = numpy.reshape(y_test_pred_svr, (-1,1))
y_test_pred_svr = scaler_y.inverse_transform(y_test_pred_svr)
mse_svr = sklearn.metrics.mean_squared_error(y_test, y_test_pred_svr)
r2_svr  = sklearn.metrics.r2_score(          y_test, y_test_pred_svr)

# --- Decision Tree Regression ---
model_dec_tree = sklearn.tree.DecisionTreeRegressor(min_samples_leaf=5, random_state=42)
model_dec_tree.fit(X_train, y_train)
y_test_pred_dec_tree = model_dec_tree.predict(X_test)
y_test_pred_dec_tree = numpy.reshape(y_test_pred_dec_tree, (-1,1))
y_test_pred_dec_tree = scaler_y.inverse_transform(y_test_pred_dec_tree)
mse_dec_tree = sklearn.metrics.mean_squared_error(y_test, y_test_pred_dec_tree)
r2_dec_tree  = sklearn.metrics.r2_score(          y_test, y_test_pred_dec_tree)

# --- Random Forest Regression ---
model_rand_forest = sklearn.ensemble.RandomForestRegressor(n_estimators=100, min_samples_leaf=5, random_state=42)
model_rand_forest.fit(X_train, y_train)
y_test_pred_rand_forest = model_rand_forest.predict(X_test)
y_test_pred_rand_forest = numpy.reshape(y_test_pred_rand_forest, (-1,1))
y_test_pred_rand_forest = scaler_y.inverse_transform(y_test_pred_rand_forest)
mse_rand_forest = sklearn.metrics.mean_squared_error(y_test, y_test_pred_rand_forest)
r2_rand_forest  = sklearn.metrics.r2_score(          y_test, y_test_pred_rand_forest)

# --- K-Nearest Neighbors Regression ---
model_knn = sklearn.neighbors.KNeighborsRegressor(n_neighbors=5)
model_knn.fit(X_train, y_train)
y_test_pred_knn = model_knn.predict(X_test)
y_test_pred_knn = numpy.reshape(y_test_pred_knn, (-1,1))
y_test_pred_knn = scaler_y.inverse_transform(y_test_pred_knn)
mse_knn = sklearn.metrics.mean_squared_error(y_test, y_test_pred_knn)
r2_knn  = sklearn.metrics.r2_score(          y_test, y_test_pred_knn)

# --- Neural Network Regression ---
model_nn = keras.models.Sequential()
model_nn.add(keras.Input(shape=(X.shape[1],)))
model_nn.add(keras.layers.Dense(10, activation="relu"))
model_nn.add(keras.layers.Dense(6,  activation="relu"))
model_nn.add(keras.layers.Dense(y.shape[1],  activation="linear"))
model_nn.compile(optimizer="adam", loss="mean_squared_error")
model_nn.fit(X_train, y_train, epochs=100, validation_data=(X_val, y_val), verbose=0)
y_test_pred_nn = model_nn.predict(X_test)
y_test_pred_nn = scaler_y.inverse_transform(y_test_pred_nn)
mse_nn = sklearn.metrics.mean_squared_error(y_test, y_test_pred_nn)
r2_nn  = sklearn.metrics.r2_score(          y_test, y_test_pred_nn)

# --- Plot the results ---
models     = ["Linear Regression",
              "Polynomial Regression",
              "Support Vector Regression",
              "Decision Tree Regression",
              "Random Forest Regression",
              "K-Nearest-Neighbors Regression",
              "Neural Network Regression"]

mse_values = [mse_linear,
              mse_poly,
              mse_svr,
              mse_dec_tree,
              mse_rand_forest,
              mse_knn,
              mse_nn]

r2_values  = [r2_linear,
              r2_poly,
              r2_svr,
              r2_dec_tree,
              r2_rand_forest,
              r2_knn,
              r2_nn]

matplotlib.pyplot.figure()
matplotlib.pyplot.bar(models, mse_values)
matplotlib.pyplot.xticks(rotation=45, ha="right")
matplotlib.pyplot.ylabel("Mean Squared Error")
matplotlib.pyplot.title("Mean Squared Error of the models")

matplotlib.pyplot.figure()
matplotlib.pyplot.bar(models, r2_values)
matplotlib.pyplot.xticks(rotation=45, ha="right")
matplotlib.pyplot.ylabel("R2 Score")
matplotlib.pyplot.title("R2 Score of the models")

matplotlib.pyplot.show()
