#### Task 3: Prompt Engineering for Large Language Models (LLMs)

In [2]:
#=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
#
#                                   ES335- Machine Learning- Assignment 1
#
# This file is used to create the dataset for the mini-project. The dataset is created by reading the data from
# the Combined folder. The data is then split into training, testing, and validation sets. This split is supposed
# to be used for all the modeling purposes.
#
#=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

# Library imports
from sklearn.model_selection import train_test_split
import pandas as pd
import numpy as np
import os
import tsfel
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from langchain_groq.chat_models import ChatGroq

# Groq API and Models 
Groq_Token = "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"  # Do not share this key with anyone
groq_models = {"llama3-70b": "llama3-70b-8192", "mixtral": "mixtral-8x7b-32768", "gemma-7b": "gemma-7b-it","llama3.1-70b":"llama-3.1-70b-versatile","llama3-8b":"llama3-8b-8192","llama3.1-8b":"llama-3.1-8b-instant","gemma-9b":"gemma2-9b-it"}


# Constants
time = 10
offset = 100
folders = ["LAYING","SITTING","STANDING","WALKING","WALKING_DOWNSTAIRS","WALKING_UPSTAIRS"]
classes = {"WALKING":1,"WALKING_UPSTAIRS":2,"WALKING_DOWNSTAIRS":3,"SITTING":4,"STANDING":5,"LAYING":6}

combined_dir = os.path.join("Combined")

#=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-==-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
                                                # Train Dataset
#=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-==-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

X_train=[]
y_train=[]
dataset_dir = os.path.join(combined_dir,"Train")

for folder in folders:
    files = os.listdir(os.path.join(dataset_dir,folder))

    for file in files:

        df = pd.read_csv(os.path.join(dataset_dir,folder,file),sep=",",header=0)
        df = df[offset:offset+time*50]
        X_train.append(df.values)
        y_train.append(classes[folder])

X_train = np.array(X_train)
y_train = np.array(y_train)


#=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-==-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
                                                # Test Dataset
#=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-==-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

X_test=[]
y_test=[]
dataset_dir = os.path.join(combined_dir,"Test")

for folder in folders:
    files = os.listdir(os.path.join(dataset_dir,folder))
    for file in files:

        df = pd.read_csv(os.path.join(dataset_dir,folder,file),sep=",",header=0)
        df = df[offset:offset+time*50]
        X_test.append(df.values)
        y_test.append(classes[folder])

X_test = np.array(X_test)
y_test = np.array(y_test)

#=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-==-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
                                                # Final Dataset
#=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-==-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

# USE THE BELOW GIVEN DATA FOR TRAINING and TESTING purposes

# concatenate the training and testing data
X = np.concatenate((X_train,X_test))
y = np.concatenate((y_train,y_test))

# split the data into training and testing sets. Change the seed value to obtain different random splits.
seed = 4
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.3,random_state=seed,stratify=y)

print("Training data shape: ",X_train.shape)
print("Testing data shape: ",X_test.shape)

#=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-==-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=


Training data shape:  (126, 500, 3)
Testing data shape:  (54, 500, 3)


In [15]:
#function to convert activity label back to activity name
def get_activity_class(activity_number):
    activity_classes = {
        1: "WALKING",
        2: "WALKING_DOWNSTAIRS",
        3: "WALKING_UPSTAIRS",
        4: "SITTING",
        5: "STANDING",
        6: "LAYING"
    }
    
    return activity_classes.get(activity_number, "Invalid input")

#### Zero-Shot Learning

In Zero-Shot Learning, we are going to start by giving the LLM, the raw acceleration data and ask it to give a prediction without giving it any data for training itself. We have tried two ways in which we are giving it input:

    a) Raw Accelerometer Data: In this we are giving the model only the raw accelerometer data in the 3 Axes as input and asking it to predicti the activity class. We have explicitly mentioned in our prompts about the 6 activity class, so it will be easy to use it further for permorfance calculation.

    b) TSFEL Data: In this we are creating a more feature-rich input data by using the TSFEL featurizer library. We are applying the feature extraction function to the 'X_train' variable. This changes the shape of the variable from (126, 500, 3) to (126, 1152). But considering that we are going to pass this data to the LLM, we have to take care of token size as well. So after failing to get an output from all the 1152 features, we decided to reduce the number of features to only 400. So the input data now has only the first 400 features that were obtained after using the TSFEL library.



##### Using Raw Accelerometer Data:

In [107]:
predicted_label_list = []
actual_label_list = []


for i in range(0,125):
    
    subject = i

    subject_data = X_train[subject]
    subject_label = y_train[subject]


    # Format the data for the model
    prompt = f"""Here is a sequence of accelerometer data\nData: {subject_data}\n
    
    There are 6 categories of activity that you can choose from namely 'LAYING', 'SITTING', 'STANDING', 'WALKING', 'WALKING_DOWNSTAIRS', 'WALKING_UPSTAIRS'
    What activity class does this data represent?
    Don't give explaination, just one-word answer"""

    # To use Groq LLMs 
    model_name = "llama3-70b" # We can choose any model from the groq_models dictionary
    llm = ChatGroq(model=groq_models[model_name], api_key=Groq_Token, temperature=0) # type: ignore
    answer = llm.invoke(prompt)

    #print(answer.content)
    actual_label = get_activity_class(subject_label)
    #print(actual_label)
    predicted_label_list.append(answer.content)
    actual_label_list.append(actual_label)

In [108]:
#Performance Calculation

accuracy = accuracy_score(actual_label_list, predicted_label_list)
print(f"Accuracy: {accuracy * 100:.2f}%")

precision = precision_score(actual_label_list, predicted_label_list, average='macro')
print(f"Precision: {precision * 100:.2f}%")

recall = recall_score(actual_label_list, predicted_label_list, average='macro')
print(f"Recall: {recall * 100:.2f}%")

f1 = f1_score(actual_label_list, predicted_label_list, average='macro')
print(f"F1 Score: {f1 * 100:.2f}%")

Accuracy: 28.00%
Precision: 9.33%
Recall: 27.78%
F1 Score: 13.97%


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


##### Using TSFEL Features:

In [68]:
#Using the TSFEL library to extract features
cfg = tsfel.get_features_by_domain()
X_tsfel = tsfel.time_series_features_extractor(cfg, X_train, fs = 50)

*** Feature extraction started ***



*** Feature extraction finished ***


In [72]:
X_tsfel.shape

(126, 1152)

In [60]:
predicted_label_list = []
actual_label_list = []


for i in range(0,125):
    
    subject = i

    subject_data = X_tsfel.iloc[subject][:400]
    subject_label = y_train[subject]


    # Format the data for the model
    prompt = f"""Here is a sequence of Featurized Accelerometer Data \nData: {subject_data}\n
    
    There are 6 categories of activity that you can choose from namely 'LAYING', 'SITTING', 'STANDING', 'WALKING', 'WALKING_DOWNSTAIRS', 'WALKING_UPSTAIRS'
    What activity class does this data represent?
    Don't give explaination, just one-word answer"""

    # To use Groq LLMs 
    model_name = "llama3-70b" # We can choose any model from the groq_models dictionary
    llm = ChatGroq(model=groq_models[model_name], api_key=Groq_Token, temperature=0) # type: ignore
    answer = llm.invoke(prompt)

    #print(answer.content)
    actual_label = get_activity_class(subject_label)
    #print(actual_label)
    predicted_label_list.append(answer.content)
    actual_label_list.append(actual_label)

In [61]:
#Performance Calculation

accuracy = accuracy_score(actual_label_list, predicted_label_list)
print(f"Accuracy: {accuracy * 100:.2f}%")

precision = precision_score(actual_label_list, predicted_label_list, average='macro')
print(f"Precision: {precision * 100:.2f}%")

recall = recall_score(actual_label_list, predicted_label_list, average='macro')
print(f"Recall: {recall * 100:.2f}%")

f1 = f1_score(actual_label_list, predicted_label_list, average='macro')
print(f"F1 Score: {f1 * 100:.2f}%")

Accuracy: 22.40%
Precision: 11.49%
Recall: 22.22%
F1 Score: 11.97%


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


#### Few-Shot Learning

In case of Few-shot learning, we need to give the LLM some prior data, using which it can train itself and then give a data that it has never seen before for prediction. Similar to Zero-Shot, we have tried to give the data in two ways. But before that we need to pre-process the labels. Since, we are going to give the LLM all the 'X_train' and 'y_train' data directly, changing the shape of 'y_train' may help the model to understand the context of data a bit better. Also it helps pass the labels of a particular instance inside loop quite easy.

In [13]:
y_train_expanded = np.expand_dims(y_train, axis=-1)  # Shape becomes (126, 1)
y_train_expanded = np.expand_dims(y_train_expanded, axis=-1)  # Shape becomes (126, 1, 1)
y_train_broadcasted = np.tile(y_train_expanded, (1, 500, 1))  # Shape becomes (126, 500, 1)

new_y_train = y_train_broadcasted

##### Using Raw accelerometer data:

In [76]:
predicted_label_list = []
actual_label_list = []

for i in range(1, 54):
    
    test_sub = i

    train_sub_data = X_train
    train_sub_label = new_y_train

    test_sub_data = X_test[test_sub]
    test_sub_label = get_activity_class(y_test[test_sub])

    # Format the data for the model
    prompt = f"""Here is a sequence of accelerometer data and corresponding activity label:\nData: {train_sub_data}\nLabel: {train_sub_label}\n
    
    The data that is of shape (126, 500, 3) which means there are 126 instances of 500 data points in each of the 3 axes.
    The label is the corresponding activity the human is currently doing.
    Shape of label is (126, 500, 1) which means for each instance there is only 1 label.

    The next sequence of accelerometer data is this: \nData: {test_sub_data}\n
    Your job is to predict the activity class of this data using the data and corresponding label that we gave earlier
    There are 6 categories of activity that you can choose from namely 'LAYING', 'SITTING', 'STANDING', 'WALKING', 'WALKING_DOWNSTAIRS', 'WALKING_UPSTAIRS'
    What activity class does this data represent?
    Give one-word answer"""

    # To use Groq LLMs 
    model_name = "llama3-70b" # We can choose any model from the groq_models dictionary
    llm = ChatGroq(model=groq_models[model_name], api_key=Groq_Token, temperature=0) # type: ignore
    answer = llm.invoke(prompt)

    predicted_label_list.append(answer.content)
    actual_label_list.append(test_sub_label)

In [77]:
#Performance Calculation

accuracy = accuracy_score(actual_label_list, predicted_label_list)
print(f"Accuracy: {accuracy * 100:.2f}%")

precision = precision_score(actual_label_list, predicted_label_list, average='macro')
print(f"Precision: {precision * 100:.2f}%")

recall = recall_score(actual_label_list, predicted_label_list, average='macro')
print(f"Recall: {recall * 100:.2f}%")

f1 = f1_score(actual_label_list, predicted_label_list, average='macro')
print(f"F1 Score: {f1 * 100:.2f}%")

Accuracy: 28.30%
Precision: 9.55%
Recall: 27.78%
F1 Score: 14.14%


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


##### Using TSFEL Features:

In [95]:
predicted_label_list = []
actual_label_list = []

for i in range(0, 52):
    
    test_sub = i

    train_sub_data = X_tsfel.iloc[:, :400]
    train_sub_label = y_train

    test_sub_data = X_test[test_sub]
    test_sub_label = get_activity_class(y_test[test_sub])

    # Format the data for the model
    prompt = f"""Here is a sequence of featurized accelerometer data and corresponding activity label:\nData: {train_sub_data}\nLabel: {train_sub_label}\n
    
    The data is of shape (126 , 400) which means for each of the 126 subjects, there are 400 features.
    The label is the activity the human is currently doing corresponding to the given data.

    The next sequence of accelerometer data is this: \nData: {test_sub_data}\n
    Your job is to predict the activity class of this data using the data and corresponding label that we gave earlier
    There are 6 categories of activity that you can choose from namely 'LAYING', 'SITTING', 'STANDING', 'WALKING', 'WALKING_DOWNSTAIRS', 'WALKING_UPSTAIRS'
    What activity class does this data represent?
    Give only one-word answer"""

    # To use Groq LLMs 
    model_name = "llama3-70b" # We can choose any model from the groq_models dictionary
    llm = ChatGroq(model=groq_models[model_name], api_key=Groq_Token, temperature=0) # type: ignore
    answer = llm.invoke(prompt)

    predicted_label_list.append(answer.content)
    actual_label_list.append(test_sub_label)

In [96]:
#Performance Calculation

accuracy = accuracy_score(actual_label_list, predicted_label_list)
print(f"Accuracy: {accuracy * 100:.2f}%")

precision = precision_score(actual_label_list, predicted_label_list, average='macro')
print(f"Precision: {precision * 100:.2f}%")

recall = recall_score(actual_label_list, predicted_label_list, average='macro')
print(f"Recall: {recall * 100:.2f}%")

f1 = f1_score(actual_label_list, predicted_label_list, average='macro')
print(f"F1 Score: {f1 * 100:.2f}%")

Accuracy: 28.85%
Precision: 7.39%
Recall: 18.52%
F1 Score: 10.21%


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


##### Experiment: Changing the labels to Text

We tried to change the labels from numeric to textual data. Since we are giving these inputs to LLM, it seems logical to have more text data so that it can understand the context much better. But the result which we saw was completely opposite. The accuracy dropped by using text labels instead of numeric ones.

In [99]:
y_train_label = []
for i in range(0,52):
    y_train_label.append(get_activity_class(y_train[i]))

In [100]:
predicted_label_list = []
actual_label_list = []

for i in range(0, 52):
    
    test_sub = i

    train_sub_data = X_tsfel.iloc[:, :400]
    train_sub_label = y_train_label

    test_sub_data = X_test[test_sub]
    test_sub_label = get_activity_class(y_test[test_sub])

    # Format the data for the model
    prompt = f"""Here is a sequence of featurized accelerometer data and corresponding activity label:\nData: {train_sub_data}\nLabel: {train_sub_label}\n
    
    The data is of shape (126 , 400) which means for each of the 126 subjects, there are 400 features.
    The label is the activity the human is currently doing corresponding to the given data.

    The next sequence of accelerometer data is this: \nData: {test_sub_data}\n
    Your job is to predict the activity class of this data using the data and corresponding label that we gave earlier
    There are 6 categories of activity that you can choose from namely 'LAYING', 'SITTING', 'STANDING', 'WALKING', 'WALKING_DOWNSTAIRS', 'WALKING_UPSTAIRS'
    What activity class does this data represent?
    Give only one-word answer"""

    # To use Groq LLMs 
    model_name = "llama3-70b" # We can choose any model from the groq_models dictionary
    llm = ChatGroq(model=groq_models[model_name], api_key=Groq_Token, temperature=0) # type: ignore
    answer = llm.invoke(prompt)

    predicted_label_list.append(answer.content)
    actual_label_list.append(test_sub_label)

In [101]:
# Calculate accuracy
accuracy = accuracy_score(actual_label_list, predicted_label_list)
print(f"Accuracy: {accuracy * 100:.2f}%")

# Calculate precision
precision = precision_score(actual_label_list, predicted_label_list, average='macro')
print(f"Precision: {precision * 100:.2f}%")

# Calculate recall
recall = recall_score(actual_label_list, predicted_label_list, average='macro')
print(f"Recall: {recall * 100:.2f}%")

# Calculate F1 score
f1 = f1_score(actual_label_list, predicted_label_list, average='macro')
print(f"F1 Score: {f1 * 100:.2f}%")

Accuracy: 19.23%
Precision: 11.33%
Recall: 18.52%
F1 Score: 8.12%


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


#### Question 1

##### Performance Analysis:

The performance metrics of each of the tests are listed below:

a) Zero-Shot with Raw data: 
- Accuracy: 28.00%
- Precision: 9.33%
- Recall: 27.78%
- F1 Score: 13.97%

b) Zero-Shot with TSFEL features:
- Accuracy: 22.40%
- Precision: 11.49%
- Recall: 22.22%
- F1 Score: 11.97%

c) Few-Shot with Raw data:
- Accuracy: 28.30%
- Precision: 9.55%
- Recall: 27.78%
- F1 Score: 14.14%

d) Few-Shot with TSFEL features:
- Accuracy: 28.85%
- Precision: 7.39%
- Recall: 18.52%
- F1 Score: 10.21%

**Conclusion:** We can observe that the performance using only the raw data is coming out to be better in both Zero-shot and Few-shot. Performance difference between Zero-shot and Few-shot is almost negligible. Both are performing similarly.

**Important Observation:** 
- While performing the task, it was observed that the model was predicting all the given data as either WALKING or STANDING. 
- On further investigating the outputs given by the LLM, it was observed that it was only focusing on the magnitude of the features to predict the activity. For e.g., when a data showed that the magnitude of acceleration is relatively high, it thinks that the activity is some kind of dynamic activity and by-default thinks that it must be WALKING.
- Similarly, when a given data has relatively smaller values, the model thinks that the activity resembles a static activity and by-default it predicts the activity as STANDING.
- This could be the reason, that even in Few-shot learning , the model isn't trying to learn the patterns from the data given by us. It is just predicting the activity using its own interpretitions and hence the lower performance scores.

#### Question 2

##### Performance against Decision Tree:

a) Few-Shot with Raw data:
- Accuracy: 28.30%
- Precision: 9.55%
- Recall: 27.78%
- F1 Score: 14.14%

b) Few-Shot with TSFEL features:
- Accuracy: 28.85%
- Precision: 7.39%
- Recall: 18.52%
- F1 Score: 10.21%

c) Decision tree with Raw data:
- Accuracy: 61%
- Precision: 56%
- Recall: 61%
- F1 Score: 58%

d) Decision tree with TSFEL features:
- Accuracy: 89%
- Precision: 89%
- Recall: 89%
- F1 Score: 89%


**Conclusion:** 
- Decision Tree is performing much better as compared to Few-Shot learning. 
- This is due the reason discussed in the previous question i.e., the LLM model is not actually learning any patterns by the data provided by us. It is just predicting the activity using the magnitude of given features. If the activity resembles a dynamic activity it defaults to WALKING and similarly, if the activity resembles a static activity, it defaults to STANDING.
- Decision Tree, on the other hand, is actually learning using the data and then giving the predictions accordingly.


#### Question 3

##### Limitations:
Some of the limitations of Zero-Shot ond Few-Shot Learning that we observed were:
- Performance of the Zero-Shot and Few-Shot Learning is nowhere near the performance of decision trees. Decision Tree was performing much better.
- Generating the output using LLM also takes a lot of time some of the scripts ran for more than 15 min. Whereas, Decision tree was much more efficient and also giving better accuracy.
- The LLM was not showing any signs of learning from the given data in case of Few-Shot learning. It was just defaulting to two values.
- The size of data which can be given as input is also a bottleneck in case of Zero-Shot and Few-Shot learning. The LLM model that we used i.e., LLaMa3-70b has a token limit of 6000. Decision Tree was easily trained on the TSFEL features (1152 Features), but when the same data was given to LLaMa3, it was hitting the token limit. So we had to reduce the size of data which we were passing to the LLM.


#### Question 4

For this task, we gave the LLM model only the first 16 instances of 'X_train' which didn't contain data of 'WALKING_UPSTAIRS' class. The data and the corresponding labels were given to train the LLM and then from the 'X_test' we chose some indices which contains only those data which belong to the 'WALKING_UPSTAIRS' class.

**Observations:** 

- The model was predicting WALKING_UPSTAIRS as either WALKING or STANDING.
- The observations are in-line with what we have observed so far in this Task i.e., the LLM is defaulting to only 2 values.

In [22]:
predicted_label_list = []
actual_label_list = []
index = [0, 8, 21, 34, 37]

for i in index:
    
    test_sub = i

    train_sub_data = X_train[:16]
    train_sub_label = new_y_train[:16]

    test_sub_data = X_test[test_sub]
    test_sub_label = get_activity_class(y_test[test_sub])

    # Format the data for the model
    prompt = f"""Here is a sequence of accelerometer data and corresponding activity label:\nData: {train_sub_data}\nLabel: {train_sub_label}\n
    
    The data that is of shape (16, 500, 3) which means there are 16 instances of 500 data points in each of the 3 axes.
    The label is the corresponding activity the human is currently doing.
    Shape of label is (16, 500, 1) which means for each instance there is only 1 label.

    The next sequence of accelerometer data is this: \nData: {test_sub_data}\n
    Your job is to predict the activity class of this data using the data and corresponding label that we gave earlier
    There are 6 categories of activity that you can choose from namely 'LAYING', 'SITTING', 'STANDING', 'WALKING', 'WALKING_DOWNSTAIRS', 'WALKING_UPSTAIRS'
    What activity class does this data represent?
    Give one-word answer"""

    # To use Groq LLMs 
    model_name = "llama3-70b" # We can choose any model from the groq_models dictionary
    llm = ChatGroq(model=groq_models[model_name], api_key=Groq_Token, temperature=0) # type: ignore
    answer = llm.invoke(prompt)

    predicted_label_list.append(answer.content)
    actual_label_list.append(test_sub_label)

In [23]:
predicted_label_list

['STANDING', 'WALKING', 'WALKING', 'STANDING', 'WALKING']

In [24]:
actual_label_list

['WALKING_UPSTAIRS',
 'WALKING_UPSTAIRS',
 'WALKING_UPSTAIRS',
 'WALKING_UPSTAIRS',
 'WALKING_UPSTAIRS']

In [26]:
#Making 10 instances of random data
instances = 10
datapoints = 500
axes = 3

# Random accelerometer data
data = np.random.uniform(low=-10, high=10, size=(instances, datapoints, axes))

print(data.shape)
#print(data)

(10, 500, 3)


#### Question 5

In this task, we have generated a random dataset of shape (10, 500, 3) which means 10 instances of 500 datapoints in each of the 3 axes. We applied Zero-shot learning for this task because applying Few-shot doesn't seem to show any benefit from our experiments. After applying the Zero-Shot, we can observe the same pattern, that the LLM is just defaulting to classify it as WALKING.

In [28]:
predicted_label_list = []
actual_label_list = []


for i in range(0,9):
    
    subject = i

    subject_data = data[subject]


    # Format the data for the model
    prompt = f"""Here is a sequence of accelerometer data\nData: {subject_data}\n
    
    There are 6 categories of activity that you can choose from namely 'LAYING', 'SITTING', 'STANDING', 'WALKING', 'WALKING_DOWNSTAIRS', 'WALKING_UPSTAIRS'
    What activity class does this data represent?
    Don't give explaination, just one-word answer"""

    # To use Groq LLMs 
    model_name = "llama3-70b" # We can choose any model from the groq_models dictionary
    llm = ChatGroq(model=groq_models[model_name], api_key=Groq_Token, temperature=0) # type: ignore
    answer = llm.invoke(prompt)

    print(answer.content)
    #actual_label = get_activity_class(subject_label)
    #print(actual_label)
    predicted_label_list.append(answer.content)
    #actual_label_list.append(actual_label)

WALKING
WALKING
WALKING
WALKING
WALKING
WALKING
WALKING
WALKING
WALKING
