#Document Question Generation and Question Answering using LLAMA-2
This project was developed in the **Vanguard Center** at **Mohammed VI Polytechnic University (UM6P)** by **Oussama Zaid**. The main aim of this project is to create a system that generates questions from documents in any language and provides accurate long form answers in french, making it easier to understand and interact with complex information.

## Technologies Used

This project harnesses the power of cutting-edge technologies to achieve its goals:

- **LLAMA-2**: Llama 2 is a second-generation open-source large language model (LLM) from Meta, it is one of the most sophisticated open source LLM with 3 options: 7b parameters, 13b and 70b.

- **Langchain**: When used with Llama-2, Langchain can greatly enhance question generation and answering. It allows Llama-2 to connect with other data sources and interact with its environment (PDFs in our case), leading to more context-aware & accurate responses.

- **Faiss**: A library developed by Facebook AI that enables efficient similarity search. Given a set of vectors (embeddings), we can index them using Faiss — then using another vector (the query vector), we search for the most similar vectors within the index. As a result It allows us to use as many PDFs as we want as input.

- **HuggingFace Transformers**: HuggingFace Transformers are a collection of pre-trained deep learning models for natural language processing (NLP) and computer vision (CV) tasks. They are based on the transformer architecture, which allows machines to understand, generate, and manipulate human language and images with exceptional contextual awareness.

Through the fusion of these advanced technologies, this project aims to reshape the way we engage with written content, presenting it in a more accessible, comprehensible, and user-friendly manner. By developing a robust Document Question Generation and Question Answering system, users can seamlessly extract knowledge from documents and attain insights efficiently and intuitively.



#Environment setup
First, verify that you're using a cuda environment, if not, go to Runtime > Change runtime type > T4

In [None]:
from torch import cuda

device = f'cuda:{cuda.current_device()}' if cuda.is_available() else 'cpu'
device

'cuda:0'

###Install the required libraries

In [None]:
!pip install -qU transformers
!pip install -qU accelerate
!pip install -qU einops
!pip install -qU langchain
!pip install -qU xformers
!pip install -qU bitsandbytes
!pip install -qU faiss-gpu
!pip install -qU sentence_transformers
!pip install -qU pypdf
!pip install -qU googletrans==4.0.0-rc1

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.6/7.6 MB[0m [31m12.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m268.8/268.8 kB[0m [31m26.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.8/7.8 MB[0m [31m43.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m48.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m251.2/251.2 kB[0m [31m4.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m42.2/42.2 kB[0m [31m1.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.7/1.7 MB[0m [31m9.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m49.4/49.4 kB[0m [31m5.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

####Import necessary libraries


In [None]:
import re                                                                       # Regular expressions
import json                                                                     # JSON parsing

import torch                                                                    # PyTorch module
from torch import cuda, bfloat16                                                # GPU acceleration

import transformers                                                             # Transformers library & modules
from transformers import StoppingCriteria, StoppingCriteriaList
from transformers import AutoConfig, AutoModelForCausalLM, AutoTokenizer

from langchain.document_loaders import PyPDFLoader                              # PDF Loader
from langchain.llms import HuggingFacePipeline                                  # HuggingFace pipeline
from langchain.text_splitter import RecursiveCharacterTextSplitter              # Text splitter
from langchain.embeddings import HuggingFaceEmbeddings                          # Embeddings generator
from langchain.vectorstores import FAISS                                        # Vector store
from langchain.chains import ConversationalRetrievalChain                       # Document retrieval chain

from googletrans import Translator                                              # Google translator

####Model Configuration

In [None]:
# Define the model ID. This is the identifier for the model we're going to use.
model_id = '4bit/Llama-2-7b-chat-hf' # This is a 4-bit quatizated version of the model, if it doesn't work, use the original model `meta-llama/Llama-2-7b-chat-hf` alongside with `bitsandbytes` library (code commented below)

# Define the Hugging Face authentication token. This is required to access models from Hugging Face's model hub.
hf_auth = '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'

# Create a configuration for BitsAndBytes. BitsAndBytes is a library for quantization of neural network parameters.
bnb_config = transformers.BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type='nf4',
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=bfloat16
)

# Create a configuration for our model using the AutoConfig class from the transformers library.
model_config = AutoConfig.from_pretrained(
    model_id,
    use_auth_token=hf_auth
)



Downloading (…)lve/main/config.json:   0%|          | 0.00/583 [00:00<?, ?B/s]

####Model Loading

In [None]:
model = AutoModelForCausalLM.from_pretrained(
    model_id,  # The model ID defined earlier.
    trust_remote_code=True,  # This indicates that we trust the remote code. Be careful with this setting in a production environment.
    config=model_config,  # The model configuration defined earlier.
    quantization_config=bnb_config,  # The BitsAndBytes configuration defined earlier.
    device_map='auto',  # This sets the device map to 'auto', which means the library will automatically select the best device to run the model (CPU or GPU).
    use_auth_token=hf_auth  # The Hugging Face authentication token defined earlier.
)

# Set our model to evaluation mode. This is necessary before using the model for inference because it tells PyTorch to disable features like dropout that are used during training but not during inference.
model.eval()

# Load our tokenizer which is responsible for turning our input text into a format that the model can understand.
tokenizer = AutoTokenizer.from_pretrained(
    model_id,  # The model ID defined earlier.
    use_auth_token=hf_auth  # The Hugging Face authentication token defined earlier.
)

print(f"Model loaded on {device}")



Downloading (…)fetensors.index.json:   0%|          | 0.00/26.8k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

Downloading (…)of-00002.safetensors:   0%|          | 0.00/9.98G [00:00<?, ?B/s]

Downloading (…)of-00002.safetensors:   0%|          | 0.00/3.50G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]



Downloading (…)neration_config.json:   0%|          | 0.00/179 [00:00<?, ?B/s]



Downloading (…)okenizer_config.json:   0%|          | 0.00/746 [00:00<?, ?B/s]

Downloading tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

Downloading (…)in/added_tokens.json:   0%|          | 0.00/21.0 [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/435 [00:00<?, ?B/s]

Model loaded on cuda:0


####Defining a stopping criteria
To control the model's behaviour, we set up a custom stopping criteria to prevent repetition, control the length and ensure the coherence of the output.

In [None]:
# Define stop tokens and convert them to device tensors
stop_list = ['\nHuman:', '\n```\n']
stop_token_ids = [tokenizer(x)['input_ids'] for x in stop_list]
stop_token_ids = [torch.LongTensor(x).to(device) for x in stop_token_ids]

# Define custom stopping criteria class
class StopOnTokens(StoppingCriteria):
    def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor, **kwargs) -> bool:
        for stop_ids in stop_token_ids:
            if torch.eq(input_ids[0][-len(stop_ids):], stop_ids).all():
                return True
        return False

# Set up stopping criteria for text generation
stopping_criteria = StoppingCriteriaList([StopOnTokens()])

####Pipeline & LLM initialization

In [None]:
# Initialize the text generation pipeline
generate_text = transformers.pipeline(
    model=model,
    tokenizer=tokenizer,
    return_full_text=True,
    task='text-generation',
    stopping_criteria=stopping_criteria,
    temperature=0.1,
    max_new_tokens=2048,
    repetition_penalty=1.1
)

# Initialize language model manager (llm) pipeline
llm = HuggingFacePipeline(pipeline=generate_text)

####Document Loading & Preprocessing
In this section, we load our PDF document(s), then split them in 500 characters chunks

#####Documents preprocessing

We'll test the algorithm using 3 large pdf documents (up to 600 pages each), to see if the model could read them and generate relevant questions.

In [None]:
# pdf_files = ["/content/Marketing digital.pdf", "/content/Marketing.pdf", "/content/Marketing2.pdf"]  # add your pdf file paths here

pdf_files = ["/content/Marketing.pdf"]

text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=20)

# embedding model
model_name = "sentence-transformers/all-mpnet-base-v2"
model_kwargs = {"device": "cuda"}
embeddings = HuggingFaceEmbeddings(model_name=model_name, model_kwargs=model_kwargs)

Downloading (…)a8e1d/.gitattributes:   0%|          | 0.00/1.18k [00:00<?, ?B/s]

Downloading (…)_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Downloading (…)b20bca8e1d/README.md:   0%|          | 0.00/10.6k [00:00<?, ?B/s]

Downloading (…)0bca8e1d/config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

Downloading (…)ce_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

Downloading (…)e1d/data_config.json:   0%|          | 0.00/39.3k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/438M [00:00<?, ?B/s]

Downloading (…)nce_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

Downloading (…)a8e1d/tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

Downloading (…)8e1d/train_script.py:   0%|          | 0.00/13.1k [00:00<?, ?B/s]

Downloading (…)b20bca8e1d/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading (…)bca8e1d/modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

To ensure the model flexibility, we train the model on PDFs in any language, then we translate the model answers to the desired language, in our case, french.

In [None]:
# Function to translate text
from googletrans import Translator

def translate(text, target_language):
    translator = Translator()
    translated = translator.translate(text, dest=target_language)
    return translated.text

For each PDF document, we split, generate embeddings and questions/answers, then we store the data in json format to be easy for future parsing.
First, we create a function that formats and extracts exactly the information we need from the output of the llm model and translate it from any language to french.

In [None]:
def split_and_store(text):
    # Split the text into sentences based on the numbered format
    sentences = re.split(r'\d+\.', text)
    sentences = [sentence.strip() for sentence in sentences if sentence.strip()]

    # Split each sentence by the "Answer" delimiter and organize into question-answer pairs
    qa_pairs = []
    for sentence in sentences:
        parts = re.split(r'\bAnswer\b', sentence)
        if len(parts) == 2:
            translated_question = translate(parts[0].strip(), "fr")
            translated_answer = translate(parts[1].strip(), "fr")
        else:
            # If "Answer" delimiter is missing, use "?" as delimiter
            parts = sentence.split("?")
            if len(parts) == 2:
                translated_question = translate(parts[0].strip(), "fr")
                translated_answer = translate(parts[1].strip(), "fr")
            else:
                continue  # Skip this sentence if neither "Answer" nor "?" found

        qa_pairs.append({"question": translated_question, "answer": translated_answer})

    # Create a dictionary to store question-answer pairs indexed by numbers
    qa_dict = {str(index): pair for index, pair in enumerate(qa_pairs, start=1)}

    return qa_dict

In [None]:
def split_and_store(text):
    # Split the text into sentences based on the numbered format
    sentences = re.split(r'\d+\.', text)
    sentences = [sentence.strip() for sentence in sentences if sentence.strip()]

    # Split each sentence by the "Correct answer" delimiter and organize into question-answer pairs
    qa_pairs = []
    for sentence in sentences:
        parts = re.split(r'\bCorrect answer\b', sentence)
        if len(parts) == 2:
            question = translate(parts[0].strip(), "fr")
            answer = translate(parts[1].strip().strip(": "), "fr")
            qa_pairs.append({"question": question, "answer": answer.strip(":")})

    # Create a dictionary to store question-answer pairs indexed by numbers
    qa_dict = {str(index): pair for index, pair in enumerate(qa_pairs, start=1)}

    return qa_dict

We prompt the model to give us 10 questions and their long answers from the provided documents.
The process could take some time depending on the hardware used

In [None]:
#questions generation from document store
chat_history = []
prompt = """
              Your are an expert exam author, based on the information from the document provided, write 10 relevant questions followed by their correct, long and detailed answers, numerated from 1, to 10 and never mention yourself or the document, never say 'as exam author', never say 'According to the document', the questions must be about the general concepts, don't dive too much into details.
         """

# Initialize a dictionary to store all JSON data
all_json_data = {}

for pdf_file in pdf_files:
    # Load and split each file separately
    loader = PyPDFLoader(pdf_file)
    documents = loader.load_and_split()
    all_splits = text_splitter.split_documents(documents)

    # Generate embeddings and initialize vectorstore and chain for each file separately
    vectorstore = FAISS.from_documents(all_splits, embeddings)
    chain = ConversationalRetrievalChain.from_llm(llm, vectorstore.as_retriever(), return_source_documents=True)

    # Generate questions for each file separately and regenerate if JSON data is empty (up to 2 attempts)
    for attempt in range(2):
        result = chain({"question": prompt, "chat_history": chat_history})
        answer = result['answer']

        # Translate and format the questions and answers into JSON for each file separately
        qa_dict = split_and_store(answer)

        if qa_dict:  # If JSON data is not empty, break the loop
            break

    # Store the JSON data for each file in the dictionary
    all_json_data[pdf_file] = qa_dict

# Convert the dictionary with all JSON data into a JSON string
all_json_string = json.dumps(all_json_data, indent=4, ensure_ascii=False)

Model output

In [None]:
print(all_json_string)

{
    "/content/Marketing.pdf": {
        "1": {
            "question": "Quel est le but de l'utilisation d'un questionnaire dans la recherche d'enquête",
            "answer": "Un questionnaire est utilisé dans la recherche enquête pour recueillir des données des participants à travers une série de questions.Le but d'utiliser un questionnaire est de collecter des informations auprès d'un grand nombre de personnes dans un temps relativement court et d'obtenir des données plus précises et fiables que ce qui ne serait possible par le biais d'entretiens personnels ou d'autres méthodes."
        },
        "2": {
            "question": "Comment l'utilisation d'un questionnaire aide-t-elle à améliorer la qualité des données d'enquête",
            "answer": "L'utilisation d'un questionnaire permet d'améliorer la qualité des données d'enquête en permettant une approche plus systématique et structurée de la collecte de données.Cela peut inclure l'utilisation de questions standardisées, l'él

From each PDF, we generated 10 relevant questions and their long form answers.

#Scoring Algorithm
The second part is the scoring algoritm, where we give a score / 10 to the user's answer to a given question from the PDFs.

In [None]:
import re


def extract_score(text):
    # find the score out of 10 in both "X out of 10" and "X/10" formats
    score_matches = re.findall(r'(\d{1,2})(?: out of 10|/10)', text)

    if score_matches:
        scores = [int(match) for match in score_matches]
        return max(scores)  # Return the highest score if multiple are found
    else:
        return 5  # Default score if no match is found


def get_chain(pdf_files):
    all_splits = []
    for pdf_file in pdf_files:
        # Load and split each file separately
        loader = PyPDFLoader(pdf_file)
        documents = loader.load_and_split()
        all_splits.extend(text_splitter.split_documents(documents))
    # Generate embeddings and initialize vectorstore and chain for all files together
    vectorstore = FAISS.from_documents(all_splits, embeddings)
    chain = ConversationalRetrievalChain.from_llm(llm, vectorstore.as_retriever(), return_source_documents=True)
    return chain


def meaning_similarity_score(student_answer, original_answer, chain):
    chat_history = []

    scoring_prompt = f"""Compare the meaning of the following two sentences and provide a score/10,
                         if although the words used are similar, the ideas expressed in the two sentences
                         are different, give it a very low score:
                      **First Sentence:**
                      "{student_answer}"

                      **Second Sentence:**
                      "{original_answer}"
                      """
    scoring_result = chain({"question": scoring_prompt, "chat_history": chat_history})
    scoring_answer = scoring_result['answer']
    score = extract_score(scoring_answer)

    return score


# Answer comparison scoring
def answer_comparison_scoring(question, student_answer, original_answer, chain):
    chat_history = []

    scoring_prompt = f"""
                    You are an expert exam corrector. You must provide a numerical score out of 10 for the student's answer in relation to the given question and the correct answer.
                    Question: {question} <end of question>
                    Student Answer: {student_answer} <end of student answer>
                    Correct Answer: {original_answer} <end of original answer>
                    """

    scoring_result = chain({"question": scoring_prompt, "chat_history": chat_history})
    scoring_answer = scoring_result['answer']
    score = extract_score(scoring_answer)

    return score


# Model based scoring
def model_based_scoring(question, student_answer, chain):
    chat_history = []

    scoring_prompt = f"You are an expert exam corrector, you will give a score out of 10 to the answer: {student_answer} of the question: {question}, please be strict and give bad scoring if the answer is wrong or doesn't exist in the document provided, and a good one if the answer is correct"

    scoring_result = chain({"question": scoring_prompt, "chat_history": chat_history})
    scoring_answer = scoring_result['answer']
    score = extract_score(scoring_answer)

    return score


def calculate_score(question, student_answer, original_answer, chain):
    total_score = 0

    for _ in range(3):
        meaning_similarity = float(meaning_similarity_score(student_answer, original_answer, chain))
        answer_comparison = float(answer_comparison_scoring(question, student_answer, original_answer, chain))
        model_scoring = float(model_based_scoring(question, student_answer, chain))

        # Calculate the total score for this trial
        trial_score = 0.2 * meaning_similarity + 0.2 * answer_comparison + 0.6 * model_scoring
        # Accumulate the trial score
        total_score += trial_score

    # Calculate the average score over the trials
    average_score = total_score / 3

    return average_score

In [None]:
question = "Quel est l'objectif principal d'une campagne de planification des médias?"
original_answer = "L'objectif principal d'une campagne de planification des médias est de déterminer la combinaison optimale des canaux médiatiques, des périodes, des formats et des durées pour atteindre le public cible avec le bon message au bon moment et dans le budget alloué."
student_answer = "Determiner la combinaison optimale des canaux mediatiques, des formats et des durees pour atteintre le public cible avec le bon message."

all_pdfs_chain = get_chain(pdf_files)
calculate_score(question, student_answer, original_answer, all_pdfs_chain)



6.599999999999999

The algorithm gave it\ a score of 7 (after rounding), and it seems logical since I didn't give the whole ideas to match the original answer.

In [None]:
import PyPDF2
import os

def merge_pdfs(pdf_files):
    # Create a PDF merger object
    pdf_merger = PyPDF2.PdfMerger()

    # Loop through the list and append each PDF to the merger
    for pdf_file in pdf_files:
        pdf_merger.append(pdf_file)

    # Output directory where the merged PDF will be saved
    output_directory = "/content/"  # You can change this to your desired directory

    # Ensure the output directory exists, or create it if it doesn't
    os.makedirs(output_directory, exist_ok=True)

    # Construct the full path for the merged PDF
    output_path = os.path.join(output_directory, output_filename)

    # Write the merged PDF to the output file
    pdf_merger.write(output_path)

    # Close the merger
    pdf_merger.close()

    return output_path

# Output file name for the merged PDF
output_filename = "merged.pdf"

In [None]:
merged = merge_pdfs(pdf_files)

In [None]:
merged

In [None]:
prompt = """
              Your are an expert exam author, based on the information from the document provided, write 10 relevant questions followed by their correct, long and detailed answers, numerated from 1, to 10 and never mention yourself or the document, never say 'as exam author', never say 'According to the document', the questions must be about the general concepts, don't dive too much into details.
         """

all_json_data = {}

all_pdfs_chain = get_chain([merged])

result = all_pdfs_chain({"question": prompt, "chat_history": chat_history})
answer = result['answer']

all_json_data[pdf_file] = split_and_store(answer)
all_json_string2 = json.dumps(all_json_data, indent=4, ensure_ascii=False)

In [None]:
import json
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity

def merge_jsons(json1, json2, threshold=0.8):
    # Extract questions and answers from JSON objects
    questions1 = [pair["question"] for pair in json1]
    answers1 = [pair["answer"] for pair in json1]

    questions2 = [pair["question"] for pair in json2]
    answers2 = [pair["answer"] for pair in json2]

    merged_json = []

    for question1, answer1 in zip(questions1, answers1):
        similar_question_found = False
        for question2, answer2 in zip(questions2, answers2):
            # Create a vectorizer for the questions
            vectorizer = CountVectorizer().fit_transform([question1, question2])

            # Compute cosine similarity
            cosine_sim = cosine_similarity(vectorizer)

            # Check if similarity is above the threshold
            if cosine_sim[0][1] > threshold:
                # If similar, discard the second question
                similar_question_found = True
                break

        if not similar_question_found:
            # If no similar question found, add it to the merged JSON
            merged_json.append({
                "question": question1,
                "answer": answer1
            })

    # Add remaining questions from json2 to merged JSON
    for question2, answer2 in zip(questions2, answers2):
        merged_json.append({
            "question": question2,
            "answer": answer2
        })

    return merged_json

In [None]:
reserve = {}

categorie = "Data Analysis"
description = "Data warhousing, data visualization, Power BI"

prompt = f"generate 10 questions and their correct, long and detailed answers about {categorie} and its description: {description}, numerated from '1.' to '10.'"

all_pdfs_chain = get_chain(pdf_files)

result = all_pdfs_chain({"question": prompt, "chat_history": chat_history})
answer = result['answer']

reserve[pdf_file] = split_and_store(answer)
reserve_json_string = json.dumps(reserve, indent=4, ensure_ascii=False)

In [None]:
print(reserve_json_string)

In [None]:
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity


def calculate_score(question, student_answer, original_answer, chain, similarity_threshold=0.7):
    total_score = 0

    # Create a TF-IDF vectorizer
    vectorizer = TfidfVectorizer()

    # Fit and transform the text data
    tfidf_matrix = vectorizer.fit_transform([student_answer, original_answer])

    # Calculate cosine similarity between the two TF-IDF representations
    similarity_score = cosine_similarity(tfidf_matrix[0], tfidf_matrix[1])[0][0]

    # If the similarity is below the threshold, assign a score of 0
    if similarity_score < similarity_threshold:
        return 0

    for _ in range(3):
        meaning_similarity = float(meaning_similarity_score(student_answer, original_answer, chain))
        answer_comparison = float(answer_comparison_scoring(question, student_answer, original_answer, chain))
        model_scoring = float(model_based_scoring(question, student_answer, chain))

        # Calculate the total score for this trial
        trial_score = 0.2 * meaning_similarity + 0.2 * answer_comparison + 0.6 * model_scoring
        # Accumulate the trial score
        total_score += trial_score

    # Calculate the average score over the trials
    average_score = total_score / 3

    return average_score