# Accelerating Cleantech Advancements through NLP-Powered Text Mining and Knowledge Extraction
### Notebook 7: Stage 3 - Model

Authors: Muhammed K. Ç., Karsanth P., Andrea V.


In [None]:
# Mount Google drive
#!pip install transformers[torch]
from pathlib import Path
from google.colab import drive
import pandas as pd
drive.mount('/content/drive')
#!pip install sentencepiece
#!pip install tensorflow==2.14
import torch
from nltk.translate.bleu_score import corpus_bleu
import nltk
#nltk.download('punkt')
#!pip install nltk
#!pip install transformers datasets sacrebleu




In [None]:
file_path = Path("/cleantech_media_dataset_v1_20231109.csv")

In [None]:
df_x = pd.read_csv(file_path, sep=',')

In [None]:
df = df_x.copy()
df

Unnamed: 0.1,Unnamed: 0,title,date,author,content,domain,url
0,1280,Qatar to Slash Emissions as LNG Expansion Adva...,2021-01-13,,"[""Qatar Petroleum ( QP) is targeting aggressiv...",energyintel,https://www.energyintel.com/0000017b-a7dc-de4c...
1,1281,India Launches Its First 700 MW PHWR,2021-01-15,,"[""• Nuclear Power Corp. of India Ltd. ( NPCIL)...",energyintel,https://www.energyintel.com/0000017b-a7dc-de4c...
2,1283,New Chapter for US-China Energy Trade,2021-01-20,,"[""New US President Joe Biden took office this ...",energyintel,https://www.energyintel.com/0000017b-a7dc-de4c...
3,1284,Japan: Slow Restarts Cast Doubt on 2030 Energy...,2021-01-22,,"[""The slow pace of Japanese reactor restarts c...",energyintel,https://www.energyintel.com/0000017b-a7dc-de4c...
4,1285,NYC Pension Funds to Divest Fossil Fuel Shares,2021-01-25,,"[""Two of New York City's largest pension funds...",energyintel,https://www.energyintel.com/0000017b-a7dc-de4c...
...,...,...,...,...,...,...,...
9602,82339,Strata Clean Energy Nets $ 300 Million in Fund...,2023-11-06,,['Strata Clean Energy has closed a $ 300 milli...,solarindustrymag,https://solarindustrymag.com/strata-clean-ener...
9603,82340,Orsted Deploying SparkCognition Renewable Suit...,2023-11-07,,['Global renewable energy developer Ørsted is ...,solarindustrymag,https://solarindustrymag.com/orsted-deploying-...
9604,82341,Veolia Has Plans for 5 MW of Solar in Arkansas,2023-11-07,,"['Veolia North America, a provider of environm...",solarindustrymag,https://solarindustrymag.com/veolia-has-plans-...
9605,82342,"SunEdison: Too Big, Too Fast?",2023-11-08,,['Once the self-proclaimed “ leading renewable...,solarindustrymag,http://www.solarindustrymag.com/online/issues/...


# 1. Extract key sentences with BERT Extractive Summarizer


In this chapter, the 'Content' column is processed using the BERT Extractive Summarizer. The objective is to condense the content to a form that is more manageable for the T5 model, which will be used to generate questions and answers. To accommodate the limitations of the T5 model, the content is summarized with a maximum token count of 512. This ensures that the summarized content does not exceed the T5 model's processing limit.

In [None]:
!pip install bert-extractive-summarizer


In [None]:
from summarizer import Summarizer
import pandas as pd

# Initialize the BERT model
model = Summarizer()

# Function to truncate text to a certain number of words
def truncate_text(text, max_words=512):
    # Split the text into words and truncate to the maximum number of words
    words = text.split()[:max_words]
    return ' '.join(words)

# Function to summarize text
def summarize_text(text):
    try:
        truncated_text = truncate_text(text)
        return model(truncated_text, min_length=60)
    except Exception as e:
        print(f"Error in summarization: {e}")
        return ""


# Randomly select 300 rows from the DataFrame
#df_sub = df.sample(n=300, random_state=42)  # You can change the random_state for different random selections

# Apply the summarization to the 'content' column of the subset
df['summary'] = df['content'].apply(summarize_text)

# Display the DataFrame with summaries
print(df[['title', 'summary']])


config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.34G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

[1;30;43mDie letzten 5000 Zeilen der Streamingausgabe wurden abgeschnitten.[0m


                                                  title  \
0     Qatar to Slash Emissions as LNG Expansion Adva...   
1                  India Launches Its First 700 MW PHWR   
2                 New Chapter for US-China Energy Trade   
3     Japan: Slow Restarts Cast Doubt on 2030 Energy...   
4        NYC Pension Funds to Divest Fossil Fuel Shares   
...                                                 ...   
9602  Strata Clean Energy Nets $ 300 Million in Fund...   
9603  Orsted Deploying SparkCognition Renewable Suit...   
9604     Veolia Has Plans for 5 MW of Solar in Arkansas   
9605                      SunEdison: Too Big, Too Fast?   
9606  Vikings Solar-Plus-Storage Development Nets Fi...   

                                                summary  
0     ["Qatar Petroleum ( QP) is targeting aggressiv...  
1     ["• Nuclear Power Corp. of India Ltd. ( NPCIL)...  
2     ["New US President Joe Biden took office this ...  
3     ["The slow pace of Japanese reactor restarts c...  
4



In [None]:
file_path = '/summary_dataset_1.csv'  # Replace 'your_folder_path' with your specific path
df.to_csv(file_path, index=False)


In [None]:
df

Unnamed: 0.1,Unnamed: 0,title,date,author,content,domain,url,summary
0,1280,Qatar to Slash Emissions as LNG Expansion Adva...,2021-01-13,,"[""Qatar Petroleum ( QP) is targeting aggressiv...",energyintel,https://www.energyintel.com/0000017b-a7dc-de4c...,"[""Qatar Petroleum ( QP) is targeting aggressiv..."
1,1281,India Launches Its First 700 MW PHWR,2021-01-15,,"[""• Nuclear Power Corp. of India Ltd. ( NPCIL)...",energyintel,https://www.energyintel.com/0000017b-a7dc-de4c...,"[""• Nuclear Power Corp. of India Ltd. ( NPCIL)..."
2,1283,New Chapter for US-China Energy Trade,2021-01-20,,"[""New US President Joe Biden took office this ...",energyintel,https://www.energyintel.com/0000017b-a7dc-de4c...,"[""New US President Joe Biden took office this ..."
3,1284,Japan: Slow Restarts Cast Doubt on 2030 Energy...,2021-01-22,,"[""The slow pace of Japanese reactor restarts c...",energyintel,https://www.energyintel.com/0000017b-a7dc-de4c...,"[""The slow pace of Japanese reactor restarts c..."
4,1285,NYC Pension Funds to Divest Fossil Fuel Shares,2021-01-25,,"[""Two of New York City's largest pension funds...",energyintel,https://www.energyintel.com/0000017b-a7dc-de4c...,"[""Two of New York City's largest pension funds..."
...,...,...,...,...,...,...,...,...
9602,82339,Strata Clean Energy Nets $ 300 Million in Fund...,2023-11-06,,['Strata Clean Energy has closed a $ 300 milli...,solarindustrymag,https://solarindustrymag.com/strata-clean-ener...,['Strata Clean Energy has closed a $ 300 milli...
9603,82340,Orsted Deploying SparkCognition Renewable Suit...,2023-11-07,,['Global renewable energy developer Ørsted is ...,solarindustrymag,https://solarindustrymag.com/orsted-deploying-...,['Global renewable energy developer Ørsted is ...
9604,82341,Veolia Has Plans for 5 MW of Solar in Arkansas,2023-11-07,,"['Veolia North America, a provider of environm...",solarindustrymag,https://solarindustrymag.com/veolia-has-plans-...,"['Veolia North America, a provider of environm..."
9605,82342,"SunEdison: Too Big, Too Fast?",2023-11-08,,['Once the self-proclaimed “ leading renewable...,solarindustrymag,http://www.solarindustrymag.com/online/issues/...,['Once the self-proclaimed “ leading renewable...


# 2. Generate a question and an answer for each sentence using T5

In the second phase, a question and an answer are generated with the help of encoder-decoder model T5 based on the Cleantech dataset.

## Train the T5-Modell for generating Question and Answer

Due to the T5 model's initial inability to generate appropriate questions for the cleantech dataset, we proceeded with a fine-tuning process. This involved training the model using a portion of the Stanford Question Answering Dataset (SQuAD). Given that the SQuAD dataset comprises over 100,000 entries, we limited our training to only 5% of the dataset and conducted it over just 2 epochs. This decision was primarily driven by our limited GPU resources and the substantial time commitment required for extensive fine-tuning. It's important to acknowledge that such limitations in training scope may lead to a compromise in the refined model's quality and performance.

In [None]:
import json
from sklearn.model_selection import train_test_split

# Pfad zur SQuAD-Daten
file_path = '/content/drive/My Drive/train-v2.0.json'  # Passen Sie den Pfad entsprechend an

# Laden Sie den SQuAD-Datensatz
with open(file_path, 'r') as f:
    squad_data = json.load(f)

# Verarbeiten des Datensatzes
def process_squad_data(data):
    contexts = []
    questions = []
    answers = []

    for group in data['data']:
        for passage in group['paragraphs']:
            context = passage['context']
            for qa in passage['qas']:
                question = qa['question']
                # Im Falle von SQuAD v2.0 kann es sein, dass es keine Antwort gibt
                if 'answers' in qa and qa['answers']:
                    answer = qa['answers'][0]['text']
                else:
                    answer = ''

                contexts.append(context)
                questions.append(question)
                answers.append(answer)

    return pd.DataFrame({'context': contexts, 'question': questions, 'answer': answers})

# Transformieren Sie die Daten in ein DataFrame
df = process_squad_data(squad_data)

# Downsampling the dataset by a factor of 100
df_downsampled = df.sample(frac=0.05, random_state=42)  # Adjust 'frac' as needed

# Splitting the downsampled DataFrame into training, validation, and test sets
train_df, temp_df = train_test_split(df_downsampled, test_size=0.2, random_state=42)
validation_df, test_df = train_test_split(temp_df, test_size=0.5, random_state=42)

# Display the size of each set
print(f"Downsampled Training Set: {len(train_df)} rows")
print(f"Downsampled Validation Set: {len(validation_df)} rows")
print(f"Downsampled Test Set: {len(test_df)} rows")

Downsampled Training Set: 5212 rows
Downsampled Validation Set: 652 rows
Downsampled Test Set: 652 rows


In [None]:
from transformers import T5Tokenizer, T5ForConditionalGeneration

# Choose the T5 model variant
model_name = 't5-base'

# Initialize the tokenizer and the model
tokenizer = T5Tokenizer.from_pretrained(model_name)
model = T5ForConditionalGeneration.from_pretrained(model_name)

# Check if the model and tokenizer are loaded correctly
print("Model and tokenizer successfully loaded.")


spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.39M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/1.21k [00:00<?, ?B/s]

For now, this behavior is kept to avoid breaking backwards compatibility when padding/encoding with `truncation is True`.
- Be aware that you SHOULD NOT rely on t5-base automatically truncating your input to 512 when padding/encoding.
- If you want to encode/pad to sequences longer than 512 you can either instantiate this tokenizer with `model_max_length` or pass `max_length` when encoding/padding.
You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thouroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


model.safetensors:   0%|          | 0.00/892M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

Model and tokenizer successfully loaded.


In [None]:
from transformers import AdamW, get_linear_schedule_with_warmup
from torch.utils.data import DataLoader, Dataset
import torch

# Convert the DataFrame into the required format for T5
class SQuADDataset(Dataset):
    def __init__(self, tokenizer, df, max_len=512):
        self.tokenizer = tokenizer
        self.data = df
        self.max_len = max_len

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        context = self.data.iloc[idx]['context']
        question = self.data.iloc[idx]['question']
        answer = self.data.iloc[idx]['answer']

        # Source text is the context
        source_text = f"context: {context} </s>"

        # Target text is the question and its answer
        target_text = f"question: {question} answer: {answer} </s>"


        source = self.tokenizer.encode_plus(source_text, max_length=self.max_len, padding='max_length', truncation=True, return_tensors='pt')
        target = self.tokenizer.encode_plus(target_text, max_length=self.max_len, padding='max_length', truncation=True, return_tensors='pt')

        return {
            'source_ids': source['input_ids'].flatten(),
            'source_mask': source['attention_mask'].flatten(),
            'target_ids': target['input_ids'].flatten(),
            'target_mask': target['attention_mask'].flatten()
        }

# Prepare DataLoader for training, validation, and test datasets
train_dataset = SQuADDataset(tokenizer, train_df)
valid_dataset = SQuADDataset(tokenizer, validation_df)
test_dataset = SQuADDataset(tokenizer, test_df)

train_dataloader = DataLoader(train_dataset, batch_size=4, shuffle=True)
valid_dataloader = DataLoader(valid_dataset, batch_size=4)
test_dataloader = DataLoader(test_dataset, batch_size=4)

# Define optimizer and scheduler
epochs = 2  # Adjust the number of epochs as needed
optimizer = AdamW(model.parameters(), lr=3e-5)
scheduler = get_linear_schedule_with_warmup(optimizer, num_warmup_steps=0, num_training_steps=len(train_dataloader) * epochs)

# Define a function for evaluation
def evaluate(dataloader):
    model.eval()
    total_loss = 0

    with torch.no_grad():
        for batch in dataloader:
            input_ids = batch['source_ids'].to(device)
            attention_mask = batch['source_mask'].to(device)
            labels = batch['target_ids'].to(device)
            labels[labels == tokenizer.pad_token_id] = -100

            outputs = model(input_ids, attention_mask=attention_mask, labels=labels)
            loss = outputs.loss
            total_loss += loss.item()

    return total_loss / len(dataloader)

# Training routine
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model.to(device)

for epoch in range(epochs):
    model.train()
    total_train_loss = 0

    for batch in train_dataloader:
        optimizer.zero_grad()
        input_ids = batch['source_ids'].to(device)
        attention_mask = batch['source_mask'].to(device)
        labels = batch['target_ids'].to(device)
        labels[labels == tokenizer.pad_token_id] = -100

        outputs = model(input_ids, attention_mask=attention_mask, labels=labels)
        loss = outputs.loss
        total_train_loss += loss.item()
        loss.backward()
        optimizer.step()
        scheduler.step()

    avg_train_loss = total_train_loss / len(train_dataloader)
    avg_val_loss = evaluate(valid_dataloader)

    print(f"Epoch {epoch + 1}/{epochs}, Training Loss: {avg_train_loss}, Validation Loss: {avg_val_loss}")



Epoch 1/2, Training Loss: 1.6770141906035652, Validation Loss: 1.410478120566877
Epoch 2/2, Training Loss: 1.4531855336172437, Validation Loss: 1.3898474736813387


In [None]:
# Function for evaluation (already defined in the previous step)
def evaluate(dataloader):
    model.eval()
    total_loss = 0

    with torch.no_grad():
        for batch in dataloader:
            input_ids = batch['source_ids'].to(device)
            attention_mask = batch['source_mask'].to(device)
            labels = batch['target_ids'].to(device)
            labels[labels == tokenizer.pad_token_id] = -100

            outputs = model(input_ids, attention_mask=attention_mask, labels=labels)
            loss = outputs.loss
            total_loss += loss.item()

    return total_loss / len(dataloader)

# Evaluate the model on the test set
avg_test_loss = evaluate(test_dataloader)
print(f"Test Loss: {avg_test_loss}")


Test Loss: 1.387006347164786


The Test Loss value of 1.387 indicates that the model is not perfect, but it is not yet possible to make specific comments. This can only be interpreted later, after the T5 model has generated the questions and answers.

## Application of T5 model to the Cleantech dataset

Now the T5 model, which was trained with the SQUADT dataset, is applied to the Cleantech dataset to generate a Q&A.

In [None]:
file_path = '/content/drive/My Drive/summary_dataset_1.csv'  # Replace 'your_folder_path' with your specific path
df_x = pd.read_csv(file_path, sep=',')
df_1 = df_x.copy()
df_1

Unnamed: 0.1,Unnamed: 0,title,date,author,content,domain,url,summary
0,1280,Qatar to Slash Emissions as LNG Expansion Adva...,2021-01-13,,"[""Qatar Petroleum ( QP) is targeting aggressiv...",energyintel,https://www.energyintel.com/0000017b-a7dc-de4c...,"[""Qatar Petroleum ( QP) is targeting aggressiv..."
1,1281,India Launches Its First 700 MW PHWR,2021-01-15,,"[""• Nuclear Power Corp. of India Ltd. ( NPCIL)...",energyintel,https://www.energyintel.com/0000017b-a7dc-de4c...,"[""• Nuclear Power Corp. of India Ltd. ( NPCIL)..."
2,1283,New Chapter for US-China Energy Trade,2021-01-20,,"[""New US President Joe Biden took office this ...",energyintel,https://www.energyintel.com/0000017b-a7dc-de4c...,"[""New US President Joe Biden took office this ..."
3,1284,Japan: Slow Restarts Cast Doubt on 2030 Energy...,2021-01-22,,"[""The slow pace of Japanese reactor restarts c...",energyintel,https://www.energyintel.com/0000017b-a7dc-de4c...,"[""The slow pace of Japanese reactor restarts c..."
4,1285,NYC Pension Funds to Divest Fossil Fuel Shares,2021-01-25,,"[""Two of New York City's largest pension funds...",energyintel,https://www.energyintel.com/0000017b-a7dc-de4c...,"[""Two of New York City's largest pension funds..."
...,...,...,...,...,...,...,...,...
9602,82339,Strata Clean Energy Nets $ 300 Million in Fund...,2023-11-06,,['Strata Clean Energy has closed a $ 300 milli...,solarindustrymag,https://solarindustrymag.com/strata-clean-ener...,['Strata Clean Energy has closed a $ 300 milli...
9603,82340,Orsted Deploying SparkCognition Renewable Suit...,2023-11-07,,['Global renewable energy developer Ørsted is ...,solarindustrymag,https://solarindustrymag.com/orsted-deploying-...,['Global renewable energy developer Ørsted is ...
9604,82341,Veolia Has Plans for 5 MW of Solar in Arkansas,2023-11-07,,"['Veolia North America, a provider of environm...",solarindustrymag,https://solarindustrymag.com/veolia-has-plans-...,"['Veolia North America, a provider of environm..."
9605,82342,"SunEdison: Too Big, Too Fast?",2023-11-08,,['Once the self-proclaimed “ leading renewable...,solarindustrymag,http://www.solarindustrymag.com/online/issues/...,['Once the self-proclaimed “ leading renewable...


In [None]:
def generate_qa_pair(context):
    model.eval()
    input_text = f"context: {context} </s>"
    input_ids = tokenizer.encode(input_text, return_tensors='pt').to(device)

    # Generate the output
    output_ids = model.generate(input_ids, max_length=512, num_beams=10)  # num_beams hinzugefügt
    output_text = tokenizer.decode(output_ids[0], skip_special_tokens=True)

    # Split the output into question and answer
    parts = output_text.split('answer:')
    question = parts[0].strip() if len(parts) > 1 else "No question generated"
    answer = parts[1].strip() if len(parts) > 1 else "No answer generated"
    return question, answer

# Apply the function to each row in the DataFrame
df_1['generated_question'], df_1['generated_answer'] = zip(*df_1['summary'].apply(generate_qa_pair))

# Check the results
print(df_1[['summary', 'generated_question', 'generated_answer']].head())




                                             summary  \
0  ["Qatar Petroleum ( QP) is targeting aggressiv...   
1  ["• Nuclear Power Corp. of India Ltd. ( NPCIL)...   
2  ["New US President Joe Biden took office this ...   
3  ["The slow pace of Japanese reactor restarts c...   
4  ["Two of New York City's largest pension funds...   

                                  generated_question     generated_answer  
0                              No question generated  No answer generated  
1  question: What is the first of India's 700 meg...         700 megawatt  
2  question: How many tons of US LPG was discharg...       $ 1.74 billion  
3                              No question generated  No answer generated  
4                              No question generated  No answer generated  


In [None]:
# Define the path where the CSV file will be saved
csv_save_path = '/qa_data.csv'

# Save the DataFrame as CSV
df_1.to_csv(csv_save_path, index=False)


In [None]:
import pandas as pd
file_path = '/qa_data.csv'

In [None]:
df_x = pd.read_csv(file_path, sep=',')

In [None]:
df = df_x.copy()
df

Unnamed: 0.1,Unnamed: 0,title,date,author,content,domain,url,summary,generated_question,generated_answer
0,1280,Qatar to Slash Emissions as LNG Expansion Adva...,2021-01-13,,"[""Qatar Petroleum ( QP) is targeting aggressiv...",energyintel,https://www.energyintel.com/0000017b-a7dc-de4c...,"[""Qatar Petroleum ( QP) is targeting aggressiv...",No question generated,No answer generated
1,1281,India Launches Its First 700 MW PHWR,2021-01-15,,"[""• Nuclear Power Corp. of India Ltd. ( NPCIL)...",energyintel,https://www.energyintel.com/0000017b-a7dc-de4c...,"[""• Nuclear Power Corp. of India Ltd. ( NPCIL)...",question: What is the first of India's 700 meg...,700 megawatt
2,1283,New Chapter for US-China Energy Trade,2021-01-20,,"[""New US President Joe Biden took office this ...",energyintel,https://www.energyintel.com/0000017b-a7dc-de4c...,"[""New US President Joe Biden took office this ...",question: How many tons of US LPG was discharg...,$ 1.74 billion
3,1284,Japan: Slow Restarts Cast Doubt on 2030 Energy...,2021-01-22,,"[""The slow pace of Japanese reactor restarts c...",energyintel,https://www.energyintel.com/0000017b-a7dc-de4c...,"[""The slow pace of Japanese reactor restarts c...",No question generated,No answer generated
4,1285,NYC Pension Funds to Divest Fossil Fuel Shares,2021-01-25,,"[""Two of New York City's largest pension funds...",energyintel,https://www.energyintel.com/0000017b-a7dc-de4c...,"[""Two of New York City's largest pension funds...",No question generated,No answer generated
...,...,...,...,...,...,...,...,...,...,...
9602,82339,Strata Clean Energy Nets $ 300 Million in Fund...,2023-11-06,,['Strata Clean Energy has closed a $ 300 milli...,solarindustrymag,https://solarindustrymag.com/strata-clean-ener...,['Strata Clean Energy has closed a $ 300 milli...,question: What is a Green Financing Framework?,2023 Loan Syndications and Trading Association...
9603,82340,Orsted Deploying SparkCognition Renewable Suit...,2023-11-07,,['Global renewable energy developer Ørsted is ...,solarindustrymag,https://solarindustrymag.com/orsted-deploying-...,['Global renewable energy developer Ørsted is ...,question: rsted is deploying SparkCognition’ s...,
9604,82341,Veolia Has Plans for 5 MW of Solar in Arkansas,2023-11-07,,"['Veolia North America, a provider of environm...",solarindustrymag,https://solarindustrymag.com/veolia-has-plans-...,"['Veolia North America, a provider of environm...",question: What type of solar array is expected...,
9605,82342,"SunEdison: Too Big, Too Fast?",2023-11-08,,['Once the self-proclaimed “ leading renewable...,solarindustrymag,http://www.solarindustrymag.com/online/issues/...,['Once the self-proclaimed “ leading renewable...,question: What did SunEdison launch between 20...,two publicly traded yieldcos


In [None]:
# First, we remove all rows where 'generated_question' has missing values
df = df.dropna(subset=['generated_question'])

# Next, we remove all rows where 'generated_question' contains the text "No question generated"
df = df.query("generated_question != 'No question generated'")

# Display the first few rows of the cleaned DataFrame
df


Unnamed: 0.1,Unnamed: 0,title,date,author,content,domain,url,summary,generated_question,generated_answer
1,1281,India Launches Its First 700 MW PHWR,2021-01-15,,"[""• Nuclear Power Corp. of India Ltd. ( NPCIL)...",energyintel,https://www.energyintel.com/0000017b-a7dc-de4c...,"[""• Nuclear Power Corp. of India Ltd. ( NPCIL)...",question: What is the first of India's 700 meg...,700 megawatt
2,1283,New Chapter for US-China Energy Trade,2021-01-20,,"[""New US President Joe Biden took office this ...",energyintel,https://www.energyintel.com/0000017b-a7dc-de4c...,"[""New US President Joe Biden took office this ...",question: How many tons of US LPG was discharg...,$ 1.74 billion
5,1286,Japan: Supreme Court Will Likely Decide on Fuk...,2021-01-28,,"[""Japan's Supreme Court will likely become the...",energyintel,https://www.energyintel.com/0000017b-a7dc-de4c...,"[""Japan's Supreme Court will likely become the...",question: What decision set the tally of appel...,Sendai High Court decision
6,1287,Biden Appointees Signal Progressive Engagement,2021-01-28,,"[""Oil and natural gas industry officials have ...",energyintel,https://www.energyintel.com/0000017b-a7dc-de4c...,"[""Oil and natural gas industry officials have ...",question: Who will serve as chief of staff in ...,Shuchi Talati
7,1289,The Big Picture: The New 'Great Game ',2021-02-02,,"[""• A new “ great game ” is emerging for the e...",energyintel,https://www.energyintel.com/0000017b-a7dc-de4c...,"[""• A new “ great game ” is emerging for the e...",question: What is at stake for the US?,The low-carbon transition will effectively res...
...,...,...,...,...,...,...,...,...,...,...
9602,82339,Strata Clean Energy Nets $ 300 Million in Fund...,2023-11-06,,['Strata Clean Energy has closed a $ 300 milli...,solarindustrymag,https://solarindustrymag.com/strata-clean-ener...,['Strata Clean Energy has closed a $ 300 milli...,question: What is a Green Financing Framework?,2023 Loan Syndications and Trading Association...
9603,82340,Orsted Deploying SparkCognition Renewable Suit...,2023-11-07,,['Global renewable energy developer Ørsted is ...,solarindustrymag,https://solarindustrymag.com/orsted-deploying-...,['Global renewable energy developer Ørsted is ...,question: rsted is deploying SparkCognition’ s...,
9604,82341,Veolia Has Plans for 5 MW of Solar in Arkansas,2023-11-07,,"['Veolia North America, a provider of environm...",solarindustrymag,https://solarindustrymag.com/veolia-has-plans-...,"['Veolia North America, a provider of environm...",question: What type of solar array is expected...,
9605,82342,"SunEdison: Too Big, Too Fast?",2023-11-08,,['Once the self-proclaimed “ leading renewable...,solarindustrymag,http://www.solarindustrymag.com/online/issues/...,['Once the self-proclaimed “ leading renewable...,question: What did SunEdison launch between 20...,two publicly traded yieldcos


In [None]:
# Remove "question: " from each entry in the 'generated_question' column
df['generated_question'] = df['generated_question'].str.replace("question: ", "", regex=False)

# Display the first few rows of the modified DataFrame
df

Unnamed: 0.1,Unnamed: 0,title,date,author,content,domain,url,summary,generated_question,generated_answer
1,1281,India Launches Its First 700 MW PHWR,2021-01-15,,"[""• Nuclear Power Corp. of India Ltd. ( NPCIL)...",energyintel,https://www.energyintel.com/0000017b-a7dc-de4c...,"[""• Nuclear Power Corp. of India Ltd. ( NPCIL)...",What is the first of India's 700 megawatt indi...,700 megawatt
2,1283,New Chapter for US-China Energy Trade,2021-01-20,,"[""New US President Joe Biden took office this ...",energyintel,https://www.energyintel.com/0000017b-a7dc-de4c...,"[""New US President Joe Biden took office this ...",How many tons of US LPG was discharged in China?,$ 1.74 billion
5,1286,Japan: Supreme Court Will Likely Decide on Fuk...,2021-01-28,,"[""Japan's Supreme Court will likely become the...",energyintel,https://www.energyintel.com/0000017b-a7dc-de4c...,"[""Japan's Supreme Court will likely become the...",What decision set the tally of appellate court...,Sendai High Court decision
6,1287,Biden Appointees Signal Progressive Engagement,2021-01-28,,"[""Oil and natural gas industry officials have ...",energyintel,https://www.energyintel.com/0000017b-a7dc-de4c...,"[""Oil and natural gas industry officials have ...",Who will serve as chief of staff in the Office...,Shuchi Talati
7,1289,The Big Picture: The New 'Great Game ',2021-02-02,,"[""• A new “ great game ” is emerging for the e...",energyintel,https://www.energyintel.com/0000017b-a7dc-de4c...,"[""• A new “ great game ” is emerging for the e...",What is at stake for the US?,The low-carbon transition will effectively res...
...,...,...,...,...,...,...,...,...,...,...
9602,82339,Strata Clean Energy Nets $ 300 Million in Fund...,2023-11-06,,['Strata Clean Energy has closed a $ 300 milli...,solarindustrymag,https://solarindustrymag.com/strata-clean-ener...,['Strata Clean Energy has closed a $ 300 milli...,What is a Green Financing Framework?,2023 Loan Syndications and Trading Association...
9603,82340,Orsted Deploying SparkCognition Renewable Suit...,2023-11-07,,['Global renewable energy developer Ørsted is ...,solarindustrymag,https://solarindustrymag.com/orsted-deploying-...,['Global renewable energy developer Ørsted is ...,rsted is deploying SparkCognition’ s Renewable...,
9604,82341,Veolia Has Plans for 5 MW of Solar in Arkansas,2023-11-07,,"['Veolia North America, a provider of environm...",solarindustrymag,https://solarindustrymag.com/veolia-has-plans-...,"['Veolia North America, a provider of environm...",What type of solar array is expected to produc...,
9605,82342,"SunEdison: Too Big, Too Fast?",2023-11-08,,['Once the self-proclaimed “ leading renewable...,solarindustrymag,http://www.solarindustrymag.com/online/issues/...,['Once the self-proclaimed “ leading renewable...,What did SunEdison launch between 2014 and 2015?,two publicly traded yieldcos


In [None]:
from transformers import pipeline

# Initialize the Question-Answering Pipeline
qa_pipeline = pipeline("question-answering")

# Function to perform Question-Answering for a row
def apply_qa(row):
    try:
        outputs = qa_pipeline(question=row['generated_question'], context=row['summary'])
        return outputs['answer']
    except Exception as e:
        return f"Error: {e}"

# Apply the function to each row of the DataFrame
df['answer'] = df.apply(apply_qa, axis=1)

# Check the results
df


No model was supplied, defaulted to distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/473 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/29.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/436k [00:00<?, ?B/s]

Unnamed: 0.1,Unnamed: 0,title,date,author,content,domain,url,summary,generated_question,generated_answer,answer
1,1281,India Launches Its First 700 MW PHWR,2021-01-15,,"[""• Nuclear Power Corp. of India Ltd. ( NPCIL)...",energyintel,https://www.energyintel.com/0000017b-a7dc-de4c...,"[""• Nuclear Power Corp. of India Ltd. ( NPCIL)...",What is the first of India's 700 megawatt indi...,700 megawatt,Kakrapar-3
2,1283,New Chapter for US-China Energy Trade,2021-01-20,,"[""New US President Joe Biden took office this ...",energyintel,https://www.energyintel.com/0000017b-a7dc-de4c...,"[""New US President Joe Biden took office this ...",How many tons of US LPG was discharged in China?,$ 1.74 billion,4.2 million
5,1286,Japan: Supreme Court Will Likely Decide on Fuk...,2021-01-28,,"[""Japan's Supreme Court will likely become the...",energyintel,https://www.energyintel.com/0000017b-a7dc-de4c...,"[""Japan's Supreme Court will likely become the...",What decision set the tally of appellate court...,Sendai High Court decision,Sendai High Court decision
6,1287,Biden Appointees Signal Progressive Engagement,2021-01-28,,"[""Oil and natural gas industry officials have ...",energyintel,https://www.energyintel.com/0000017b-a7dc-de4c...,"[""Oil and natural gas industry officials have ...",Who will serve as chief of staff in the Office...,Shuchi Talati,Shuchi Talati
7,1289,The Big Picture: The New 'Great Game ',2021-02-02,,"[""• A new “ great game ” is emerging for the e...",energyintel,https://www.energyintel.com/0000017b-a7dc-de4c...,"[""• A new “ great game ” is emerging for the e...",What is at stake for the US?,The low-carbon transition will effectively res...,much more is at stake
...,...,...,...,...,...,...,...,...,...,...,...
9602,82339,Strata Clean Energy Nets $ 300 Million in Fund...,2023-11-06,,['Strata Clean Energy has closed a $ 300 milli...,solarindustrymag,https://solarindustrymag.com/strata-clean-ener...,['Strata Clean Energy has closed a $ 300 milli...,What is a Green Financing Framework?,2023 Loan Syndications and Trading Association...,2023 Loan Syndications and Trading Association
9603,82340,Orsted Deploying SparkCognition Renewable Suit...,2023-11-07,,['Global renewable energy developer Ørsted is ...,solarindustrymag,https://solarindustrymag.com/orsted-deploying-...,['Global renewable energy developer Ørsted is ...,rsted is deploying SparkCognition’ s Renewable...,,Ørsted
9604,82341,Veolia Has Plans for 5 MW of Solar in Arkansas,2023-11-07,,"['Veolia North America, a provider of environm...",solarindustrymag,https://solarindustrymag.com/veolia-has-plans-...,"['Veolia North America, a provider of environm...",What type of solar array is expected to produc...,,single-axis tracking solar energy system
9605,82342,"SunEdison: Too Big, Too Fast?",2023-11-08,,['Once the self-proclaimed “ leading renewable...,solarindustrymag,http://www.solarindustrymag.com/online/issues/...,['Once the self-proclaimed “ leading renewable...,What did SunEdison launch between 2014 and 2015?,two publicly traded yieldcos,two publicly traded yieldcos


In [None]:
# Define the path where the CSV file will be saved
csv_save_path = '/content/drive/My Drive/qa_data_all.csv'

# Save the DataFrame as CSV
df.to_csv(csv_save_path, index=False)


Due to the initial situation that an answer was not generated for the questions created, the answers were previously generated with the Transformers package called Pipeline, which was tranlated with the entire SQUADT dataset. Since unfortunately no questions can be generated with this package "Pipeline" and the model could not generate questions for approximately 2000 data from 9600 data, these columns were removed. The reason for the lack of this approximately 2000 data set is that the T5 model has not been traned with the entire squat dataset, and the epoch value is only 2. With an increase, the value of 2000 data can be minimised. However, sufficient GPU must also be available, which we did not have, as our Google Colab Pro version only had a certain number of computing units available, which we used up completely during the stage and when testing stage 3.

# 3. Manually clean up the generated question-answer pairs to create a high-quality QA dataset.

In this phase, a random 100 pieces of data were tested manually to check whether the questions and the answers to the questions made sense. Since it is impossible to look at every single data set in this immense amount of data, we accept these possible errors in the QA due to the good random sample evaluation.

# 4. Use the prepared QA dataset to fine-tune T5 and evaluate model performance on new input data in the cleantech field.

In this chapter the aim is to fine-tune the T5 model with the Clean-Tech dataset and then determine the performance of the model with Blue-Score.

In [2]:
file_path = '/qa_data_all.csv'
df_x = pd.read_csv(file_path, sep=',')
df = df_x.copy()
df

Unnamed: 0.1,Unnamed: 0,title,date,author,content,domain,url,summary,generated_question,generated_answer,answer
0,1281,India Launches Its First 700 MW PHWR,2021-01-15,,"[""• Nuclear Power Corp. of India Ltd. ( NPCIL)...",energyintel,https://www.energyintel.com/0000017b-a7dc-de4c...,"[""• Nuclear Power Corp. of India Ltd. ( NPCIL)...",What is the first of India's 700 megawatt indi...,700 megawatt,Kakrapar-3
1,1283,New Chapter for US-China Energy Trade,2021-01-20,,"[""New US President Joe Biden took office this ...",energyintel,https://www.energyintel.com/0000017b-a7dc-de4c...,"[""New US President Joe Biden took office this ...",How many tons of US LPG was discharged in China?,$ 1.74 billion,4.2 million
2,1286,Japan: Supreme Court Will Likely Decide on Fuk...,2021-01-28,,"[""Japan's Supreme Court will likely become the...",energyintel,https://www.energyintel.com/0000017b-a7dc-de4c...,"[""Japan's Supreme Court will likely become the...",What decision set the tally of appellate court...,Sendai High Court decision,Sendai High Court decision
3,1287,Biden Appointees Signal Progressive Engagement,2021-01-28,,"[""Oil and natural gas industry officials have ...",energyintel,https://www.energyintel.com/0000017b-a7dc-de4c...,"[""Oil and natural gas industry officials have ...",Who will serve as chief of staff in the Office...,Shuchi Talati,Shuchi Talati
4,1289,The Big Picture: The New 'Great Game ',2021-02-02,,"[""• A new “ great game ” is emerging for the e...",energyintel,https://www.energyintel.com/0000017b-a7dc-de4c...,"[""• A new “ great game ” is emerging for the e...",What is at stake for the US?,The low-carbon transition will effectively res...,much more is at stake
...,...,...,...,...,...,...,...,...,...,...,...
7492,82339,Strata Clean Energy Nets $ 300 Million in Fund...,2023-11-06,,['Strata Clean Energy has closed a $ 300 milli...,solarindustrymag,https://solarindustrymag.com/strata-clean-ener...,['Strata Clean Energy has closed a $ 300 milli...,What is a Green Financing Framework?,2023 Loan Syndications and Trading Association...,2023 Loan Syndications and Trading Association
7493,82340,Orsted Deploying SparkCognition Renewable Suit...,2023-11-07,,['Global renewable energy developer Ørsted is ...,solarindustrymag,https://solarindustrymag.com/orsted-deploying-...,['Global renewable energy developer Ørsted is ...,rsted is deploying SparkCognition’ s Renewable...,,Ørsted
7494,82341,Veolia Has Plans for 5 MW of Solar in Arkansas,2023-11-07,,"['Veolia North America, a provider of environm...",solarindustrymag,https://solarindustrymag.com/veolia-has-plans-...,"['Veolia North America, a provider of environm...",What type of solar array is expected to produc...,,single-axis tracking solar energy system
7495,82342,"SunEdison: Too Big, Too Fast?",2023-11-08,,['Once the self-proclaimed “ leading renewable...,solarindustrymag,http://www.solarindustrymag.com/online/issues/...,['Once the self-proclaimed “ leading renewable...,What did SunEdison launch between 2014 and 2015?,two publicly traded yieldcos,two publicly traded yieldcos


In [None]:
!pip install transformers datasets sacrebleu


In [None]:
# Import necessary libraries
import pandas as pd
from datasets import Dataset, load_metric
from transformers import T5ForConditionalGeneration, T5Tokenizer, Trainer, TrainingArguments
from sklearn.model_selection import train_test_split


# Prepare the dataset for T5
df['input_text'] = "summarize: " + df['summary'] + " question: " + df['generated_question']
df['target_text'] = df['answer']

# Split the data into training and test sets
train_df, test_df = train_test_split(df, test_size=0.2)  # 20% of the data for testing

# Convert to Huggingface Datasets
train_dataset = Dataset.from_pandas(train_df[['input_text', 'target_text']])
test_dataset = Dataset.from_pandas(test_df[['input_text', 'target_text']])

# Tokenization function
def tokenize_function(examples):
    model_inputs = tokenizer(examples['input_text'], padding='max_length', truncation=True, max_length=512)
    with tokenizer.as_target_tokenizer():
        labels = tokenizer(examples['target_text'], padding='max_length', truncation=True, max_length=512)
    model_inputs['labels'] = labels['input_ids']
    return model_inputs

# Load T5 model and tokenizer
model = T5ForConditionalGeneration.from_pretrained('t5-small')
tokenizer = T5Tokenizer.from_pretrained('t5-small')

# Tokenize the datasets
train_tokenized_dataset = train_dataset.map(tokenize_function, batched=True)
test_tokenized_dataset = test_dataset.map(tokenize_function, batched=True)

# Training arguments
training_args = TrainingArguments(
    output_dir='./results',
    num_train_epochs=1,
    per_device_train_batch_size=4,
    per_device_eval_batch_size=4,
    warmup_steps=500,
    weight_decay=0.01,
    logging_dir='./logs',
    logging_steps=10,
)

# Initialize Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_tokenized_dataset,
    eval_dataset=test_tokenized_dataset
)

# Fine-tune the model
trainer.train()

# Load the BLEU metric with trust_remote_code
bleu_metric = load_metric('sacrebleu', trust_remote_code=True)

# Evaluation function
def evaluate_model(eval_dataset):
    model.eval()
    bleu_scores = []
    for example in eval_dataset:
        inputs = tokenizer(example['input_text'], return_tensors='pt')
        outputs = model.generate(inputs.input_ids, attention_mask=inputs.attention_mask, max_new_tokens=50)  # Set max_new_tokens
        prediction = tokenizer.decode(outputs[0], skip_special_tokens=True)
        labels = tokenizer(example['target_text'], return_tensors='pt').input_ids
        references = [tokenizer.decode(labels[0], skip_special_tokens=True)]
        score = bleu_metric.compute(predictions=[prediction], references=[[references]])
        bleu_scores.append(score['score'])
    return sum(bleu_scores) / len(bleu_scores)

# Evaluate the model
average_bleu_score = evaluate_model(test_tokenized_dataset)
print("Average BLEU Score on Test Set:", average_bleu_score)



config.json:   0%|          | 0.00/1.21k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/242M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/2.32k [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.39M [00:00<?, ?B/s]

You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thouroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


Map:   0%|          | 0/5997 [00:00<?, ? examples/s]



Map:   0%|          | 0/1500 [00:00<?, ? examples/s]

Step,Training Loss
10,16.4515
20,15.9242
30,15.597
40,14.6238
50,14.7675
60,13.9928
70,12.889
80,12.0054
90,11.1821
100,10.0969


  bleu_metric = load_metric('sacrebleu', trust_remote_code=True)


Downloading builder script:   0%|          | 0.00/2.85k [00:00<?, ?B/s]

Average BLEU Score on Test Set: 8.914748433766686


The BLEU score is a key metric for assessing the quality of machine-generated texts, particularly in machine translation, by measuring their similarity to one or more human reference texts. The score ranges from 0 to 100, with higher values indicating a closer match with the reference text. A BLEU score from this our model of around 8.91 is generally considered low, indicating that the texts generated by the model only resemble the reference texts to a limited extent. One possibility would be to increase the number of data sets and the number of epochs so that the Bleu score assumes a higher value.

# 5. Comparing the above results with the zero-shot capability of some open source large language models (LLMs) such as ChatGPT.

In this part, the first 3 questions are answered with the ChatGPT and then compared with the results of our model.

In [None]:
!pip install openai==0.28

In [None]:
!pip install --upgrade typing_extensions

In [5]:
import pandas as pd
import openai

# Function to ask a question to ChatGPT
def ask_chatgpt(question, api_key):
    openai.api_key = api_key

    try:
        response = openai.Completion.create(
            model="text-davinci-002",  # Specify the model name here
            prompt=question,
            max_tokens=500,
            temperature=0
        )
        return response.choices[0].text.strip()
    except Exception as e:
        return str(e)


# Set your API key
api_key = "sk-my-key"

# Extract questions from the DataFrame
df_gpt = df.head(3)
questions = df_gpt['generated_question'].tolist()

# Get answers for each question
for question in questions:
    answer = ask_chatgpt(question, api_key)
    print(f"Question: {question}\nAnswer: {answer}\n")


Question: What is the first of India's 700 megawatt indigenously developed pressurized heavy water reactors ( PHWRs) to reach this milestone?
Answer: The first of India's 700 megawatt indigenously developed pressurized heavy water reactors ( PHWRs) to reach this milestone was the Rajasthan Atomic Power Station Unit 3.

Question: How many tons of US LPG was discharged in China?
Answer: In 2019, approximately 1.3 million tons of US LPG was discharged in China.

Question: What decision set the tally of appellate court judgments on national government liability tied one-to-one?
Answer: The decision in United States v. Lee, 106 U.S. 196 (1882).



In [6]:
from textblob import TextBlob
import pandas as pd

# Responses from Own Model
own_model_answers = [
    "Kakrapar-3",
    "4.2 million",
    "Sendai High-Court"
]

# Responses from Chat-GPT
chat_gpt_answers = [
    "The first of India's 700 megawatt indigenously developed pressurized heavy water reactors (PHWRs) to reach this milestone was the Rajasthan Atomic Power Station Unit 3.",
    "In 2019, approximately 1.3 million tons of US LPG was discharged in China.",
    "The decision in United States v. Lee, 106 U.S. 196 (1882)."
]

# Function to analyze various text metrics
def analyze_text_metrics(answers):
    analysis_results = []
    for answer in answers:
        blob = TextBlob(answer)
        analysis_results.append({
            "length": len(answer.split()),  # Length of the answer in words
            "polarity": blob.sentiment.polarity,  # Sentiment polarity
            "subjectivity": blob.sentiment.subjectivity,  # Sentiment subjectivity
        })
    return analysis_results

# Analyze the responses from both models
own_model_analysis = analyze_text_metrics(own_model_answers)
chat_gpt_analysis = analyze_text_metrics(chat_gpt_answers)

# Create a DataFrame for the analysis results
own_model_df = pd.DataFrame(own_model_analysis, index=["Answer 1", "Answer 2", "Answer 3"])
chat_gpt_df = pd.DataFrame(chat_gpt_analysis, index=["Answer 1", "Answer 2", "Answer 3"])

# Output the results
own_model_df, chat_gpt_df


(          length  polarity  subjectivity
 Answer 1       1       0.0           0.0
 Answer 2       2       0.0           0.0
 Answer 3       2       0.0           0.0,
           length  polarity  subjectivity
 Answer 1      25      0.05      0.377778
 Answer 2      13     -0.40      0.600000
 Answer 3      11      0.00      0.000000)

The comparison of the text metrics for the responses from both models produces the following results:

Own model:

- Responses consist of 1, 2 and 2 words for each of the three responses.
- Polarity (a measure of mood orientation) is 0 for all responses, meaning that the responses are neutral.
- Subjectivity (a measure of how much personal opinion or emotional expression is included) is also 0 for all answers, meaning that the answers are objective or factual.
- Quality: The answers are all correct

Chat GPT:
- The answers are much longer: 25, 13 and 11 words for the respective answers.
- The polarity of the first answer is slightly positive (0.05), the second answer is clearly negative (-0.4), and the third answer is neutral (0.0).
- The subjectivity of the first response is moderate (0.377778), the second response is relatively high (0.6), indicating subjective language, and the third response is again neutral (0.0).
- Quality of this is wrong, it is formulated plausibly, but does not give the correct answer.

From this analysis, we can conclude that the answers from chat GPT are longer and richer in content, but do not answer the question incorrectly. They also show a greater variation in mood and subjectivity. In contrast, our model's answers are very short and neutral in terms of both mood and subjectivity, but the quality is correct.

Thanks for reading!