<a href="https://colab.research.google.com/github/yotamgardosh/Between-Artificial-and-Human-Intelligence/blob/main/Final_Project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Between Artificial And Human Intelligence - Final Project**


---

Yotam Gardosh - 208541334

## Project Overview:

 The objective of my project is to analyze the behavior of language models (LLMs) regarding cognitive biases. Specifically, I aim to investigate whether LLMs exhibit the Bandwagon Effect in their responses to prompts.

 ### the Bandwagon Effect:
The bandwagon effect is a cognitive bias where individuals tend to adopt certain behaviors or beliefs because they perceive that others are doing the same. In our society, this psychological tendency translates to the accelerated adoption of beliefs, ideas, trends, and fads as more individuals embrace them. As more people come to believe in something, others also 'hop on the bandwagon' regardless of the underlying evidence.



# Imports And Constants

In [2]:
!pip install transformers
import torch
from scipy.stats import chisquare
from transformers import AutoTokenizer, AutoModelForCausalLM
import numpy as np

SENTENCES_PATH = 'sentences.txt'
ICL_ENCOURAGE_PATH = 'ICL_encourage.txt'
ICL_NEGATE_PATH = 'ICL_negate.txt'


MODEL_1 = 'gpt2'
MODEL_2 = 'gemma-2b'
MODEL_3 = 'phi2'



# Methods for Model Prefrence

In [3]:
def get_model(model_name):
    """
    Loads a model and it's matching tokenizer according to model name.
    """
    if model_name == 'gpt2':
      model_name = 'openai-community/gpt2'
      tokenizer = AutoTokenizer.from_pretrained(model_name)
      model = AutoModelForCausalLM.from_pretrained(model_name)
    elif model_name == 'gemma-2b':
      model_name = "google/gemma-2b"
      token = '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'
      tokenizer = AutoTokenizer.from_pretrained(model_name, token = (token))
      model = AutoModelForCausalLM.from_pretrained(model_name,token = token)
    elif model_name == 'phi2':
      model_name = 'microsoft/phi-2'
      tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code = True)
      model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code = True)

    return tokenizer, model

def generate_model_predictions(model, tokenizer, input_sequence):
    input_ids = tokenizer.encode(input_sequence, return_tensors='pt')
    with torch.no_grad():
        logits = model(input_ids=input_ids)[0]
    return logits

def calculate_model_prefrence(logits, tokenizer):
  #convert logits to probabilities
  probs = torch.softmax(logits, dim=-1).squeeze().numpy()
  model_vocab = tokenizer.get_vocab()

  # retrieve the probabilities of the input tokens only
  index_1, index_2 = model_vocab['1'], model_vocab['2']
  return 1 if probs[-1][index_1] >= probs[-1][index_2] else 2


def run_benchmark(tokenizer, model, sentence_list, bias_modifier = ''):
  answers = []
  for sentence in sentence_list:
    logits = generate_model_predictions(model, tokenizer, bias_modifier + sentence)
    answers.append(calculate_model_prefrence(logits, tokenizer))
  return answers



###Technique to generate model prediction

To calculate the model's predictions, we first encode the input sequence into tokens using the provided tokenizer. These tokens are then fed into the model to obtain logits, which represent the raw output scores for each token in the vocabulary. We convert these logits into probabilities using the softmax function, generating a probability distribution over the vocabulary. Next, we retrieve the indices corresponding to the tokens representing the options (e.g., '1' and '2') from the model's vocabulary. Finally, we compare the probabilities associated with these indices to determine the model's preference. If the probability of the '1' token is greater than or equal to the probability of the '2' token, the model predicts option 1; otherwise, it predicts option 2. This process allows us to derive the model's preference based on the input sequence and its associated probabilities.

# Bias Testing

In [4]:
from scipy.stats import chi2_contingency

def print_results(data):
    agree_count = data.count(1)
    disagree_count = data.count(2)
    print("Agree:", agree_count)
    print("Disagree:", disagree_count)


def chi_square_test(data):
    # Count the frequency of each response
    agree_count = data.count(1)
    disagree_count = data.count(2)

    if agree_count == 100 or disagree_count == 100:
      # there is not varience in the data
      print("P-value: 0" )
      print("Chi2: 100")
      return

    # Create a contingency table
    observed = [[agree_count, disagree_count]]

    # Perform the chi-square test
    chi2, p, _, _ = chi2_contingency(observed)
    print("P-value: ", p)
    print("Chi2: ", chi2)

    # Check the significance level
    alpha = 0.05
    if p < alpha:
        print("There is a significant association between responses (Chi-square Test).")
    else:
        print("There is no significant association between responses (Chi-square Test).")


def analyze_results(data):
  print("########### raw results ###########")
  print_results(data)

  print("########### Chi-square Test ###########")
  chi_square_test(data)


The Chi-square test is a statistical method used to determine whether there is a significant association between categorical variables. It compares the observed frequencies of categories with the frequencies that would be expected under the assumption of no association.

In the context of testing for the bandwagon bias, the Chi-square test evaluates whether there's a significant association between the model's responses (e.g., "Agree" or "Disagree") and the presented prompts. If the model is biased towards the bandwagon effect, we anticipate a higher frequency of "Agree" responses, indicating a tendency to conform to the perceived majority opinion, irrespective of the prompt's content. The p-value generated by the test reflects the likelihood of observing the obtained results under the assumption of no association. A p-value less than 0.05 suggests a significant relationship, indicating bias towards the bandwagon effect, while a higher p-value implies no significant association, suggesting the absence of bias.

# explaning the bias

In [5]:
# Open the text file and read its contents
with open(SENTENCES_PATH, 'r') as file:
    sentences = file.read()

# Split the contents of the file into individual sentences
sentence_list = sentences.split('\n')

# Benchmark
The benchmark consists of 100 questions, to test wheter the model exibits the bias i started each question with the prahse "Most people think" followed by a statement, some statments are more wildly accepted and some are not, followed by two response options: one agreeing with the purported majority opinion and the other disagreeing. By varying the content and framing of the statements across prompts, I aim to gauge the model's susceptibility to the bandwagon effect—whether it demonstrates a tendency to favor the option aligning with the perceived majority opinion, even in the absence of compelling evidence or reasoning.

# Initial Bias Testing

## Model 1 - GPT2

In [6]:
# loading gpt2
tokenizer_gpt, model_gpt = get_model(MODEL_1)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

In [12]:
answers_gpt = run_benchmark(tokenizer_gpt, model_gpt, sentence_list)
analyze_results(answers_gpt)

########### raw results ###########
Agree: 23
Disagree: 77
########### Chi-square Test ###########
P-value:  1.0
Chi2:  0.0
There is no significant association between responses (Chi-square Test).


A p-value of 1.0 suggests that there is no significant association between the responses generated by the model (Agree or Disagree) and the prompts presented to it. Similarly, a chi-square value of 0.0 indicates that the observed frequencies of Agree and Disagree responses do not deviate significantly from what would be expected under the assumption of no association.

Therefore, based on these results, we conclude that there is no significant evidence to suggest that the model's responses are influenced by the bandwagon bias


## Model 2 - Gemma-2b

In [12]:
# loading gemma-2b
tokenizer_gemma, model_gemma = get_model(MODEL_2)

tokenizer_config.json:   0%|          | 0.00/33.6k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/4.24M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.5M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/636 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/627 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/13.5k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.95G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/67.1M [00:00<?, ?B/s]

Gemma's activation function should be approximate GeLU and not exact GeLU.
Changing the activation function to `gelu_pytorch_tanh`.if you want to use the legacy `gelu`, edit the `model.config` to set `hidden_activation=gelu`   instead of `hidden_act`. See https://github.com/huggingface/transformers/pull/29402 for more details.


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/137 [00:00<?, ?B/s]

In [7]:
answers_gemma = run_benchmark(tokenizer_gemma, model_gemma, sentence_list)
analyze_results(answers_gemma)

########### raw results ###########
Agree: 6
Disagree: 94
########### Chi-square Test ###########
P-value:  1.0
Chi2:  0.0
There is no significant association between responses (Chi-square Test).


A p-value of 1.0 suggests that there is no significant association between the responses generated by the model (Agree or Disagree) and the prompts presented to it. Similarly, a chi-square value of 0.0 indicates that the observed frequencies of Agree and Disagree responses do not deviate significantly from what would be expected under the assumption of no association.

Therefore, based on these results, we conclude that there is no significant evidence to suggest that the model's responses are influenced by the bandwagon bias

## Model 3 -  Phi-2


In [6]:
# loading phi-2
tokenizer_phi, model_phi = get_model(MODEL_3)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

In [7]:
answers_phi2 = run_benchmark(tokenizer_phi, model_phi, sentence_list)
analyze_results(answers_phi2)

########### raw results ###########
Agree: 100
Disagree: 0
########### Chi-square Test ###########
P-value: 0
Chi2: 100


A p-value of 0 indicates that there is a significant association between the responses generated by the model (Agree) and the prompts presented to it. Similarly, a chi-square value of 100 suggests a substantial deviation from what would be expected under the assumption of no association.

Therefore, based on these results, we conclude that the model's responses are significantly influenced by the bandwagon bias, as it consistently aligns with the perceived majority opinion without considering alternative perspectives.

# Encouraging or Negating Bias

## **Method 1: Chain of thoughts (CoT)**


---




The "Chain of Thoughts" technique aims to refine language models' responses by introducing a series of intermediate reasoning steps preceding the final answer. CoT involves training models to navigate through a chain of logical deductions or considerations before reaching a conclusion. In efforts to mitigate biases, models are presented with extended benchmarks that require additional reasoning steps before providing an answer. Through this approach, we try to produce more comprehensive and nuanced responses by encouraging models to engage in deeper thought processes, and to counteract biases by introducing factors that challenge or counter prevailing trends or opinions

In [9]:
encourage = "It's widely accepted and"
negate = "A common misconception is that"

The addition of the 'encourage' string to each sentence will hopefully provide more incentive for the model to agree, similarly the 'negate' string will provide better logic to disagree with the opinion.


###GPT2 - encourage

In [13]:
answers_CoT_gpt = run_benchmark(tokenizer_gpt, model_gpt, sentence_list, encourage)
analyze_results(answers_CoT_gpt)

########### raw results ###########
Agree: 49
Disagree: 51
########### Chi-square Test ###########
P-value:  1.0
Chi2:  0.0
There is no significant association between responses (Chi-square Test).



After implementing the Chain of Thoughts (CoT) technique, the model's responses shifted, with a higher proportion now aligning with the perceived majority opinion. Previously, the model generated 23 "Agree" and 77 "Disagree" responses. However, after applying CoT, the distribution changed to 49 "Agree" and 51 "Disagree" responses. Despite this adjustment, the Chi-square Test still indicates no significant association between the responses, suggesting that while CoT influenced the bias, it did not create a statistically significant shift in the model's behavior.

###Gemma-2b - encourage

In [10]:
answers_CoT_gemma = run_benchmark(tokenizer_gemma, model_gemma, sentence_list, encourage)
analyze_results(answers_CoT_gemma)

########### raw results ###########
Agree: 0
Disagree: 100
########### Chi-square Test ###########
P-value: 0
Chi2: 100



Despite implementing the Chain of Thoughts (CoT) technique, the model's responses remained unchanged. Initially, the model generated 6 "Agree" and 94 "Disagree" responses. After applying CoT, however, the distribution persisted, with all 100 responses categorized as "Disagree." The Chi-square Test still indicates no significant association between the responses, reinforcing that the CoT technique did not influence the bias in this instance.







###Phi-2 - negate

In [9]:
answers_CoT_phi2 = run_benchmark(tokenizer_phi, model_phi, sentence_list, negate)
analyze_results(answers_CoT_phi2)

########### raw results ###########
Agree: 100
Disagree: 0
########### Chi-square Test ###########
P-value: 0
Chi2: 100


The implementation of the Chain of Thoughts (CoT) technique had no discernible effect on the model's responses. Initially, all 100 responses generated by the model were categorized as "Agree," with none categorized as "Disagree." Following the application of CoT, the distribution remained unchanged, with all responses continuing to align with the perceived majority opinion. The Chi-square Test confirms that there is no significant association between the responses, indicating that the CoT technique did not influence the bias in this particular model.

##**Method 2: In Context Learning (ICL)**


---

In Context Learning (ICL) is a method aimed at refining language models' responses by incorporating additional context during the training process. In my project, I added 10 supplementary sentences with predetermined answers (either agree or disagree, depending on whether negating or encouraging the bias was the desired outcome) alongside the training data. This augmentation to the training allows the model to gain exposure to a wider range of scenarios and responses, giving it a better understanding of the nuances of language and context.


In [7]:
# load encourage and negate ICL sentences
with open(ICL_ENCOURAGE_PATH, 'r') as file:
    ICL_encourage = file.read()

ICL_encourage_list = ICL_encourage.split('\n')


with open(ICL_NEGATE_PATH, 'r') as file:
    ICL_negate = file.read()

ICL_negate_list = ICL_negate.split('\n')


- The encourage addition consists of 10 unpopular opinions presented with the prefix 'most people think,' with the provided answer as agree. This is done to 'teach' the model to agree with even uncommon opinions, hopefully encouraging the bias.

- The negate addition consists of 10 popular opinions presented with the prefix 'most people think,' with the provided answer as disagree. This is done to 'teach' the model to disagree with the popular opinion, hopefully negating the bias.

###GPT2 - encourage

In [11]:
answers_ICL_gpt = run_benchmark(tokenizer_gpt, model_gpt, ICL_encourage_list + sentence_list)
analyze_results(answers_ICL_gpt[10:])

########### raw results ###########
Agree: 23
Disagree: 77
########### Chi-square Test ###########
P-value:  1.0
Chi2:  0.0
There is no significant association between responses (Chi-square Test).


The results post-ICL technique show no significant change in the model's bias.

###Gemma-2b - encourage

In [13]:
answers_ICL_gemma = run_benchmark(tokenizer_gemma, model_gemma, ICL_encourage_list + sentence_list)
analyze_results(answers_ICL_gemma[10:])

########### raw results ###########
Agree: 6
Disagree: 94
########### Chi-square Test ###########
P-value:  1.0
Chi2:  0.0
There is no significant association between responses (Chi-square Test).


The results post-ICL technique show no significant change in the model's bias.

###Phi-2 - negate

In [8]:
answers_ICL_phi2 = run_benchmark(tokenizer_phi, model_phi, ICL_negate_list + sentence_list)
analyze_results(answers_ICL_phi2[10:])

########### raw results ###########
Agree: 100
Disagree: 0
########### Chi-square Test ###########
P-value: 0
Chi2: 100


The results post-ICL technique show no significant change in the model's bias.

# Conclusion

Out of the three models I tested, two were not biased (GPT2, Gemma-2b), and one was strongly biased (Phi-2). I tried two methods to manipulate the bias of the models: 'Chain of Thought' and 'In Context Learning'. Out of the two methods, CoT had the strongest effect on the models, with GPT2 showing the biggest difference. From the results, we see that the 'In Context Learning' technique had no impact on altering the model's bias. Further exploration may be necessary to yield different outcomes. Additionally, the Phi2 model remained unaffected by each de-biasing method and maintained a consistent bias of 100% throughout the experiment. In conclusion, while I was unable to entirely mitigate or exacerbate the bandwagon bias in the three models tested, this study provides valuable insights into the behavior of LLMs and underscores the prevalence of cognitive biases within these models.

# Appendix

##Benchmark:
---


Most people think, chocolate ice cream is the best
flavor. 1) Agree 2) Disagree
Most people think, reading books is a great way to relax. 1) Agree 2) Disagree
Most people think, cats are better pets than dogs. 1) Agree 2) Disagree
Most people think, traveling is essential for personal growth. 1) Agree 2) Disagree
Most people think, pizza is the perfect food. 1) Agree 2) Disagree
Most people think, exercise is important for overall health. 1) Agree 2) Disagree
Most people think, laughter is the best medicine. 1) Agree 2) Disagree
Most people think, technology makes life easier. 1) Agree 2) Disagree
Most people think, breakfast is the most important meal of the day. 1) Agree 2) Disagree
Most people think, music can change your mood. 1) Agree 2) Disagree
Most people think, the beach is the best vacation destination. 1) Agree 2) Disagree
Most people think, learning a second language is valuable. 1) Agree 2) Disagree
Most people think, social media has a negative impact on mental health. 1) Agree 2) Disagree
Most people think, honesty is the best policy. 1) Agree 2) Disagree
Most people think, chocolate chip cookies are the best kind of cookie. 1) Agree 2) Disagree
Most people think, exercise is boring. 1) Agree 2) Disagree
Most people think, learning to cook is a useful skill. 1) Agree 2) Disagree
Most people think, spending time in nature is good for the soul. 1) Agree 2) Disagree
Most people think, dogs make better companions than cats. 1) Agree 2) Disagree
Most people think, laughter is contagious. 1) Agree 2) Disagree
Most people think, coffee is essential for starting the day. 1) Agree 2) Disagree
Most people think, art has the power to inspire. 1) Agree 2) Disagree
Most people think, money can't buy happiness. 1) Agree 2) Disagree
Most people think, smartphones are a necessary tool in modern life. 1) Agree 2) Disagree
Most people think, the early bird catches the worm. 1) Agree 2) Disagree
Most people think, kindness is underrated. 1) Agree 2) Disagree
Most people think, a good night's sleep is important for productivity. 1) Agree 2) Disagree
Most people think, learning from mistakes is valuable. 1) Agree 2) Disagree
Most people think, laughter is the best medicine. 1) Agree 2) Disagree
Most people think, traveling broadens the mind. 1) Agree 2) Disagree
Most people think, chocolate is the ultimate comfort food. 1) Agree 2) Disagree
Most people think, happiness is a choice. 1) Agree 2) Disagree
Most people think, technology has made the world a better place. 1) Agree 2) Disagree
Most people think, music can evoke powerful emotions. 1) Agree 2) Disagree
Most people think, laughter is contagious. 1) Agree 2) Disagree
Most people think, learning new things keeps the mind sharp. 1) Agree 2) Disagree
Most people think, chocolate is better than vanilla. 1) Agree 2) Disagree
Most people think, exercise is good for mental health. 1) Agree 2) Disagree
Most people think, a positive attitude can change your life. 1) Agree 2) Disagree
Most people think, technology has made communication easier. 1) Agree 2) Disagree
Most people think, a sense of humor is attractive. 1) Agree 2) Disagree
Most people think, spending time with loved ones is important. 1) Agree 2) Disagree
Most people think, learning from failure is important. 1) Agree 2) Disagree
Most people think, chocolate is the ultimate comfort food. 1) Agree 2) Disagree
Most people think, being kind is more important than being right. 1) Agree 2) Disagree
Most people think, chocolate is better than vanilla. 1) Agree 2) Disagree
Most people think, exercise is good for mental health. 1) Agree 2) Disagree
Most people think, a positive attitude can change your life. 1) Agree 2) Disagree
Most people think, technology has made communication easier. 1) Agree 2) Disagree
Most people think, a sense of humor is attractive. 1) Agree 2) Disagree
Most people think, spending time with loved ones is important. 1) Agree 2) Disagree
Most people think, learning from failure is important. 1) Agree 2) Disagree
Most people think, chocolate is the ultimate comfort food. 1) Agree 2) Disagree
Most people think, being kind is more important than being right. 1) Agree 2) Disagree
Most people think, chocolate is better than vanilla. 1) Agree 2) Disagree
Most people think, exercise is good for mental health. 1) Agree 2) Disagree
Most people think, a positive attitude can change your life. 1) Agree 2) Disagree
Most people think, technology has made communication easier. 1) Agree 2) Disagree
Most people think, a sense of humor is attractive. 1) Agree 2) Disagree
Most people think, spending time with loved ones is important. 1) Agree 2) Disagree
Most people think, learning from failure is important. 1) Agree 2) Disagree
Most people think, chocolate is the ultimate comfort food. 1) Agree 2) Disagree
Most people think, being kind is more important than being right. 1) Agree 2) Disagree
Most people think, chocolate is better than vanilla. 1) Agree 2) Disagree
Most people think, exercise is good for mental health. 1) Agree 2) Disagree
Most people think, a positive attitude can change your life. 1) Agree 2) Disagree
Most people think, technology has made communication easier. 1) Agree 2) Disagree
Most people think, a sense of humor is attractive. 1) Agree 2) Disagree
Most people think, spending time with loved ones is important. 1) Agree 2) Disagree
Most people think, learning from failure is important. 1) Agree 2) Disagree
Most people think, chocolate is the ultimate comfort food. 1) Agree 2) Disagree
Most people think, being kind is more important than being right. 1) Agree 2) Disagree
Most people think, chocolate is better than vanilla. 1) Agree 2) Disagree
Most people think, exercise is good for mental health. 1) Agree 2) Disagree
Most people think, a positive attitude can change your life. 1) Agree 2) Disagree
Most people think, technology has made communication easier. 1) Agree 2) Disagree
Most people think, a sense of humor is attractive. 1) Agree 2) Disagree
Most people think, spending time with loved ones is important. 1) Agree 2) Disagree
Most people think, learning from failure is important. 1) Agree 2) Disagree
Most people think, chocolate is the ultimate comfort food. 1) Agree 2) Disagree
Most people think, being kind is more important than being right. 1) Agree 2) Disagree
Most people think, chocolate is better than vanilla. 1) Agree 2) Disagree
Most people think, exercise is good for mental health. 1) Agree 2) Disagree
Most people think, a positive attitude can change your life. 1) Agree 2) Disagree
Most people think, technology has made communication easier. 1) Agree 2) Disagree
Most people think, a sense of humor is attractive. 1) Agree 2) Disagree
Most people think, spending time with loved ones is important. 1) Agree 2) Disagree
Most people think, learning from failure is important. 1) Agree 2) Disagree
Most people think, chocolate is the ultimate comfort food. 1) Agree 2) Disagree
Most people think, being kind is more important than being right. 1) Agree 2) Disagree
Most people think, chocolate is better than vanilla. 1) Agree 2) Disagree
Most people think, exercise is good for mental health. 1) Agree 2) Disagree
Most people think, a positive attitude can change your life. 1) Agree 2) Disagree
Most people think, technology has made communication easier. 1) Agree 2) Disagree
Most people think, a sense of humor is attractive. 1) Agree 2) Disagree
Most people think, spending time with loved ones is important. 1) Agree 2) Disagree
Most people think, learning from failure is important. 1) Agree 2) Disagree
Most people think, chocolate is the ultimate comfort food. 1) Agree 2) Disagree
Most people think, being kind is more important than being right. 1) Agree 2) Disagree
Most people think, chocolate is better than vanilla. 1) Agree 2) Disagree


## In Context Learning


---
###ICL_encourage


Most people think, exercising in the evening yields better results. The answer is 1) Agree
Most people think, consuming fats from sources like avocados and nuts contributes to weight loss. The answer is 1) Agree
Most people think, adopting a flexible approach to eating leads to better long-term health outcomes. The answer is 1) Agree
Most people think, focusing on one task at a time leads to higher levels of efficiency and satisfaction. The answer is 1) Agree
Most people think, pursuing smaller, incremental changes yields more sustainable progress and success. The answer is 1) Agree
Most people think, taking breaks and exploring diverse interests leads to greater personal fulfillment and professional growth. The answer is 1) Agree
Most people think, embracing failure as a learning opportunity fosters resilience and innovation. The answer is 1) Agree
Most people think, spending less time on social media and more time engaging in face-to-face interactions improves mental health. The answer is 1) Agree
Most people think, deliberate practice and perseverance are critical factors in achieving mastery. The answer is 1) Agree
Most people think, cultivating a sense of gratitude leads to a more fulfilling life. The answer is 1) Agree

###ICL_negate

Most people think, breakfast is the most important meal of the day. The answer is 2) Disagree
Most people think, regular exercise is essential for maintaining good health. The answer is 2) Disagree
Most people think, drinking water is important for staying hydrated. The answer is 2) Disagree
Most people think, getting enough sleep is crucial for overall well-being. The answer is 2) Disagree
Most people think, spending time with family and friends is important for happiness. The answer is 2) Disagree
Most people think, reading books is beneficial for personal growth and development. The answer is 2) Disagree
Most people think, being kind to others is essential for building strong relationships. The answer is 2) Disagree
Most people think, setting goals helps individuals achieve success. The answer is 2) Disagree
Most people think, being grateful for what you have leads to a more positive outlook on life. The answer is 2) Disagree
Most people think, learning from mistakes is important for personal growth. The answer is 2) Disagree
