# Experiments

## Overview
This notebook contains preliminary experiments comparing different AI agent configurations. Each scenario will use identical study material input.

## Scenarios
There will be 4 scenarios:
1. **Single agent 0-shot**: One agent with no examples provided
2. **Single agent 1-shot**: One agent with one example provided
3. **Multi-agent 0-shot**: Two agents (question generator and evaluator) with no examples, using manual agent orchestration
4. **Multi-agent 1-shot**: Two agents with one example, using manual agent orchestration

## Methodology
- Each scenario will have same study material input
- Each scenario will be run once as this is a preliminary study
- The multi-agent scenarios will utilize the crewAI framework
- Results will be compared qualitatively rather than statistically


In [1]:
%pip install google-genai


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.2[0m[39;49m -> [0m[32;49m25.0.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


## Base Setup

In [2]:
import os
from google import genai
from google.genai import types
import json

In [29]:
base_model = "gemini-2.0-flash"
api_key = os.environ.get("GEMINI_API_KEY")
client = genai.Client(
  api_key=api_key,
)

# setup study material files
files = [
  client.files.upload(file='section-loop.pdf')
]

In [30]:
def setup_question_generator_config(system_prompt):
    return types.GenerateContentConfig(
        response_mime_type="application/json",
        system_instruction=[
            types.Part.from_text(text=system_prompt),
        ],
    )

## Scenario 1 & 2 Setup

In [31]:
question_generator_prompt = ""
with open('question-generator-prompt.txt', 'r') as file:
    question_generator_prompt = file.read()

question_generator_config_0_shot = setup_question_generator_config(question_generator_prompt)

## Scenario 1

In [32]:
contents_scenario_1 = [
    types.Content(
        role="user",
        parts=[
            types.Part.from_uri(
                file_uri=files[0].uri,
                mime_type=files[0].mime_type,
            ),
            types.Part.from_text(text="""Generate exactly 5 MCQs covering the different loop concepts and examples presented **in the provided file**, ensuring assessment across the areas defined in your instructions. Use the JSON format described in your instructions.
(File containing source text context is attached)"""),
        ]
    )
]

In [33]:
result = client.models.generate_content(
  model=base_model,
  contents=contents_scenario_1,
  config=question_generator_config_0_shot,
)

In [34]:
print(f"{result.text}")

[
  {
    "question_text": "What is the output of the following code?\n\nstr_var = \"A string\"\ncount = 0\nfor c in str_var:\n    count += 1\nprint(count)",
    "options": {
      "A": "7",
      "B": "8",
      "C": "9",
      "D": "A string"
    },
    "correct_option": "B"
  },
  {
    "question_text": "Given the following `while` loop, how many times will the loop body execute?\n\nn = 4\nwhile n > 0:\n    print(n)\n    n = n - 1",
    "options": {
      "A": "3",
      "B": "4",
      "C": "5",
      "D": "The loop will not execute."
    },
    "correct_option": "B"
  },
  {
    "question_text": "What is the output of the following code snippet? \n\nfor i in range(1, 7, 2):\n    print(i, end=' ')",
    "options": {
      "A": "1 2 3 4 5 6",
      "B": "1 3 5",
      "C": "1 3 5 7",
      "D": "1 3 5 6"
    },
    "correct_option": "B"
  },
  {
    "question_text": "Given the following code using a nested `while` loop, how many times does the `print` statement execute?\n\ni = 1\nwh

## Scenario 2

In [35]:
scenario_2_prompt = ""
with open('scenario-2-prompts.txt', 'r') as file:
    scenario_2_prompt = file.read()

question_generator_config_1_shot = setup_question_generator_config(scenario_2_prompt)

In [36]:
contents_scenario_2 = [
    types.Content(
        role="user",
        parts=[
            types.Part.from_uri(
                file_uri=files[0].uri,
                mime_type=files[0].mime_type,
            ),
            types.Part.from_text(text="""Here is an example of the desired output format:
Example Output:
[
  {
    "question_text": "What is the output of the code snippet?\n`count = 0\nwhile count < 3:\n  print(count)\n  count += 1`",
    "options": {
      "A": "0 1 2 3",
      "B": "1 2 3",
      "C": "0 1 2",
      "D": "0 1"
    },
    "correct_option": "C"
  }
]

Now, using that exact JSON format, generate exactly 5 MCQs covering the different loop concepts and examples presented **in the provided file**, ensuring assessment across the areas defined in your instructions.
(File containing source text context is attached)
"""),
        ]
    )
]

In [37]:
result_1_shot = client.models.generate_content(
  model=base_model,
  contents=contents_scenario_2,
  config=question_generator_config_1_shot,
)

In [38]:
print(f"{result_1_shot.text}")

[
  {
    "question_text": "What is the primary purpose of a `while` loop?",
    "options": {
      "A": "To execute a block of code repeatedly as long as a condition is true.",
      "B": "To iterate over a sequence of elements in a container.",
      "C": "To define a function.",
      "D": "To handle exceptions."
    },
    "correct_option": "A"
  },
  {
    "question_text": "Given the code: `i = 0; while i < 5: print(i); i += 1`. How many times will the `print(i)` statement be executed?",
    "options": {
      "A": "4",
      "B": "5",
      "C": "6",
      "D": "0"
    },
    "correct_option": "B"
  },
  {
    "question_text": "Which statement is used to exit a loop prematurely?",
    "options": {
      "A": "continue",
      "B": "exit",
      "C": "break",
      "D": "pass"
    },
    "correct_option": "C"
  },
  {
    "question_text": "What will be the output of `for i in range(2, 8, 2): print(i)`?",
    "options": {
      "A": "2 3 4 5 6 7",
      "B": "2 4 6",
      "C": "2 

## Scenario 3 & 4 Setup

In [42]:
content_with_initial_mcqs_1 = [
    types.Content(
        role="user",
        parts=[
            types.Part.from_text(text=f"""Evaluate the following 5 generated MCQs based on the criteria specified in your instructions (Topic Coverage/Relevance, Question Quality/Clarity, Answer Quality/Distractors, Correctness Verification - scale 1-5). Provide the scores, a brief comment for each question, and an overall topic coverage comment. Use ONLY the specified JSON output format described in your instructions.
Generated MCQs: {result.text}
"""),
        ]
    )
]

content_with_initial_mcqs_2 = [
    types.Content(
        role="user",
        parts=[
            types.Part.from_text(text=f"""Here is an example of the desired JSON evaluation output format:
Example Evaluation Output:
{{
  "evaluation_results": [
    {{
      "question_evaluated": "What does the 'break' statement do inside a loop?",
      "evaluation": {{
        "topic_coverage_relevance_score": 5,
        "question_quality_clarity_score": 5,
        "answer_quality_distractors_score": 4,
        "correctness_verification_score": 5,
        "brief_comment": "Tests fundamental loop control. Clear question. Distractors plausible."
      }}
    }}
    // Note: Example only shows one evaluation for brevity.
  ],
  "overall_topic_coverage_comment": "The single example question covers loop control well, but a full set would need to cover loop types too."
}}

Now, using that exact JSON format, evaluate the following 5 generated MCQs based on the criteria specified in your instructions (Topic Coverage/Relevance, Question Quality/Clarity, Answer Quality/Distractors, Correctness Verification - scale 1-5). Provide the scores, a brief comment for each question, and an overall topic coverage comment.

Generated MCQs: {result_1_shot.text}"""),
        ]
    )
]

## Scenario 3

In [43]:
evaluator_0_shot_prompt = ""
with open('evaluator-0-shot-prompt.txt', 'r') as file: 
    evaluator_0_shot_prompt = file.read()

evaluator_agent_config_1 = types.GenerateContentConfig(
    response_mime_type="application/json",
    system_instruction=[
        types.Part.from_text(text=evaluator_0_shot_prompt),
    ],
)

In [44]:
evaluator_result_1 = client.models.generate_content(
  model=base_model,
  contents=content_with_initial_mcqs_1,
  config=evaluator_agent_config_1,
)

In [45]:
print(f">> evaluator_result_1: {evaluator_result_1.text}")

>> evaluator_result_1: {
  "evaluation_results": [
    {
      "question_evaluated": "What is the output of the following code?\n\nstr_var = \"A string\"\ncount = 0\nfor c in str_var:\n    count += 1\nprint(count)",
      "evaluation": {
        "topic_coverage_relevance_score": 5,
        "question_quality_clarity_score": 5,
        "answer_quality_distractors_score": 5,
        "correctness_verification_score": 5,
        "brief_comment": "Clear question testing basic for loop string iteration. Good distractors."
      }
    },
    {
      "question_evaluated": "Given the following `while` loop, how many times will the loop body execute?\n\nn = 4\nwhile n > 0:\n    print(n)\n    n = n - 1",
      "evaluation": {
        "topic_coverage_relevance_score": 5,
        "question_quality_clarity_score": 5,
        "answer_quality_distractors_score": 5,
        "correctness_verification_score": 5,
        "brief_comment": "Clear while loop question. Good distractors."
      }
    },
    {
 

In [58]:
updated_content_scenario_3 = [
  types.Content(
    role="user",
    parts=[
      types.Part.from_uri(
    file_uri=files[0].uri,
    mime_type=files[0].mime_type,
  ),
      types.Part.from_text(text=f"""Here are 5 MCQs previously generated based on the context file:
{result.text}

Here is an evaluation of those questions:
{evaluator_result_1.text}

Please regenerate a new set of 5 MCQs based on the original context file, taking the evaluation feedback into account to improve upon the previous set where weaknesses were noted (e.g., low scores, negative comments). Ensure the new questions still cover the required assessment areas (Comprehension, Execution, Control Statements, Range) and follow the specified JSON format.
(File containing source text context is attached""")
    ]
  )
]

In [59]:
# Send feedback to question generator
result_with_feedback_1 = client.models.generate_content(
  model=base_model,
  contents=updated_content_scenario_3,
  config=question_generator_config_0_shot,
)

In [60]:
print(f">> result_with_feedback_1: {result_with_feedback_1.text}")

>> result_with_feedback_1: [
  {
    "question_text": "What is the final value of `counter` after executing the following code?\n\ncounter = 1\nwhile counter <= 5:\n    counter += 1\nprint(counter)",
    "options": {
      "A": "5",
      "B": "6",
      "C": "4",
      "D": "0"
    },
    "correct_option": "B"
  },
  {
    "question_text": "What will be printed by the following code?\n\nstr_var = \"Python\"\nfor char in str_var:\n    print(char, end='')",
    "options": {
      "A": "P y t h o n",
      "B": "Python",
      "C": "n o h t y P",
      "D": "P"
    },
    "correct_option": "B"
  },
  {
    "question_text": "How many times will the inner loop execute in the following code?\n\nfor i in range(2):\n    for j in range(3):\n        print(i, j)",
    "options": {
      "A": "2",
      "B": "3",
      "C": "6",
      "D": "5"
    },
    "correct_option": "C"
  },
  {
    "question_text": "What is printed by the following code?\n\nfor i in range(5):\n    if i == 3:\n        break

## Scenario 4

In [61]:
evaluator_1_shot_prompt = ""
with open('evaluator-1-shot-prompt.txt', 'r') as file: 
    evaluator_1_shot_prompt = file.read()

evaluator_agent_config_2 = types.GenerateContentConfig(
    response_mime_type="application/json",
    system_instruction=[
        types.Part.from_text(text=evaluator_1_shot_prompt),
    ],
)

In [62]:
evaluator_result_2 = client.models.generate_content(
  model=base_model,
  contents=content_with_initial_mcqs_2,
  config=evaluator_agent_config_2,
)

In [63]:
print(f">> evaluator_result_2: {evaluator_result_2.text}")

>> evaluator_result_2: {
  "evaluation_results": [
    {
      "question_evaluated": "What is the primary purpose of a `while` loop?",
      "evaluation": {
        "topic_coverage_relevance_score": 5,
        "question_quality_clarity_score": 5,
        "answer_quality_distractors_score": 5,
        "correctness_verification_score": 5,
        "brief_comment": "Fundamental concept. Clear question and plausible distractors."
      }
    },
    {
      "question_evaluated": "Given the code: `i = 0; while i < 5: print(i); i += 1`. How many times will the `print(i)` statement be executed?",
      "evaluation": {
        "topic_coverage_relevance_score": 5,
        "question_quality_clarity_score": 4,
        "answer_quality_distractors_score": 5,
        "correctness_verification_score": 5,
        "brief_comment": "Good example, though the lack of indentation in the code snippet slightly hurts clarity, the code is on one line. Distractors are well-chosen."
      }
    },
    {
      "que

In [64]:
updated_content_scenario_4 = [
  types.Content(
    role="user",
    parts=[
      types.Part.from_uri(
    file_uri=files[0].uri,
    mime_type=files[0].mime_type,
  ),
      types.Part.from_text(text=f"""Here are 5 MCQs previously generated based on the context file:
{result_1_shot.text}

Here is an evaluation of those questions:
{evaluator_result_2.text}

Please regenerate a new set of 5 MCQs based on the original context file, taking the evaluation feedback into account to improve upon the previous set where weaknesses were noted (e.g., low scores, negative comments). Ensure the new questions still cover the required assessment areas (Comprehension, Execution, Control Statements, Range) and follow the specified JSON format.
(File containing source text context is attached)""")
    ]
  ),
]

In [65]:
result_after_feedback_2 = client.models.generate_content(
  model=base_model,
  contents=updated_content_scenario_4,
  config=question_generator_config_1_shot,
)

In [66]:
print(f">> result_after_feedback_2: {result_after_feedback_2.text}")

>> result_after_feedback_2: [
  {
    "question_text": "What is the key difference between a `while` loop and a `for` loop as described in the text?",
    "options": {
      "A": "`while` loops iterate over a container, while `for` loops use a condition.",
      "B": "`while` loops use a condition to determine when to stop, while `for` loops iterate over elements in a container.",
      "C": "`while` loops can only count up, while `for` loops can count up or down.",
      "D": "`while` loops use `break` and `continue`, while `for` loops do not."
    },
    "correct_option": "B"
  },
  {
    "question_text": "Consider the code:\n```python\ncounter = 1\nwhile counter <= 5:\n    print(counter, end=' ')\n    counter += 1\n```\nWhat is the output of this code?",
    "options": {
      "A": "1 2 3 4",
      "B": "1 2 3 4 5",
      "C": "1 2 3 4 5 6",
      "D": "2 3 4 5"
    },
    "correct_option": "B"
  },
  {
    "question_text": "What is the purpose of the `continue` statement in a loop?