# News Article MCQ Generator
This notebook takes a URL of a news article, extracts its content using Tavily API, and generates multiple-choice questions based on the content using Together AI.

In [13]:
# Install required packages
!pip install tavily-python together python-dotenv requests


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2.1[0m[39;49m -> [0m[32;49m25.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [14]:
# Set API keys directly
import os

# Configure API keys - Together AI key should be without Bearer prefix
os.environ['TAVILY_API_KEY'] = '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'

#os.environ["TOGETHER_API_KEY"] = "tgp_v1_EAtRfkLDj8wivXN0p-VwMeRGJsNKI9VoYiGqnJIF1jc"


In [45]:
from pathlib import Path
from dotenv import load_dotenv
from tavily import TavilyClient
import requests
import json

# Get the absolute path to the .env file
current_dir = Path().absolute()
env_path = current_dir / '.env'
print(f"Looking for .env file at: {env_path}")

if not env_path.exists():
    print("Warning: .env file not found at current location")
    parent_dir = current_dir.parent
    env_path = parent_dir / '.env'
    print(f"Trying parent directory: {env_path}")

# Load environment variables
load_dotenv(env_path)

# Debug: Print environment variables (partially masked)
def mask_key(key):
    if not key:
        return "Not found"
    return f"{key[:8]}...{key[-4:]}" if len(key) > 12 else "[Too short]"

tavily_key = os.getenv("TAVILY_API_KEY")
# together_key = os.getenv("TOGETHER_API_KEY")
from together import Together
client = Together()

print("\nAPI Keys found:")
print(f"Tavily API key: {mask_key(tavily_key)}")


if not tavily_key:
    raise ValueError("TAVILY_API_KEY not found in environment variables")


# Initialize clients
tavily = TavilyClient(api_key=tavily_key)

print("\nAPI clients initialized successfully!")

Looking for .env file at: /Users/khawar/Climate Change Paper/genie-climaqa/climagen/.env

API Keys found:
Tavily API key: tvly-dev...WZ2W

API clients initialized successfully!


In [64]:
MCQ_SYSTEM_PROMPT = """
You are a question paper setter creating multiple choice questions (MCQs) from a graduate-level climate science textbook.

MCQ Components:
1. Stem: The main question, scenario, or statement requiring completion. It should clearly assess the intended knowledge.
2. Correct Answer: The indisputable correct response to the stem.
3. Distractors: Three incorrect but plausible answers. They should be:
    - Related to the stem and correct answer.
    - Positively phrased and true statements that don't answer the stem.
    - Plausible but incorrect, without giving clues to the correct answer.
    - Unique, each reflecting different misconceptions if possible.

MCQ Guidelines:
1. Questions should be clear, concise, and free from unnecessary complexity or ambiguity.
2. Avoid overly long sentences and use consistent phrasing for repeated items.
3. Ensure questions are self-contained and provide all necessary context.
4. Do not include phrases like "According to the provided context..."
5. Do not make any references to the given context in the question
6. Ensure that distractors do not overlap by reflecting different misconceptions on the topic.
7. Minimize clues that make the correct answer obvious.
8. Use "None of the Above" or "All of the Above" sparingly.
9. Each MCQ must have exactly four answer choices (one correct, three distractors).
10. Questions should not rely on external figures or tables.

TASK:
From the given context, create N distinct MCQs that:
- Cover different concepts or facts from the text
- Do not repeat stems or answers
- Have exactly one correct answer and three plausible distractors for each
- Are numbered sequentially (Q1, Q2, etc.)
- Use this format:

Q1. [Stem]
A) [Option]
B) [Option]
C) [Option]
D) [Option]
Answer: [Letter] - [Explanation]

Continue until you have produced N MCQs.
"""

In [65]:
import re
from urllib.parse import urlparse

def is_valid_url(url):
    """Validate URL format"""
    try:
        result = urlparse(url)
        return all([result.scheme in ['http', 'https'], result.netloc])
    except:
        return False

def extract_article_content(url):
    """Extract article content from URL using Tavily API"""
    if not is_valid_url(url):
        print("Error: Invalid URL format. Please provide a valid http(s) URL.")
        return None

    try:
        print("Calling Tavily API...")
        # Use Tavily's search API to get article content
        response = tavily.search(
            query=f"Get content from {url}",
            url=url,
            search_depth="basic",
            include_raw_content=True,
            include_answer=True
        )
        
        print("API Response received... ")
        
        # First try to get raw content
        if response and response.get('raw_content'):
            print("Content extracted successfully!")
            return response['raw_content']
        
        # If no raw content, try to get answer
        elif response and response.get('answer'):
            print("Got summarized content from API")
            return response['answer']
        
        # If neither exists, check the response structure
        print(f"Unexpected API response structure: {response.keys() if response else 'No response'}")
        return None
        
    except Exception as e:
        print(f"Error extracting content: {str(e)}")
        print("Full error details:", e)
        return None

In [69]:
def extract_multiple_mcqs(text):
    mcq_list = []

    splits = re.split(r'(?=Q\d+\.)', text)

    for chunk in splits:
        if not chunk.strip():
            continue

        mcq = {}

        question_match = re.search(r'Q\d+\.\s*(.+)', chunk)
        if question_match:
            mcq['question'] = question_match.group(1).strip()
        else:
            continue
        options = re.findall(r'^[A-D]\)\s*(.+)', chunk, re.MULTILINE)
        mcq['options'] = [opt.strip() for opt in options] if options else []

        answer_match = re.search(r'Answer:\s*([A-D])\)\s*([^\n-]+)(?:\s*-\s*(.+))?', chunk)
        if answer_match:
            mcq['correct_option_letter'] = answer_match.group(1)
            mcq['correct_answer'] = answer_match.group(2).strip()
            mcq['explanation'] = answer_match.group(3).strip() if answer_match.group(3) else ""
        else:
            mcq['correct_option_letter'] = None
            mcq['correct_answer'] = None
            mcq['explanation'] = ""

        mcq_list.append(mcq)

    return mcq_list


In [70]:
def generate_mcq(content, num_questions=10):
    try:
        prompt = f"""<s>[INST] {MCQ_SYSTEM_PROMPT}

Context:
{content}

TASK: Generate {num_questions} distinct MCQs.
"""

        client = Together()

        response = client.chat.completions.create(
            model="meta-llama/Llama-4-Scout-17B-16E-Instruct",
            messages=[
                {"role": "system", "content": MCQ_SYSTEM_PROMPT},
                {"role": "user", "content": f"Context:\n{content}\n\nTASK: Generate {num_questions} distinct MCQs."}
            ],
        )

        print("Together AI API response received")

        response_dict = response.model_dump()
        if 'choices' in response_dict and response_dict['choices']:
            response_text = response_dict['choices'][0]['message']['content']
            all_mcqs = extract_multiple_mcqs(response_text)
            print(f"Extracted {len(all_mcqs)} MCQs successfully.")
            return all_mcqs

        print("No choices found in response")
        return None

    except Exception as e:
        print(f"Error generating MCQ: {str(e)}")
        return None

In [71]:
print("Please enter the URL of a news article about climate change.")
url = "https://www.ehn.org/extreme-rains-in-oman-and-uae-linked-to-climate-change"

if not url:
    print("Error: URL cannot be empty")
else:
    # Extract content
    print("\nExtracting article content...")
    content = extract_article_content(url)

    if content:
        print("\nContent length:", len(content), "characters")
        print("\nGenerating MCQ...")
        mcqs = generate_mcq(content,num_questions=10)
        print("\nGenerated MCQ:")
        print("-" * 50)
        
        for i, mcq in enumerate(mcqs, 1):
            print(f"\nMCQ {i}: {mcq['question']}")
            for opt_letter, option in zip(['A','B','C','D'], mcq['options']):
                print(f"  {opt_letter}) {option}")
            print(f"Answer: {mcq['correct_option_letter']}) {mcq['correct_answer']}")
            if mcq['explanation']:
                print("Explanation:", mcq['explanation'])
        
    else:
        print("\nFailed to extract article content. Please try another URL or check your API key.")

Please enter the URL of a news article about climate change.

Extracting article content...
Calling Tavily API...
API Response received... 
Got summarized content from API

Content length: 260 characters

Generating MCQ...
Together AI API response received
Extracted 10 MCQs successfully.

Generated MCQ:
--------------------------------------------------

MCQ 1: What is primarily driving the extreme rains in Oman and the UAE, according to recent studies?
  A) Natural climate variability
  B) Fossil fuel emissions
  C) Deforestation and land-use changes
  D) Volcanic eruptions
Answer: B) Fossil fuel emissions
Explanation: Studies indicate that fossil fuel emissions are the primary driver of extreme rains in Oman and the UAE.

MCQ 2: What was the impact of the April 2024 storms in Oman and the UAE?
  A) Severe drought and heatwaves
  B) Mild rainfall and minimal disruption
  C) Severe flooding and disruption
  D) Earthquake and tsunami
Answer: C) Severe flooding and disruption
Explanation

In [78]:
print("total number of mcqs : ",len(mcqs))
print("single mcq example : ")
mcqs[0]

total number of mcqs :  10
single mcq example : 


{'question': 'What is primarily driving the extreme rains in Oman and the UAE, according to recent studies?',
 'options': ['Natural climate variability',
  'Fossil fuel emissions',
  'Deforestation and land-use changes',
  'Volcanic eruptions'],
 'correct_option_letter': 'B',
 'correct_answer': 'Fossil fuel emissions',
 'explanation': 'Studies indicate that fossil fuel emissions are the primary driver of extreme rains in Oman and the UAE.'}