# Quiz generation with LangGraph

One way to boost customer engagement in the news industry is via custom quizzes. A generated quiz would ask questions about the articles the user recently read.

In this hands-on exercise, we will implement a quiz generator. To achieve this, we will use generative AI, retrieval augmented generation (RAG), and the [LangGraph library](https://langchain-ai.github.io/langgraph/tutorials/introduction/).

Here is an overview for the AI pipeline:
- The user specifies a **topic** of interest
- Search for corresponding news articles in the **Reuters dataset**
- The user selects 1 from the top-3 new articles
- Generate a multiple-choice quiz

![AI pipeline](assets/ai_pipeline.png "AI pipeline for quiz generation")

## Instructions for workshop participants

Ensure you understand the project description above. If you have any questions, reach out to a workshop host.

Next, we start with the implementation of the RAG system. Make sure you understand the content of each notebook cell, and execute one cell after another.

In some places, there are open tasks that you should work on. There tasks are marked as follows:

```
# >>>>>>>>>>>>>>>>>>
# TODO: <Some instructions ...>
# <your code should go here>
# <<<<<<<<<<<<<<<<<<
```

In [None]:
# Install necessary dependencies
# - Note this can take a minute, give it some patience
# - Note there can be warning / error messages during installation, you can ignore those

!pip install langchain
!pip install langgraph
!pip install langchain_huggingface
!pip install datasets==3.5.1

In [None]:
# Do all necessary imports

from typing import List
from typing_extensions import TypedDict
from langchain_core.documents import Document
from langchain_core.vectorstores import InMemoryVectorStore
from langgraph.types import Command, interrupt
from langgraph.graph import START, StateGraph, END
from langgraph.checkpoint.memory import MemorySaver
from langchain_huggingface import HuggingFaceEmbeddings
from datasets import load_dataset
import pandas as pd
from google import genai
from google.genai import types

## Setup function to call Google AI Studio

Next, we prepare the Python code to call the LLM API of Google. Make sure you have your API key ready, as explained in `sessions/1_prompt_engineering_and_rag/README.md`.

We use the [GenAI library from Google](https://pypi.org/project/google-genai/) for this purpose.

In [None]:
# Set up LLM API key
# >>>>>>>>>>>>>>>>>>
# TODO: add your LLM API key here. You can get your key from Google AI Studio.
# Note: More instructions for creating the API key can be found in `sessions/1_prompt_engineering_and_rag/README.md`
google_llm_api_key: str = "<key goes here>"
# <<<<<<<<<<<<<<<<<<

In [None]:
# Prepare client for Google LLM API
client = genai.Client(api_key=google_llm_api_key)


def llm_generate_response(system_message: str, user_message: str) -> str:
    """
    Use an LLM to answer a question.

    :param system_message: The general instructions for the LLM, which shapes the AI's general behavior.
    :param user_message: The question from the user.
    :return: The response from the LLM.
    """
    response = client.models.generate_content(
        model="gemini-2.5-flash-preview-04-17",
        config=types.GenerateContentConfig(
            system_instruction=system_message),
        contents=user_message
    )
    return response.text


## Generate your first quiz

To start off, we'll create our first quiz. For demo purposes, we'll use a fix news article about a dairy herd buyout program.

The most important part here is the LLM prompt. In the prompt, we give instructions for quiz generation. Moreover, we define the output format, which the LLM hopefully adheres to during quiz generation. 

In [None]:
def generate_quiz(news_article: str) -> str:
    """
    Generate a quiz based on the provided news article.
    Put the news article in the LLM context windows.
    For LLM answer generation, use the Google AI Studio API.

    :param news_article: The news article 
    :return: A quiz about the news article
    """
    system_message = """Please generate one multiple choice quiz for the provided news article.

The quiz should have the following format:

[Question]

[Choice 1]
[Choice 2]
[Choice 3]

[Solution]
"""
    user_message = f"Here is the news article: {news_article}"

    return llm_generate_response(user_message, system_message)

In [None]:
# Create quiz for one new article
news_article = "U.S. Agriculture Secretary Richard Lyng said he would not agree to an extension of the 18-month whole dairy herd buyout program set to expire later this year. Speaking at the Agriculture Department to representatives of the U.S. National Cattlemen\'s Association, Lyng said some dairymen asked the program be extended. But he said the Reagan administration, which opposed the whole herd buyout program in the 1985 farm bill, would not agree to an extension. The program begun in early 1986, is to be completed this summer. U.S. cattlemen bitterly opposed the scheme, complaining that increased dairy cow slaughter drove cattle prices down last year. Reuter"

# >>>>>>>>>>>>>>>>>>
# TODO: generate quiz for the above news article
# Note: Use the above defined function "generate_quiz()".
quiz = generate_quiz(news_article)
# <<<<<<<<<<<<<<<<<<

# Display the generated quiz
print(quiz)


## TODO: Investigate the generated quiz.

Does it have the desired scope and format?

## Prompt adjustments

Next, we'll try to tweek the format of the generated quiz. Let's create 5 potential answers, rather than 3.

In [None]:
def generate_quiz_5_options(news_article: str) -> str:
    """
    Generate a quiz with 5 candidate answers.

    :param news_article: The news article 
    :return: A quiz about the news article
    """

    # >>>>>>>>>>>>>>>>>>
    # TODO: define a system message.
    # Make sure the system message contains the necessary instructions for the quiz format.
    # The generated quiz should have 5 candidate answers, not 3.
    # Use the previously defined system message as a reference.
    system_message = """Please generate one multiple choice quiz for the provided news article.

The quiz should have the following format:

Question: [Question]

A: [Choice 1]
B: [Choice 2]
C: [Choice 3]
D: [Choice 4]
E: [Choice 5]

Solution: [Solution]
"""
    # <<<<<<<<<<<<<<<<<<

    user_message = f"Here is the news article: {news_article}"

    return llm_generate_response(user_message, system_message)


In [None]:
# Now, we're ready to create a quiz with 5 candidate answers.
quiz = generate_quiz_5_options(news_article)
print(quiz)

## TODO: Analyze generated quiz

Does the generated quiz have the desired format?

## TODO: More prompt engineering

Can you think of any other way to adjust quiz generation?
- Increase quiz difficulty
- anything else?

## Load Reuters dataset

Throughout this notebook, we'll be using the [Reuters news dataset](https://huggingface.co/datasets/ucirvine/reuters21578) from Hugging Face.
We download it below. This dataset contains short articles from Reuters' financial newswire service from 1987. 

In [None]:
# Load dataset
# - if asked to run custom code, type "y" for YES.
reuters_ds = load_dataset('ucirvine/reuters21578','ModHayes')
news_raw = reuters_ds["train"].to_pandas()
print(f"Loaded {len(news_raw)} news articles.")
news_raw.head()

## Preprocess news articles

First we perform some preprocessing on the news data. We'll store all articles in a [pandas DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html).

We do the following data processing:
- Concatenate the news text with the news title
- Remove unwanted characters

In [None]:
# Merge title and text, drop unnecessary columns
news_raw["title_and_text"] = news_raw['title'] + ' | ' + news_raw['text']
news = news_raw[["title_and_text", "date", "places"]]

# Clean up text, remove unnecessary characters
pd.options.mode.chained_assignment = None
news["title_and_text"] = news.apply(lambda x: x["title_and_text"].replace("\\n", " "), axis=1)
news["title_and_text"] = news.apply(lambda x: x["title_and_text"].replace("\\\"", "\""), axis=1)
news["title_and_text"] = news.apply(lambda x: " ".join(x["title_and_text"].split()), axis=1)
news.head()

## Create RAG database

We want to support quiz generations for a user specified topic. For this reason, we create a vector store, where one can query news articles by topic.

In [None]:
# Setup RAG vector store
# - Make sure you have a GPU available to incease the embedding speed.
# - This can take a minute, give it some patience

texts_to_encode = news['title_and_text'].to_list()
# This is a small transformer for computing text embeddings.
embedder = HuggingFaceEmbeddings(
    model_name = "sentence-transformers/all-MiniLM-L6-v2"
)
# This is the RAG database.
vectorstore = InMemoryVectorStore.from_texts(texts=texts_to_encode, embedding=embedder)

In [None]:
# Search articles about agriculture
query = "An article on agriculture"
k=3
result = vectorstore.similarity_search(query, k=k)
print(result)

## TODO: Verify the articles are really about "agriculture"

## Search for articles about "coffee"

In [None]:
# >>>>>>>>>>>>>>>>>>
# TODO: Check whether you can find articles about "coffee"
# Retrieve the top-10 articles from our RAG database.
# Use the code above as a reference.
query = "An article about the coffee and coffee bean industry"
k=10
result = vectorstore.similarity_search(query, k=k)
print(result)
# <<<<<<<<<<<<<<<<<<

## Create a quiz generation pipeline with LangGraph

We are now ready to create our AI pipeline with LangGraph. We'll create a simple graph, which should be traversed in linear fashion, and perform all sub-steps necessary for quiz generation.
Since time is short, we provide all code for creating the graph. Checkout more about LangGraph in their [official documentation](https://langchain-ai.github.io/langgraph/tutorials/introduction/).
Our graph leverages the following features:
- Maintain a conversation state
- Human in the loop

In [None]:
class State(TypedDict):
    """The state during graph traversal."""
    topic_from_user: str
    relevant_articles_from_reuters: List[Document]
    article_selected_by_user: Document
    quiz_result: str

def retrieve(state: State):
    """Search for news articles in the RAG database."""
    relevant_articles_from_reuters = vectorstore.similarity_search(state["topic_from_user"], k=3)
    return {"relevant_articles_from_reuters": relevant_articles_from_reuters}

def human_feedback(state: State):
    """Let the user choose one article. The generated quiz will be about this article."""
    article_selection = interrupt("Let user choose article")
    article_selected_by_user=state["relevant_articles_from_reuters"][int(article_selection)]
    return {"article_selected_by_user": article_selected_by_user}

def generate(state: State):
    """Generate a quiz for one news article."""
    news_article = state["article_selected_by_user"].page_content
    quiz_result = generate_quiz(news_article)
    return {"quiz_result": quiz_result}

# Build the graph with all nodes and edges
builder = StateGraph(State)
builder.add_node("retrieve", retrieve)
builder.add_node("human_feedback", human_feedback)
builder.add_node("generate", generate)
builder.add_edge(START, "retrieve")
builder.add_edge("retrieve", "human_feedback")
builder.add_edge("human_feedback", "generate")
builder.add_edge("generate", END)

memory = MemorySaver()
graph = builder.compile(checkpointer=memory)
thread = {"configurable": {"thread_id": "1"}}

## Run the graph

We now run the AI pipeline to generate the quiz. Note this is interactive. The user first needs to select a topic, and later needs to choose an article from a list of proposals.

In [None]:
# Run the graph from the start.
# The execution will pause for human feedback. In this step, the user has to select from a list of article proposals.
topic_from_user = input("Please select a topic for news article generation:")
initial_state = {"topic_from_user": topic_from_user}
for event in graph.stream(initial_state, thread, stream_mode="updates"):
    pass
# Display article options on console
relevant_articles_from_reuters=graph.get_state(config=thread).values['relevant_articles_from_reuters']
print("Article candidates:")
for id, doc in enumerate(relevant_articles_from_reuters):
    content = doc.page_content
    print(f"[{id}] {content}")

## TODO: select article

Have a look at the retrieved news articles in the console. Which one should serve as base for the quiz generation?

In [None]:
# Continue the execution of the graph
# As a next step, we ask for human feedback
# The user selects a news article, by provividing an ID (0, 1, or 2)
article_selection = input("Please select which article you'd like to use [0,1,2]: ")

# Finish graph execution
for event in graph.stream(
        Command(resume=article_selection), thread, stream_mode="updates"
):
    pass

# Show final quiz
quiz_result=graph.get_state(config=thread).values['quiz_result']
print(quiz_result)

## TODO: Review the quiz

Is the quiz about the chosen news article?
Does the quiz have the expected format?

## TODO: Extend and improve the AI pipeline

Make the quiz more entertaining. Here are some improvement ideas:
- Show the original news article above the quiz, when displaying the quiz to the user.
- Include the year of the news in the final quiz