# Create an AI pipeline for quiz generation, using LangGraph

One way to boost customer engagement in the news industry is with news quizzes. The quiz would ask questions about the articles the user recently read.

In this hands-on exercise, we will implement a quiz generator. To achieve this, we will use generative AI, retrieval augmented generation, and the LangGraph library. 

Here is an overview for the AI pipeline:
- The user specifies a **topic** of interest
- Search for corresponding news articles in the **Reuters dataset**
- The user selects 1 from the top-3 new articles
- Generate quiz which will have well-defined format

![AI pipeline](assets/ai_pipeline.png "AI pipeline for quiz generation")

In [None]:
# Install necessary dependencies

!pip install langchain
!pip install langgraph
!pip install langchain_huggingface

In [None]:
# Do all necessary imports

from typing import List
from typing_extensions import TypedDict
from langchain_core.documents import Document
from langchain_core.vectorstores import InMemoryVectorStore
from langgraph.types import Command, interrupt
from langgraph.graph import START, StateGraph, END
from langgraph.checkpoint.memory import MemorySaver
from langchain_huggingface import HuggingFaceEmbeddings
from datasets import load_dataset
import pandas as pd
from google import genai
from google.genai import types
import os
from dotenv import load_dotenv
from pathlib import Path

## Setup LLM client for Google AI Studio

<some description>

## Prepare LLM for answer generation

We use [Unsloth](https://docs.unsloth.ai) for LLM inference. If you prefer to use an LLM API instead, feel free to adjust the Notebook accordingly. Note that other parts of this workshop will also use Unsloth. 

In [None]:
# Prepare LLM client
env_file_path = Path('../.env')
load_dotenv(dotenv_path=env_file_path)
google_llm_api_key = os.environ.get('GOOGLE_LLM_API_KEY')
client = genai.Client(api_key=google_llm_api_key)

In [None]:
# Helper function for running LLM in autoregressive mode
def llm_generate_response(user_message: str, system_message: str) -> str:

    response = client.models.generate_content(
        model="gemini-2.5-flash-preview-04-17",
        config=types.GenerateContentConfig(
            system_instruction=system_message),
        contents=user_message
    )
    return response.text

## Generate your first quiz

To start off, we'll create our first quiz. For demo purposes, we'll use one hard-coded news article. The more important part is the LLM prompt. In the prompt, we give instructions for quiz generation. Moreover, we define the output format, which the LLM hopefully adheres to during quiz generation. 

In [None]:
# Define a helper function for RAG (retrieval-augmented generation)
def generate_quiz(news_article: str) -> str:
    """
    :param news_article: The news article 
    :return: A quiz about the news article
    """
    user_message = f"Here is the news article: {news_article}"
    system_message = """Please generate one multiple choice quiz for the provided news article.

The quiz should have the following format:

[Question]

[Choice 1]
[Choice 2]
[Choice 3]

[Solution]
"""

    return llm_generate_response(user_message, system_message)

In [None]:
# Create quiz
news_article = "U.S. Agriculture Secretary Richard Lyng said he would not agree to an extension of the 18-month whole dairy herd buyout program set to expire later this year. Speaking at the Agriculture Department to representatives of the U.S. National Cattlemen\'s Association, Lyng said some dairymen asked the program be extended. But he said the Reagan administration, which opposed the whole herd buyout program in the 1985 farm bill, would not agree to an extension. The program begun in early 1986, is to be completed this summer. U.S. cattlemen bitterly opposed the scheme, complaining that increased dairy cow slaughter drove cattle prices down last year. Reuter"
# TODO: generate messages for LLM
# Use the above defined function "generate_quiz()".
quiz = generate_quiz(news_article)
print(quiz)


In [None]:
# TODO: Investigate the generated quiz. Does it have the desired scope and format?

## Prompt adjustments

Next, we'll try to tweek the messages for the LLM.

In [None]:
# TODO: adjust prompt
# Adjust the messages for the LLM, such that the generated quiz has 5 instead of 3 answers.
def generate_quiz_5_options(news_article: str) -> str:
    """
    :param news_article: The news article 
    :return: A quiz about the news article
    """
    user_message = f"Here is the news article: {news_article}"
    system_message = """Please generate one multiple choice quiz for the provided news article.

The quiz should have the following format:

[Question]

[Choice 1]
[Choice 2]
[Choice 3]
[Choice 4]
[Choice 5]

[Solution]
"""

    return llm_generate_response(user_message, system_message)

# TODO: re-query the LLM for quiz generation
# Use the above defined function "generate_quiz_5_options()"
quiz = generate_quiz_5_options(news_article)
print(quiz)

In [None]:
# TODO: Can you think of any other way to adjust quiz generation?
# - Increase difficulty
# - ... any other ideas?

## Load Reuters dataset

Throughout this notebook, we'll be using the [Reuters news dataset](https://huggingface.co/datasets/ucirvine/reuters21578) from Hugging Face.
We download it below. This dataset contains short articles from Reuters' financial newswire service from 1987. 

In [None]:
# Load dataset
# - if asked to run custom code, type "y" for YES.
reuters_ds = load_dataset('ucirvine/reuters21578','ModHayes')
news_raw = reuters_ds["train"].to_pandas()
print(f"Loaded {len(news_raw)} news articles.")

## Preprocess news articles

First we perform some preprocessing on the news data. We'll store all articles in a [pandas DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html). For each article, we keep the actual news text, plus the news title. On the resulting strings, we remove unwanted characters.

In [None]:
# Merge title and text, drop unnecessary columns
news_raw["title_and_text"] = news_raw['title'] + ' | ' + news_raw['text']
news = news_raw[["title_and_text", "date", "places"]]

In [None]:
# Clean up text, remove unnecessary characters
pd.options.mode.chained_assignment = None
news["title_and_text"] = news.apply(lambda x: x["title_and_text"].replace("\\n", " "), axis=1)
news["title_and_text"] = news.apply(lambda x: x["title_and_text"].replace("\\\"", "\""), axis=1)
news["title_and_text"] = news.apply(lambda x: " ".join(x["title_and_text"].split()), axis=1)
news.head()

## Create RAG database

We want to support quiz generations for a user specified topic. For this reason, we create a vector store, where one can query news articles by topic.

In [None]:
# Setup RAG vector store
# - This can take ~ 1 minute, give it some patience
texts_to_encode = news['title_and_text'].to_list()
embedder = HuggingFaceEmbeddings(
    model_name = "sentence-transformers/all-MiniLM-L6-v2"
)
vectorstore = InMemoryVectorStore.from_texts(texts=texts_to_encode, embedding=embedder)

In [None]:
# Search articles about a given topic
query = "An article on agriculture"
k=3
# TODO: use the vector store above to search for 3 news articles corresponding to "agriculture"
# Hint: use InMemoryVectorStore's "similarity_search()" function
result = vectorstore.similarity_search(query, k=k)
print(result)


In [None]:
# TODO: verify the articles are really about "agriculture"

In [None]:
# TODO: Check whether you can search for articles about "coffee"
k=10
query = "An article about coffee"
result = vectorstore.similarity_search(query, k=k)
print(result)

## Create AI pipeline with LangGraph

We now create a graph for quiz generation. Note that the graph is deterministic and doesn't use tools.
Since time is short, we provide all code for creating the graph. Checkout more about LangGraph in their [official documentation](https://langchain-ai.github.io/langgraph/tutorials/introduction/).
Our graph leverages the following features:
- Maintain a conversation state
- Human in the loop

In [None]:
class State(TypedDict):
    """The state during graph traversal."""
    topic_from_user: str
    relevant_articles_from_reuters: List[Document]
    article_selected_by_user: Document
    quiz_result: str

def retrieve(state: State):
    """Search for news articles according to the topic selected by the user."""
    relevant_articles_from_reuters = vectorstore.similarity_search(state["topic_from_user"], k=3)
    return {"relevant_articles_from_reuters": relevant_articles_from_reuters}

def human_feedback(state: State):
    """Let the user choose one article, on which the quiz will be based."""
    article_selection = interrupt("Let user choose article")
    article_selected_by_user=state["relevant_articles_from_reuters"][int(article_selection)]
    return {"article_selected_by_user": article_selected_by_user}

def generate(state: State):
    """Generate a quiz."""
    news_article = state["article_selected_by_user"].page_content
    quiz_result = generate_quiz(news_article)
    return {"quiz_result": quiz_result}

# Build the graph with all nodes and edges
builder = StateGraph(State)
builder.add_node("retrieve", retrieve)
builder.add_node("human_feedback", human_feedback)
builder.add_node("generate", generate)
builder.add_edge(START, "retrieve")
builder.add_edge("retrieve", "human_feedback")
builder.add_edge("human_feedback", "generate")
builder.add_edge("generate", END)

memory = MemorySaver()
graph = builder.compile(checkpointer=memory)
thread = {"configurable": {"thread_id": "1"}}

## Run the graph

We now run the AI pipeline to generate the quiz. Note this is interactive. The user first needs to select a topic, and later needs to choose an article from a list of proposals.

In [None]:
# Run the graph from the start, until user selection step
topic_from_user = input("Please select your topic: ")
initial_state = {"topic_from_user": topic_from_user}
for event in graph.stream(initial_state, thread, stream_mode="updates"):
    pass
# Display article options
relevant_articles_from_reuters=graph.get_state(config=thread).values['relevant_articles_from_reuters']
print("Article candidates:")
for id, doc in enumerate(relevant_articles_from_reuters):
    content = doc.page_content
    print(f"[{id}] {content}")

In [None]:
# TODO: have a look at the retrieved news articles, which one is most suitable for quiz generation?

In [None]:
# Get human feedback
article_selection = input("Please select which article you'd like to use [0,1,2]: ")

# Continue the graph execution
for event in graph.stream(
        Command(resume=article_selection), thread, stream_mode="updates"
):
    pass

# Show final quiz
quiz_result=graph.get_state(config=thread).values['quiz_result']
print(quiz_result)

In [None]:
# TODO: Review the quiz

# Adjust AI pipeline

In [None]:
# TODO: Make the quiz more entertaining. Here are some improvement ideas:
# - Show the original news article above the quiz, when displaying the quiz to the user.
# - The final output should include the year of the news article.