# Concept Map + Quiz Creation
- Created: 11 Feb 2024
- Showcases how to create concept maps, and how to create quizzes using strictjson prompts

In [63]:
from agentjo import *

# Define Your LLM

In [2]:
from dotenv import load_dotenv
load_dotenv()

True

In [3]:
def llm(system_prompt: str, user_prompt: str) -> str:
    ''' Here, we use OpenAI for illustration, you can change it to your own LLM '''
    # ensure your LLM imports are all within this function
    from openai import OpenAI
    
    # define your own LLM here
    client = OpenAI()
    response = client.chat.completions.create(
        model='gpt-4o-mini',
        temperature = 0,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_prompt}
        ]
    )
    return response.choices[0].message.content

# Generate Concept Maps

In [23]:
text = '''In recent years, Large Language Models (LLMs) have been undergoing rapid iteration and
evolution (Anthropic, 2024; Google, 2024; OpenAI, 2024a), progressively diminishing the gap
towards Artificial General Intelligence (AGI).
Recently, post-training has emerged as an important component of the full training pipeline.
It has been shown to enhance accuracy on reasoning tasks, align with social values, and adapt
to user preferences, all while requiring relatively minimal computational resources against
pre-training. In the context of reasoning capabilities, OpenAI’s o1 (OpenAI, 2024b) series models
were the first to introduce inference-time scaling by increasing the length of the Chain-ofThought reasoning process. This approach has achieved significant improvements in various
reasoning tasks, such as mathematics, coding, and scientific reasoning. However, the challenge
of effective test-time scaling remains an open question for the research community. Several prior
works have explored various approaches, including process-based reward models (Lightman
et al., 2023; Uesato et al., 2022; Wang et al., 2023), reinforcement learning (Kumar et al., 2024),
and search algorithms such as Monte Carlo Tree Search and Beam Search (Feng et al., 2024; Trinh
et al., 2024; Xin et al., 2024). However, none of these methods has achieved general reasoning
performance comparable to OpenAI’s o1 series models.
In this paper, we take the first step toward improving language model reasoning capabilities
using pure reinforcement learning (RL). Our goal is to explore the potential of LLMs to develop
reasoning capabilities without any supervised data, focusing on their self-evolution through
a pure RL process. Specifically, we use DeepSeek-V3-Base as the base model and employ
GRPO (Shao et al., 2024) as the RL framework to improve model performance in reasoning.
During training, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting
reasoning behaviors. After thousands of RL steps, DeepSeek-R1-Zero exhibits super performance
on reasoning benchmarks. For instance, the pass@1 score on AIME 2024 increases from 15.6% to
71.0%, and with majority voting, the score further improves to 86.7%, matching the performance
of OpenAI-o1-0912.
However, DeepSeek-R1-Zero encounters challenges such as poor readability, and language
mixing. To address these issues and further enhance reasoning performance, we introduce
DeepSeek-R1, which incorporates a small amount of cold-start data and a multi-stage training
pipeline. Specifically, we begin by collecting thousands of cold-start data to fine-tune the
DeepSeek-V3-Base model. Following this, we perform reasoning-oriented RL like DeepSeek-R1-
Zero. Upon nearing convergence in the RL process, we create new SFT data through rejection
sampling on the RL checkpoint, combined with supervised data from DeepSeek-V3 in domains
such as writing, factual QA, and self-cognition, and then retrain the DeepSeek-V3-Base model.
After fine-tuning with the new data, the checkpoint undergoes an additional RL process, taking
into account prompts from all scenarios. After these steps, we obtained a checkpoint referred to
as DeepSeek-R1, which achieves performance on par with OpenAI-o1-1217.
We further explore distillation from DeepSeek-R1 to smaller dense models. Using Qwen2.5-
32B (Qwen, 2024b) as the base model, direct distillation from DeepSeek-R1 outperforms applying
RL on it. This demonstrates that the reasoning patterns discovered by larger base models are crucial for improving reasoning capabilities. We open-source the distilled Qwen and Llama (Dubey
et al., 2024) series. Notably, our distilled 14B model outperforms state-of-the-art open-source
QwQ-32B-Preview (Qwen, 2024a) by a large margin, and the distilled 32B and 70B models set a
new record on the reasoning benchmarks among dense models.'''

In [30]:
res = strict_json('''Create a concept map of (Node 1, Node 2, relation), where relation is how Node 1 relates to Node 2.
Keep the relation short and concise''',
                  text,
                  output_format = {"Concept Map": "List of (Node 1, Node 2, relation), type: list"},
                  llm = llm)

In [31]:
res

{'Concept Map': [['Large Language Models (LLMs)',
   'Artificial General Intelligence (AGI)',
   'diminishing gap towards'],
  ['Post-training', 'Full training pipeline', 'important component of'],
  ['Post-training', 'Accuracy on reasoning tasks', 'enhances'],
  ['OpenAI’s o1 series models', 'Inference-time scaling', 'introduced by'],
  ['Inference-time scaling',
   'Chain-of-Thought reasoning process',
   'increases length of'],
  ['Chain-of-Thought reasoning process',
   'Reasoning tasks',
   'improves performance in'],
  ['DeepSeek-V3-Base', 'DeepSeek-R1-Zero', 'base model for'],
  ['DeepSeek-R1-Zero',
   'Reasoning benchmarks',
   'exhibits super performance on'],
  ['DeepSeek-R1', 'Cold-start data', 'incorporates small amount of'],
  ['DeepSeek-R1',
   'Multi-stage training pipeline',
   'enhances reasoning performance with'],
  ['DeepSeek-R1', 'OpenAI-o1-1217', 'achieves performance on par with'],
  ['Distillation', 'Smaller dense models', 'explores from DeepSeek-R1 to'],
  ['Di

In [32]:
edges = res['Concept Map']

In [33]:
# !pip install pyvis

In [38]:
from pyvis.network import Network

# Create the Pyvis Network instance.
# Set notebook=False for scripts (set to True if in a Jupyter Notebook)
net = Network(height="750px", width="100%", directed=True, notebook=True)

# Set custom options to limit zoom range and enable dragging.
# (These options are passed as a JSON string to vis.js.)
net.set_options("""
var options = {
  "interaction": {
    "dragView": true,
    "zoomView": true,
    "minZoom": 0.5,
    "maxZoom": 2
  },
  "physics": {
    "enabled": true
  },
  "layout": {
    "improvedLayout": true
  }
}
""")

# Add nodes to the network (avoid duplicates using a set)
nodes = set()
for source, target, _ in edges:
    nodes.add(source)
    nodes.add(target)

for node in nodes:
    net.add_node(node, label=node, title=node)

# Add edges with labels (the label appears on the edge and on hover)
for source, target, relation in edges:
    net.add_edge(source, target, title=relation, label=relation)

# === Inject custom JavaScript to prevent panning "out-of-bounds" ===
#
# This script listens for the "dragEnd" event on the network.
# It calculates the bounding box (min/max positions) of all nodes.
# If the current view center is too far (with a specified padding) from the bounding box,
# it automatically calls network.fit() to recenter the graph.
#
# Adjust the "padding" value as needed.

custom_script = """
<script type="text/javascript">
  network.on("dragEnd", function () {
    // Get positions of all nodes
    var positions = network.getPositions();
    var xs = [], ys = [];
    for (var key in positions) {
      xs.push(positions[key].x);
      ys.push(positions[key].y);
    }
    var minX = Math.min.apply(null, xs);
    var maxX = Math.max.apply(null, xs);
    var minY = Math.min.apply(null, ys);
    var maxY = Math.max.apply(null, ys);
    
    // Get the current view center of the network
    var viewPosition = network.getViewPosition();
    var centerX = viewPosition.x;
    var centerY = viewPosition.y;
    
    // Define a tolerance (padding) around the nodes' bounding box
    var padding = 50; // adjust this value as needed
    
    // If the center is out-of-bounds, recenter the network view.
    if (centerX < minX - padding || centerX > maxX + padding ||
        centerY < minY - padding || centerY > maxY + padding) {
      network.fit({animation: {duration: 500}});
    }
  });
</script>
"""

# Append the custom JavaScript to the HTML output.
net.html += custom_script

# Generate and open the interactive graph.
net.show("interactive_knowledge_graph.html")

interactive_knowledge_graph.html


# Now Make it into a Quiz Game

In [60]:
res = strict_json('''Generate questions and answers in multiple choice and the correct answer based on the text''',
                  text,
                  output_format = {"Question 1": "type: str",
                                  "Answers 1": "4 possible answers labelled A to D, only 1 correct, type: list",
                                  "Correct Answer 1": "type: Enum['A','B','C','D']",
                                  "Question 2": "type: str",
                                  "Answers 2": "4 possible answers labelled A to D, only 1 correct, type: list",
                                  "Correct Answer 2": "type: Enum['A','B','C','D']",
                                  "Question 3": "type: str",
                                  "Answers 3": "4 possible answers labelled A to D, only 1 correct, type: list",
                                  "Correct Answer 3": "type: Enum['A','B','C','D']",
                                  "Question 4": "type: str",
                                  "Answers 4": "4 possible answers labelled A to D, only 1 correct, type: list",
                                  "Correct Answer 4": "type: Enum['A','B','C','D']",
                                  "Question 5": "type: str",
                                  "Answers 5": "4 possible answers labelled A to D, only 1 correct, type: list",
                                  "Correct Answer 5": "type: Enum['A','B','C','D']"},
                  llm = llm)

In [61]:
res

{'Question 1': 'What is the main focus of the paper regarding language models?',
 'Answers 1': ['A) Improving language model reasoning capabilities using supervised data',
  'B) Exploring reasoning capabilities through pure reinforcement learning',
  'C) Enhancing pre-training processes',
  'D) Developing new computational resources'],
 'Correct Answer 1': 'B',
 'Question 2': 'What significant improvement was observed in DeepSeek-R1-Zero on the AIME 2024 benchmark?',
 'Answers 2': ['A) Increase from 15.6% to 71.0% pass@1 score',
  'B) Increase from 50% to 90% pass@1 score',
  'C) Increase from 10% to 30% pass@1 score',
  'D) No significant improvement was observed'],
 'Correct Answer 2': 'A',
 'Question 3': 'What challenges does DeepSeek-R1-Zero encounter?',
 'Answers 3': ['A) High computational cost',
  'B) Poor readability and language mixing',
  'C) Lack of training data',
  'D) Slow training speed'],
 'Correct Answer 3': 'B',
 'Question 4': 'What method is used to improve the perfo