# ReasoningAgent (Tree of Thoughts with MCTS)


This notebook demonstrates how to use Monte Carlo Tree Search (MCTS) with ReasoningAgent for complex reasoning tasks. MCTS provides several advantages over beam search when:

1. Ground truth evaluation is available
2. LLM-based evaluation is expensive
3. You want to generate diverse, high-quality training data

In [1]:
import json
import os
import pickle
import random

api_key = os.environ.get("OPENAI_API_KEY")

config_list = [{"model": "gpt-4o-mini", "api_key": api_key}]
verbose = False

## Simple Example: Dice Roll Problem

Here we'll solve a probability problem using MCTS-based reasoning. This example demonstrates:
- How MCTS explores different reasoning paths
- How ground truth evaluation improves path selection
- How to visualize the reasoning tree

In [2]:
from autogen import AssistantAgent, ReasoningAgent, ThinkNode, UserProxyAgent, visualize_tree

question = "What is the expected maximum dice value if you can roll a 6-sided dice three times?"
random.seed(1)  # setup seed for reproducibility

mcts_agent = ReasoningAgent(
    name="mcts_agent",
    system_message="answer math questions",
    llm_config={"config_list": config_list},
    verbose=True,
    # setup small depth and simulations for conciseness.
    max_depth=4,
    reason_config={"method": "mcts", "nsim": 5},
)


user_proxy = UserProxyAgent(
    name="user_proxy",
    human_input_mode="NEVER",
    code_execution_config=False,
    max_consecutive_auto_reply=10,
)

In [3]:
question = "What is the expected maximum dice value if you can roll a 6-sided dice three times?"


def last_meaningful_msg(sender, recipient, summary_args):
    import warnings

    if sender == recipient:
        return "TERMINATE"

    summary = ""
    chat_messages = recipient.chat_messages[sender]

    for msg in reversed(chat_messages):
        try:
            content = msg["content"]
            if isinstance(content, str):
                summary = content.replace("TERMINATE", "")
            elif isinstance(content, list):
                # Remove the `TERMINATE` word in the content list.
                summary = "\n".join(
                    x["text"].replace("TERMINATE", "") for x in content if isinstance(x, dict) and "text" in x
                )
            if summary.strip().rstrip():
                return summary
        except (IndexError, AttributeError) as e:
            warnings.warn(f"Cannot extract summary using last_msg: {e}. Using an empty str as summary.", UserWarning)
    return summary

In [4]:
ans = user_proxy.initiate_chat(mcts_agent, message=question, summary_method=last_meaningful_msg)

[33muser_proxy[0m (to mcts_agent):

What is the expected maximum dice value if you can roll a 6-sided dice three times?

--------------------------------------------------------------------------------
[33mmcts_agent[0m (to tot_thinker):

# Question: What is the expected maximum dice value if you can roll a 6-sided dice three times?
---
What are the possible next steps?

--------------------------------------------------------------------------------
[33mtot_thinker[0m (to mcts_agent):

**Reflection**  
The previous steps do not indicate any attempts to solve the question at hand, which is focused on calculating the expected maximum value of multiple 6-sided dice rolls. There is a lack of a structured approach to derive this answer, which could entail using probability or statistics. Furthermore, no errors were present in the sequence itself, but it could benefit from a clearer method towards computing the expected maximum.

**Possible Options:**  
Option 1: Calculate the expecte

In [5]:
print(ans.summary)

To calculate the expected maximum value when rolling a 6-sided die three times, we can follow these steps:

### Step 1: Understand the setup
When rolling a single 6-sided die, the possible outcomes are 1, 2, 3, 4, 5, and 6. When rolling three dice, our goal is to find the expected maximum value from those three rolls.

### Step 2: Calculate the probability for each maximum outcome
The maximum value \( M \) can range from 1 to 6. We will calculate the probability of each possible maximum outcome:

1. **For \( M = 1 \)**: This occurs when all three dice show 1:
   - Probability: \( P(M = 1) = \left(\frac{1}{6}\right)^3 = \frac{1}{216} \)

2. **For \( M = 2 \)**: This occurs when at least one die shows a 2, and none show a 3, 4, 5, or 6:
   - Probability: \( P(M = 2) = \left(\frac{2}{6}\right)^3 - \left(\frac{1}{6}\right)^3 = \frac{8}{216} - \frac{1}{216} = \frac{7}{216} \)

3. **For \( M = 3 \)**: This occurs when at least one die shows a 3, and none show 4, 5, or 6:
   - Probability: \(

In [6]:
!pip install graphviz



In [7]:
### Run the following line to save the visualization to "tree_of_thoughts.png"
# visualize_tree(mcts_agent._root)

In [8]:
writer = AssistantAgent(
    name="Writer",
    llm_config={"config_list": config_list},
    system_message="""
    You are a professional writer, known for your insightful and engaging articles.
    You transform complex concepts into compelling narratives.
    You should improve the quality of the content based on the feedback from the user.
    """,
)
reason_agent_for_writer = ReasoningAgent(
    name="reason_agent",
    llm_config={"config_list": config_list},
    verbose=verbose,
    beam_size=1,
    max_depth=3,
)


def reflection_message(recipient, messages, sender, config):
    print("Reflecting...", "yellow")
    return f"Reflect, Reason and provide critique on the following writing. \n\n {recipient.chat_messages_for_summary(sender)[-1]['content']}"

In [9]:
user_proxy.register_nested_chats(
    [
        {
            "recipient": reason_agent_for_writer,
            "message": reflection_message,
            "summary_method": "last_msg",
            "max_turns": 1,
        }
    ],
    trigger=writer,
)

In [10]:
task = """Write a concise but engaging blogpost about Nvidia."""
res = user_proxy.initiate_chat(recipient=writer, message=task, max_turns=2, summary_method="last_msg")

[33muser_proxy[0m (to Writer):

Write a concise but engaging blogpost about Nvidia.

--------------------------------------------------------------------------------
[33mWriter[0m (to user_proxy):

**Title: Nvidia: The Powerhouse of Visual Computing and AI Innovation**

In a world increasingly defined by digital experiences, Nvidia stands as a titan, driving the future of technology with its groundbreaking advancements in graphics processing. Established in 1993, Nvidia has evolved from a graphics card manufacturer into a leader in AI, gaming, and deep learning.

At the heart of Nvidia’s success is its Graphics Processing Unit (GPU), a marvel of engineering that has transformed not just gaming but industries ranging from film to healthcare. The iconic GeForce series has become synonymous with high-performance gaming, delivering stunning graphics that bring virtual worlds to life. However, Nvidia's impact extends far beyond the gaming realm; their GPUs power some of the most complex

In [11]:
# json.dump(mcts_agent._root.to_dict(), open("mcts.json", "w"), indent=2)
print(json.dumps(mcts_agent._root.to_dict(), indent=2))

{
  "content": "What is the expected maximum dice value if you can roll a 6-sided dice three times?",
  "value": 4.0,
  "depth": 0,
  "visits": 5,
  "children": [
    {
      "content": "Calculate the expected maximum value of one roll and extend it to three rolls based on probability theories.",
      "value": 0.75,
      "depth": 1,
      "visits": 1,
      "children": [
        {
          "content": "Derive the expected maximum from the probability distribution of the maximum of three dice rolls.",
          "value": 0,
          "depth": 2,
          "visits": 0,
          "children": []
        },
        {
          "content": "Calculate the expected value for each possible maximum outcome (1 through 6) based on the probabilities of rolling them.",
          "value": 0,
          "depth": 2,
          "visits": 0,
          "children": []
        },
        {
          "content": "Simulate rolling a 6-sided die three times multiple times to empirically find the expected maximum.

## Using Ground Truth to Generate Training Data

When we have access to ground truth answers, we can use them to improve the evaluation of reasoning paths. This section demonstrates:
- How to include ground truth in prompts
- How the agent uses ground truth for evaluation
- How this improves the quality of generated solutions

The MCTS approach can generate valuable training data for:
- Supervised Fine-Tuning (SFT)
- Reinforcement Learning from Human Feedback (RLHF)

In [12]:
prompt = """What is the expected maximum dice value if you can roll a 6-sided dice three times?

GROUND_TRUTH:
We define X as the highest outcome among the three rolls. 
The probability that X is at least m is 1 - \\left(\frac{m-1}{6}\right)^3 for each m from 1 to 6. 
Summing these probabilities gives the expectation E(X) = \\sum_{m=1}^{6} [1 - (\frac{m-1}{6})^3].
Calculating this sum results in E(X) = 6 - \frac{225}{216} = \frac{119}{24}, which approximates to 4.9583.
Therefore, the expected maximum value when rolling a six-sided die three times is \frac{119}{24} or approximately 4.9583.
"""
random.seed(1)  # setup seed for reproducibility

mcts_agent2 = ReasoningAgent(
    name="mcts_agent",
    system_message="answer math questions",
    llm_config={"config_list": config_list},
    verbose=True,
    # setup small depth and simulations for conciseness.
    max_depth=4,
    reason_config={"method": "mcts", "nsim": 5},
)


user_proxy = UserProxyAgent(
    name="user_proxy",
    human_input_mode="NEVER",
    code_execution_config=False,
    max_consecutive_auto_reply=10,
)


ans = user_proxy.initiate_chat(mcts_agent2, message=prompt, summary_method=last_meaningful_msg)

[33muser_proxy[0m (to mcts_agent):

What is the expected maximum dice value if you can roll a 6-sided dice three times?

GROUND_TRUTH:
We define X as the highest outcome among the three rolls. 
ight)^3 for each m from 1 to 6. st m is 1 - \left(rac{m-1}{6}
Summing these probabilities gives the expectation E(X) = \sum_{m=1}^{6} [1 - (rac{m-1}{6})^3].
Calculating this sum results in E(X) = 6 - rac{225}{216} = rac{119}{24}, which approximates to 4.9583.
Therefore, the expected maximum value when rolling a six-sided die three times is rac{119}{24} or approximately 4.9583.


--------------------------------------------------------------------------------
[33mmcts_agent[0m (to tot_thinker):

# Question: What is the expected maximum dice value if you can roll a 6-sided dice three times?
---
What are the possible next steps?

--------------------------------------------------------------------------------
[33mtot_thinker[0m (to mcts_agent):

**Reflection**  
The previous steps do not

In [13]:
print(ans.summary)

To find the expected maximum value when rolling a 6-sided die three times, we can use probability theory. Here’s a concise approach to the solution:

### Step 1: Calculate the Expected Maximum Value

1. **Define the maximum value** \( M \) from three rolls of the die (possible values: 1 to 6).
2. **Find the probability** \( P(M = k) \) for each possible maximum \( k \) (where \( k \) ranges from 1 to 6).

- \( P(M = 1) = \left(\frac{1}{6}\right)^3 = \frac{1}{216} \)

- \( P(M = 2) = P(\text{at least one die is 2}) - P(\text{all dice are } 1) = \left(\frac{2}{6}\right)^3 - \left(\frac{1}{6}\right)^3 = \frac{8}{216} - \frac{1}{216} = \frac{7}{216} \)

- \( P(M = 3) = P(\text{at least one die is 3}) - P(\text{all dice are } 2 \text{ or less}) = \left(\frac{3}{6}\right)^3 - \left(\frac{2}{6}\right)^3 = \frac{27}{216} - \frac{8}{216} = \frac{19}{216} \)

- \( P(M = 4) = P(\text{at least one die is 4}) - P(\text{all dice are } 3 \text{ or less}) = \left(\frac{4}{6}\right)^3 - \left(\frac{3}{

In [14]:
print(json.dumps(mcts_agent2._root.to_dict(), indent=2))

{
  "content": "What is the expected maximum dice value if you can roll a 6-sided dice three times?",
  "value": 4.5,
  "depth": 0,
  "visits": 5,
  "children": [
    {
      "content": "Calculate the expected maximum value of one roll and extend it to three rolls based on probability theories.",
      "value": 2.0,
      "depth": 1,
      "visits": 2,
      "children": [
        {
          "content": "Derive the expected maximum from the probability distribution of the maximum of three dice rolls.",
          "value": 1.0,
          "depth": 2,
          "visits": 1,
          "children": [
            {
              "content": "Calculate the expected maximum for a single roll to establish a baseline value, ensuring clarity for the next step.",
              "value": 0,
              "depth": 3,
              "visits": 0,
              "children": []
            },
            {
              "content": "Compute the expected maximum from the cumulative distribution function for thre

In [15]:
from autogen.agentchat.contrib.reasoning_agent import extract_rlhf_preference_dataset, extract_sft_dataset

# Get SFT data from successful paths
sft_data = extract_sft_dataset(mcts_agent2._root)

# Get preference pairs for RLHF
rlhf_data = extract_rlhf_preference_dataset(mcts_agent2._root)