# Project 4: **Build a Deep Research System**
Welcome to project 4! For this project, we shift our focus from tool use and agents to *reasoning* models. You will practice state‑of‑the‑art inference‑time scaling methods such as *Chain‑of‑Thought* prompting and *Tree‑of‑Thoughts*, and briefly explore high-levels of training reasoning models using techniques like **STaR**.


Finally, you will put everything together to build a *deep research agent* that can browse the web, reason over what it finds, and give structured answers.

## Learning Objectives  
* Apply common inference‑time scaling methods: **zero‑shot / few‑shot CoT, self‑consistency, sequential decoding, tree‑of‑thoughts**  
* Gain intuition for **training** reasoning‑capable models following **STaR** approach 
* Build a minimal **deep‑research agent** that combines step‑by‑step reasoning with live web search   
* Practice extending deep-search to a multi-agent system 

## Roadmap  
1. Environment setup  
2. Inference‑time scaling  
   2.1 Few‑shot & zero‑shot CoT  
   2.2 Self‑consistency
   2.3 Sequential revisions  
   2.4 Tree‑of‑Thought
3. STaR for training models for reasoning  
4. Deep-research agent  
5. (Optional) Multi-agent deep-research

# 1‑ Environment setup

## 1.1- Conda environment

Before we start coding, you need a reproducible setup. Open a terminal in the same directory as this notebook and run:

```bash
# Create and activate the conda environment
conda env create -f environment.yaml && conda activate deep_research

# Register this environment as a Jupyter kernel
python -m ipykernel install --user --name=deep_research --display-name "deep_research"
```
Once this is done, you can select "deep_research" from the Kernel → Change Kernel menu in Jupyter or VS Code.

## 1.2 Ollama setup

In this project we use the `llama3.2:3b` and `deepseek-r1:8b` models. You can try other smaller or larger reasoning LLMs such as `qwen2.5:3b-instruct` or `phi4-mini` to compare performance. Explore available models here: https://ollama.com/library.

```bash
ollama pull llama3.2:3b
ollama pull deepseek-r1:8b
# Additional small reasoning models to compare
# ollama pull qwen2.5:3b-instruct
# ollama pull phi4-mini

```

`ollama pull` downloads the model so you can run it locally without API calls.

---  
# 2‑ Inference‑time scaling

Inference-time scaling refers to techniques that make an existing model reason better without retraining it. Instead of changing the model’s weights, we achieve reasoning capability by adjusting how we prompt, sample, or aggregate LLM's outputs.

In this section, we’ll explore several inference-time strategies that improve reasoning quality using a non-reasoning base model. You will experiment with and compare methods such as:

- Few-shot Chain-of-Thought (CoT)
- Zero-shot CoT
- Self-consistency
- Sequential revision
- Tree-of-Thoughts (ToT)

### 2.1: Few‑Shot CoT
Few-shot prompting helps a model reason by showing one or multiple examples before asking a new question. By observing the pattern of reasoning and final answers, the model learns how to structure its own reasoning process on the new input.

In this exercise, you will create a prompt that includes a few example Q&A pairs demonstrating step-by-step reasoning. Then, you will feed a new question and see the model’s output.

In [1]:
from openai import OpenAI

# Initialize client for Ollama
client = OpenAI(api_key="ollama", base_url="http://localhost:11434/v1")
MODEL = "llama3.2:3b"

# Create few-shot examples with step-by-step reasoning
few_shot_prompt = """
Question: If Mary has 3 apples and gives 1 to John, how many apples does she have left?
Let's solve this step by step:
1. Initially, Mary has 3 apples
2. She gives 1 apple to John
3. Therefore, 3 - 1 = 2 apples remain
Answer: 2 apples

Question: A train travels 120 km in 2 hours. What is its average speed?
Let's solve this step by step:
1. Distance traveled = 120 km
2. Time taken = 2 hours
3. Average speed = Distance ÷ Time
4. Therefore, 120 ÷ 2 = 60
Answer: 60 kilometers per hour

Question: If a square has a perimeter of 20 cm, what is its area?
Let's think step by step:"""

# Call the model with few-shot examples
response = client.chat.completions.create(
    model=MODEL,
    messages=[{"role": "user", "content": few_shot_prompt}],
    temperature=0.7
)

print("Model's reasoning:")
print(response.choices[0].message.content)

Model's reasoning:
To find the area of a square given its perimeter, we need to follow these steps:

1. The formula for the perimeter of a square is P = 4s, where s is the length of one side.
2. Given that the perimeter (P) is 20 cm, we can set up an equation: 20 = 4s
3. To solve for s, divide both sides of the equation by 4: s = 20 ÷ 4
4. Therefore, s = 5 cm, which means each side of the square has a length of 5 cm.
5. The formula for the area (A) of a square is A = s^2, where s is the length of one side.
6. Substitute the value of s into the equation: A = 5^2
7. Calculate the square of 5: 5^2 = 25
8. Therefore, the area of the square is 25 square centimeters.

Answer: The area of the square is 25 square centimeters.


### (Optional) Few-shot CoT on GPT2
GPT-2 is a pre-trained language model without instruction tuning. It continues text rather than answering questions. In this section, you'll try the exact same CoT pattern on GPT-2 and observe what happens. The goal is to test whether few-shot CoT alone can elicit structured reasoning from a non-chat LLM.

In [5]:
import os
import torch
from transformers import pipeline

# Initialize GPT-2 pipeline with text-generation
generator = pipeline('text-generation', model='gpt2')

# Create a few-shot prompt with simpler examples (since GPT-2 has limited context)
few_shot_prompt = """
Q: If I have 2 cookies and eat 1, how many do I have?
Steps:
1. Start with 2 cookies
2. Eat 1 cookie
3. 2 - 1 = 1 cookie left
Answer: 1 cookie

Q: A rectangle is 4m wide and 3m long. What's the area?
Steps:
1. Area = width × length
2. Area = 4m × 3m
3. Area = 12 square meters
Answer: 12 square meters

Q: If 15 students share 45 pencils equally, how many pencils does each student get?
Steps:"""

# Generate completions with different decoding settings
outputs = []

# Greedy decoding (no sampling)
greedy = generator(few_shot_prompt, max_new_tokens=100, num_return_sequences=1, pad_token_id=50256, do_sample=False)
outputs.append(("Greedy (greedy decoding):", greedy[0]['generated_text']))

# Sampling with temperature
sampled = generator(few_shot_prompt, max_new_tokens=100, num_return_sequences=1,
                   temperature=0.7, pad_token_id=50256)
outputs.append(("Sampling (temperature=0.7):", sampled[0]['generated_text']))

# Top-k sampling
top_k = generator(few_shot_prompt, max_new_tokens=100, num_return_sequences=1,
                 temperature=0.7, top_k=50, pad_token_id=50256)
outputs.append(("Top-k sampling (k=50):", top_k[0]['generated_text']))

# Print and compare outputs
print("Testing GPT-2's reasoning ability with different decoding strategies:\n")
for method, output in outputs:
    print(f"\n{method}")
    print("-" * 50)
    # Print only the new content after our prompt
    new_content = output[len(few_shot_prompt):]
    print(new_content.strip())

Device set to use cpu
The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Testing GPT-2's reasoning ability with different decoding strategies:


Greedy (greedy decoding):
--------------------------------------------------
1. Student = 1 pencil

2. Student = 2 pencils

3. Student = 3 pencils

Answer: 3 pencils

Q: If I have 2 cookies and eat 1, how many do I have?

Steps:

1. Start with 2 cookies

2. Eat 1 cookie

3. 2 - 1 = 1 cookie left

Answer: 1 cookie

Q: A rectangle is 4m wide and

Sampling (temperature=0.7):
--------------------------------------------------
15 students share 45 pencils equally

Q: If I have 5 pencils, how many do I get?

Steps:

5 pencils

Q: If I have 10 pencils, how many do I get?

Steps:

10 pencils

Q: If I have 2 pencils, how many do I get?

Steps:

2 pencils

Q: If I have 3 pencils

Top-k sampling (k=50):
--------------------------------------------------
1. 15 students = 1 pencil

2. 15 students = 2 pencils

3. 15 students = 3 pencils


Q: I didn't pay for the class! What should I do?

Steps:

1. Make sure you're not going to 

### 2.2: Zero‑Shot Chain‑of‑Thought
Zero-shot CoT encourages the model to reason without examples by adding a short cue such as “Let’s think step by step.” This simple phrase often activates the model’s latent reasoning ability even when no demonstrations are provided. It serves as a baseline to compare with few-shot and other inference-time scaling methods.

In [6]:
from openai import OpenAI

# Initialize client for Ollama
client = OpenAI(api_key="ollama", base_url="http://localhost:11434/v1")
MODEL = "llama3.2:3b"

def zero_shot_cot(question: str, temperature: float = 0.7) -> str:
    # Create a zero-shot prompt with reasoning cue
    prompt = f"""You are a helpful expert assistant. Please help solve this problem step by step:

{question}

Let's approach this step by step:"""
    
    # Call the model
    response = client.chat.completions.create(
        model=MODEL,
        messages=[{"role": "user", "content": prompt}],
        temperature=temperature
    )
    
    return response.choices[0].message.content

# Test with a challenging multi-step problem
question = "In a class of 24 students, 3/4 play sports and 2/3 of those who play sports also play music. How many students play both sports and music?"

print("Question:", question)
print("\nReasoning process:")
print(zero_shot_cot(question))

Question: In a class of 24 students, 3/4 play sports and 2/3 of those who play sports also play music. How many students play both sports and music?

Reasoning process:
We can break down the problem into smaller steps to find out how many students play both sports and music.

Step 1: Find the number of students who play sports.
Since 3/4 of the students play sports, we need to multiply the total number of students (24) by 3/4:

Number of students who play sports = 24 x 3/4
= 24 x 0.75
= 18

So, 18 students play sports.

Step 2: Find the number of students who play both sports and music.
Since 2/3 of those who play sports also play music, we need to multiply the number of students who play sports (18) by 2/3:

Number of students who play both sports and music = 18 x 2/3
= 18 x 0.67
= 12

Therefore, 12 students play both sports and music.

Let me know if you have any further questions or need help with anything else!
We can break down the problem into smaller steps to find out how many s

### 2.3 Self‑Consistency
Self-consistency enhances reasoning accuracy by sampling multiple independent reasoning paths for the same question instead of relying on a single deterministic answer. Each run may follow a slightly different logical chain, and the diversity helps correct individual mistakes. After generating several reasoning traces, you then aggregate the final answers using majority voting.

This approach is especially useful when tasks involve multi-step reasoning or arithmetic, where single-path outputs may be incorrect.

In [7]:
from openai import OpenAI
import re, collections

# Initialize Ollama/OpenAI-compatible client
client = OpenAI(api_key = "ollama", base_url = "http://localhost:11434/v1")
MODEL = "llama3.2:3b"


def cot_answer(question: str, temperature: float = 1.0) -> str:
    """Generate a chain-of-thought trace for `question` and return the final answer string.

    - Builds a short prompt that cues step-by-step reasoning.
    - Calls the chat completions API.
    - Extracts the final answer heuristically by looking for an `Answer:` line or the last non-empty line.

    Returns:
        final_ans (str): extracted final answer (trimmed). If parsing fails, returns the full model output.
    """
    prompt = f"""You are a helpful assistant. Please answer the question step-by-step and finish with a clear final line prefixed with 'Answer:'

Question: {question}

Let's think step by step:"""

    try:
        resp = client.chat.completions.create(
            model=MODEL,
            messages=[{"role": "user", "content": prompt}],
            temperature=temperature,
        )
        content = resp.choices[0].message.content.strip()
    except Exception as e:
        # Return a sentinel so the caller can handle failures
        return f"<ERROR: {e}>"

    # Try to extract text after 'Answer:' (case-insensitive)
    m = re.search(r"Answer\s*[:\-]?\s*(.+)$", content, flags=re.IGNORECASE | re.MULTILINE)
    if m:
        final = m.group(1).strip()
        # If answer spans multiple lines, take up to the first blank line
        final_lines = [l for l in final.splitlines() if l.strip()]
        return final_lines[0].strip() if final_lines else final.strip()

    # Fallback: take the last non-empty line of the model output
    lines = [l.strip() for l in content.splitlines() if l.strip()]
    if lines:
        return lines[-1]

    # If everything else fails return the raw content
    return content


def self_consistent(question: str, n: int = 10, base_temperature: float = 0.8) -> tuple:
    """Run `cot_answer` n times (with sampling) and return the most common final answer plus the Counter.

    Args:
        question: the question string
        n: number of sampled reasoning traces to generate
        base_temperature: sampling temperature for diversity (set near 0 for deterministic)

    Returns:
        (winner, counter): winner is the most-common answer string; counter is a collections.Counter of all answers
    """
    answers = []
    for i in range(n):
        # increase temperature slightly for more diversity across runs
        temp = base_temperature if i > 0 else max(0.2, base_temperature - 0.2)
        ans = cot_answer(question, temperature=temp)
        answers.append(ans)

    counter = collections.Counter(answers)
    if not counter:
        return None, counter

    winner = counter.most_common(1)[0][0]
    return winner, counter


# Small demonstration/test
if __name__ == "__main__":
    question = "What is the square root of 144?"
    winner, counter = self_consistent(question, n=6, base_temperature=0.8)
    print("Votes:", counter)
    print("Chosen answer:", winner)

Votes: Counter({'The square root of 144 is 12.': 2, 'with no fractional component when squared, look at the pair:': 1, 'must be somewhere between 12 and 14.': 1, '12': 1, 'The value of the square root of 144 is 12.': 1})
Chosen answer: The square root of 144 is 12.


### 2.4: Sequential Revision

Sequential revision iteratively improves an answer by generating a first draft, critiquing it, and producing revised drafts that condition on prior answers. Each round should be short and focused, so improvements accumulate without drifting from the question.

In [8]:
MODEL = "llama3.2:3b"

def sequential_revision(question: str, max_steps: int = 3) -> str:
    """Generate an initial draft answer then iteratively refine it.

    Workflow:
    - Produce a first draft that includes step-by-step reasoning and a short final answer.
    - For up to `max_steps-1` revision rounds, ask the model to (1) briefly critique the previous draft and (2) produce an improved revision.
    - Stop early if the model returns the same draft (no changes).

    Returns the final draft string. Prints each draft so you can observe evolution.
    """
    # Build initial prompt for a first draft
    init_prompt = f"""You are an experienced instructor. Produce a concise draft answer to the question below that includes step-by-step reasoning and a brief final answer.

Question: {question}

Draft:"""

    try:
        resp = client.chat.completions.create(
            model=MODEL,
            messages=[{"role": "user", "content": init_prompt}],
            temperature=0.5,
        )
        draft = resp.choices[0].message.content.strip()
    except Exception as e:
        return f"<ERROR generating initial draft: {e}>"

    print("Draft 1:\n", draft)

    # Iteratively refine
    for step in range(2, max_steps + 1):
        revise_prompt = f"""You are a concise editor and domain expert. Here is the previous draft answer:

{draft}

1) In 1-2 short sentences, list the main weaknesses or inaccuracies (if any).
2) Produce a revised improved draft that addresses those weaknesses. Keep the style clear and the final answer concise. Output only the revised draft.

Revised draft:"""
        try:
            r = client.chat.completions.create(
                model=MODEL,
                messages=[{"role": "user", "content": revise_prompt}],
                temperature=0.6,
            )
            new_draft = r.choices[0].message.content.strip()
        except Exception as e:
            print(f"<ERROR during revision step {step}: {e}>")
            break

        # If model returns same content, stop early
        if new_draft == draft:
            print(f"\nNo substantive change at step {step}; stopping early.")
            break

        draft = new_draft
        print(f"\nDraft {step}:\n{draft}")

    return draft


# Example usage (not wrapped in __main__ so it's visible in the notebook execution output)
question = (
    "Design a 1-hour hands-on activity to teach 8th graders the basics of probability. "
    "Include required materials, step-by-step procedures, and expected learning outcomes."
)
final = sequential_revision(question, max_steps=3)
print("\nFinal draft:\n", final)

Draft 1:
 **Activity Title:** "Rolling with Probability"

**Objective:** To introduce 8th graders to the concept of probability and its application in real-life situations through a hands-on activity.

**Materials:**

* A standard six-sided die (d6)
* Whiteboard or chalkboard
* Markers or chalk
* Printed copies of the probability formula (P = number of favorable outcomes / total number of possible outcomes)

**Step-by-Step Procedure:**

1. Introduction (5 minutes):
	* Introduce the concept of probability and ask students if they have ever heard of it.
	* Write the definition on the board: "Probability is a measure of how likely an event is to occur."
2. Direct Instruction (10 minutes):
	* Explain that probability can be calculated using the formula P = number of favorable outcomes / total number of possible outcomes.
	* Use simple examples, such as flipping a coin or rolling a die, to illustrate the concept.
	* Write the formula on the board and have students copy it onto their printed

### 2.5 Tree‑of‑Thoughts
Tree-of-Thoughts reframes reasoning as a search process rather than a single forward chain.
Instead of producing one linear sequence of thoughts, the model generates multiple candidate thoughts at each step, evaluates their promise, and then expands only the best few. This allows exploration of different reasoning paths before committing to a final answer, similar to how humans brainstorm, prune, and refine ideas.


In this section, you’ll experiment with two simplified versions of ToT:
1. Word Ladder puzzle solver: a small example where each “thought” is a candidate word transition.
2. Generic ToT search (depth 2, width 2): a minimal logic to expand, evaluate, and select reasoning branches

In [None]:
###### Word Ladder Puzzle ##########

def neighbors(word, vocabulary):
    # Generate all valid one-letter mutations of 'word' that exist in 'vocabulary' and return them.
    """
    YOUR CODE HERE (~6-8 lines)
    """
    pass


def tree_of_thought(start, goal, vocab, max_depth=5, beam_width=4):
    # Search over partial thoughts (paths) using a small beam.
    # Step 1: Initialize the frontier with a single path [start]
    # Step 2: For each depth, expand each path by one neighbor from 'neighbors'
    # Step 3: Score paths by edit distance between last word and 'goal' (smaller is better)
    # Step 4: Keep the top 'beam_width' paths and stop early if any reaches 'goal'
    # Step 5: Return the best goal-reaching path or None
    """
    YOUR CODE HERE (~14-18 lines)
    """
    pass


vocab = {"hit","dot","cog","log","dog","lot","lit","hot"}
print(tree_of_thought("hit", "cog", vocab)) # one candidate solution: ['hit', 'hot', 'dot', 'dog', 'cog']


In [None]:
###### Generic ToT Search ##########

import re

MODEL = "llama3.2:3b"

def propose_thoughts(question, state, k=2):
    # Propose up to k next “thoughts” that extend the current partial solution/state.
    # Steps: build a short prompt with problem + current state; call your client with n=k. Then return a list of stripped strings (≤ k).
    """
    YOUR CODE HERE (~8-10 lines)
    """
    pass


def score_state(question, state):
    # Score how promising a partial solution is on a 1–10 scale (higher is better).
    # Steps: build a rating prompt; call the model; parse the first integer 1–10;
    """
    YOUR CODE HERE (~8-10 lines)
    """
    pass


def tree_of_thoughts(question, depth=2, width=2):
    # Run a tiny ToT search: expand states with propose_thoughts, score with score_state, keep top-k at each depth.
    # Steps: initialize frontier=[("", 0)]; for each depth, expand each state with k=width thoughts; score each; sort by score desc; keep top 'width'; return best state and score.
    """
    YOUR CODE HERE (~12-16 lines)
    """
    pass


question = "Design a plan for a weekend science workshop for 12-year-olds."
solution, score = tree_of_thoughts(question)

print(f"Best solution (score {score}):\n{solution}")

---  
# 3‑ Training Models for Reasoning

### 3.1: CoT Training
Chain-of-Thought (CoT) training conditions the model on explicit rationales during fine-tuning. Instead of teaching the model to output only the final answer, we train on (question, rationale, answer) so the model learns to internalize multi-step reasoning patterns. A practical recipe is STaR (Self-Taught Reasoner), which uses a stronger teacher model to bootstrap rationales that a smaller student can learn from.

For tasks that require multi-hop reasoning, models fine-tuned on rationales often achieve higher accuracy and are more stable at inference time than models trained on direct answers only. 

Training a full language model is beyond the scope of this notebook, but here is the high-level workflow followed by a short pseudocode:
- Collect questions: Prepare a dataset of questions and correct answers.
- Generate rationales: Use a strong LLM to produce step-by-step reasoning ending with the correct answer.
- Filter and clean: Discard incorrect or low-quality rationales.
- Prepare training data: Format triples (question, rationale, answer) for supervised fine-tuning.
- Fine-tune: Fine-tune the LLM on rationales.
- Iterate: Refine prompts, improve data quality, and retrain for stronger reasoning.

In [None]:
# Pseudocode (STaR loop)
# for round in 1 ... iters:
    # STEP 1: self-generate reasoning (teacher creates rationale + answer)
    # STEP 2: keep only correct, high-quality traces
    # STEP 3: fine-tune student on (question, rationale, answer) data

### 3.2: ORM vs PRM + RL
Training a Reward Model (RM) allows large language models to be improved through reinforcement learning (RL). Instead of fine-tuning directly on examples, we train a separate model that can score or rank model outputs, and use those scores as feedback signals to refine the policy model.

Two main reward modeling approaches are ORM (predicts a scalar reward for the final answer) and PRM (evaluates the reasoning steps instead of just the outcome)



| Approach | Typical loss | When to use |
|-----------|-------------|-------------|
|*Outcome Reward Model* | Predict scalar reward | Easy to collect training data using verifiers |
|*Process Reward Model* | Predict rewards per step | Difficult to collect training data but more accurate |
| *RLHF* | Use RM as reward in **RL** fine‑tuning | Aligns policy with human signals | Aligns model policy with human or synthetic preferences




In [None]:
# for round = 1 ... iters:
    # STEP 1:  Generate reasoning
        # sample a minibatch of questions
        # policy roll‑out (actions + log‑probs)
    # STEP 2:  Score the trajectory
        # ORM: scalar reward for the final answer / PRM: scalar reward for the thought process
    # STEP 3:  Reinforce the policy (PPO)

---  
# 4‑ A Deep Research Agent

A deep-research agent pairs a reasoning model (e.g., deepseek-r1) with external tools for web search and retrieval. We will follow the ReAct pattern: the model writes short thoughts, decides when to call tools, reads observations, and continues reasoning until it can answer or reaches a step limit.

We now combine a **search tool** with a reasoning model (e.g., `deepseek-r1`) in a multi-step setup. We follow the *ReAct* pattern (reason → tool → observation):

1. The model reasoins and decides to use tools
2. The agent searches and feed condensed snippets back as context
3. Iterate until the model answers or hits a step limit

We use `AgentType.OPENAI_FUNCTIONS`, which hides the loop inside the LangChain agent.

In [None]:
from ddgs import DDGS
from langchain.tools import Tool

def ddg_search(query: str, k: int = 5) -> str:
    # Use DDGS to run a simple web search and return joined snippets.
    """
    YOUR CODE HERE (~6 lines of code)
    """

search_tool = Tool(
    name="DuckDuckGo Search",
    func=ddg_search,
    description="Search the public web. Input: a plain English query. Returns: concatenated snippets."
)


In [None]:
from langchain.agents import initialize_agent, AgentType
from langchain_community.chat_models import ChatOllama

MODEL = "deepseek-r1:8b"
question = "What are the best resources to learn machine learning in 2025?"

# Step 1: Initialize the reasoning model via ChatOllama
"""
YOUR CODE HERE (1 line of code)
"""

# Step 2: Build the agent with tool access (DuckDuckGo Search) and function-calling interface (initialize_agent)
"""
YOUR CODE HERE (1 line of code)
"""


# Step 3: Ask a query and let the agent search + reason to produce an answer
"""
YOUR CODE HERE (2 lines of code)
"""

# Optional (Multi-agent Deep Research)
Instead of a single multi-step agent, you can design multiple collaborating agents such as a Planner, Searcher, Summarizer, and Verifier that pass information and refine each other’s outputs. This setup improves robustness, diversity of reasoning, and division of labor.

Try building a simple setup with 2–3 agents that share goals and messages, for example Planner → Researcher → Writer.

In [None]:
def parallel_research(query, n=3):
    # Run n independent research runs in parallel and return their answers.
    # Steps: use ThreadPoolExecutor; submit n calls to your agent/search pipeline; gather results in order.
    """
    YOUR CODE HERE
    """

answers = parallel_research("What are the best resources to learn ML in 2025?")
for i,a in enumerate(answers,1):
    print(f"[Run {i}] {a[:200]}…")

## 🎉 Congratulations!

* Practised various inference‑time reasoning methods
* Gained intuition about training reasoning models
* You have built a **deep-research agent**: reasoning model like deep-seek r1 + ReAct-style agent + tool use (web search)
* Try adding more tools, and extending the deep-research to a multi-agent system: many agents researching web in parallel.


👏 **Great job!** Take a moment to celebrate. The techniques you implemented here power many production agents and chatbots.