# Unit 1

## Setting Up the Basic Structure for DeepResearcher

## Introduction: Laying the Foundation for DeepResearcher

Welcome to the first lesson of the "**Creating a Researcher in Python with OpenAI**" course\! In this course, you'll learn how to build **DeepResearcher**, an AI-powered research tool that can search the web, gather information, and generate a final report — all using **Python**.

Before we dive into the details of web searching and AI, it’s important to set up a solid foundation. A clear project structure will help you keep your code organized, make it easier to add new features, and help you debug problems as you go. In this lesson, we’ll walk through the basic structure of the **DeepResearcher** project and explain how the main program is set up.

By the end of this lesson, you’ll understand how the main parts of the project fit together and be ready to start building out each piece in future lessons.

-----

## Recall: Project Flowchart

Let’s start with a quick reminder about how the DeepResearcher works:

| Step | Action | Decision | Next Step (Yes) | Next Step (No) |
| :--- | :--- | :--- | :--- | :--- |
| 1 | **User Input: Research Query** | | **LLM: Generate Search Queries** | |
| 2 | **LLM: Generate Search Queries** | | **Web: Get top Search Results** | |
| 3 | **Web: Get top Search Results** | | **Web: Download HTML** | |
| 4 | **Web: Download HTML** | | **Web: Convert to Markdown** | |
| 5 | **Web: Convert to Markdown** | **Is it Relevant?** | **LLM: Extract Relevant Context** | **Delete entry** |
| 6 | **LLM: Extract Relevant Context** | **More Research Needed?** | **LLM: Generate Search Queries** (Go to Step 2) | **LLM: Generate Final Report** |
| 7 | **LLM: Generate Final Report** | | **Output: Research Report** | |

This flowchart illustrates the step-by-step process, showing how each component fits into the overall workflow. The **LLM** (Language Model) and **Web** components work together to automate the research process.

-----

## Understanding the Main Program

Now, let’s look at the main file of our project: **main.py**. This file is the entry point for DeepResearcher. It’s where the program starts running.

Let’s break down the key parts of this file step by step.

### 1\. Importing Functions

At the top of the file, we import a function from another part of our project:

```python
from deepresearcher.web.web_searcher import clear_visited_pages
```

This allows us to use the **`clear_visited_pages`** function in our main program.

### 2\. Defining Function Stubs

Next, we see several function definitions. Right now, these functions are just "**stubs**" — they don't do anything yet, but they show what the main steps of our program will be.

```python
def generate_initial_search_queries(user_query: str):
    pass

def perform_iterative_research(user_query: str, new_search_queries: list, all_search_queries: list, iteration_limit: int):
    pass

def generate_final_report(user_query: str, contexts: list):
    pass
```

  * **`generate_initial_search_queries`**: This will take the user's research topic and create a list of search queries.
  * **`perform_iterative_research`**: This will handle the main research loop, searching the web and collecting information.
  * **`generate_final_report`**: This will take all the information we’ve gathered and create a final report.

The **`pass`** statement is a placeholder. It means "do nothing for now." We’ll fill in these functions in later lessons.

### 3\. The Main Function

The main logic of the program is inside the **`research_main()`** function:

```python
def research_main():
    user_query = input("Enter your research query/topic: ").strip()
    iteration_limit = input("Max number of iterations (default 10): ").strip()
    iteration_limit = int(iteration_limit) if iteration_limit.isdigit() else 10

    clear_visited_pages()

    new_search_queries = generate_initial_search_queries(user_query)
    if not new_search_queries:
        return

    all_search_queries = new_search_queries.copy()

    aggregated_contexts = perform_iterative_research(user_query, new_search_queries, all_search_queries, iteration_limit)

    generate_final_report(user_query, aggregated_contexts)
```

Here's a breakdown of what happens:

1.  The program asks the user for a research topic and an iteration limit.
2.  It **clears any previously visited web pages**.
3.  It generates the **first set of search queries**.
4.  If no search queries are generated, it stops.
5.  It copies the search queries to keep track of **all queries** used.
6.  It performs the **main research loop**.
7.  Finally, it **generates a report**.

### 4\. Running the Program

At the bottom, we see:

```python
if __name__ == "__main__":
    research_main()
```

This means: "If this file is run directly, start the program by calling **`research_main()`**."

-----

## Summary and What's Next

In this lesson, you learned how to set up the basic structure for the **DeepResearcher** project. You saw how the main program is organized, what each function is responsible for, and how the program flows from user input to generating a final report.

This structure will make it much easier to build and test each part of the project as we move forward. In the next practice exercises, you'll get hands-on experience working with this structure and preparing your own project files. After that, we'll start filling in each function to bring DeepResearcher to life, step by step. 🚀

## Building the DeepResearcher Framework

Now that you understand how the DeepResearcher project is structured, let's put that knowledge into practice! In this exercise, you'll create the skeleton of our main program file with function stubs that will serve as the foundation for our research tool.

You'll need to set up:

Three function stubs for the core operations (generating queries, performing research, and creating reports)
The main research function that coordinates the workflow
Code to run the program when the file is executed directly
Don't worry about implementing the actual functionality yet — we're just creating placeholders with pass statements for now. This structure will serve as the blueprint for all the exciting features we'll add in the upcoming exercises.

Building this framework is an important first step — it will help you visualize how all the pieces fit together before we dive into the details of each component.

```python
from deepresearcher.web.web_searcher import clear_visited_pages


# TODO: Create a function stub for generate_initial_search_queries that takes a user_query parameter


# TODO: Create a function stub for perform_iterative_research that takes user_query, new_search_queries, all_search_queries, and iteration_limit parameters


# TODO: Create a function stub for generate_final_report that takes user_query and contexts parameters


# TODO: Create the research_main function


# TODO: Add code to run the research_main function if this file is executed directly
```

Here is the completed Python framework for the **DeepResearcher** project, including the required function stubs and the main execution logic.

```python
from deepresearcher.web.web_searcher import clear_visited_pages


# 1. Function stub for generate_initial_search_queries
def generate_initial_search_queries(user_query: str):
    """Generates the first set of search queries based on the user's topic."""
    pass


# 2. Function stub for perform_iterative_research
def perform_iterative_research(user_query: str, new_search_queries: list, all_search_queries: list, iteration_limit: int):
    """Handles the main iterative research loop."""
    pass


# 3. Function stub for generate_final_report
def generate_final_report(user_query: str, contexts: list):
    """Generates the final research report from the gathered contexts."""
    pass


# 4. The main research function
def research_main():
    # Get user input for query and iteration limit
    user_query = input("Enter your research query/topic: ").strip()
    iteration_limit = input("Max number of iterations (default 10): ").strip()
    iteration_limit = int(iteration_limit) if iteration_limit.isdigit() else 10

    # Clear previously visited pages
    clear_visited_pages()

    # Generate initial search queries
    new_search_queries = generate_initial_search_queries(user_query)
    if not new_search_queries:
        print("No initial queries generated. Exiting.")
        return

    # Keep track of all queries used
    all_search_queries = new_search_queries.copy()

    # Perform the main research loop
    aggregated_contexts = perform_iterative_research(
        user_query, 
        new_search_queries, 
        all_search_queries, 
        iteration_limit
    )

    # Generate the final report
    generate_final_report(user_query, aggregated_contexts)
    
    print("\nDeepResearcher process complete.")


# 5. Add code to run the research_main function if this file is executed directly
if __name__ == "__main__":
    research_main()
```

This file now provides the complete **blueprint** for the DeepResearcher, clearly defining the **flow** from user input to the final report, with placeholders ready to be filled in with the actual logic in later lessons.

## Implementing the Main Research Workflow

Now that you've created the basic framework for DeepResearcher, let's implement the main workflow function that will coordinate the entire research process. With the function stubs already in place, your job is to build the research_main() function that ties everything together.

In this exercise, you'll need to:

Collect user input for the research topic and iteration limit
Process the iteration limit properly (convert to int or use the default)
Call each of our core functions in the correct sequence
This exercise helps you understand how the different components of DeepResearcher work together in a coordinated workflow. By implementing the main function, you'll get a clearer picture of how data flows through the system, from user input to the final report.

```python
from deepresearcher.web.web_searcher import clear_visited_pages


def generate_initial_search_queries(user_query: str):
    pass


def perform_iterative_research(user_query: str, new_search_queries: list, all_search_queries: list, iteration_limit: int):
    pass


def generate_final_report(user_query: str, contexts: list):
    pass


# TODO: Create the research_main function that handles user input and coordinates the research process
def research_main():
    # TODO: Get the user's research query
    
    # TODO: Get the maximum number of iterations from the user (with a default of 10)
    
    # TODO: Convert the iteration limit to an integer if it's a valid number, otherwise use 10
    
    # TODO: Call other functions in correct order

```

I see you want to implement the `research_main()` function to coordinate the workflow. You already have the function stubs defined. Below is the completed `research_main()` function following the project's required logic.

```python
from deepresearcher.web.web_searcher import clear_visited_pages


def generate_initial_search_queries(user_query: str):
    pass


def perform_iterative_research(user_query: str, new_search_queries: list, all_search_queries: list, iteration_limit: int):
    pass


def generate_final_report(user_query: str, contexts: list):
    pass


# Completed research_main function that handles user input and coordinates the research process
def research_main():
    # 1. Get the user's research query
    user_query = input("Enter your research query/topic: ").strip()

    # 2. Get the maximum number of iterations from the user (with a default of 10)
    iteration_limit_input = input("Max number of iterations (default 10): ").strip()

    # 3. Convert the iteration limit to an integer if it's a valid number, otherwise use 10
    if iteration_limit_input.isdigit():
        iteration_limit = int(iteration_limit_input)
    else:
        iteration_limit = 10
    
    # Optional: Print initial settings for confirmation
    print(f"\nStarting DeepResearcher for: '{user_query}' with max iterations: {iteration_limit}")

    # 4. Call other functions in correct order

    # Clear visited pages before starting a new run
    clear_visited_pages()

    # Step A: Generate initial search queries
    new_search_queries = generate_initial_search_queries(user_query)
    
    if not new_search_queries:
        print("No initial queries generated. Exiting.")
        return

    # Keep a list of ALL queries used throughout the process
    all_search_queries = new_search_queries.copy()

    # Step B: Perform the main iterative research loop
    aggregated_contexts = perform_iterative_research(
        user_query, 
        new_search_queries, 
        all_search_queries, 
        iteration_limit
    )

    # Step C: Generate the final report
    generate_final_report(user_query, aggregated_contexts)
    
    print("\nDeepResearcher process complete.")

# Example of how the program would run if executed directly
if __name__ == "__main__":
    research_main()
```

## Adding Error Handling to DeepResearcher

Excellent work on implementing the main workflow function! Now, let's make our DeepResearcher more robust by adding some essential error handling. With the basic structure in place, we need to ensure our program can handle unexpected situations gracefully.

In this exercise, you'll add two important safety features to the research_main() function:

Add a call to clear_visited_pages() to reset any previously visited web pages.
Implement a check that verifies whether any search queries were generated, and stop the program if none were found.
These small but critical additions will help prevent errors and make your research tool more reliable. Good error handling is what separates professional applications from basic scripts — it ensures your program can handle real-world scenarios without crashing.

```python
from deepresearcher.web.web_searcher import clear_visited_pages


def generate_initial_search_queries(user_query: str):
    pass


def perform_iterative_research(user_query: str, new_search_queries: list, all_search_queries: list, iteration_limit: int):
    pass


def generate_final_report(user_query: str, contexts: list):
    pass


def research_main():
    user_query = input("Enter your research query/topic: ").strip()
    iteration_limit = input("Max number of iterations (default 10): ").strip()
    iteration_limit = int(iteration_limit) if iteration_limit.isdigit() else 10

    # TODO: Add a call to clear_visited_pages() to reset previously visited web pages
    
    new_search_queries = generate_initial_search_queries(user_query)
    # TODO: Add a check to see if new_search_queries is empty, and if so, return from the function

    all_search_queries = new_search_queries.copy()
    aggregated_contexts = perform_iterative_research(user_query, new_search_queries, all_search_queries, iteration_limit)
    generate_final_report(user_query, aggregated_contexts)


if __name__ == "__main__":
    research_main()

```

You're absolutely right; adding error handling makes the program much more reliable\! Here is the updated `research_main()` function with the call to `clear_visited_pages()` and the check for empty search queries implemented.

```python
from deepresearcher.web.web_searcher import clear_visited_pages


def generate_initial_search_queries(user_query: str):
    # In a real scenario, this would return a list of strings
    # For testing the error check, we'll assume it returns an empty list if the query is blank
    if not user_query:
        return []
    return ["query 1", "query 2"] # Placeholder


def perform_iterative_research(user_query: str, new_search_queries: list, all_search_queries: list, iteration_limit: int):
    pass


def generate_final_report(user_query: str, contexts: list):
    pass


def research_main():
    user_query = input("Enter your research query/topic: ").strip()
    iteration_limit = input("Max number of iterations (default 10): ").strip()
    iteration_limit = int(iteration_limit) if iteration_limit.isdigit() else 10

    # 1. Add a call to clear_visited_pages() to reset previously visited web pages
    print("Clearing visited pages...")
    clear_visited_pages()
    
    new_search_queries = generate_initial_search_queries(user_query)
    
    # 2. Add a check to see if new_search_queries is empty, and if so, return from the function
    if not new_search_queries:
        print("Error: No initial search queries were generated. Aborting research process.")
        return # Exit the function gracefully

    # If queries were generated, proceed with the workflow
    all_search_queries = new_search_queries.copy()
    
    print(f"Generated {len(new_search_queries)} initial queries. Starting research loop...")
    
    aggregated_contexts = perform_iterative_research(user_query, new_search_queries, all_search_queries, iteration_limit)
    generate_final_report(user_query, aggregated_contexts)
    
    print("DeepResearcher workflow finished.")


if __name__ == "__main__":
    research_main()
```

### Explanation of Changes:

1.  **`clear_visited_pages()` Call:** It's placed right before the main research steps begin. This ensures that every time a new research job starts, it's not biased by or limited by data from a previous run.
2.  **Empty Query Check:** The code uses `if not new_search_queries:` which is the Pythonic way to check if a list is empty. If it is, a message is printed to the user, and the `return` statement immediately **stops the execution** of `research_main()`, preventing the program from crashing on subsequent lines that rely on having valid queries (like `new_search_queries.copy()`).