# Unit 2

## Creating the Web Searcher Module

Tentu, ini terjemahan teks ke dalam Bahasa Inggris dalam format Markdown:

# Creating a Web Searcher Module: Combining Searching, Fetching, and Markdown Conversion

Welcome back\! In the previous lesson, you learned how to use the **DDGS** library to search the web, fetch a single web page, and convert its content from HTML to Markdown. These are essential skills for building an automated research tool.

In this lesson, we will take the next step by creating a **Web Searcher module**. This module will allow you to search for a topic, fetch the top web pages, and convert their content to Markdownâ€”all in one go. This is a key part of building a tool that can gather and process information from the web automatically.

By the end of this lesson, you will know how to combine searching, fetching, and converting web content into a single, reusable function. This will make your code cleaner and more powerful, and prepare you for more advanced automation tasks later in the course.

-----

## Main Tools Used

Before we dive in, letâ€™s quickly remind ourselves of the main tools we have used so far:

  * **DDGS**: This library lets us perform web searches in Python and get results as structured data.
  * **httpx**: This is a library for making HTTP requests, which we use to fetch web pages.
  * **html\_to\_markdown**: This tool converts HTML content into Markdown, making it easier to read and process.

You have already seen how to use each of these tools separately. Now, we will see how to use them together to automate the process of searching for and retrieving useful web content.

-----

## Building the `search_and_fetch_markdown` Function

Now, letâ€™s put everything we learned together into a single function. We want a function that:

1.  Takes a search query.
2.  Searches the web for the top results.
3.  Fetches each result's web page.
4.  Converts the HTML to Markdown.
5.  Returns a list of dictionaries, each with the title, URL, and Markdown content.

Letâ€™s build this step by step.

### Step 1: Define the Function and Set Up the Search

```python
import httpx
from ddgs import DDGS
from html_to_markdown import convert_to_markdown

def search_and_fetch_markdown(query, max_results=5, timeout=10):
    ddgs = DDGS(timeout=timeout)
    results = ddgs.text(query, max_results=max_results)
    markdown_pages = []
```

  * We import the needed libraries.
  * The function takes a `query`, and optional `max_results` and `timeout`.
  * We create a `DDGS` object and perform the search.
  * We prepare an empty list to store the results.

### Step 2: Loop Through Results and Fetch Content

```python
    for result in results:
        url = result.get("href")
        title = result.get("title", "")
        
        if not url:
            continue
            
        try:
            response = httpx.get(url, timeout=timeout, follow_redirects=True)
            response.raise_for_status()
            
            markdown = convert_to_markdown(response.text)
            
            markdown_pages.append({
                "title": title,
                "url": url,
                "markdown": markdown
            })
            
        except Exception as e:
            # Optional: Print or log the error for debugging
            print(f"Error fetching page: {url}. Error: {e}")
            
    return markdown_pages
```

  * For each result, we get the **URL** (`href`) and **title**.
  * If the URL is missing, we skip it (`continue`).
  * We use a `try` block to handle errors. If fetching or converting fails, we continue to the next result.
  * If successful, we convert the HTML to Markdown and add the result to our list.
  * At the end, the function returns a list of dictionaries.

-----

## Example Output

If you call the function like this:

```python
pages = search_and_fetch_markdown("python web scraping", max_results=2)

for page in pages:
    print(f"Title: {page['title']}")
    print(f"URL: {page['url']}")
    print(f"Markdown (first 100 chars):\n{page['markdown'][:100]}")
    print("-" * 40)
```

You might see output like:

```
Title: Web Scraping with Python - Real Python
URL: https://realpython.com/python-web-scraping/
Markdown (first 100 chars):
# Web Scraping With Python: Collecting Data From the Modern WebWeb scraping is the process of program...
----------------------------------------
Title: Python Web Scraping Tutorial - GeeksforGeeks
URL: https://www.geeksforgeeks.org/python-web-scraping-tutorial/
Markdown (first 100 chars):
# Python Web Scraping TutorialWeb scraping is a technique to extract data from websites. In this tutoria...
----------------------------------------
```

This shows that your function is working: it searches, fetches, and converts web pages to Markdown.

-----

## Summary and What's Next

In this lesson, you learned how to build a **Web Searcher module** that can:

  * Search the web for a topic.
  * Fetch the top web pages from the results.
  * Convert each page's HTML content to Markdown.
  * Return all this information in a structured way.

This function is a powerful building block for your automated research tool. In the next practice exercises, you will get hands-on experience using and modifying this function. You will practice searching for different topics, handling errors, and working with the Markdown output.

Great job making it this farâ€”let's move on to the practice and solidify your skills\! ðŸ’ª

## Building the Web Searcher Function

Now that you've learned about the individual components of web searching, it's time to bring everything together! In this exercise, you'll build a function called search_and_fetch_markdown that automates the process of searching the web and retrieving readable content.

Your task:

Write a function search_and_fetch_markdown(query: str) that:
Uses the DDGS library to perform a web search for the given query.
Gets the URL of the top search result.
Fetches the HTML content of that URL using fetch_web_page.
Converts the HTML content to Markdown using html_to_markdown_converter.
Returns the resulting Markdown string.
If any step fails (e.g., no results, fetch error), return a string describing the error.

Follow the TODO comments in the starter code to implement each part of the function. When you're done, you'll have created a powerful tool that can gather information from multiple web sources with just one function call â€” an essential building block for your automated research toolkit!

```python
import httpx
from ddgs import DDGS
from html_to_markdown import convert_to_markdown


def fetch_web_page(url):
    """
    Fetch the content of a web page using httpx.
    
    Args:
        url (str): The URL to fetch
        timeout (int): Maximum time to wait for response in seconds
        
    Returns:
        str: The HTML content of the page or an error message
    """
    try:
        response = httpx.get(url)
        response.raise_for_status()  # Raise an exception for 4XX/5XX responses
        return response.text
    except Exception as e:
        return f"Error fetching URL {url}: {e}"


def html_to_markdown_converter(html_content, max_length=500):
    """
    Convert HTML content to Markdown format and trim to specified length.
    
    Args:
        html_content (str): The HTML content to convert
        max_length (int): Maximum length of the returned Markdown (default: 500)
        
    Returns:
        str: The converted and trimmed Markdown content
    """
    # Convert HTML to Markdown
    markdown_content = convert_to_markdown(html_content)
    
    # Trim the Markdown content to the specified length
    trimmed_markdown = markdown_content.strip()[:max_length]
    
    return trimmed_markdown


def search_and_fetch_markdown(query: str):
    """
    Search the web for the query, fetch the top result, and return its Markdown content.
    """
    # TODO: Create a DDGS object
    
    # TODO: Perform a search and get the top result
    
    # TODO: If there are no results, return an error message
    
    # TODO: Get the URL from the top result
    
    # TODO: Fetch the HTML content using fetch_web_page
    
    # TODO: Convert the HTML to Markdown using html_to_markdown_converter
    
    # TODO: Return the Markdown string
```

Ini adalah implementasi dari fungsi `search_and_fetch_markdown` yang menggabungkan pencarian, pengambilan, dan konversi konten ke Markdown, berdasarkan *starter code* yang Anda berikan.

```python
import httpx
from ddgs import DDGS
# Import the custom function from the previous context, assuming it exists
from html_to_markdown import convert_to_markdown


def fetch_web_page(url):
    """
    Fetch the content of a web page using httpx.
    
    Args:
        url (str): The URL to fetch
        
    Returns:
        str: The HTML content of the page or an error message prefixed with "Error"
    """
    try:
        # A reasonable timeout is highly recommended in real-world use
        response = httpx.get(url, timeout=10, follow_redirects=True) 
        response.raise_for_status()  # Raise an exception for 4XX/5XX responses
        return response.text
    except Exception as e:
        return f"Error fetching URL {url}: {e}"


def html_to_markdown_converter(html_content, max_length=500):
    """
    Convert HTML content to Markdown format and trim to specified length.
    
    Args:
        html_content (str): The HTML content to convert
        max_length (int): Maximum length of the returned Markdown (default: 500)
        
    Returns:
        str: The converted and trimmed Markdown content
    """
    # Convert HTML to Markdown
    markdown_content = convert_to_markdown(html_content)
    
    # Trim the Markdown content to the specified length
    trimmed_markdown = markdown_content.strip()[:max_length]
    
    return trimmed_markdown


def search_and_fetch_markdown(query: str):
    """
    Search the web for the query, fetch the top result, and return its Markdown content.
    
    Args:
        query (str): The search query to use.
        
    Returns:
        str: The resulting Markdown string of the top page, or an error message.
    """
    try:
        # TODO: Create a DDGS object
        ddgs = DDGS()
        
        # TODO: Perform a search and get the top result (using max_results=1)
        # We use .text() which yields results as they are found, so we convert it to a list
        results = list(ddgs.text(query, max_results=1))
        
        # TODO: If there are no results, return an error message
        if not results:
            return f"Error: No search results found for query: '{query}'"
            
        # Get the first (and only) result dictionary
        top_result = results[0]
        
        # TODO: Get the URL from the top result
        url = top_result.get("href")
        
        if not url:
            return "Error: Top search result did not contain a valid URL."
            
        # TODO: Fetch the HTML content using fetch_web_page
        html_content = fetch_web_page(url)
        
        # Check if fetch_web_page returned an error
        if html_content.startswith("Error"):
            # The error message from fetch_web_page is the string we need to return
            return html_content
            
        # TODO: Convert the HTML to Markdown using html_to_markdown_converter
        markdown_content = html_to_markdown_converter(html_content)
        
        # TODO: Return the Markdown string
        # Optionally, include the title and URL for context
        title = top_result.get("title", "No Title")
        
        return (f"# Top Result for '{query}': {title}\n"
                f"URL: {url}\n\n"
                f"---\n\n"
                f"{markdown_content}...") # Ellipsis to indicate content was trimmed
                
    except Exception as e:
        # Catch any unexpected errors during the overall process (e.g., DDGS initialization)
        return f"Error during search_and_fetch process: {e}"

# --- Contoh Penggunaan (Opsional) ---

# print(search_and_fetch_markdown("Linux compress pdf in terminal"))
# print("\n" + "="*50 + "\n")
# print(search_and_fetch_markdown("asdfghjkl random string that won't exist"))
```

## Enhancing Web Searcher for Multiple Results

Excellent work on building your first web searcher function! Now it's time to take it to the next level by enhancing it to handle multiple search results instead of just one.

Your current function returns only the Markdown content from the top search result, but a truly useful research tool should gather information from multiple sources. In this exercise, you'll expand the function's capabilities.

Your task is to:

Update the search to retrieve the top 3 results.
Create a list to store information about each page.
Loop through all search results instead of just processing the first one.
For each result, fetch the web page and convert it to Markdown.
Handle errors for individual pages without stopping the entire process.
Return the list of Markdown strings.
Follow the TODO comments in the starter code to implement these changes step by step.

When you're done, you'll have a powerful tool that can gather information from multiple web sources with a single function call â€” a key component of any automated research system!

```python
import httpx
from ddgs import DDGS
from html_to_markdown import convert_to_markdown


def fetch_web_page(url):
    """
    Fetch the content of a web page using httpx.
    
    Args:
        url (str): The URL to fetch
        
    Returns:
        str: The HTML content of the page or an error message
    """
    try:
        response = httpx.get(url)
        response.raise_for_status()
        return response.text
    except Exception as e:
        return f"Error fetching URL {url}: {e}"


def html_to_markdown_converter(html_content, max_length=500):
    """
    Convert HTML content to Markdown format and trim to specified length.
    
    Args:
        html_content (str): The HTML content to convert
        max_length (int): Maximum length of the returned Markdown (default: 500)
        
    Returns:
        str: The converted and trimmed Markdown content
    """
    markdown_content = convert_to_markdown(html_content)
    trimmed_markdown = markdown_content.strip()[:max_length]
    return trimmed_markdown


def search_and_fetch_markdown(query: str):
    """
    Search the web for the query, fetch the top 3 results, and return their Markdown content as a list of strings.
    
    Args:
        query (str): The search query
        
    Returns:
        list: A list of Markdown strings, one for each result
    """
    try:
        # TODO: Search for the top 3 results
        ddgs = DDGS()
        results = ddgs.text(query, max_results=1)
        
        # TODO: Create an empty list to store the markdown pages
        
        # TODO: Replace this single-result processing with a loop through all results
        url = results[0].get("href")
        title = results[0].get("title", "")
        if not url:
            return "Top search result does not have a URL."
            
        html_content = fetch_web_page(url)
        markdown = html_to_markdown_converter(html_content)
        
        return [markdown]
    except Exception as e:
        return [f"Error: {e}"]

```

## Adding a Parameter to control Multiple Results

Excellent work on building the web searcher function! Now it's time to take it to the next level by enhancing it to handle multiple search results with a parameter.

Your current function returns only the Markdown content from the top 3 search results, but a truly useful research tool should be flexible and handle different parameters for multiple sources. In this exercise, you'll expand the function's capabilities.

Your task is to:

Add a new parameter to the function called max_results with a default of 5
Use this new parameter in the function
Update the docstring of the function to reflect the changes
Follow the TODO comments in the starter code to implement these changes.

When you're done, you'll have a much more powerful research tool that can gather information from multiple sources with just one function call!

```python
import httpx
from ddgs import DDGS
from html_to_markdown import convert_to_markdown


def fetch_web_page(url):
    """
    Fetch the content of a web page using httpx.
    
    Args:
        url (str): The URL to fetch
        
    Returns:
        str: The HTML content of the page or an error message
    """
    try:
        response = httpx.get(url)
        response.raise_for_status()
        return response.text
    except Exception as e:
        return f"Error fetching URL {url}: {e}"


def html_to_markdown_converter(html_content, max_length=500):
    """
    Convert HTML content to Markdown format and trim to specified length.
    
    Args:
        html_content (str): The HTML content to convert
        max_length (int): Maximum length of the returned Markdown (default: 500)
        
    Returns:
        str: The converted and trimmed Markdown content
    """
    markdown_content = convert_to_markdown(html_content)
    trimmed_markdown = markdown_content.strip()[:max_length]
    return trimmed_markdown


# TODO: Add a max_results parameter to the declaration of the function
def search_and_fetch_markdown(query: str):
    """
    Search the web for the query, fetch the top 3 results, and return their Markdown content.
    
    Args:
        query (str): The search query
        
    Returns:
        list: A list of Markdown strings, one for each result
    """
    # TODO: Update the docstring of the function
    try:
        ddgs = DDGS()
        # TODO: Use the new max_results parameter in the DDGS search
        results = ddgs.text(query, max_results=3)
        
        markdown_pages = []
        
        for result in results:
            url = result.get("href")
            if not url:
                continue
            try:
                response = httpx.get(url, follow_redirects=True)
                response.raise_for_status()
                markdown = html_to_markdown_converter(response.text)
                markdown_pages.append(markdown)
            except Exception as e:
                markdown_pages.append(f"**Error fetching {url}:** {e}")
        return markdown_pages
    except Exception as e:
        return [f"Error: {e}"]
```

Tentu, berikut adalah fungsi `search_and_fetch_markdown` yang diperbarui dengan parameter `max_results` dan *docstring* yang disesuaikan.

```python
import httpx
from ddgs import DDGS
from html_to_markdown import convert_to_markdown


def fetch_web_page(url):
    """
    Fetch the content of a web page using httpx.
    
    Args:
        url (str): The URL to fetch
        
    Returns:
        str: The HTML content of the page or an error message
    """
    try:
        # Added a timeout for better robustness
        response = httpx.get(url, timeout=10) 
        response.raise_for_status()
        return response.text
    except Exception as e:
        return f"Error fetching URL {url}: {e}"


def html_to_markdown_converter(html_content, max_length=500):
    """
    Convert HTML content to Markdown format and trim to specified length.
    
    Args:
        html_content (str): The HTML content to convert
        max_length (int): Maximum length of the returned Markdown (default: 500)
        
    Returns:
        str: The converted and trimmed Markdown content
    """
    markdown_content = convert_to_markdown(html_content)
    trimmed_markdown = markdown_content.strip()[:max_length]
    return trimmed_markdown


# TODO: Add a max_results parameter to the declaration of the function
def search_and_fetch_markdown(query: str, max_results: int = 5):
    """
    Search the web for the query, fetch the top results (up to max_results), 
    and return their Markdown content.
    
    Args:
        query (str): The search query.
        max_results (int, optional): The maximum number of search results to fetch. 
                                     Defaults to 5.
        
    Returns:
        list: A list of Markdown strings (or error messages for failed fetches), 
              one for each result attempted.
    """
    # TODO: Update the docstring of the function (Done above)
    try:
        ddgs = DDGS()
        
        # TODO: Use the new max_results parameter in the DDGS search
        # Convert the generator to a list to ensure we only get up to max_results
        results = list(ddgs.text(query, max_results=max_results))
        
        markdown_pages = []
        
        # Check if any results were returned
        if not results:
            return [f"Notice: No search results found for query: '{query}'"]
            
        for result in results:
            url = result.get("href")
            title = result.get("title", "No Title")
            
            if not url or not url.startswith('http'):
                markdown_pages.append(f"Notice: Skipped result '{title}' due to invalid URL.")
                continue
                
            try:
                # Use a specific timeout for the request
                response = httpx.get(url, follow_redirects=True, timeout=10) 
                response.raise_for_status()
                
                markdown = html_to_markdown_converter(response.text)
                
                # Add context (Title and URL) to the Markdown output
                formatted_output = (
                    f"## Source: {title}\n"
                    f"**URL:** {url}\n\n"
                    f"---\n\n"
                    f"{markdown}..."
                )
                markdown_pages.append(formatted_output)
                
            except Exception as e:
                # Handle errors for individual page fetches gracefully
                markdown_pages.append(f"**Error fetching {url} ('{title}'):** {e}")
                
        return markdown_pages
        
    except Exception as e:
        # Catch critical errors during the DDGS search itself
        return [f"CRITICAL ERROR during DDGS search for '{query}': {e}"]
```

## Adding Timeouts for Web Requests

Now that you've enhanced your web searcher to handle multiple results, let's make it even more robust by adding timeout control! When working with web requests, timeouts are crucial â€” they prevent your program from hanging indefinitely when a website is slow to respond.

In this exercise, you'll add timeout functionality to your web searcher module. This will give users control over how long the function waits for responses, making it more flexible for different network conditions.

Your task is to:

Update the function signature to include a timeout parameter with a default value of 10 seconds.
Pass this timeout parameter to the DDGS constructor when creating the search object.
Use the same timeout parameter in the httpx.get() call when fetching web pages.
Update the function's docstring to document the new parameter.
Extend the code in app.py to test this functionality
Follow the TODO comments in the starter code to implement these changes. When you're done, you'll have a more reliable web searcher that can adapt to different network speeds and connection qualities â€” an important feature for any tool that interacts with the web!

Suggestions
Give me an example
Hint, please


```python
import httpx
from ddgs import DDGS
from html_to_markdown import convert_to_markdown


def fetch_web_page(url):
    """
    Fetch the content of a web page using httpx.
    
    Args:
        url (str): The URL to fetch
        
    Returns:
        str: The HTML content of the page or an error message
    """
    try:
        response = httpx.get(url)
        response.raise_for_status()
        return response.text
    except Exception as e:
        return f"Error fetching URL {url}: {e}"


def html_to_markdown_converter(html_content, max_length=500):
    """
    Convert HTML content to Markdown format and trim to specified length.
    
    Args:
        html_content (str): The HTML content to convert
        max_length (int): Maximum length of the returned Markdown (default: 500)
        
    Returns:
        str: The converted and trimmed Markdown content
    """
    markdown_content = convert_to_markdown(html_content)
    trimmed_markdown = markdown_content.strip()[:max_length]
    return trimmed_markdown


# TODO: Update the function signature to include a timeout parameter with default value of 10
def search_and_fetch_markdown(query: str,  max_results: int = 5):
    """
    Search the web for the query, fetch multiple results, and return their Markdown content.
    
    Args:
        query (str): The search query
        max_results (int): Maximum number of results to fetch (default: 5)
        
    Returns:
        list: A list of Markdown strings, one for each result
    """
    # TODO: Update the docstring to reflect the changes
    
    try:
        # TODO: Pass the timeout parameter to the DDGS constructor
        ddgs = DDGS()
        results = ddgs.text(query, max_results=max_results)
        
        markdown_pages = []
        
        for result in results:
            url = result.get("href")
            if not url:
                continue
            try:
                # TODO: Pass the timeout parameter to the fetch_web_page function
                response = fetch_web_page(url)
                markdown = html_to_markdown_converter(response.text)
                markdown_pages.append(markdown)
            except Exception as e:
                markdown_pages.append(f"**Error fetching {url}:** {e}")
        return markdown_pages
    except Exception as e:
        return [f"Error: {e}"]
```

That's an excellent step toward making your tool robust\! Using timeouts is essential for reliable web scraping.

Here is the implementation that incorporates the **`timeout`** parameter into the function signature, the **`DDGS`** constructor, and the **`httpx.get()`** call within `fetch_web_page`.

-----

## Updated Web Searcher Module with Timeouts

### Hint

You need to modify **three** places:

1.  The function signature of **`search_and_fetch_markdown`** to accept `timeout: int = 10`.
2.  The function signature of **`fetch_web_page`** to accept the `timeout` and pass it to **`httpx.get(url, timeout=timeout)`**.
3.  The instantiation of **`DDGS()`** to include `DDGS(timeout=timeout)`.

Since your `search_and_fetch_markdown` function already calls `fetch_web_page`, you'll need to update the definition and call of `fetch_web_page` as well.

### Implementation

Here is the complete, updated code:

```python
import httpx
from ddgs import DDGS
from html_to_markdown import convert_to_markdown


# Updated fetch_web_page to accept and use the timeout parameter
def fetch_web_page(url, timeout: int = 10):
    """
    Fetch the content of a web page using httpx.
    
    Args:
        url (str): The URL to fetch
        timeout (int): Maximum time to wait for the response in seconds (default: 10).
        
    Returns:
        str: The HTML content of the page or an error message
    """
    try:
        # Pass the timeout parameter to httpx.get()
        response = httpx.get(url, timeout=timeout, follow_redirects=True) 
        response.raise_for_status()
        return response.text
    except Exception as e:
        return f"Error fetching URL {url}: {e}"


def html_to_markdown_converter(html_content, max_length=500):
    """
    Convert HTML content to Markdown format and trim to specified length.
    
    Args:
        html_content (str): The HTML content to convert
        max_length (int): Maximum length of the returned Markdown (default: 500)
        
    Returns:
        str: The converted and trimmed Markdown content
    """
    markdown_content = convert_to_markdown(html_content)
    trimmed_markdown = markdown_content.strip()[:max_length]
    return trimmed_markdown


# TODO: Update the function signature to include a timeout parameter with default value of 10
def search_and_fetch_markdown(query: str, max_results: int = 5, timeout: int = 10):
    """
    Search the web for the query, fetch multiple results, and return their Markdown content.
    
    Args:
        query (str): The search query.
        max_results (int): Maximum number of results to fetch (default: 5).
        timeout (int): Maximum seconds to wait for connection/response (default: 10).
        
    Returns:
        list: A list of Markdown strings (or error messages for failed fetches), 
              one for each result attempted.
    """
    # TODO: Update the docstring to reflect the changes (Done above)
    
    try:
        # TODO: Pass the timeout parameter to the DDGS constructor
        ddgs = DDGS(timeout=timeout)
        results = ddgs.text(query, max_results=max_results)
        
        markdown_pages = []
        
        for result in results:
            url = result.get("href")
            title = result.get("title", "No Title")
            
            if not url or not url.startswith('http'):
                markdown_pages.append(f"Notice: Skipped result '{title}' due to invalid URL.")
                continue
            
            try:
                # TODO: Pass the timeout parameter to the fetch_web_page function
                # Note: We are using the updated fetch_web_page that now accepts 'timeout'
                html_content = fetch_web_page(url, timeout=timeout) 
                
                # Check if fetch_web_page returned an error string
                if html_content.startswith("Error fetching URL"):
                    markdown_pages.append(f"**Error fetching {url} ('{title}'):** Timeout or other error.")
                    continue
                    
                markdown = html_to_markdown_converter(html_content)
                
                formatted_output = (
                    f"## Source: {title}\n"
                    f"**URL:** {url}\n\n"
                    f"---\n\n"
                    f"{markdown}..."
                )
                markdown_pages.append(formatted_output)
                
            except Exception as e:
                # This catch block is mostly for errors inside the loop body, but fetch_web_page handles most of them now
                markdown_pages.append(f"**Error processing {url} ('{title}'):** {e}")
                
        return markdown_pages
        
    except Exception as e:
        # Catch critical errors during the DDGS search itself
        return [f"CRITICAL ERROR during DDGS search for '{query}': {e}"]
```

-----

## Example Usage

Here is an example of how you could test the new timeout functionality:

```python
# --- app.py (Testing/Example) ---

# Example 1: Standard usage (uses default timeout=10)
# results_standard = search_and_fetch_markdown("python asyncio tutorial", max_results=2)
# print("Standard Results:")
# for r in results_standard:
#     print(r + "\n" + "="*40)

# Example 2: Short timeout (more likely to fail if the site is slow)
# results_short_timeout = search_and_fetch_markdown("slow website example", max_results=1, timeout=1) 
# print("\nShort Timeout Results (Likely to show a timeout error):")
# for r in results_short_timeout:
#     print(r + "\n" + "="*40)

# Example 3: Long timeout (less likely to fail, good for poor connections)
# results_long_timeout = search_and_fetch_markdown("data science trends 2025", max_results=2, timeout=30)
# print("\nLong Timeout Results:")
# for r in results_long_timeout:
#     print(r + "\n" + "="*40)
```

The key changes were passing the `timeout` value down from the main function to both the `DDGS` initialization and the helper function `fetch_web_page`. This ensures consistent control over wait times throughout your web interactions.

## Structuring Search Results for Better Context

You've made excellent progress with your web searcher! Now that you can fetch multiple results, let's make your search results even more useful by improving how they're structured.

Currently, your function returns just the Markdown content as a string, but this doesn't tell you anything about where the information came from. In this exercise, you'll enhance the function to return more complete information about each search result.

Your task is to modify the search_and_fetch_markdown function to return a list of dictionaries instead of just Markdown strings. Each dictionary should contain:

The title of the webpage
The URL of the webpage
The Markdown content from that page
This structured format will make your search results much more valuable because you'll know the source of each piece of information. You'll also need to update the error handling to maintain this structure even when errors occur.

Follow the TODO comments in the starter code to implement these changes. When you're done, you'll have a more powerful research tool that provides well-organized data, ready for further processing or display!

````python
import httpx
from ddgs import DDGS
from html_to_markdown import convert_to_markdown


def fetch_web_page(url, timeout: int = 10):
    """
    Fetch the content of a web page using httpx.
    
    Args:
        url (str): The URL to fetch
        timeout (int): Maximum time to wait for response in seconds (default: 10)
        
    Returns:
        str: The HTML content of the page or an error message
    """
    try:
        response = httpx.get(url, timeout=timeout)
        response.raise_for_status()
        return response.text
    except Exception as e:
        return f"Error fetching URL {url}: {e}"


def html_to_markdown_converter(html_content, max_length=500):
    """
    Convert HTML content to Markdown format and trim to specified length.
    
    Args:
        html_content (str): The HTML content to convert
        max_length (int): Maximum length of the returned Markdown (default: 500)
        
    Returns:
        str: The converted and trimmed Markdown content
    """
    markdown_content = convert_to_markdown(html_content)
    trimmed_markdown = markdown_content.strip()[:max_length]
    return trimmed_markdown


def search_and_fetch_markdown(query: str, max_results: int = 5, timeout: int = 10):
    """
    Search the web for the query, fetch multiple results, and return their Markdown content.
    
    Args:
        query (str): The search query
        max_results (int): Maximum number of results to fetch (default: 5)
        timeout (int): Maximum time to wait for response in seconds (default: 10)
        
    Returns:
        list: A list of Markdown strings, one for each result
    """
    # TODO: Update the docstring to indicate the function now returns a list of dictionaries
    
    try:
        ddgs = DDGS(timeout=timeout)
        results = ddgs.text(query, max_results=max_results)
        
        markdown_pages = []
        
        for result in results:
            url = result.get("href")
            if not url:
                continue
            try:
                response = fetch_web_page(url, timeout=timeout)
                markdown = html_to_markdown_converter(response.text)
                markdown_pages.append(markdown)
                # TODO: Instead of returning the markdown directly, append a dictionary with title, url, and markdown to the list
            except Exception as e:
                markdown_pages.append(f"**Error fetching {url}:** {e}")
        return markdown_pages
    except Exception as e:
        return [f"Error: {e}"]

```

That's a smart move\! Returning structured data (dictionaries) is far more useful for downstream processing than simple strings.

Here is the updated `search_and_fetch_markdown` function, where each result is returned as a dictionary containing the **title**, **URL**, and **Markdown content**.

```python
import httpx
from ddgs import DDGS
from html_to_markdown import convert_to_markdown


def fetch_web_page(url, timeout: int = 10):
    """
    Fetch the content of a web page using httpx.
    
    Args:
        url (str): The URL to fetch
        timeout (int): Maximum time to wait for response in seconds (default: 10)
        
    Returns:
        str: The HTML content of the page or an error message (prefixed with 'Error')
    """
    try:
        response = httpx.get(url, timeout=timeout, follow_redirects=True)
        response.raise_for_status()
        return response.text
    except Exception as e:
        return f"Error fetching URL {url}: {e}"


def html_to_markdown_converter(html_content, max_length=500):
    """
    Convert HTML content to Markdown format and trim to specified length.
    
    Args:
        html_content (str): The HTML content to convert
        max_length (int): Maximum length of the returned Markdown (default: 500)
        
    Returns:
        str: The converted and trimmed Markdown content
    """
    markdown_content = convert_to_markdown(html_content)
    trimmed_markdown = markdown_content.strip()[:max_length]
    return trimmed_markdown


def search_and_fetch_markdown(query: str, max_results: int = 5, timeout: int = 10):
    """
    Search the web for the query, fetch multiple results, and return their structured content.
    
    Args:
        query (str): The search query.
        max_results (int): Maximum number of results to fetch (default: 5).
        timeout (int): Maximum time to wait for response in seconds (default: 10).
        
    Returns:
        list: A list of dictionaries. Each dictionary contains 'title', 'url', and 
              either 'markdown' content or an 'error' message.
    """
    # TODO: Update the docstring to indicate the function now returns a list of dictionaries (Done above)
    
    try:
        ddgs = DDGS(timeout=timeout)
        # Convert generator to list to work with a fixed set of results
        results = list(ddgs.text(query, max_results=max_results))
        
        # This list will store the dictionaries
        markdown_pages = []
        
        for result in results:
            url = result.get("href")
            title = result.get("title", "No Title Found")
            
            # 1. Handle missing URL in search result
            if not url or not url.startswith('http'):
                markdown_pages.append({
                    "title": title,
                    "url": url if url else "N/A",
                    "markdown": None,
                    "error": "Skipped: Invalid or missing URL in search result."
                })
                continue

            try:
                # Use the updated fetch_web_page, which returns HTML or an error string
                html_content = fetch_web_page(url, timeout=timeout)
                
                # Check if fetch_web_page returned an error
                if html_content.startswith("Error fetching URL"):
                    # 2. Handle HTTP/Fetch Errors
                    markdown_pages.append({
                        "title": title,
                        "url": url,
                        "markdown": None,
                        "error": html_content
                    })
                    continue
                    
                # Process successful fetch
                markdown = html_to_markdown_converter(html_content)
                
                # TODO: Instead of returning the markdown directly, append a dictionary with title, url, and markdown to the list
                markdown_pages.append({
                    "title": title,
                    "url": url,
                    "markdown": markdown,
                    "error": None # Indicate success
                })
                
            except Exception as e:
                # 3. Handle Unexpected Errors during conversion or processing
                markdown_pages.append({
                    "title": title,
                    "url": url,
                    "markdown": None,
                    "error": f"Unexpected processing error: {e}"
                })
                
        return markdown_pages
        
    except Exception as e:
        # 4. Handle Critical Errors during DDGS search initialization or execution
        return [{
            "title": "N/A",
            "url": "N/A",
            "markdown": None,
            "error": f"CRITICAL SEARCH ERROR: {e}"
        }]
```

## Adding Robust Error Handling