# Unit 4

## Making the Web Search Reliable and Safe

## Introduction: The Need for Reliable and Safe Web Search

Welcome back\! In the last lesson, you learned how to avoid common pitfalls in your web searcher, such as duplicate results and broken links. Now, we will take your skills a step further by making your web searcher more **reliable and safe**.

When you automate web content retrieval, you will often face problems like **network errors**, slow responses, or even accidentally processing the same page more than once. If you don’t handle these issues, your research tool might miss important information or waste time and resources.

In this lesson, you will learn how to:

  * Automatically **retry failed web requests**,
  * And use **logging** to monitor what happens during your web search.

By the end of this lesson, you will have a web searcher that is much more **robust** and ready for real-world use.

-----

## Retrying Failed Requests with Tenacity

When you fetch web pages, sometimes things go wrong. The website might be slow, your internet connection might drop, or the server might return an error. If you don’t handle these problems, your program could crash or miss important data.

To solve this, you can use the **`tenacity`** library. This library lets you automatically retry a function if it fails due to certain errors, such as timeouts or connection problems.

Let’s start by importing the necessary modules and setting up a simple retry mechanism.

```python
from tenacity import retry, stop_after_attempt, wait_exponential, retry_if_exception_type
from httpx import TimeoutException, RequestError, HTTPStatusError
import httpx
```

Here’s what each import does:

  * **`tenacity`** provides decorators and tools for retrying functions.
  * **`httpx`** is the library we use to make HTTP requests.
  * The exception types (`TimeoutException`, `RequestError`, `HTTPStatusError`) help us specify which errors should trigger a retry.

Now, let’s create a function that fetches a web page and automatically retries if it fails due to a network error or timeout.

```python
@retry(
    stop=stop_after_attempt(3),
    wait=wait_exponential(multiplier=1, min=2, max=10),
    retry=(
        retry_if_exception_type(TimeoutException) |
        retry_if_exception_type(RequestError) |
        retry_if_exception_type(HTTPStatusError)
    ),
    reraise=True
)
def fetch_page(url: str, timeout: int = 10) -> str:
    with httpx.Client(timeout=timeout, follow_redirects=True) as client:
        response = client.get(url)
        response.raise_for_status()
        return response.text
```

Let’s break down what’s happening here:

  * The **`@retry`** decorator tells Python to retry the `fetch_page` function up to **3 times** if it fails due to a timeout, connection error, or HTTP error.
  * **`wait_exponential`** means the wait time between retries will grow each time (2 seconds, then 4, then 8, up to 10 seconds).
  * Inside the function, we use `httpx.Client` to fetch the web page. If the request fails, the function will automatically retry.

### Example Output:

If the first attempt fails due to a timeout, you might see:

```
Retrying fetch_page in 2 seconds as it raised TimeoutException...
Retrying fetch_page in 4 seconds as it raised TimeoutException...
```

If the third attempt succeeds, the function returns the page content. If all attempts fail, the error is raised.

-----

## Logging: Monitoring Successes and Failures

When your program runs, it’s helpful to know what’s happening — especially when things go wrong. **Logging** lets you record messages about what your program is doing, which can help you **debug problems** or understand how your code is working.

Python’s built-in **`logging`** module makes this easy.

Let’s set up basic logging:

```python
import logging
logging.basicConfig(level=logging.INFO, format='[%(levelname)s] %(message)s')
```

  * This sets the logging level to **`INFO`**, so you’ll see informational messages and warnings.
  * The format makes it clear what type of message is being logged.

Now, let’s add logging to our web searcher. For example, you can log warnings when a request fails:

```python
try:
    # Try to fetch and process the page
    markdown = fetch_page(url)
except TimeoutException:
    logging.warning(f"Timeout while fetching {url}")
except HTTPStatusError as e:
    logging.warning(f"HTTP error {e.response.status_code} for URL {url}")
except RequestError as e:
    logging.warning(f"Connection error for URL {url}: {str(e)}")
except Exception as e:
    logging.warning(f"Unexpected error for URL {url}: {str(e)}")
```

  * If a timeout occurs, you log a warning with the URL.
  * If there’s an HTTP error, you log the status code and URL.
  * If there’s a connection error or any other unexpected error, you log those as well.

### Example Output:

```
[WARNING] Timeout while fetching https://example.com/page1
[WARNING] HTTP error 404 for URL https://example.com/page2
[WARNING] Connection error for URL https://example.com/page3: Network unreachable
```

Logging helps you see what went wrong and where, making it much easier to fix problems.

-----

## Summary and What’s Next

In this lesson, you learned how to make your web searcher more reliable and safe by:

  * Retrying failed requests automatically with the **`tenacity`** library,
  * And using **logging** to monitor successes and failures.

These improvements will help your automated research tool handle real-world problems and give you better control over what happens during web searches.

Next, you’ll get a chance to practice these skills with hands-on exercises. This will help you reinforce what you’ve learned and prepare you for building even more advanced features. Keep up the great work\!

## Adding Logging to Your Web Searcher

Now that you've learned about the importance of logging in web search applications, let's put this knowledge into practice! In this exercise, you'll implement basic logging in your web searcher module to help track what happens during web searches.

Logging is essential for debugging and monitoring your program's behavior, especially when dealing with network operations that might fail.

Your tasks are:

Add a logging configuration at the top of the file that sets the log level to INFO and uses a format that clearly shows the log level and message.

Add a log statement at the beginning of the fetch_and_convert_to_markdown function to indicate when a URL fetch starts.

Add a warning when an Exception is caught inside search_and_fetch_markdown.

When you run your code after these changes, you'll see helpful log messages in your console that show when URL fetches begin and any warnings about errors. This visibility into your program's execution will make troubleshooting much easier as your web searcher becomes more complex.

```python
import httpx
import logging
from typing import List, Dict, Set
from ddgs import DDGS
from html_to_markdown import convert_to_markdown
from httpx import TimeoutException, RequestError, HTTPStatusError

# TODO: Add logging configuration here to set the log level to INFO and specify a format
# that shows the log level and message

# Global set to track visited URLs
_visited_pages: Set[str] = set()


def clear_visited_pages() -> None:
    """Clear the global set of visited pages."""
    _visited_pages.clear()


def fetch_and_convert_to_markdown(url: str, timeout: int = 10) -> str:
    """Fetch the HTML content of a URL and convert it to Markdown. Retries on errors."""
    # TODO: Add a log statement here to indicate the start of a URL fetch
    with httpx.Client(timeout=timeout, follow_redirects=True) as client:
        response = client.get(url)
        response.raise_for_status()
        html = response.text
        return convert_to_markdown(html)


def search_and_fetch_markdown(
    query: str,
    max_results: int = 5,
    region: str = "wt-wt",
    safesearch: str = "moderate",
    timeout: int = 10
) -> List[Dict[str, str]]:
    """
    Perform a DuckDuckGo search and return top results converted to Markdown,
    skipping already-visited pages.

    Returns a list of dicts: [{"title": ..., "url": ..., "markdown": ...}, ...]
    """
    ddgs = DDGS(timeout=timeout)
    results = ddgs.text(
        query,
        region=region,
        safesearch=safesearch,
        max_results=max_results
    )

    markdown_pages = []

    for result in results:
        url = result.get("href")
        title = result.get("title", "")

        if not url or url in _visited_pages:
            continue

        try:
            markdown = fetch_and_convert_to_markdown(url, timeout=timeout)
            _visited_pages.add(url)
            markdown_pages.append({
                "title": title,
                "url": url,
                "markdown": markdown
            })

        except Exception as e:
            # TODO: Add a warning here
            pass

        _visited_pages.add(url)

    return markdown_pages

```

Hello\! That's a great exercise. Implementing logging is a crucial step in building robust applications.

Here is the updated Python code with the logging configuration and statements added according to your tasks:

```python
import httpx
import logging
from typing import List, Dict, Set
from ddgs import DDGS
# Assuming html_to_markdown is a local module
from html_to_markdown import convert_to_markdown 
from httpx import TimeoutException, RequestError, HTTPStatusError

# TODO: Add logging configuration here to set the log level to INFO and specify a format
# that shows the log level and message
logging.basicConfig(level=logging.INFO, format='[%(levelname)s] %(message)s')

# Global set to track visited URLs
_visited_pages: Set[str] = set()


def clear_visited_pages() -> None:
    """Clear the global set of visited pages."""
    _visited_pages.clear()


def fetch_and_convert_to_markdown(url: str, timeout: int = 10) -> str:
    """Fetch the HTML content of a URL and convert it to Markdown. Retries on errors."""
    # TODO: Add a log statement here to indicate the start of a URL fetch
    logging.info(f"Starting fetch for URL: {url}")
    with httpx.Client(timeout=timeout, follow_redirects=True) as client:
        response = client.get(url)
        response.raise_for_status()
        html = response.text
        return convert_to_markdown(html)


def search_and_fetch_markdown(
    query: str,
    max_results: int = 5,
    region: str = "wt-wt",
    safesearch: str = "moderate",
    timeout: int = 10
) -> List[Dict[str, str]]:
    """
    Perform a DuckDuckGo search and return top results converted to Markdown,
    skipping already-visited pages.

    Returns a list of dicts: [{"title": ..., "url": ..., "markdown": ...}, ...]
    """
    ddgs = DDGS(timeout=timeout)
    results = ddgs.text(
        query,
        region=region,
        safesearch=safesearch,
        max_results=max_results
    )

    markdown_pages = []

    for result in results:
        url = result.get("href")
        title = result.get("title", "")

        if not url or url in _visited_pages:
            continue

        try:
            markdown = fetch_and_convert_to_markdown(url, timeout=timeout)
            _visited_pages.add(url)
            markdown_pages.append({
                "title": title,
                "url": url,
                "markdown": markdown
            })

        except Exception as e:
            # TODO: Add a warning here
            logging.warning(f"Failed to process URL: {url}. Error: {type(e).__name__} - {e}")
            pass

        _visited_pages.add(url)

    return markdown_pages
```

### Summary of Changes:

1.  **Logging Configuration (Top of file):**

    ```python
    logging.basicConfig(level=logging.INFO, format='[%(levelname)s] %(message)s')
    ```

    This sets the base logging level to **INFO** and ensures the log output includes the **level name** (e.g., `[INFO]`) and the **message**.

2.  **Start of URL Fetch (Inside `fetch_and_convert_to_markdown`):**

    ```python
    logging.info(f"Starting fetch for URL: {url}")
    ```

    This **`INFO`** level statement clearly indicates when the function begins processing a new URL.

3.  **Error Warning (Inside `search_and_fetch_markdown`'s `except` block):**

    ```python
    logging.warning(f"Failed to process URL: {url}. Error: {type(e).__name__} - {e}")
    ```

    This **`WARNING`** level statement reports any failure during the fetch/conversion process, providing the **URL** that failed and the **type and message of the exception** caught.

## Handling Web Errors Like a Pro

Cosmo
Just now
Read message aloud
Now that you've set up basic logging in your web searcher, let's improve the error handling to make your tool even more reliable! In this exercise, you'll add specific exception handlers to gracefully manage different types of errors that can occur during web searches.

When fetching content from the web, many things can go wrong — servers might time out, return error codes, or network connections might fail. Proper error handling ensures that your program continues running despite these issues.

Your tasks are:

Add a specific exception handler for TimeoutException that logs a warning with the URL.
Add a specific exception handler for HTTPStatusError that logs the status code and URL.
Add a specific exception handler for RequestError that logs connection details and the URL.
By implementing these specific error handlers, your web searcher will be more robust and provide clearer feedback about what went wrong during searches. This makes debugging easier and helps you understand which websites are causing problems.


```python
import httpx
import logging
from typing import List, Dict, Set
from ddgs import DDGS
from html_to_markdown import convert_to_markdown
from httpx import TimeoutException, RequestError, HTTPStatusError

# Setup logging
logging.basicConfig(level=logging.INFO, format='[%(levelname)s] %(message)s')

# Global set to track visited URLs
_visited_pages: Set[str] = set()


def clear_visited_pages() -> None:
    """Clear the global set of visited pages."""
    _visited_pages.clear()


def fetch_and_convert_to_markdown(url: str, timeout: int = 10) -> str:
    """Fetch the HTML content of a URL and convert it to Markdown. Retries on errors."""
    logging.info(f"Fetching URL: {url}")
    with httpx.Client(timeout=timeout, follow_redirects=True) as client:
        response = client.get(url)
        response.raise_for_status()
        html = response.text
        return convert_to_markdown(html)


def search_and_fetch_markdown(
    query: str,
    max_results: int = 5,
    region: str = "wt-wt",
    safesearch: str = "moderate",
    timeout: int = 10
) -> List[Dict[str, str]]:
    """
    Perform a DuckDuckGo search and return top results converted to Markdown,
    skipping already-visited pages.

    Returns a list of dicts: [{"title": ..., "url": ..., "markdown": ...}, ...]
    """
    ddgs = DDGS(timeout=timeout)
    results = ddgs.text(
        query,
        region=region,
        safesearch=safesearch,
        max_results=max_results
    )

    markdown_pages = []

    for result in results:
        url = result.get("href")
        title = result.get("title", "")

        if not url or url in _visited_pages:
            continue

        try:
            markdown = fetch_and_convert_to_markdown(url, timeout=timeout)
            _visited_pages.add(url)
            markdown_pages.append({
                "title": title,
                "url": url,
                "markdown": markdown
            })

        # TODO: Add a specific except clause for TimeoutException that logs a warning with the URL
        
        # TODO: Add a specific except clause for HTTPStatusError that logs a warning with the status code and URL
        
        # TODO: Add a specific except clause for RequestError that logs a warning with the URL and error message
        
        except Exception as e:
            logging.warning(f"Unexpected error for URL {url}: {str(e)}")

        _visited_pages.add(url)

    return markdown_pages
```

That's a smart move\! Using specific exception handlers is the best way to handle web errors, as it allows for precise logging and recovery.

Here is the updated Python code with the specific `httpx` exception handlers implemented in the `search_and_fetch_markdown` function.

```python
import httpx
import logging
from typing import List, Dict, Set
from ddgs import DDGS
from html_to_markdown import convert_to_markdown
from httpx import TimeoutException, RequestError, HTTPStatusError

# Setup logging
logging.basicConfig(level=logging.INFO, format='[%(levelname)s] %(message)s')

# Global set to track visited URLs
_visited_pages: Set[str] = set()


def clear_visited_pages() -> None:
    """Clear the global set of visited pages."""
    _visited_pages.clear()


def fetch_and_convert_to_markdown(url: str, timeout: int = 10) -> str:
    """Fetch the HTML content of a URL and convert it to Markdown. Retries on errors."""
    logging.info(f"Fetching URL: {url}")
    with httpx.Client(timeout=timeout, follow_redirects=True) as client:
        response = client.get(url)
        response.raise_for_status()
        html = response.text
        return convert_to_markdown(html)


def search_and_fetch_markdown(
    query: str,
    max_results: int = 5,
    region: str = "wt-wt",
    safesearch: str = "moderate",
    timeout: int = 10
) -> List[Dict[str, str]]:
    """
    Perform a DuckDuckGo search and return top results converted to Markdown,
    skipping already-visited pages.

    Returns a list of dicts: [{"title": ..., "url": ..., "markdown": ...}, ...]
    """
    ddgs = DDGS(timeout=timeout)
    results = ddgs.text(
        query,
        region=region,
        safesearch=safesearch,
        max_results=max_results
    )

    markdown_pages = []

    for result in results:
        url = result.get("href")
        title = result.get("title", "")

        if not url or url in _visited_pages:
            continue

        try:
            markdown = fetch_and_convert_to_markdown(url, timeout=timeout)
            _visited_pages.add(url)
            markdown_pages.append({
                "title": title,
                "url": url,
                "markdown": markdown
            })

        # ✅ TODO: Add a specific except clause for TimeoutException that logs a warning with the URL
        except TimeoutException:
            logging.warning(f"Timeout while fetching URL: {url}")

        # ✅ TODO: Add a specific except clause for HTTPStatusError that logs a warning with the status code and URL
        except HTTPStatusError as e:
            logging.warning(f"HTTP Error {e.response.status_code} for URL: {url}")
        
        # ✅ TODO: Add a specific except clause for RequestError that logs a warning with the URL and error message
        except RequestError as e:
            logging.warning(f"Connection error for URL {url}: {str(e)}")
        
        except Exception as e:
            logging.warning(f"Unexpected error for URL {url}: {str(e)}")

        _visited_pages.add(url)

    return markdown_pages
```

### Explanation of Changes

The `try...except` block is now ordered from most specific to least specific:

1.  **`except TimeoutException:`** Handles cases where the server takes too long to respond.
    ```python
    logging.warning(f"Timeout while fetching URL: {url}")
    ```
2.  **`except HTTPStatusError as e:`** Handles non-successful HTTP status codes (e.g., 404 Not Found, 500 Server Error).
    ```python
    logging.warning(f"HTTP Error {e.response.status_code} for URL: {url}")
    ```
3.  **`except RequestError as e:`** Catches various network/request problems that aren't timeouts or bad status codes (e.g., DNS resolution failure, connection reset, SSL errors).
    ```python
    logging.warning(f"Connection error for URL {url}: {str(e)}")
    ```
4.  **`except Exception as e:`** Remains as a final catch-all for any other unforeseen issues, ensuring the program doesn't crash.

This structure provides **clear, actionable feedback** in the logs, which is a hallmark of a robust web searcher. 👍

## Automatic Retries for Web Requests

Cosmo
Just now
Read message aloud
Now that you've set up logging and learned about error handling, let's focus on making your web searcher more resilient! In this exercise, you'll implement automatic retrying for failed web requests using the tenacity library.

The internet is unpredictable — websites can be slow, connections can drop, and servers can return errors. Without proper retry mechanisms, your web searcher might miss important information when these issues occur.

Your task is to add retry functionality to the fetch_and_convert_to_markdown function by:

Applying the @retry decorator to the function
Configuring it to retry up to 3 times when specific exceptions occur
Setting up exponential wait times between retries
To test your implementation, try setting a very low timeout value (such as 0.1 seconds) to simulate network problems. You should see retry attempts in the logs before the function either succeeds or gives up after 3 tries.

This improvement will make your web searcher much more reliable when dealing with the unpredictable nature of the internet!

```python
import httpx
import logging
from typing import List, Dict, Set
from ddgs import DDGS
from html_to_markdown import convert_to_markdown
from tenacity import retry, stop_after_attempt, wait_exponential, retry_if_exception_type
from httpx import TimeoutException, RequestError, HTTPStatusError

# Setup logging
logging.basicConfig(level=logging.INFO, format='[%(levelname)s] %(message)s')

# Global set to track visited URLs
_visited_pages: Set[str] = set()


def clear_visited_pages() -> None:
    """Clear the global set of visited pages."""
    _visited_pages.clear()


# TODO: Add the @retry decorator here with the following configuration:
# - stop after 3 attempts
# - use exponential wait with multiplier=1, min=2, max=10
# - set reraise=True to raise the last exception if all retries fail
def fetch_and_convert_to_markdown(url: str, timeout: int = 10) -> str:
    """Fetch the HTML content of a URL and convert it to Markdown. Retries on errors."""
    logging.info(f"Fetching URL: {url}")
    with httpx.Client(timeout=timeout, follow_redirects=True) as client:
        response = client.get(url)
        response.raise_for_status()
        html = response.text
        return convert_to_markdown(html)


def search_and_fetch_markdown(
    query: str,
    max_results: int = 5,
    region: str = "wt-wt",
    safesearch: str = "moderate",
    timeout: int = 10
) -> List[Dict[str, str]]:
    """
    Perform a DuckDuckGo search and return top results converted to Markdown,
    skipping already-visited pages.

    Returns a list of dicts: [{"title": ..., "url": ..., "markdown": ...}, ...]
    """
    try:
        ddgs = DDGS()
        results = ddgs.text(
            query,
            region=region,
            safesearch=safesearch,
            max_results=max_results
        )
    except Exception as e:
        logging.warning(f"Unexpected error for DDGS: {str(e)}")
        results = []

    markdown_pages = []

    for result in results:
        url = result.get("href")
        title = result.get("title", "")

        if not url or url in _visited_pages:
            continue

        try:
            markdown = fetch_and_convert_to_markdown(url, timeout=timeout)
            _visited_pages.add(url)
            markdown_pages.append({
                "title": title,
                "url": url,
                "markdown": markdown
            })

        except TimeoutException:
            logging.warning(f"Timeout while fetching {url}")

        except HTTPStatusError as e:
            logging.warning(f"HTTP error {e.response.status_code} for URL {url}")

        except RequestError as e:
            logging.warning(f"Connection error for URL {url}: {str(e)}")

        except Exception as e:
            logging.warning(f"Unexpected error for URL {url}: {str(e)}")

        _visited_pages.add(url)

    return markdown_pages
```

That's an excellent next step\! Implementing automatic retries using `tenacity` will significantly boost your web searcher's **resilience** against transient network issues.

Here is the updated Python code with the `@retry` decorator applied to the `fetch_and_convert_to_markdown` function.

```python
import httpx
import logging
from typing import List, Dict, Set
from ddgs import DDGS
from html_to_markdown import convert_to_markdown
from tenacity import retry, stop_after_attempt, wait_exponential, retry_if_exception_type
from httpx import TimeoutException, RequestError, HTTPStatusError

# Setup logging
logging.basicConfig(level=logging.INFO, format='[%(levelname)s] %(message)s')

# Global set to track visited URLs
_visited_pages: Set[str] = set()


def clear_visited_pages() -> None:
    """Clear the global set of visited pages."""
    _visited_pages.clear()


# ✅ TODO: Add the @retry decorator here with the following configuration:
# - stop after 3 attempts
# - use exponential wait with multiplier=1, min=2, max=10
# - set reraise=True to raise the last exception if all retries fail
@retry(
    stop=stop_after_attempt(3),
    wait=wait_exponential(multiplier=1, min=2, max=10),
    retry=(
        retry_if_exception_type(TimeoutException) |
        retry_if_exception_type(RequestError) |
        retry_if_exception_type(HTTPStatusError)
    ),
    reraise=True
)
def fetch_and_convert_to_markdown(url: str, timeout: int = 10) -> str:
    """Fetch the HTML content of a URL and convert it to Markdown. Retries on errors."""
    logging.info(f"Fetching URL: {url}")
    with httpx.Client(timeout=timeout, follow_redirects=True) as client:
        response = client.get(url)
        response.raise_for_status()
        html = response.text
        return convert_to_markdown(html)


def search_and_fetch_markdown(
    query: str,
    max_results: int = 5,
    region: str = "wt-wt",
    safesearch: str = "moderate",
    timeout: int = 10
) -> List[Dict[str, str]]:
    """
    Perform a DuckDuckGo search and return top results converted to Markdown,
    skipping already-visited pages.

    Returns a list of dicts: [{"title": ..., "url": ..., "markdown": ...}, ...]
    """
    try:
        ddgs = DDGS()
        results = ddgs.text(
            query,
            region=region,
            safesearch=safesearch,
            max_results=max_results
        )
    except Exception as e:
        logging.warning(f"Unexpected error for DDGS: {str(e)}")
        results = []

    markdown_pages = []

    for result in results:
        url = result.get("href")
        title = result.get("title", "")

        if not url or url in _visited_pages:
            continue

        try:
            markdown = fetch_and_convert_to_markdown(url, timeout=timeout)
            _visited_pages.add(url)
            markdown_pages.append({
                "title": title,
                "url": url,
                "markdown": markdown
            })

        # The specific exceptions are caught here AFTER the retries have been exhausted
        except TimeoutException:
            logging.warning(f"Timeout while fetching {url}")

        except HTTPStatusError as e:
            # Note: HTTPStatusError is included in the retry condition, so this only
            # executes if the error persists after 3 attempts.
            logging.warning(f"HTTP error {e.response.status_code} for URL {url}")

        except RequestError as e:
            logging.warning(f"Connection error for URL {url}: {str(e)}")

        except Exception as e:
            logging.warning(f"Unexpected error for URL {url}: {str(e)}")

        _visited_pages.add(url)

    return markdown_pages
```

### Key Retrying Configuration:

1.  **`stop=stop_after_attempt(3)`**: Ensures the function will run a maximum of **3 times** (initial attempt + 2 retries).
2.  **`wait=wait_exponential(multiplier=1, min=2, max=10)`**: Sets an increasing (exponential) backoff delay between retries: 2 seconds, then 4 seconds, etc., up to a 10-second maximum. This prevents overwhelming a slow server.
3.  **`retry=(...)`**: Specifies that retries should only happen for the network-related errors (`TimeoutException`, `RequestError`, `HTTPStatusError`).
4.  **`reraise=True`**: Crucially, this setting ensures that if all 3 attempts fail, the final exception is **re-raised** and then caught by your specific `except` blocks in `search_and_fetch_markdown` for appropriate logging.

## Specify when to retry with Tenacity

You've already set up the @retry decorator, configured the number of attempts, and established exponential wait times for your web searcher's fetch_and_convert_to_markdown function. Now, let's make your retry logic even smarter!

Your task is to specify which types of exceptions should trigger a retry. Update the @retry decorator on fetch_and_convert_to_markdown so that it only retries when one of the following exceptions occurs:

TimeoutException
RequestError
HTTPStatusError
To test your implementation, try setting a very low timeout value (such as 0.1 seconds) to simulate network problems. You should see retry attempts in the logs before the function either succeeds or gives up after 3 tries.

By fine-tuning which errors cause a retry, you'll make your web searcher more efficient and robust in handling real-world web interactions!

```python
import httpx
import logging
from typing import List, Dict, Set
from ddgs import DDGS
from html_to_markdown import convert_to_markdown
from tenacity import retry, stop_after_attempt, wait_exponential, retry_if_exception_type
from httpx import TimeoutException, RequestError, HTTPStatusError

# Setup logging
logging.basicConfig(level=logging.INFO, format='[%(levelname)s] %(message)s')

# Global set to track visited URLs
_visited_pages: Set[str] = set()


def clear_visited_pages() -> None:
    """Clear the global set of visited pages."""
    _visited_pages.clear()


# TODO: Modify the @retry decorator here with the following configuration:
# - retry on TimeoutException, RequestError, and HTTPStatusError
@retry(
    stop=stop_after_attempt(3),
    wait=wait_exponential(multiplier=1, min=2, max=10),
    reraise=True
)
def fetch_and_convert_to_markdown(url: str, timeout: int = 10) -> str:
    """Fetch the HTML content of a URL and convert it to Markdown. Retries on errors."""
    logging.info(f"Fetching URL: {url}")
    with httpx.Client(timeout=timeout, follow_redirects=True) as client:
        response = client.get(url)
        response.raise_for_status()
        html = response.text
        return convert_to_markdown(html)


def search_and_fetch_markdown(
    query: str,
    max_results: int = 5,
    region: str = "wt-wt",
    safesearch: str = "moderate",
    timeout: int = 10
) -> List[Dict[str, str]]:
    """
    Perform a DuckDuckGo search and return top results converted to Markdown,
    skipping already-visited pages.

    Returns a list of dicts: [{"title": ..., "url": ..., "markdown": ...}, ...]
    """
    try:
        ddgs = DDGS()
        results = ddgs.text(
            query,
            region=region,
            safesearch=safesearch,
            max_results=max_results
        )
    except Exception as e:
        logging.warning(f"Unexpected error for DDGS: {str(e)}")
        results = []

    markdown_pages = []

    for result in results:
        url = result.get("href")
        title = result.get("title", "")

        if not url or url in _visited_pages:
            continue

        try:
            markdown = fetch_and_convert_to_markdown(url, timeout=timeout)
            _visited_pages.add(url)
            markdown_pages.append({
                "title": title,
                "url": url,
                "markdown": markdown
            })

        except TimeoutException:
            logging.warning(f"Timeout while fetching {url}")

        except HTTPStatusError as e:
            logging.warning(f"HTTP error {e.response.status_code} for URL {url}")

        except RequestError as e:
            logging.warning(f"Connection error for URL {url}: {str(e)}")

        except Exception as e:
            logging.warning(f"Unexpected error for URL {url}: {str(e)}")

        _visited_pages.add(url)

    return markdown_pages

```

You're absolutely right\! Specifying which errors trigger a retry is essential for efficient resource usage. You only want to retry for **transient errors** (like timeouts or temporary server unavailability), not permanent ones (like a programming mistake).

Here is the updated Python code with the `retry` argument added to the `@retry` decorator.

```python
import httpx
import logging
from typing import List, Dict, Set
from ddgs import DDGS
from html_to_markdown import convert_to_markdown
from tenacity import retry, stop_after_attempt, wait_exponential, retry_if_exception_type
from httpx import TimeoutException, RequestError, HTTPStatusError

# Setup logging
logging.basicConfig(level=logging.INFO, format='[%(levelname)s] %(message)s')

# Global set to track visited URLs
_visited_pages: Set[str] = set()


def clear_visited_pages() -> None:
    """Clear the global set of visited pages."""
    _visited_pages.clear()


# ✅ TODO: Modify the @retry decorator here with the following configuration:
# - retry on TimeoutException, RequestError, and HTTPStatusError
@retry(
    stop=stop_after_attempt(3),
    wait=wait_exponential(multiplier=1, min=2, max=10),
    retry=(
        retry_if_exception_type(TimeoutException) |
        retry_if_exception_type(RequestError) |
        retry_if_exception_type(HTTPStatusError)
    ),
    reraise=True
)
def fetch_and_convert_to_markdown(url: str, timeout: int = 10) -> str:
    """Fetch the HTML content of a URL and convert it to Markdown. Retries on errors."""
    logging.info(f"Fetching URL: {url}")
    with httpx.Client(timeout=timeout, follow_redirects=True) as client:
        response = client.get(url)
        response.raise_for_status()
        html = response.text
        return convert_to_markdown(html)


def search_and_fetch_markdown(
    query: str,
    max_results: int = 5,
    region: str = "wt-wt",
    safesearch: str = "moderate",
    timeout: int = 10
) -> List[Dict[str, str]]:
    """
    Perform a DuckDuckGo search and return top results converted to Markdown,
    skipping already-visited pages.

    Returns a list of dicts: [{"title": ..., "url": ..., "markdown": ...}, ...]
    """
    try:
        ddgs = DDGS()
        results = ddgs.text(
            query,
            region=region,
            safesearch=safesearch,
            max_results=max_results
        )
    except Exception as e:
        logging.warning(f"Unexpected error for DDGS: {str(e)}")
        results = []

    markdown_pages = []

    for result in results:
        url = result.get("href")
        title = result.get("title", "")

        if not url or url in _visited_pages:
            continue

        try:
            markdown = fetch_and_convert_to_markdown(url, timeout=timeout)
            _visited_pages.add(url)
            markdown_pages.append({
                "title": title,
                "url": url,
                "markdown": markdown
            })

        except TimeoutException:
            logging.warning(f"Timeout while fetching {url}")

        except HTTPStatusError as e:
            logging.warning(f"HTTP error {e.response.status_code} for URL {url}")

        except RequestError as e:
            logging.warning(f"Connection error for URL {url}: {str(e)}")

        except Exception as e:
            logging.warning(f"Unexpected error for URL {url}: {str(e)}")

        _visited_pages.add(url)

    return markdown_pages
```

### Key Update:

The essential change is the addition of the **`retry`** argument to the decorator:

```python
    retry=(
        retry_if_exception_type(TimeoutException) |
        retry_if_exception_type(RequestError) |
        retry_if_exception_type(HTTPStatusError)
    ),
```

This uses the `tenacity` utility **`retry_if_exception_type`** combined with the bitwise OR operator (`|`) to specify that a retry should occur if the raised exception matches **any** of the three specified `httpx` error types. This makes the retry logic highly targeted and efficient.