# Unit 3

## Avoiding Common Pitfalls in Our Web Searcher

# Making Your Web Searcher More Reliable

Welcome back\! So far, you've learned how to search the web using **Python** and how to build a module that fetches and processes web content. In this lesson, we'll focus on making your **web searcher** more reliable by avoiding common mistakes that can cause problems in automated research.

When you build tools that interact with the web, you'll often run into issues like duplicate results, broken links, or slow responses. If you don't handle these problems, your tool might waste time, give you bad data, or even stop working. By learning how to avoid these pitfalls, you'll make your web searcher much more robust and useful.

-----

## Recall: Our Web Searcher Workflow

Let's quickly remind ourselves what our web searcher does. In the previous lessons, you learned how to:

1.  Use the **`DDGS`** library to search the web for a query.
2.  Fetch the content of the top search results using the **`httpx`** library.
3.  Convert the HTML content of each page to Markdown using **`html_to_markdown`**.

All of these steps are combined in a function that takes a search query and returns a list of results, each with a title, URL, and Markdown content. Now, let's see how we can improve this process by handling some common issues.

-----

## Tracking and Skipping Visited URLs

One common problem is processing the same web page more than once. This can happen if the same URL appears in multiple searches or if your code is run multiple times. To avoid this, we need a way to **remember which pages we have already visited**.

1.  **Create a Set:**
    Let's start by creating a set to keep track of visited URLs. Sets are useful because they don't allow duplicate values, and checking if a value is in a set is very fast.

    ```python
    _visited_pages: Set[str] = set()
    ```

2.  **Check and Skip:**
    Now, before we fetch a page, we check if its URL is already in `_visited_pages`. If it is, we skip it:

    ```python
    if not url or url in _visited_pages:
        continue  # skip already-visited or invalid URLs
    ```

      * `not url` checks if the URL is missing or empty.
      * `url in _visited_pages` checks if we have already seen this URL.

3.  **Mark as Visited:**
    After we successfully fetch and process a page, we add its URL to the set:

    ```python
    _visited_pages.add(url)
    ```

This way, we make sure we don't process the same page twice.

-----

## Handling Errors When Fetching Pages

Another common issue is that some web pages might be **broken, slow, or unreachable**. If your code tries to fetch a page and something goes wrong, it could crash or get stuck.

To handle this, we use a `try` and `except` block when fetching each page:

```python
try:
    # 1. Attempt to fetch the page and handle redirects
    response = httpx.get(url, timeout=timeout, follow_redirects=True)
    # 2. Raise an exception for bad status codes (4xx, 5xx)
    response.raise_for_status()

    html = response.text
    markdown = convert_to_markdown(html)

    # 3. Add to visited only on success
    _visited_pages.add(url)

    markdown_pages.append({
        "title": title,
        "url": url,
        "markdown": markdown
    })
except Exception as e:
    # 4. Handle the error
    _visited_pages.add(url)  # mark it to avoid retrying this bad page
    markdown_pages.append({
        "title": title or "Error",
        "url": url,
        "markdown": f"**Error fetching content from** `{url}`: {e}"
    })
```

**Key Takeaways:**

  * The **`try`** block attempts to fetch the page and convert it to Markdown.
  * If anything goes wrong (e.g., page doesn't load or server returns an error), the code jumps to the **`except`** block.
  * In the `except` block, we still add the URL to `_visited_pages` so we don't try it again.
  * We add a result to `markdown_pages` with a clear **error message**, so we know what went wrong.

**Example Output:**

```markdown
**Error fetching content from** `http://example.com/badpage`: 404 Client Error: Not Found for url: http://example.com/badpage
```

-----

## Using Timeouts and Safe Search Settings

To keep your tool fast and responsive, and to ensure you get appropriate results, you should use timeouts and safe search settings.

1.  **Setting a Timeout:**
    When fetching a page, we set a `timeout` parameter:

    ```python
    response = httpx.get(url, timeout=timeout, follow_redirects=True)
    ```

    The `timeout` makes sure that if a page takes too long to load, your code will stop waiting and move on.

2.  **Safe Search and Region:**
    When searching with `DDGS`, we can set the `safesearch` and `region` parameters:

    ```python
    results = ddgs.text(
        query,
        region=region,
        safesearch=safesearch,
        max_results=max_results
    )
    ```

      * `safesearch` helps filter out inappropriate or irrelevant results.
      * `region` can help you get results that are more relevant to your location or language.

**Example Timeout Error:**

If a page takes too long to load, you might see an error like:

```markdown
**Error fetching content from** `http://slowwebsite.com`: TimeoutException
```

-----

## Summary and What's Next

In this lesson, you learned how to make your web searcher more reliable by:

  * **Tracking and skipping** already-visited URLs to avoid duplicates and save time.
  * **Handling errors** when fetching pages, so your tool doesn't crash or get stuck.
  * Using **timeouts** and **safe search** settings to keep your searches fast and appropriate.

These improvements will help you build a more robust and efficient research tool. In the next practice exercises, you'll get a chance to apply these ideas and see how they make your web searcher stronger. Good luck\!

## Skipping Duplicate URLs for Efficiency

Now that you've learned about the importance of tracking visited URLs, let's put this knowledge into practice! In this exercise, you'll implement the URL checking mechanism we discussed in the lesson.

Your task is to modify the search_and_fetch_markdown function to avoid processing duplicate URLs. You need to add a global set to keep track of all the visited URLs and a condition that checks whether a URL is empty or has already been visited before attempting to fetch it.

The provided code in the app.py file will run two similar searches that might return overlapping results. When your code is working correctly, you'll see that:

The first search processes all new URLs
The second search processes only URLs that weren't found in the first search
The _visited_pages set grows appropriately
This improvement will make your web searcher much more efficient by preventing it from doing the same work twice!

```python
import httpx
from ddgs import DDGS
from html_to_markdown import convert_to_markdown
from typing import List, Dict, Set


# TODO: Create a global set to track visited URLs


def fetch_web_page(url, timeout: int = 10):
    """
    Fetch the content of a web page using httpx.
    """
    response = httpx.get(url, timeout=timeout)
    response.raise_for_status()
    return response.text

def html_to_markdown_converter(html_content, max_length=500):
    """
    Convert HTML content to Markdown format and trim to specified length.
    """
    markdown_content = convert_to_markdown(html_content)
    trimmed_markdown = markdown_content.strip()[:max_length]
    return trimmed_markdown

def search_and_fetch_markdown(
    query: str,
    max_results: int = 5,
    timeout: int = 10
) -> List[Dict[str, str]]:
    """
    Search the web for the query, fetch multiple results, and return their Markdown content.
    Skips already-visited URLs.
    """
    ddgs = DDGS(timeout=timeout)
    results = ddgs.text(query, max_results=max_results)
    markdown_pages = []

    for result in results:
        url = result.get("href")
        title = result.get("title", "")

        # TODO: Add a condition to check if the URL is empty or already visited
        # If either condition is true, use 'continue' to skip to the next result

        try:
            html = fetch_web_page(url, timeout=timeout)
            markdown = html_to_markdown_converter(html)
            # TODO: Add url to global visited pages set
            markdown_pages.append({
                "title": title,
                "url": url,
                "markdown": markdown
            })
        except Exception as e:
            print(f"Error: {str(e)}")

    return markdown_pages

```

Here is the corrected and completed Python code for `app.py`. I've added the global set, the check to skip visited or empty URLs, and the logic to mark URLs as visited.

```python
import httpx
from ddgs import DDGS
from html_to_markdown import convert_to_markdown
from typing import List, Dict, Set


# TODO: Create a global set to track visited URLs
_visited_pages: Set[str] = set()


def fetch_web_page(url, timeout: int = 10):
    """
    Fetch the content of a web page using httpx.
    """
    response = httpx.get(url, timeout=timeout)
    response.raise_for_status()
    return response.text

def html_to_markdown_converter(html_content, max_length=500):
    """
    Convert HTML content to Markdown format and trim to specified length.
    """
    markdown_content = convert_to_markdown(html_content)
    trimmed_markdown = markdown_content.strip()[:max_length]
    return trimmed_markdown

def search_and_fetch_markdown(
    query: str,
    max_results: int = 5,
    timeout: int = 10
) -> List[Dict[str, str]]:
    """
    Search the web for the query, fetch multiple results, and return their Markdown content.
    Skips already-visited URLs.
    """
    ddgs = DDGS(timeout=timeout)
    results = ddgs.text(query, max_results=max_results)
    markdown_pages = []

    for result in results:
        url = result.get("href")
        title = result.get("title", "")

        # TODO: Add a condition to check if the URL is empty or already visited
        # If either condition is true, use 'continue' to skip to the next result
        if not url or url in _visited_pages:
            print(f"Skipping duplicate or invalid URL: {url}")
            continue

        try:
            html = fetch_web_page(url, timeout=timeout)
            markdown = html_to_markdown_converter(html)
            
            # TODO: Add url to global visited pages set
            _visited_pages.add(url)
            
            markdown_pages.append({
                "title": title,
                "url": url,
                "markdown": markdown
            })
        except Exception as e:
            # We still add the URL to the set even if fetching failed to prevent retrying a 'bad' URL
            _visited_pages.add(url)
            print(f"Error fetching {url}: {str(e)}")

    return markdown_pages
```

### Key Changes Implemented:

1.  **Global Set Declaration:**

    ```python
    _visited_pages: Set[str] = set()
    ```

    A global `set` named `_visited_pages` was created outside the function to maintain state across multiple function calls.

2.  **Skipping Logic:**

    ```python
    if not url or url in _visited_pages:
        print(f"Skipping duplicate or invalid URL: {url}")
        continue
    ```

    This condition uses `continue` to efficiently skip the rest of the loop iteration if the URL is empty (`not url`) or has already been processed (`url in _visited_pages`).

3.  **Adding to Set:**

    ```python
    _visited_pages.add(url)
    ```

    The URL is added to the set only **after** the `fetch_web_page` call succeeds, but **before** the page is added to the final results list. I also added a line within the `except` block to ensure bad/failing URLs are marked as visited and aren't retried unnecessarily.

## Graceful Error Handling for Web Requests

Now that you've implemented URL tracking to avoid duplicates, let's focus on another critical aspect of web searching: error handling! In the real world, not all web requests succeed — sites go down, connections time out, and servers return errors.

Your task is to complete the error handling in the search_and_fetch_markdown function. When an exception occurs during fetching, you need to:

Add the URL to _visited_pages so you don't try to fetch it again
Create a properly formatted error message that shows which URL failed and what happened
Add this error information to the results list instead of letting the program crash
The provided test file will try to fetch content from both valid and invalid URLs. When your code works correctly, you'll see that your program continues running even when it encounters bad URLs and provides helpful error messages instead of crashing.

This improvement will make your web searcher much more robust when dealing with the unpredictable nature of the internet!

```python
import httpx
from typing import List, Dict, Set
from ddgs import DDGS
from html_to_markdown import convert_to_markdown

# Global set to track visited URLs
_visited_pages: Set[str] = set()


def fetch_web_page(url, timeout: int = 10):
    """
    Fetch the content of a web page using httpx.
    """
    response = httpx.get(url, timeout=timeout)
    response.raise_for_status()
    return response.text


def html_to_markdown_converter(html_content, max_length=500):
    """
    Convert HTML content to Markdown format and trim to specified length.
    """
    markdown_content = convert_to_markdown(html_content)
    trimmed_markdown = markdown_content.strip()[:max_length]
    return trimmed_markdown


def search_and_fetch_markdown(
    query: str,
    max_results: int = 5,
    timeout: int = 10
) -> List[Dict[str, str]]:
    """
    Search the web for the query, fetch multiple results, and return their Markdown content.
    Skips already-visited URLs.
    """
    ddgs = DDGS(timeout=timeout)
    results = ddgs.text(query, max_results=max_results)
    markdown_pages = []

    for result in results:
        url = result.get("href")
        title = result.get("title", "")

        if not url or url in _visited_pages:
            continue  # skip already-visited or invalid URLs

        try:
            html = fetch_web_page(url, timeout=timeout)
            markdown = html_to_markdown_converter(html)
            _visited_pages.add(url)
            markdown_pages.append({
                "title": title,
                "url": url,
                "markdown": markdown
            })
        except Exception as e:
            # TODO: Add the URL to _visited_pages to avoid retrying this bad page

            # TODO: Add an entry to markdown_pages with an error message that includes:
            # - The title (or "Error" if title is empty)
            # - The URL
            # - A formatted error message showing which URL failed and what the error was
            pass

    return markdown_pages
```

This exercise requires you to complete the `try...except` block to ensure your web searcher handles failed requests gracefully.

Here is the completed Python code for the `search_and_fetch_markdown` function, which now includes the logic to mark bad URLs as visited and append a descriptive error message to the results list.

```python
import httpx
from typing import List, Dict, Set
from ddgs import DDGS
from html_to_markdown import convert_to_markdown

# Global set to track visited URLs
_visited_pages: Set[str] = set()


def fetch_web_page(url, timeout: int = 10):
    """
    Fetch the content of a web page using httpx.
    """
    response = httpx.get(url, timeout=timeout)
    response.raise_for_status()
    return response.text


def html_to_markdown_converter(html_content, max_length=500):
    """
    Convert HTML content to Markdown format and trim to specified length.
    """
    markdown_content = convert_to_markdown(html_content)
    trimmed_markdown = markdown_content.strip()[:max_length]
    return trimmed_markdown


def search_and_fetch_markdown(
    query: str,
    max_results: int = 5,
    timeout: int = 10
) -> List[Dict[str, str]]:
    """
    Search the web for the query, fetch multiple results, and return their Markdown content.
    Skips already-visited URLs.
    """
    ddgs = DDGS(timeout=timeout)
    results = ddgs.text(query, max_results=max_results)
    markdown_pages = []

    for result in results:
        url = result.get("href")
        title = result.get("title", "")

        if not url or url in _visited_pages:
            continue  # skip already-visited or invalid URLs

        try:
            html = fetch_web_page(url, timeout=timeout)
            markdown = html_to_markdown_converter(html)
            _visited_pages.add(url)
            markdown_pages.append({
                "title": title,
                "url": url,
                "markdown": markdown
            })
        except Exception as e:
            # TODO: Add the URL to _visited_pages to avoid retrying this bad page
            _visited_pages.add(url)

            # TODO: Add an entry to markdown_pages with an error message that includes:
            # - The title (or "Error" if title is empty)
            # - The URL
            # - A formatted error message showing which URL failed and what the error was
            error_message = f"**Error fetching content from** `{url}`: {e}"
            markdown_pages.append({
                "title": title or "Error fetching page",
                "url": url,
                "markdown": error_message
            })

    return markdown_pages
```

## Resetting URL Tracking for Fresh Searches

Cosmo
Just now
Read message aloud
You've done a fantastic job handling duplicate URLs and errors! Now, let's add another important feature — the ability to reset our URL tracking system.

Sometimes, you'll want to run the same search again or process the same URLs after making changes to your code. For this, you need a way to clear the list of visited pages.

Your task has two parts:

Complete the clear_visited_pages() function in the web_searcher.py file to reset the global _visited_pages set.

In the test file, add the missing function calls to demonstrate how clearing works:

Start with a clean slate
Run a search and see which URLs are processed
Clear the visited pages
Run the same search again to show that previously visited URLs are processed again
When your code works correctly, you'll see that after clearing the visited pages, the same search will process the same URLs again as if they were new. This feature makes your web searcher more flexible for repeated research tasks!

```python
import httpx
from typing import List, Dict, Set
from ddgs import DDGS
from html_to_markdown import convert_to_markdown

# Global set to track visited URLs
_visited_pages: Set[str] = set()


def fetch_web_page(url, timeout: int = 10):
    """
    Fetch the content of a web page using httpx.
    """
    response = httpx.get(url, timeout=timeout)
    response.raise_for_status()
    return response.text


def html_to_markdown_converter(html_content, max_length=500):
    """
    Convert HTML content to Markdown format and trim to specified length.
    """
    markdown_content = convert_to_markdown(html_content)
    trimmed_markdown = markdown_content.strip()[:max_length]
    return trimmed_markdown


# TODO: Implement the clear_visited_pages() function


def search_and_fetch_markdown(
    query: str,
    max_results: int = 5,
    timeout: int = 10
) -> List[Dict[str, str]]:
    """
    Search the web for the query, fetch multiple results, and return their Markdown content.
    Skips already-visited URLs.
    """
    ddgs = DDGS(timeout=timeout)
    results = ddgs.text(query, max_results=max_results)
    markdown_pages = []

    for result in results:
        url = result.get("href")
        title = result.get("title", "")

        if not url or url in _visited_pages:
            continue  # skip already-visited or invalid URLs

        try:
            html = fetch_web_page(url, timeout=timeout)
            markdown = html_to_markdown_converter(html)
            _visited_pages.add(url)
            markdown_pages.append({
                "title": title,
                "url": url,
                "markdown": markdown
            })
        except Exception as e:
            _visited_pages.add(url)  # mark it to avoid retrying this bad page
            markdown_pages.append({
                "title": title or "Error",
                "url": url,
                "markdown": f"**Error fetching content from** `{url}`: {e}"
            })

    return markdown_pages
```

Tentu, saya akan melengkapi fungsi `clear_visited_pages()` dan memberikan contoh bagaimana fungsi tersebut dapat digunakan dalam konteks pengujian.

### 1\. Implementasi `clear_visited_pages()`

Anda dapat mengimplementasikan fungsi ini dengan memanggil metode `.clear()` pada *set* global `_visited_pages`.

Berikut adalah kode Python yang sudah diperbarui, termasuk fungsi yang diminta:

```python
import httpx
from typing import List, Dict, Set
from ddgs import DDGS
from html_to_markdown import convert_to_markdown

# Global set to track visited URLs
_visited_pages: Set[str] = set()


def fetch_web_page(url, timeout: int = 10):
    """
    Fetch the content of a web page using httpx.
    """
    response = httpx.get(url, timeout=timeout)
    response.raise_for_status()
    return response.text


def html_to_markdown_converter(html_content, max_length=500):
    """
    Convert HTML content to Markdown format and trim to specified length.
    """
    markdown_content = convert_to_markdown(html_content)
    trimmed_markdown = markdown_content.strip()[:max_length]
    return trimmed_markdown


# TODO: Implement the clear_visited_pages() function
def clear_visited_pages():
    """
    Resets the global set of visited URLs, allowing previously processed URLs to be fetched again.
    """
    global _visited_pages
    _visited_pages.clear()


def search_and_fetch_markdown(
    query: str,
    max_results: int = 5,
    timeout: int = 10
) -> List[Dict[str, str]]:
    """
    Search the web for the query, fetch multiple results, and return their Markdown content.
    Skips already-visited URLs.
    """
    ddgs = DDGS(timeout=timeout)
    results = ddgs.text(query, max_results=max_results)
    markdown_pages = []

    for result in results:
        url = result.get("href")
        title = result.get("title", "")

        if not url or url in _visited_pages:
            continue  # skip already-visited or invalid URLs

        try:
            html = fetch_web_page(url, timeout=timeout)
            markdown = html_to_markdown_converter(html)
            _visited_pages.add(url)
            markdown_pages.append({
                "title": title,
                "url": url,
                "markdown": markdown
            })
        except Exception as e:
            _visited_pages.add(url)  # mark it to avoid retrying this bad page
            markdown_pages.append({
                "title": title or "Error",
                "url": url,
                "markdown": f"**Error fetching content from** `{url}`: {e}"
            })

    return markdown_pages
```

-----

### 2\. Contoh Penggunaan untuk Pengujian Ulang

Berikut adalah skrip contoh (misalnya di file test) yang menunjukkan bagaimana fungsi `clear_visited_pages()` digunakan untuk memungkinkan URL yang sama diproses kembali:

```python
# Asumsikan kode di atas ada di file web_searcher.py

# Query yang sama digunakan untuk dua kali pencarian
SEARCH_QUERY = "Python programming"

# --- Langkah 1: Bersihkan status awal (Memastikan set kosong) ---
clear_visited_pages()
print(f"Status Awal: _visited_pages memiliki {len(_visited_pages)} URL.")

# --- Langkah 2: Jalankan pencarian pertama ---
print("\n--- Running Search 1 ---")
results1 = search_and_fetch_markdown(SEARCH_QUERY, max_results=3)
print(f"Search 1 memproses {len(results1)} hasil.")
print(f"Status Setelah Search 1: _visited_pages memiliki {len(_visited_pages)} URL.")
# Pada titik ini, semua URL unik telah ditambahkan ke _visited_pages.

# --- Langkah 3: Jalankan pencarian kedua (tanpa membersihkan) ---
print("\n--- Running Search 2 (Without Clearing) ---")
results2 = search_and_fetch_markdown(SEARCH_QUERY, max_results=3)
print(f"Search 2 memproses {len(results2)} hasil (hanya yang baru/berbeda).")
print(f"Status Setelah Search 2: _visited_pages memiliki {len(_visited_pages)} URL.")
# Jika semua hasil sama dengan Search 1, results2 akan kosong atau sangat sedikit.

# --- Langkah 4: Bersihkan status ---
print("\n--- Clearing Visited Pages ---")
clear_visited_pages()
print(f"Status Setelah Clear: _visited_pages memiliki {len(_visited_pages)} URL.")

# --- Langkah 5: Jalankan pencarian ketiga (Setelah membersihkan) ---
print("\n--- Running Search 3 (After Clearing) ---")
results3 = search_and_fetch_markdown(SEARCH_QUERY, max_results=3)
print(f"Search 3 memproses {len(results3)} hasil.")
print(f"Status Setelah Search 3: _visited_pages memiliki {len(_visited_pages)} URL.")
# Karena set telah di-reset, Search 3 akan memproses URL yang sama dengan Search 1.
```

## Customizing Search Results with Parameters

You've made excellent progress with URL tracking and error handling! Now, let's enhance our web searcher by customizing the search results themselves. In the lesson, we mentioned the safesearch and region parameters that can help filter and localize your search results.

Your task is to add this parameters in the search_and_fetch_markdown function and update the DDGS search call to properly use them.

The provided tests in app.py will help you experiment with:

Different safesearch levels ("off", "moderate", "strict") to see how content filtering works
Different region settings (like "us-en", "uk-en", "de-de") to see how results vary by location
By implementing these parameters, you'll make your web searcher more adaptable to different research needs and audiences. This is especially useful when you need to filter out inappropriate content or find region-specific information!

```python
import httpx
from typing import List, Dict, Set
from ddgs import DDGS
from html_to_markdown import convert_to_markdown

# Global set to track visited URLs
_visited_pages: Set[str] = set()


def fetch_web_page(url, timeout: int = 10):
    """
    Fetch the content of a web page using httpx.
    """
    response = httpx.get(url, timeout=timeout)
    response.raise_for_status()
    return response.text


def html_to_markdown_converter(html_content, max_length=500):
    """
    Convert HTML content to Markdown format and trim to specified length.
    """
    markdown_content = convert_to_markdown(html_content)
    trimmed_markdown = markdown_content.strip()[:max_length]
    return trimmed_markdown


def clear_visited_pages() -> None:
    """Clear the global set of visited pages."""
    _visited_pages.clear()


# TODO: Add a region parameter with "wt-wt" as default and a safesearch parameter with "moderate" as default
def search_and_fetch_markdown(
    query: str,
    max_results: int = 5,
    timeout: int = 10
) -> List[Dict[str, str]]:
    """
    Search the web for the query, fetch multiple results, and return their Markdown content.
    Skips already-visited URLs.
    """
    ddgs = DDGS(timeout=timeout)
    # TODO: Update the search call to pass the region and safesearch parameters
    results = ddgs.text(
        query,
        max_results=max_results
    )
    markdown_pages = []

    for result in results:
        url = result.get("href")
        title = result.get("title", "")

        if not url or url in _visited_pages:
            continue  # skip already-visited or invalid URLs

        try:
            html = fetch_web_page(url, timeout=timeout)
            markdown = html_to_markdown_converter(html)
            _visited_pages.add(url)
            markdown_pages.append({
                "title": title,
                "url": url,
                "markdown": markdown
            })
        except Exception as e:
            _visited_pages.add(url)  # mark it to avoid retrying this bad page
            markdown_pages.append({
                "title": title or "Error",
                "url": url,
                "markdown": f"**Error fetching content from** `{url}`: {e}"
            })

    return markdown_pages

# app.py
from deepresearcher.web.web_searcher import search_and_fetch_markdown, clear_visited_pages

def test_safesearch_settings():
    """Test how different safesearch settings affect search results."""
    
    # The query that might return different results based on safesearch
    query = "controversial topics 2023"
    max_results = 3
    
    print(f"Testing safesearch settings with query: '{query}'")
    
    # Test with safesearch off
    print("\n--- SafeSearch: OFF ---")
    clear_visited_pages()  # Start fresh for each test
    # TODO: Call search_and_fetch_markdown with safesearch="off"
    results_off = None
    
    print(f"Found {len(results_off) if results_off else 0} results:")
    for i, result in enumerate(results_off or [], 1):
        print(f"{i}. {result['title']} - {result['url']}")
    
    # Test with safesearch moderate
    print("\n--- SafeSearch: MODERATE ---")
    clear_visited_pages()
    # TODO: Call search_and_fetch_markdown with safesearch="moderate"
    results_moderate = None
    
    print(f"Found {len(results_moderate) if results_moderate else 0} results:")
    for i, result in enumerate(results_moderate or [], 1):
        print(f"{i}. {result['title']} - {result['url']}")
    
    # Test with safesearch strict
    print("\n--- SafeSearch: STRICT ---")
    clear_visited_pages()
    # TODO: Call search_and_fetch_markdown with safesearch="strict"
    results_strict = None
    
    print(f"Found {len(results_strict) if results_strict else 0} results:")
    for i, result in enumerate(results_strict or [], 1):
        print(f"{i}. {result['title']} - {result['url']}")
    
    # Compare results
    print("\n--- COMPARISON ---")
    
    # TODO: Add code to compare the results from different safesearch settings
    # Hint: Create sets of URLs from each result list and compare them


def test_region_settings():
    """Test how different region settings affect search results."""
    
    # A query that might return different results based on region
    query = "local news today"
    max_results = 3
    
    # Define regions to test
    regions = {
        "us-en": "United States (English)",
        "uk-en": "United Kingdom (English)",
        "de-de": "Germany (German)"
    }
    
    print(f"Testing region settings with query: '{query}'")
    
    # Store results for each region
    all_results = {}
    
    # Test each region
    for region_code, region_name in regions.items():
        print(f"\n--- Region: {region_name} ({region_code}) ---")
        clear_visited_pages()  # Start fresh for each test
        
        # TODO: Call search_and_fetch_markdown with the current region_code
        results = None
        
        all_results[region_code] = results or []
        
        print(f"Found {len(results) if results else 0} results:")
        for i, result in enumerate(results or [], 1):
            print(f"{i}. {result['title']} - {result['url']}")
    
    # Compare results between regions
    print("\n--- COMPARISON ---")
    
    # TODO: Add code to compare the results from different regions
    # Hint: Create sets of URLs for each region and compare them

if __name__ == "__main__":
    test_safesearch_settings()
    test_region_settings()
```

Tentu, saya akan melengkapi fungsi `search_and_fetch_markdown` di `web_searcher.py` dengan parameter `region` dan `safesearch`, serta menyelesaikan *file test* (`app.py`) dengan memanggil fungsi tersebut menggunakan parameter yang berbeda.

## 1\. Kode `web_searcher.py` yang Diperbarui

Saya telah menambahkan parameter `region` dan `safesearch` ke `search_and_fetch_markdown` dan memperbarui panggilan `ddgs.text()`:

```python
import httpx
from typing import List, Dict, Set
from ddgs import DDGS
from html_to_markdown import convert_to_markdown

# Global set to track visited URLs
_visited_pages: Set[str] = set()


def fetch_web_page(url, timeout: int = 10):
    """
    Fetch the content of a web page using httpx.
    """
    response = httpx.get(url, timeout=timeout)
    response.raise_for_status()
    return response.text


def html_to_markdown_converter(html_content, max_length=500):
    """
    Convert HTML content to Markdown format and trim to specified length.
    """
    markdown_content = convert_to_markdown(html_content)
    trimmed_markdown = markdown_content.strip()[:max_length]
    return trimmed_markdown


def clear_visited_pages() -> None:
    """Clear the global set of visited pages."""
    global _visited_pages
    _visited_pages.clear()


# TODO: Add a region parameter with "wt-wt" as default and a safesearch parameter with "moderate" as default
def search_and_fetch_markdown(
    query: str,
    max_results: int = 5,
    timeout: int = 10,
    region: str = "wt-wt",  # Default: Worldwide/All regions
    safesearch: str = "moderate" # Default: Moderate filtering
) -> List[Dict[str, str]]:
    """
    Search the web for the query, fetch multiple results, and return their Markdown content.
    Skips already-visited URLs.
    """
    ddgs = DDGS(timeout=timeout)
    # TODO: Update the search call to pass the region and safesearch parameters
    results = ddgs.text(
        query,
        max_results=max_results,
        region=region,
        safesearch=safesearch
    )
    markdown_pages = []

    for result in results:
        url = result.get("href")
        title = result.get("title", "")

        if not url or url in _visited_pages:
            continue  # skip already-visited or invalid URLs

        try:
            html = fetch_web_page(url, timeout=timeout)
            markdown = html_to_markdown_converter(html)
            _visited_pages.add(url)
            markdown_pages.append({
                "title": title,
                "url": url,
                "markdown": markdown
            })
        except Exception as e:
            _visited_pages.add(url)  # mark it to avoid retrying this bad page
            markdown_pages.append({
                "title": title or "Error",
                "url": url,
                "markdown": f"**Error fetching content from** `{url}`: {e}"
            })

    return markdown_pages
```

-----

## 2\. Kode `app.py` yang Diperbarui

Berikut adalah kode pengujian yang sudah dilengkapi dengan panggilan fungsi yang benar dan logika perbandingan hasil:

```python
# app.py
from deepresearcher.web.web_searcher import search_and_fetch_markdown, clear_visited_pages

def test_safesearch_settings():
    """Test how different safesearch settings affect search results."""
    
    # The query that might return different results based on safesearch
    query = "controversial topics 2023"
    max_results = 3
    
    print(f"Testing safesearch settings with query: '{query}'")
    
    # Test with safesearch off
    print("\n--- SafeSearch: OFF ---")
    clear_visited_pages()  # Start fresh for each test
    # TODO: Call search_and_fetch_markdown with safesearch="off"
    results_off = search_and_fetch_markdown(query, max_results=max_results, safesearch="off")
    
    print(f"Found {len(results_off) if results_off else 0} results:")
    for i, result in enumerate(results_off or [], 1):
        print(f"{i}. {result['title']} - {result['url']}")
    
    # Test with safesearch moderate
    print("\n--- SafeSearch: MODERATE ---")
    clear_visited_pages()
    # TODO: Call search_and_fetch_markdown with safesearch="moderate"
    results_moderate = search_and_fetch_markdown(query, max_results=max_results, safesearch="moderate")
    
    print(f"Found {len(results_moderate) if results_moderate else 0} results:")
    for i, result in enumerate(results_moderate or [], 1):
        print(f"{i}. {result['title']} - {result['url']}")
    
    # Test with safesearch strict
    print("\n--- SafeSearch: STRICT ---")
    clear_visited_pages()
    # TODO: Call search_and_fetch_markdown with safesearch="strict"
    results_strict = search_and_fetch_markdown(query, max_results=max_results, safesearch="strict")
    
    print(f"Found {len(results_strict) if results_strict else 0} results:")
    for i, result in enumerate(results_strict or [], 1):
        print(f"{i}. {result['title']} - {result['url']}")
    
    # Compare results
    print("\n--- COMPARISON ---")
    
    # TODO: Add code to compare the results from different safesearch settings
    urls_off = {r['url'] for r in results_off}
    urls_moderate = {r['url'] for r in results_moderate}
    urls_strict = {r['url'] for r in results_strict}
    
    print(f"URLs in OFF but NOT in MODERATE: {urls_off - urls_moderate}")
    print(f"URLs in MODERATE but NOT in STRICT: {urls_moderate - urls_strict}")


def test_region_settings():
    """Test how different region settings affect search results."""
    
    # A query that might return different results based on region
    query = "local news today"
    max_results = 3
    
    # Define regions to test
    regions = {
        "us-en": "United States (English)",
        "uk-en": "United Kingdom (English)",
        "de-de": "Germany (German)"
    }
    
    print(f"Testing region settings with query: '{query}'")
    
    # Store results for each region
    all_results = {}
    
    # Test each region
    for region_code, region_name in regions.items():
        print(f"\n--- Region: {region_name} ({region_code}) ---")
        clear_visited_pages()  # Start fresh for each test
        
        # TODO: Call search_and_fetch_markdown with the current region_code
        results = search_and_fetch_markdown(query, max_results=max_results, region=region_code)
        
        all_results[region_code] = results or []
        
        print(f"Found {len(results) if results else 0} results:")
        for i, result in enumerate(results or [], 1):
            print(f"{i}. {result['title']} - {result['url']}")
    
    # Compare results between regions
    print("\n--- COMPARISON ---")
    
    # TODO: Add code to compare the results from different regions
    urls_us = {r['url'] for r in all_results['us-en']}
    urls_uk = {r['url'] for r in all_results['uk-en']}
    urls_de = {r['url'] for r in all_results['de-de']}
    
    print(f"URLs found ONLY in US: {urls_us - urls_uk - urls_de}")
    print(f"URLs found ONLY in UK: {urls_uk - urls_us - urls_de}")
    print(f"URLs found ONLY in DE: {urls_de - urls_us - urls_uk}")

if __name__ == "__main__":
    test_safesearch_settings()
    test_region_settings()
```