# Tavily API: A Beginner's Guide with the Python Client

Welcome to this comprehensive guide to the Tavily API! This notebook will walk you through all of Tavily's features using the official `tavily-python` client. We'll cover `Search`, `Extract`, `Crawl`, and `Map`, with clear explanations, code examples, and sample responses.

Tavily is a search engine specifically designed for AI agents and LLMs (Large Language Models). It acts as a web access layer for AI systems, providing real-time, customizable, and RAG-ready search results.

<img src='tavily_overview.png' alt='Search Flowchart' width='400'/>

Agenda:
- [Setup](#setup)
- [Search](#search)
- [Extract](#extract)
- [Crawl](#crawl)
- [Map](#map)
- [Putting It All Together: An Autonomous Research Agent](#putting-it-all-together-an-autonomous-research-agent)


## Setup

First, make sure you have the `tavily-python` library installed. If not, you can install it via pip. We've already done this in our virtual environment.

```
pip install tavily-python
```

Next, you'll need to get your free API key from the [Tavily website](https://app.tavily.com/). Once you have your key, you can instantiate the client.

In [None]:
from tavily import TavilyClient
import json
import os

# It's recommended to set your API key as an environment variable
# api_key = os.environ.get('TAVILY_API_KEY')
# For this example, we will hardcode it. Replace 'tvly-YOUR_API_KEY' with your actual key.
api_key = 'tvly-apikey'

tavily = TavilyClient(api_key=api_key)

## 1. Search (`/search` endpoint)

The `search` method is Tavily's core feature, providing a powerful search engine optimized for AI agents and LLMs.

### Parameters
- `query` (required): The search query.
- `search_depth`: `basic` or `advanced`. Advanced is more comprehensive.
- `include_answer`: Whether to include a generated answer.
- `include_images`: Whether to include relevant image URLs.
- `include_raw_content`: Whether to include the raw HTML of search results.
- `max_results`: The number of search results to return.
- `include_domains`: A list of domains to search within.
- `exclude_domains`: A list of domains to exclude.

<details>
<summary><b>🔍 View Search Flowchart</b></summary>
<img src='search_flowchart.png' alt='Search Flowchart' width='1000'/>
</details>

In [2]:
# Example: Advanced search with an answer and images
# api.tavily.com/search
search_result = tavily.search(
    query="Who won the last FIFA World Cup?",
    search_depth="advanced",
    include_answer=True,
    include_images=True,
    max_results=3
)

print(json.dumps(search_result, indent=2))

{
  "query": "Who won the last FIFA World Cup?",
  "follow_up_questions": null,
  "answer": "Argentina won the last FIFA World Cup in 2022. They defeated France in the final on penalties. This was Argentina's third title.",
  "images": [
    "https://i.ytimg.com/vi/fibjOtrH13U/maxresdefault.jpg",
    "https://i.ytimg.com/vi/XmEFiSUWcZE/maxresdefault.jpg",
    "https://i0.wp.com/www.neogol.com/wp-content/uploads/2022/12/LIST-FIFA-WORLD-CHAMPIONS-1.jpg",
    "https://media.cnn.com/api/v1/images/stellar/prod/221219105607-messi-crowd-world-cup-121822.jpg?c=original&q=w_1280,c_fill",
    "https://thesportshint.com/wp-content/uploads/2022/07/image-2.png"
  ],
  "results": [
    {
      "url": "https://en.wikipedia.org/wiki/FIFA_World_Cup",
      "title": "FIFA World Cup - Wikipedia",
      "content": "The FIFA World Cup, often called the World Cup, is an international association football competition among the senior men's national teams of the members of the F\u00e9d\u00e9ration Internation

---

## 2. Extract (`/extract` endpoint)

The `extract` method allows you to get the main content from web pages.

### Parameters
- `urls` (required): A single URL or a list of up to 20 URLs.
- `extract_depth`: `basic` or `advanced`. Advanced includes tables and embedded content.
- `include_images`: Whether to include extracted images.

<details>
<summary><b>📄 View Extract Flowchart</b></summary>
<img src='extract_flowchart.png' alt='Extract Flowchart' width='800'/>
</details>

In [None]:
# Example: Extract content from a URL 
# Some websites may block content extraction. Here is an example URL that should work reliably.
# api.tavily.com/extract

urls_to_extract = [
    'https://en.wikipedia.org/wiki/Artificial_intelligence'
]
extraction_result = tavily.extract(urls=urls_to_extract, include_images=True)

print(json.dumps(extraction_result, indent=2))

---

## 3. Crawl (`/crawl` endpoint)

The `crawl` method is a powerful, graph-based tool for traversing websites, tt can explore hundreds of paths in parallel with built-in extraction and intelligent discovery.

### Parameters
- `url` (required): The starting URL for the crawl.
- `instructions`: Natural language guidance for the crawl.
- `max_depth`: How many levels deep to crawl.
- `max_breadth`: The number of pages to crawl at each level.
- `limit`: The total maximum number of pages to crawl.
- `select_paths`: Regex patterns for URLs to include.
- `exclude_paths`: Regex patterns for URLs to exclude.

<details>
<summary><b>🕷️ View Crawl Flowchart</b></summary>
<img src='crawl_flowchart.png' alt='Crawl Flowchart' width='1000'/>
</details>


In [None]:
# Example: Crawl a documentation site for specific information
# api.tavily.com/crawl
crawl_result = tavily.crawl(
    url="https://docs.tavily.com",
    instructions="Find all pages related to the Python SDK and API reference.",
    max_depth=2,       # Crawl 2 levels deep
    limit=10           # Return a maximum of 10 pages
)

print(json.dumps(crawl_result, indent=2))

---

## 4. Map (`/map` endpoint)

The `map` method generates a comprehensive sitemap by traversing a website. It's great for discovering the structure of a site before performing a targeted crawl.

### Parameters
- `url` (required): The base URL to map.
- `max_depth`: How many levels deep to map.
- `max_breadth`: The number of pages to explore at each level.
- `limit`: The total maximum number of URLs to return.

<details>
<summary><b>🗺️ View Map Flowchart</b></summary>
<img src='map_flowchart.png' alt='Map Flowchart' width='800'/>
</details>

In [5]:
# Example: Generate a sitemap for a documentation website
# api.tavily.com/map
map_result = tavily.map(
    url="https://docs.tavily.com",
    max_depth=1, # Map only the top level and one level down
    limit=20     # Limit the output to 20 URLs
)

print(json.dumps(map_result, indent=2))

{
  "base_url": "https://docs.tavily.com",
  "results": [
    "https://docs.tavily.com/welcome",
    "https://docs.tavily.com/documentation/api-credits",
    "https://docs.tavily.com/documentation/about",
    "https://docs.tavily.com/sdk/python/quick-start",
    "https://docs.tavily.com/examples/use-cases/chat",
    "https://docs.tavily.com/documentation/api-reference/endpoint/search",
    "https://blog.tavily.com/",
    "https://tavily.com/",
    "https://app.tavily.com/playground",
    "https://app.tavily.com/",
    "https://community.tavily.com/"
  ],
  "response_time": 0.37,
  "request_id": "ef1d1195-d3f6-4e3e-a98e-cf63a65910b6"
}


---

## Putting It All Together: An Autonomous Research Agent

Now, let's build a more advanced, autonomous research agent. This agent will only require a single input query. It will then use Tavily to find the most relevant, up-to-date information online and then use a local Ollama model to synthesize a report.

This demonstrates a powerful real-world use case: **enriching a local, private LLM with real-time, public information.**

<details>
<summary><b>🤖 View Agent Workflow</b></summary>
<img src='agent_flowchart.png' alt='Agent Flowchart' width='1000'/>
</details>

### Prerequisites
1. **Ollama Installed and Running**: Make sure you have [Ollama](https://ollama.com/) installed and a model pulled (e.g., `ollama run llama3`).
2. **Ollama Python Library**: Install the Ollama Python client:
   ```
   pip install ollama
   ```

<details>
<summary><b>🤖 View Agent Workflow</b></summary>
<img src='agent_flowchart.png' alt='Agent Flowchart' width='1000'/>
</details>

In [6]:
import ollama
from urllib.parse import urlparse

def run_autonomous_research_agent(query):
    """
    Runs an autonomous research agent that takes a query, finds the best information source,
    and generates a report using a local LLM.
    """
    print(f"--- Starting Autonomous Research for: '{query}' ---\n")

    # Step 1: Use Tavily Search to get a broad overview and find the best source.
    # The 'search' endpoint is the entry point for our research.
    print("Step 1: Finding the best information source with Tavily Search...")
    search_results = tavily.search(query=query, search_depth="advanced")
    
    if not search_results['results']:
        print("No search results found. Aborting.")
        return

    # Automatically select the most relevant source to crawl.
    best_source_url = search_results['results'][0]['url']
    print(f"Best source found: {best_source_url}\n")

    # Step 2: Use Tavily Crawl for a targeted deep-dive into the best source.
    # The 'crawl' endpoint lets us extract specific, relevant content.
    print(f"Step 2: Performing a deep-dive crawl on {best_source_url}...")
    crawl_results = tavily.crawl(
        url=best_source_url,
        instructions=f"Focus on the main content that directly addresses the query: '{query}'.",
        limit=5  # Limit to 5 pages to keep it focused
    )
    
    # Combine all the gathered content.
    all_content = "\n\n".join([res['raw_content'] for res in crawl_results['results']])
    if not all_content:
        print("Could not extract content from the source. Using initial search results for summary.")
        all_content = "\n\n".join([res['content'] for res in search_results['results']])


    # Step 3: Use Ollama to synthesize the findings into a report.
    # This prompt highlights why Tavily is essential—the LLM's knowledge is not real-time.
    print("\nStep 3: Synthesizing findings with local Ollama model...")
    try:
        ollama_response = ollama.chat(
            model='llama3',  # Or any other model you have pulled
            messages=[
                {
                    'role': 'system',
                    'content': 'You are a research assistant. Your task is to write a concise report based *only* on the provided real-time web search results.'
                },
                {
                    'role': 'user',
                    'content': f"My internal knowledge is outdated. Please generate a report on the topic '{query}' using only the following up-to-date information:\n\n---\n{all_content[:8000]}"
                },
            ],
        )
        print("\n--- Research Report ---")
        print(ollama_response['message']['content'])
    except Exception as e:
        print(f"\nCould not connect to Ollama. Please ensure it's running. Error: {e}")

    print("\n--- Research Complete ---")

# --- Run the Autonomous Agent ---
# Let's use a query that requires very recent information that a local LLM wouldn't have.
research_query = "What were the key announcements from Apple's latest WWDC event?"
run_autonomous_research_agent(research_query)

--- Starting Autonomous Research for: 'What were the key announcements from Apple's latest WWDC event?' ---

Step 1: Finding the best information source with Tavily Search...
Best source found: https://www.macrumors.com/roundup/wwdc/

Step 2: Performing a deep-dive crawl on https://www.macrumors.com/roundup/wwdc/...

Step 3: Synthesizing findings with local Ollama model...

--- Research Report ---
**Key Announcements from Apple's Latest WWDC Event**

Based on the latest information available on MacRumors, here are the key announcements made during Apple's Worldwide Developers Conference (WWDC) event:

1. **iOS 26 and iPadOS 18**: New operating system updates for iOS devices, bringing new features and improvements.
2. **macOS 26 Tahoe and macOS Sequoia**: Upcoming updates to the Mac operating system, focusing on performance and security enhancements.
3. **watchOS 11**: The latest update for Apple Watch, featuring improved performance and new watch faces.
4. **visionOS 2**: An update to 

---