## Title & Overview

- Project title: Satellite Metadata Search Agent (STAC Agent)
- Problem statement:
  - Provide natural-language search over satellite imagery metadata (Sentinel-2 STAC) using Google ADK.
  - Users can specify area, time range, and cloud-cover constraints in free-form text.
- High-level summary:
  - Resolve AOI (Area of Interest) from place names using a predefined AOI catalog.
  - Interpret time range and cloud-cover constraints from text.
  - Call a STAC search tool against Sentinel-2 metadata.
  - Return results in a human-readable table.


## Background & Objectives

- Competition: Agents Intensive Capstone Project (Kaggle).
- Goal of the capstone:
  - Build an agent that solves a real-world problem or improves everyday productivity.
- This agent:
  - Focuses on Earth observation (EO) data accessibility.
  - Targets analysts and developers who want to quickly explore Sentinel-2 imagery without writing STAC queries.
- Objectives of this Notebook:
  - Describe the system design and agent architecture.
  - Demonstrate core behaviors (normal search and clarification).
  - Summarize evaluation and limitations.


In [None]:
## System Design & Architecture
- Modules: agent/stac_agent_adk.py (root agent), agent/prompts.py (instructions), aoi/aoi_catalog.py (AOI resolver), tools/stac_search.py (STAC API), scripts/run_eval.py (eval runner).
- Flow: user → resolve AOI → build search params → call Element84 Earth Search v1 → format table.

![Agent flow](fig/flow.png)


# Environment & Dependencies (Kaggle)
- **Target runtime**: This notebook is intended to be run on a Kaggle Notebook (Python 3.11). The examples and demo cells assume a Kaggle session; some paths and conveniences (e.g., `/kaggle/input`) are Kaggle-specific.
- **Enable Internet**: Live STAC and ADK calls require outbound HTTPS. In Kaggle, verify the notebook's *Internet* setting is enabled (Notebook Settings). If Internet is disabled, the notebook will skip live API requests.
- **Enable GPU (optional)**: You can enable a GPU via the Notebook Settings -> Accelerator, but GPUs are not required for the agent or STAC queries. Do not rely on GPU for core functionality or reproducibility.
- **API keys & secrets**: Set required keys (for example `GOOGLE_API_KEY`) in the session environment before running cells that contact external services. On Kaggle you can store secrets privately (recommended). For quick testing, you may set them at runtime (avoid committing keys):

```python
import os
os.environ['GOOGLE_API_KEY'] = 'YOUR_KEY_HERE'  # Do NOT commit this to a public notebook
```

- **Dependencies (notebook)**: Install Python packages in the Kaggle session with `pip`. Example cell:

```python
!pip install google-adk python-dotenv pandas shapely requests tabulate rouge-score
```

- **Check API availability before live calls**: Demo cells guard live API usage; they check `os.getenv('GOOGLE_API_KEY')` and skip real calls if the key is missing. This avoids runtime errors on sessions without keys or Internet.

```python
import os
if not os.getenv('GOOGLE_API_KEY'):
    print('GOOGLE_API_KEY not set — live agent calls will be skipped.')
```

- **Working with Kaggle datasets**: If you provided a submission bundle as a Kaggle Dataset, files will appear under `/kaggle/input/<DATASET_NAME>`. Use the provided extraction cells to unpack and add the project to `sys.path`.
- **Local development**: For local runs use a virtual environment and install the same dependencies: `python -m venv .venv && source .venv/bin/activate` then `pip install -r requirements.txt` (or the single `pip install` line above). Run unit tests with `pytest tests/unit`.
- **Notes**: If Internet or API keys are not available, the notebook contains guarded cells that will skip demonstrations which require live STAC/ADK requests. This lets you still run and inspect non-network code paths and unit tests.
- **Reference**: Agents Intensive Capstone Project (Kaggle) — https://www.kaggle.com/competitions/agents-intensive-capstone-project/overview

In [None]:
# Code cell: install dependencies (adjust versions as needed)
!pip install google-adk python-dotenv pandas shapely requests tabulate rouge-score


In [None]:
# Code cell: unpack submission bundle from Kaggle Dataset
!ls /kaggle/input  # TODO: find dataset name
!tar -xzf /kaggle/input/<DATASET_NAME>/submission.tar.gz -C /kaggle/working/
!ls /kaggle/working


In [None]:
# Code cell: add project to Python path
import sys, pathlib
PROJECT_ROOT = pathlib.Path("/kaggle/working/capstone")  # TODO: confirm extracted path
sys.path.append(str(PROJECT_ROOT / "src"))


## Agent Components
- Root agent: capstone.agent.stac_agent_adk.create_agent() using SYSTEM_PROMPT + ARGUMENT_PLANNING_INSTRUCTIONS.
- Tools: resolve_aoi (uses aoi_catalog.json with ids like japan, hokkaido_east, japan_cloud_free_focused), search_satellite_scenes (Element84 Earth Search v1, default sentinel-2-l2a).
- TODO: Note MODEL_NAME default (gemini-2.0-flash) and how to swap.
- TODO: Add placeholders for component diagram / prompt snippets.
## Demo: Normal Query Flow
- Example intent placeholder: “Find Sentinel-2 images over eastern Hokkaido between 2023-06-15 and 2023-06-30 with less than 20% cloud cover.”
- TODO: Replace with your own query if needed.

In [None]:
# Code cell: run a normal query through the agent
import asyncio
from google.genai import types
from capstone.agent.stac_agent_adk import create_runner_async

async def run_query(query: str, user_id: str = "kaggle_user", session_id: str = "demo_session"):
    runner, _ = await create_runner_async(user_id=user_id, session_id=session_id)
    user_msg = types.Content(role="user", parts=[types.Part(text=query)])
    events = runner.run(user_id=user_id, session_id=session_id, new_message=user_msg)
    final_text = None
    for ev in events:
        if ev.is_final_response() and ev.content and ev.content.parts:
            final_text = ev.content.parts[0].text
    return final_text or "TODO: inspect intermediate events for details."

if os.getenv("GOOGLE_API_KEY"):
    normal_query = "TODO: fill normal query text"
    print(asyncio.run(run_query(normal_query)))
else:
    print("Skipping agent call because GOOGLE_API_KEY is missing.")


- TODO: Add markdown commentary on expected table columns (id, datetime UTC, cloud_cover, preview_url).
## Demo: Clarification Behavior
- Ambiguous query placeholder: “Show me cloud-free images around Sapporo in September.”
- Expected: agent asks for explicit date range before search.

In [None]:
# Code cell: trigger clarification behavior
if os.getenv("GOOGLE_API_KEY"):
    unclear_query = "TODO: fill ambiguous query text"
    print(asyncio.run(run_query(unclear_query, session_id="clarify_session")))
else:
    print("Skipping agent call because GOOGLE_API_KEY is missing.")


## Demo: Clarification Query


In [None]:

query = "Show me images from Hokkaido last summer"
response = root_agent.run(query)
print(response)


- TODO: Describe expected back-and-forth (resolve_aoi call; request for precise dates/clouds).
## Evaluation Notes
- Local tests: .venv/bin/pytest tests/unit -q (AOI catalog, wiring).
- ADK smoke: .venv/bin/python -m capstone.scripts.run_eval (loads normal/clarification/boundary evalsets).
- Full ADK CLI: adk eval src/capstone/scripts/eval ... --print_detailed_results > result.txt.
- TODO: Paste your actual results/metrics; note any failures or skipped cases.
- TODO: List known limitations / future fixes.
## Conclusion & Future Work
- TODO: Summarize agent value (NL → STAC search, deterministic tool use).
- TODO: Add roadmap items (more AOIs, additional sensors, caching, richer previews).
