# üöÄ Deploy ADK Agent to Vertex AI Agent Engine

> Previous:  A2A protocol for agent interoperability, deployed to local server

> Now: Production deployment with [Vertex AI Agent Engine](https://docs.cloud.google.com/agent-builder/agent-engine/overview)

- ‚úÖ Build a production-ready ADK agent
- ‚úÖ Deploy your agent to [**Vertex AI Agent Engine**](https://docs.cloud.google.com/agent-builder/agent-engine/overview) using the ADK CLI
- ‚úÖ Test your deployed agent with Python SDK
- ‚úÖ Monitor and manage deployed agents in Google Cloud Console
- ‚úÖ Understand how to add Memory to your Agent using Vertex AI Memory Bank
- ‚úÖ Understand cost management and cleanup best practices

---
# 1. Setup


In [1]:
!gcloud auth application-default login
import os
import random
import time
import vertexai
from vertexai import agent_engines

print("‚úÖ Imports completed successfully")

Your browser has been opened to visit:

    https://accounts.google.com/o/oauth2/auth?response_type=code&client_id=764086051850-6qr4p6gpi6hn506pt8ejuq83di341hur.apps.googleusercontent.com&redirect_uri=http%3A%2F%2Flocalhost%3A8085%2F&scope=openid+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fuserinfo.email+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fcloud-platform+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fsqlservice.login&state=S862gZx9OdpvjvLSf0rDNZFfVzD7nb&access_type=offline&code_challenge=rlpvp52UQIptpD1ebiBptBxg_zN6volChi-Amk55300&code_challenge_method=S256


Credentials saved to file: [/Users/xing.zhang/.config/gcloud/application_default_credentials.json]

These credentials will be used by any library that requests Application Default Credentials (ADC).

Quota project "gen-lang-client-0004570932" was added to ADC which can be used by Google client libraries for billing and quota. Note that some services may still bill the project owning the resource.
‚úÖ Imports completed successfully


In [2]:
## Set your PROJECT_ID in .env
import os
from dotenv import load_dotenv
load_dotenv()

PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")

print(f"‚úÖ Project ID set to: {PROJECT_ID}")

‚úÖ Project ID set to: gen-lang-client-0004570932


---

# 2. Create Agent with ADK

Build a **Weather Assistant** that is optimized for production testing with the following configuration:

- **Model:** Uses gemini-2.5-flash-lite for low latency and cost-efficiency.
- **Tools:** Includes a `get_weather` function to demonstrate tool execution.
- **Persona:** Responds conversationally to prove the instruction-following capabilities.

This demonstrates the foundational ADK architecture we are about to package: **Agent + Tools + Instructions**.

We'll create the following files and directory structure:

```
sample_agent/
‚îú‚îÄ‚îÄ agent.py                  # The logic
‚îú‚îÄ‚îÄ requirements.txt          # The libraries
‚îú‚îÄ‚îÄ .env                      # The secrets/config
‚îî‚îÄ‚îÄ .agent_engine_config.json # The hardware specs
```

### Step 1: create repo if not already there

In [6]:
!mkdir -p sample_agent

### Step 2: write requirements.txt

In [7]:
%%writefile sample_agent/requirements.txt

google-adk
opentelemetry-instrumentation-google-genai

Overwriting sample_agent/requirements.txt


### Step 3: create .env

- Uses the `global` endpoint for Gemini API calls
- Configures ADK to use Vertex AI instead of Google AI Studio

In [14]:
%%writefile sample_agent/.env

# https://cloud.google.com/vertex-ai/generative-ai/docs/learn/locations#global-endpoint
GOOGLE_CLOUD_LOCATION="global"

# Set to 1 to use Vertex AI, or 0 to use Google AI Studio
GOOGLE_GENAI_USE_VERTEXAI=1 

Overwriting sample_agent/.env


### Step 4: agent code

In [15]:
%%writefile sample_agent/agent.py

from google.adk.agents import Agent
import vertexai
import os

vertexai.init(
    project=os.environ["GOOGLE_CLOUD_PROJECT"],
    location=os.environ["GOOGLE_CLOUD_LOCATION"],
)

def get_weather(city: str) -> dict:
    """
    Returns weather information for a given city.

    This is a TOOL that the agent can call when users ask about weather.
    In production, this would call a real weather API (e.g., OpenWeatherMap).
    For this demo, we use mock data.

    Args:
        city: Name of the city (e.g., "Tokyo", "New York")

    Returns:
        dict: Dictionary with status and weather report or error message
    """
    # Mock weather database with structured responses
    weather_data = {
        "san francisco": {"status": "success", "report": "The weather in San Francisco is sunny with a temperature of 72¬∞F (22¬∞C)."},
        "new york": {"status": "success", "report": "The weather in New York is cloudy with a temperature of 65¬∞F (18¬∞C)."},
        "london": {"status": "success", "report": "The weather in London is rainy with a temperature of 58¬∞F (14¬∞C)."},
        "tokyo": {"status": "success", "report": "The weather in Tokyo is clear with a temperature of 70¬∞F (21¬∞C)."},
        "paris": {"status": "success", "report": "The weather in Paris is partly cloudy with a temperature of 68¬∞F (20¬∞C)."}
    }

    city_lower = city.lower()
    if city_lower in weather_data:
        return weather_data[city_lower]
    else:
        available_cities = ", ".join([c.title() for c in weather_data.keys()])
        return {
            "status": "error",
            "error_message": f"Weather information for '{city}' is not available. Try: {available_cities}"
        }

root_agent = Agent(
    name="weather_assistant",
    model="gemini-2.5-flash-lite",  # Fast, cost-effective Gemini model
    description="A helpful weather assistant that provides weather information for cities.",
    instruction="""
    You are a friendly weather assistant. When users ask about the weather:

    1. Identify the city name from their question
    2. Use the get_weather tool to fetch current weather information
    3. Respond in a friendly, conversational tone
    4. If the city isn't available, suggest one of the available cities

    Be helpful and concise in your responses.
    """,
    tools=[get_weather]
)

Overwriting sample_agent/agent.py


---
# 3. Deploy to Agent Engine

ADK supports multiple deployment platforms ([docs](https://google.github.io/adk-docs/deploy/)). This notebook uses **Vertex AI Agent Engine**‚Äîfully managed, auto-scaling, with built-in session management ([guide](https://google.github.io/adk-docs/deploy/agent-engine/)).

**Cost note**: Agent Engine has a [monthly free tier](https://cloud.google.com/agent-builder/docs/agent-engine/overview#pricing). Clean up promptly to avoid charges.

**Other options**: [Cloud Run](https://google.github.io/adk-docs/deploy/cloud-run/) (serverless, quick start) | [GKE](https://google.github.io/adk-docs/deploy/gke/) (full control, complex systems)


### Step 1: create deployment configuration

The `.agent_engine_config.json` file controls the deployment settings.

- `min_instances` Scales down to zero when not in use (saves costs)
- `max_instances` Maximum of 1 instance running (sufficient for this demo)
- CPU core per instance, 1 GB of memory per instance


In [16]:
%%writefile sample_agent/.agent_engine_config.json
{
    "min_instances": 0, 
    "max_instances": 1, 
    "resource_limits": {"cpu": "1", "memory": "1Gi"} 
}

Overwriting sample_agent/.agent_engine_config.json


### Step 2. Select deployment region

- Choose a region close to your users for lower latency
- Consider data residency requirements
- Check the [Agent Engine locations documentation](https://cloud.google.com/vertex-ai/generative-ai/docs/agent-engine/overview#locations)

In [17]:
regions_list = ["europe-west1", "europe-west4", "us-east4", "us-west1"]
#deployed_region = random.choice(regions_list)
deployed_region = "us-west1"

### Step 3. Deploy the agent

`adk deploy agent_engine`

1. Packages your agent code (`sample_agent/` directory)
2. Uploads it to Agent Engine
3. Creates a containerized deployment
4. Outputs a resource name like: `projects/PROJECT_NUMBER/locations/REGION/reasoningEngines/ID`

In [18]:
# check all files in the sample_agent repo for deployment
!ls -la ./sample_agent

total 32
drwxr-xr-x@  6 xing.zhang  staff   192 Feb  1 17:47 [34m.[m[m
drwxr-xr-x@ 27 xing.zhang  staff   864 Feb  1 18:01 [34m..[m[m
-rw-r--r--@  1 xing.zhang  staff   108 Feb  1 18:05 .agent_engine_config.json
-rw-r--r--@  1 xing.zhang  staff   208 Feb  1 18:04 .env
-rw-r--r--@  1 xing.zhang  staff  2337 Feb  1 18:04 agent.py
-rw-r--r--@  1 xing.zhang  staff    55 Feb  1 18:00 requirements.txt


In [19]:
!adk deploy agent_engine --project=$PROJECT_ID --region=$deployed_region sample_agent --agent_engine_config_file=sample_agent/.agent_engine_config.json

Staging all files in: /Users/xing.zhang/github/google_5day_ai_202511/sample_agent_tmp20260201_180551
Copying agent source code...
Copying agent source code complete.
Resolving files and dependencies...
Reading agent engine config from sample_agent/.agent_engine_config.json
Reading environment variables from /Users/xing.zhang/github/google_5day_ai_202511/sample_agent/.env
[33mIgnoring GOOGLE_CLOUD_LOCATION in .env as `--region` was explicitly passed and takes precedence[0m
Initializing Vertex AI...
Vertex AI initialized.
Created sample_agent_tmp20260201_180551/agent_engine_app.py
Files and dependencies resolved
Deploying to agent engine...
[32m‚úÖ Created agent engine: projects/118481962963/locations/us-west1/reasoningEngines/6303016376923586560[0m
Cleaning up the temp folder: sample_agent_tmp20260201_180551


---

# 4. Retrieve and Test Deployed Agent

### 4.1 Retrieve the deployed agent

Below cell does

1. Initializes the Vertex AI SDK with your project and region
2. Lists all deployed agents in that region
3. Gets the first one (most recently deployed)
4. Stores it as `remote_agent` for testing

In [20]:
# Initialize Vertex AI
vertexai.init(project=PROJECT_ID, location=deployed_region)

# Get the most recently deployed agent
agents_list = list(agent_engines.list()) # directly imported by "from vertexai import agent_engines"
if agents_list:
    remote_agent = agents_list[0]  # Get the first (most recent) agent
    client = agent_engines
    print(f"‚úÖ Connected to deployed agent: {remote_agent.resource_name}")
else:
    print("‚ùå No agents found. Please deploy first.")

‚úÖ Connected to deployed agent: projects/118481962963/locations/us-west1/reasoningEngines/6303016376923586560


### 4.2 Test the deployed agent

In [21]:
async for item in remote_agent.async_stream_query(
    message="What is the weather in Tokyo?",
    user_id="user_42",
):
    print(item)

{'model_version': 'gemini-2.5-flash-lite', 'content': {'parts': [{'function_call': {'id': 'adk-9de65626-8efd-4570-9ae2-829ed62a8eab', 'args': {'city': 'Tokyo'}, 'name': 'get_weather'}}], 'role': 'model'}, 'finish_reason': 'STOP', 'usage_metadata': {'candidates_token_count': 5, 'candidates_tokens_details': [{'modality': 'TEXT', 'token_count': 5}], 'prompt_token_count': 232, 'prompt_tokens_details': [{'modality': 'TEXT', 'token_count': 232}], 'thoughts_token_count': 61, 'total_token_count': 298, 'traffic_type': 'ON_DEMAND'}, 'avg_logprobs': -1.4990894317626953, 'invocation_id': 'e-dc63a129-5c9c-4588-a12f-c2655eec3a50', 'author': 'weather_assistant', 'actions': {'state_delta': {}, 'artifact_delta': {}, 'requested_auth_configs': {}, 'requested_tool_confirmations': {}}, 'long_running_tool_ids': [], 'id': '5979806e-eb71-474c-9815-3508478cde5d', 'timestamp': 1769998343.892178}
{'content': {'parts': [{'function_response': {'id': 'adk-9de65626-8efd-4570-9ae2-829ed62a8eab', 'name': 'get_weather'

---

# 5. Long-Term Memory with Vertex AI Memory Bank

Session memory only lasts one conversation‚Äîusers must repeat preferences every time.

**Memory Bank** provides persistent memory across sessions:

| Session Memory | Memory Bank |
|----------------|-------------|
| Single conversation | All conversations |
| Forgets when session ends | Remembers permanently |

**How it works:**
1. During conversations ‚Üí Agent searches past facts via memory tools
2. After conversations ‚Üí Key info extracted ("User prefers Celsius")
3. Next session ‚Üí Agent recalls automatically

**To enable:** Add `PreloadMemoryTool` + save callback to your agent, then redeploy.

üìö [ADK Memory Guide](https://google.github.io/adk-docs/sessions/memory/) | [Memory Tools](https://google.github.io/adk-docs/tools/built-in-tools/) | [Sample Notebook](https://github.com/GoogleCloudPlatform/generative-ai/blob/main/agents/agent_engine/memory_bank/get_started_with_memory_bank_on_adk.ipynb)

---

# 6. Cleanup

**‚ö†Ô∏è Always delete resources when done testing!**

Leaving the agent running can incur costs. When you're done testing and querying your deployed agent, it's recommended to delete your remote agent to avoid incurring additional costs.

- `resource_name=remote_agent.resource_name` - Identifies which agent to delete
- `force=True` - Forces deletion even if the agent is running

The deletion process typically takes 1-2 minutes. You can verify deletion in the [Agent Engine Console](https://console.cloud.google.com/vertex-ai/agents/agent-engines).

In [22]:
agent_engines.delete(resource_name=remote_agent.resource_name, force=True)

print("‚úÖ Agent successfully deleted")

Deleting AgentEngine resource: projects/118481962963/locations/us-west1/reasoningEngines/6303016376923586560
Delete AgentEngine backing LRO: projects/118481962963/locations/us-west1/operations/6817746973030875136
AgentEngine resource deleted: projects/118481962963/locations/us-west1/reasoningEngines/6303016376923586560
‚úÖ Agent successfully deleted
