This is the capstone project for the course "5-Day AI Agents: Intensive Vibe Coding Course With Google" on Kaggle. Its purpose is to showcase what I have learned by designing and implementing a practical modern AI agent system.
Investigating algorithmic platforms requires combining brittle, technically demanding data extraction with complex legal analysis. Traditional HTML scraping is fragile, and mapping massive datasets to evolving regulations like the Digital Markets Act (DMA) creates a severe bottleneck for researchers and auditors. There is a critical need for an automated workflow that reliably extracts data and drafts compliance reviews, while ensuring human experts retain ultimate sign-off authority.
LUX (Legal Uncovering, eXplainable) is a scoped, multi-agent system orchestrated by Google ADK. It automates the extraction of algorithmic data using highly reliable undocumented APIs, cross-references the findings against regulatory frameworks via RAG, and stages an evidence-backed compliance report for mandatory human review and approval via a web dashboard.
Primary Users: Academic researchers, investigative journalists, policy analysts, and NGO advocates.
Case Study
Auditing Amazon for self-preferencing (DMA Article 6). We will utilize the undocumented JSON API powering Amazon's "Our brands" filter to reliably bypass HTML brittleness and identify whether in-house brands are artificially boosted in search rankings.
References
- The API Inspector (Autonomous): Executes the technical extraction. Queries the Amazon search suggestions API using a sandboxed Python script, parses the JSON, and structures the output.
- The Regulatory Analyst (Advisory): Receives the structured data, queries the RAG knowledge base for DMA gatekeeper rules, and drafts a preliminary compliance assessment.
- fetch_amazon_brands (MCP Tool): Queries Amazon's search suggestion API to extract structured suggestions and classify brand types (private-label vs. third-party).
- query_dma_rag (Vertex Search Tool): Performs semantic search queries against a Vertex AI Search endpoint containing indexed regulatory texts (such as the DMA) to locate legal definitions and rules.
- Trigger: User inputs a keyword (e.g., "batteries") via the UI or playground.
- Agent Runtime: Vertex AI Agent Engine.
- Orchestration: Google ADK utilizing a stateful Graph API to manage the workflow and HITL pause state.
- Sandbox: Model Context Protocol (MCP) server for secure Python execution.
- Knowledge: Vertex AI Search (RAG) loaded with DMA documentation.
The diagram below shows the event-driven topology of the LUX workflow, illustrating the secure pipeline from automated ingestion to the management approval loop:
graph TD
User([User / Researcher]) -->|1. Inputs Keyword e.g. batteries| START
subgraph Workflow ["Stateful Graph (lux_audit_graph)"]
START[START Node] -->|2. Validate Input| VAL["Input Validation (validate_prompt_node)"]
VAL -->|3. Cleaned Keyword| API["API Inspector Node (api_inspector)"]
API -->|4. Invoke Tool| Tool1["fetch_amazon_brands"]
Tool1 -->|5. Return Suggestions| API
API -->|6. Structured Output| DM["Defense Middleware (defense_middleware_node)"]
DM -->|7. Sanitized Output| SC["Security Checkpoint (security_checkpoint_node)"]
SC -->|8a. Safe Path| RA["Regulatory Analyst Node (regulatory_analyst)"]
SC -->|8b. Injection Detected| HITL["HITL Pause Node (hitl_pause_node)"]
RA -->|9. Query RAG| Tool2["query_dma_rag"]
Tool2 -->|10. Legal Quotes & Chunks| RA
RA -->|11. Draft Compliance Report| HITL
HITL -->|12. Await Approval / Input| HITL_Approve{User Decision}
HITL_Approve -->|Approve / Reject| FR["Finalize Report Node (finalize_report_node)"]
end
FR -->|13. If Approved: Save Audit| DB[(audit_db.json)]
FR -->|14. Result Return| User
The tool connection and call flows differ between local development and deployed cloud environments.
In local development, the entry point is the local playground UI or terminal CLI. Tool calls route securely via JSON-RPC to the local MCP server running as a subprocess:
sequenceDiagram
autonumber
actor User as Developer (Local Playground UI)
participant ADK as Local ADK Graph Runtime
participant MCP as Local MCP Server (Subprocess)
participant API as External Service (Amazon / Mock Search)
User->>ADK: 1. Inputs keyword (e.g. Kindle)
ADK->>ADK: 2. Runs validate & defense nodes
ADK->>MCP: 3. Sends tool call JSON-RPC over stdio (fetch_amazon_brands)
MCP->>API: 4. Executes HTTP suggestion request
API-->>MCP: 5. Returns suggestion payload JSON
MCP-->>ADK: 6. Returns structured results
ADK->>User: 7. Renders output in local playground
In production, the entry point is the user-facing Researcher Portal. The portal initiates a session, and the deployed Vertex AI Reasoning Engine executes the graph nodes and calls tools directly as native Python functions:
sequenceDiagram
autonumber
actor User as Researcher (Web Portal Dashboard)
participant Portal as Researcher Portal Service
participant Engine as Vertex AI Reasoning Engine (Agent Runtime)
participant Tool as Native Python Tool (amazon_brands.py / dma_rag.py)
participant API as External Service (Amazon API / Vertex RAG Search)
User->>Portal: 1. Inputs keyword
Portal->>Engine: 2. Starts/Resumes Session (VertexAiSessionService)
Engine->>Engine: 3. Runs graph validation & defense nodes
Engine->>Tool: 4. Invokes Python function directly
Tool->>API: 5. Sends API request (HTTPS suggestions / Discovery Engine client)
API-->>Tool: 6. Returns results payload
Tool-->>Engine: 7. Returns validated python dict
Engine-->>Portal: 8. Streams workflow events & audit records
Portal->>User: 9. Displays finalized compliance report
- Workflow Engine: Google ADK (Stateful graph API).
- Runtime: Vertex AI Agent Engine (Gemini 1.5 Pro / Gemini 2.0 Flash).
- Tooling Sandbox: Cloud Run (MCP Server).
- Eventing: Cloud Pub/Sub for handling ambient triggers and UI state updates.
- Front-End: Cloud Run (Python) for the researcher portal dashboard.
- Pre-Execution Input Validation: User queries are sanitized and validated at the graph entry point (
validate_prompt_node) before triggering any LLM agent or tool execution, protecting the system from prompt injection, PII ingestion, URL/ASIN spam, and invalid search queries. - Prompt-Injection Defense: The API Inspector runs strictly in an isolated MCP container. Raw scraped JSON is recursively sanitized before being passed into the Regulatory Analyst's context window.
- PII Redaction: Raw receipts and payloads are passed through a regex-based redaction filter in the security checkpoint that intercepts and masks sensitive data such as SSNs and credit card numbers.
- Strict HITL: The system is hardcoded to pause state. It cannot publish or finalize a compliance report without explicit human API authorization.
- Explicit Labeling: All drafted reports carry an immutable "This is NOT legal advice" header.
lux-agent/
├── .agents/ # Workspace customizations and skills
│ └── skills/
│ ├── fetch_amazon_brands/ # Skill for auditing Amazon private labels
│ │ ├── SKILL.md # Skill definition for Amazon API extraction
│ │ └── references/ # Amazon suggestion API rules & specifications
│ └── query_dma_rag/ # Skill for querying DMA gatekeeper rules
│ ├── SKILL.md # Skill definition and semantic search parameters
│ └── references/ # RAG citation guidelines & edge-case rules
├── app/ # Core agent engine nodes & graph
│ ├── agent.py # Workflow factory and entry point definitions
│ ├── agent_runtime_app.py # ADK Reasoning Engine setup and clone wrapper
│ ├── fast_api_app.py # Local playground API transport wrapper
│ ├── core/ # Decomposed business logic & configurations
│ │ ├── config.py # Stateless settings management
│ │ ├── persistence.py # Concurrency-safe GCS/local audit persistence
│ │ ├── security.py # Prompt-injection validation & PII redaction
│ │ ├── validation.py # Input query validation
│ │ └── adapters/
│ │ └── pubsub.py # Starlette request payload pubsub adapters
│ ├── tools/ # Native tool Python implementations
│ │ ├── amazon_brands.py # Search suggestions and brand classifiers
│ │ └── dma_rag.py # DMA compliance Discovery Engine RAG tool
│ └── app_utils/ # Shared helpers, telemetry, and custom types
├── artifacts/ # Evaluator logs and telemetry traces
├── deployment/ # IaC scripts for cloud deployment
├── docs/ # Technical guides & documentation
│ └── technical_know_how.md # Orchestration, sequence and reference guide
├── frontend/ # Web dashboard (Researcher Portal)
│ ├── main.py # FastAPI portal dashboard
│ ├── config.py # Portal environmental configurations
│ ├── Dockerfile # Container configuration for portal deployment
│ └── README.md # Dashboard launch manual
├── mcp_server/ # Local Model Context Protocol server
├── tests/ # Test harness (unit, integration, and mocks)
│ ├── conftest.py # Global test fixtures
│ ├── unit/ # Unit tests (config, persistence, pubsub, api)
│ └── integration/ # Integration tests (Reasoning Engine stream tests)
├── BACKLOG.md # Product backlog & roadmap
├── GEMINI.md # Development flywheel guide
├── agents-cli-manifest.yaml # ADK workspace configuration manifest
└── pyproject.toml # Project dependency configuration
Before running the agent, make sure you have:
- uv: Fast Python package manager. Install uv.
- agents-cli: Google Agents CLI. Install with
uv tool install google-agents-cli. - Google Cloud SDK: For local authentication. Install gcloud.
- Install project dependencies:
agents-cli install
- Initialize local credentials:
gcloud auth application-default login
- Run the local development playground:
agents-cli playground
| Command | Description |
|---|---|
agents-cli playground |
Launches the interactive local development UI |
uv run pytest tests/unit tests/integration |
Runs the test suite |
agents-cli lint |
Runs formatting and styling lint checks |
agents-cli eval generate |
Evaluates agent behavior against defined cases |
agents-cli deploy |
Deploys the application target to dev |
- Code edits are performed in
app/agent.pyand tools should be added as MCP server definitions undermcp_server/server.py. - Hot-reloading is supported in the playground during local development.
Deployment to Dev environment:
gcloud config set project <your-project-id>
agents-cli deployIn the local development environment, the agent communicates with the tools via the Model Context Protocol (MCP).
- Server Process / Command: The playground launches the MCP server locally as a subprocess using:
uv run --project mcp_server python mcp_server/server.py
- Endpoint / Protocol: There is no HTTP endpoint or port. The local agent communicates with the MCP server using JSON-RPC over standard input/output (stdio).
- Underlying Services Called by the Local MCP Server:
fetch_amazon_brands: Queries the public Amazon API endpoint:https://completion.amazon.com/api/2017/suggestionsquery_dma_rag: If Google Cloud credentials and Vertex AI Search environment variables are configured, it queries the GCP Discovery Engine API. If missing, it runs a local in-memory simulation directly in Python.
When deployed to Google Cloud (Vertex AI Reasoning Engine / Agent Runtime), the local mcp_server folder does not exist. The agent code dynamically detects this and falls back to running the tools as native Python functions defined inside app/agent.py.
fetch_amazon_brandsEndpoint: Directly makes an HTTP GET request tohttps://completion.amazon.com/api/2017/suggestions.query_dma_ragEndpoint: Calls the Vertex AI Search (Discovery Engine) API via the Google Cloud Client Library (discoveryengine.SearchServiceClient()) pointing to:projects/{project_id}/locations/{location}/collections/default_collection/dataStores/{data_store_id}/servingConfigs/default_searchIf unconfigured, it runs a local in-memory simulation.
The project employs a three-tier testing strategy spanning unit tests, integration tests, and agentic evaluations:
-
29 unit tests spanning multiple targeting domains:
- 13 validation tests (tests/unit/test_dummy.py) covering input validation across 6 security categories:
- ✅ Happy-path acceptance (
"Kindle","AA batteries","smart-watch","men's shoes") - 🛡 Input sanitization (type checking, empty/whitespace rejection)
- 📏 Length boundaries (min 2 chars, max 50 chars)
- 🔐 Security — character allowlist/blocklist, 11 illegal characters (
< > { } [ ] \ / ; = *), and anti-prompt-injection signatures ("ignore instructions","bypass system","system prompt","print rules") - 🏷 Domain logic — ASIN rejection (
B08QF1V9T2), URL rejection (www.amazon.co.uk,amazon.com) - 🛡 Ethical — PII blocking (email, phone, SSN) and NSFW/harmful content filtering
- ✅ Happy-path acceptance (
- 6 Amazon suggestion tests (tests/unit/test_amazon_api.py) covering error handles, retries, and fallbacks.
- 3 configuration tests (tests/unit/test_config.py) validating environment variables and locations.
- 4 persistence repository tests (tests/unit/test_persistence.py) asserting GCS and local db operations.
- 3 pubsub adapter tests (tests/unit/test_pubsub_adapter.py) validating payload path parsers.
- 13 validation tests (tests/unit/test_dummy.py) covering input validation across 6 security categories:
-
3 integration tests (tests/integration/):
test_agent_stream— Full ADKRunnerwith SSE streaming for"Please audit the keyword: batteries"test_agent_stream_query—AgentEngineApp.async_stream_queryrunning cloned isolated replicastest_agent_feedback—register_feedbackvalidation logs
-
Test infrastructure (tests/conftest.py): patches
discoveryengine_v1.SearchServiceClientacross all tests for offline safety. -
3 custom agentic eval metrics (
tests/eval/eval_config.yaml) — deterministic Python scorers (not LLM-as-judge):Metric What It Checks assert_mock_payload_parsedAPI Inspector correctly parses Amazon JSON payload assert_dma_citesRegulatory Analyst cites DMA legal framework assert_hitl_haltWorkflow halts at the Human-in-the-Loop node -
3 eval dataset cases (
tests/eval/datasets/lux-audit-dataset.json):mock_payload_parsing(deterministic mock path),dma_citation(live API + RAG),hitl_interception(novel keyword halt check).
-
Eval run (2026-06-22) — 9/9 perfect scores (1.0 mean, 0.0 std dev, 0 errors):
Metric Cases Mean Score Status assert_mock_payload_parsed3/3 1.00 ✅ PASS assert_dma_cites3/3 1.00 ✅ PASS assert_hitl_halt3/3 1.00 ✅ PASS -
4 production audit reports stored in
audit_db.json, demonstrating real end-to-end workflow completions:Keyword Risk Decision Notable Finding cablesLow ✅ Approved All suggestions third-party — no self-preferencing detected kindleHigh ✅ Approved Amazon's own brand misclassified as "third_party"— DMA Art. 6(5) concernbatteriesMedium ✅ Approved Neutral suggestions; deeper algorithm audit recommended bookLow ✅ Approved All suggestions third-party; affiliated brands noted
-
Security (5 validations):
- Pre-execution input sanitization via
validate_prompt_node(runs before any LLM/tool) - Defense middleware strips HTML,
javascript:,eval(),exec()and enforces a 4000-char token ceiling - PII redaction (
[REDACTED_SSN],[REDACTED_CC]) at the security checkpoint - Injection route bypass — malicious payloads skip the LLM entirely → routed directly to HITL
- MCP sandbox isolation — Amazon API tool runs in a separate process via stdio JSON-RPC
- Pre-execution input sanitization via
-
Functional (6 validations):
- API data extraction from
completion.amazon.comwithhouse_brand/third_partyclassification - RAG knowledge retrieval with graceful fallback (Vertex AI Search → local simulation)
- Pydantic-enforced output schemas (
APIInspectorOutput,RegulatoryReport) - HITL workflow halt confirmed by eval metric on 3/3 cases
- Report persistence to
audit_db.jsonwith full audit trail - Dual streaming support (sync SSE + async streaming)
- API data extraction from
-
Governance (4 validations):
- Mandatory HITL gate — hardcoded in the workflow graph, no bypass path exists
- Legal disclaimer enforced in Regulatory Analyst instructions and verified in all 4 audit records
- Feedback pipeline with Pydantic validation and Cloud Logging
- OpenTelemetry observability (Cloud Trace, GCS prompt logging, commit SHA versioning)
-
Deployment (4 validations):
- Dual-mode tool resolution — detects
mcp_server/at runtime (MCP locally, native functions in cloud) - Location regression guard — 3 dedicated unit tests prevent
LOCATION=globalbug - Terraform IaC in
deployment/terraform/ - Containerized frontend via
frontend/Dockerfilefor Cloud Run
- Dual-mode tool resolution — detects
Summary: 30 total validations — all passing. Test command:
uv run pytest tests/unit tests/integration
LUX automatically exports tracing telemetry and logs to:
- Cloud Trace: For monitoring node latencies.
- BigQuery: Agent execution analytics and prompt telemetry.
- Cloud Logging: System events.
- Adding more Agent Skills for other algorithmic and regulatory audits.
- Translating legal requirements (e.g., DMA, AI Act, GDPR, DSA transparency obligations) into technical testing parameters, drafting the accountability reporting checklist.
- Scaling this App to support diverse use cases across multiple digital and AI platforms.
This project is licensed under the MIT License.