🚀 Gemma 4 DevOps Agents

Welcome to the Gemma-4 DevOps Agents workspace. This repository contains three specialized, self-hosted AI-driven DevOps/SRE agents powered by Google's Gemma 4 model. These agents are packaged as Model Context Protocol (MCP) servers to analyze, monitor, and troubleshoot infrastructure components.

📂 Project Structure

This workspace is organized into five distinct sub-agents, each tailored to a specific environment and serving stack:

Sub-Agent	Purpose	Serving Engine	Target Infrastructure
Local DevOps Agent	CPU/GPU local analysis & prototyping	Ollama / vLLM	Local Docker / Workstations
GPU DevOps Agent (26B)	Serverless GPU-accelerated cloud analysis (26B config)	vLLM	Google Cloud Run (us-central1)
GPU DevOps Agent (6000)	Serverless GPU-accelerated cloud analysis (RTX 6000)	vLLM	Google Cloud Run (us-central1)
GPU DevOps Agent (vLLM)	Serverless GPU-accelerated cloud analysis (L4 GPU)	vLLM	Google Cloud Run (us-east4)
TPU DevOps Agent	Ultra-high performance enterprise log & infra analysis	vLLM	Google Cloud TPUs (v6e Trillium)

🛠 Features & Capabilities

Automated SRE Diagnostics: Fetches and reviews system, container, and Cloud Logging entries using Gemma 4 to identify root causes and generate 3-step remediation plans.
Serving Stack Control: Built-in tools to provision, start, stop, restart, and scale your vLLM and Ollama containers or Cloud TPU Queued Resources.
Observability Dashboards: Real-time dashboards monitoring HBM usage, Tensor Core pressure, Prometheus metrics, and service latencies.
Model Benchmarking: Tools to run load tests and vLLM's internal benchmark suites, returning performance metrics (TTFT, throughput, P95 latency).
Gemini CLI Integration: Custom setup instructions using a LiteLLM Proxy to route standard Gemini CLI commands directly to your private, self-hosted Gemma 4 instance.

🏗 Global Makefile Usage

A root Makefile is provided to manage the sub-agents collectively:

Help / Display commands:
```
make all
```
Install dependencies in all subdirectories:
```
make install
```
Run tests across all agents:
```
make test
```
Lint all Python directories:
```
make lint
```
Clean build/cache folders:
```
make clean
```

🚀 Sub-Agent Overviews

1. Local DevOps Agent

Role: Specialized SRE specialized in local containerized workloads.
Inference Stack: Runs gemma4:e2b or google/gemma-4-E2B-it via local Docker (ollama/ollama or CPU/GPU vLLM).
Key Tools:
- manage_docker: Manage the local container.
- analyze_local_logs: Automated log diagnostic reports.
- query_gemma4_with_stats: Measure local inference latency and throughput.
- get_help: Retrieve server configuration and tool details.
Documentation: See local-devops-agent/README.md and local-devops-agent/GEMINI.md.

2. GPU DevOps Agent (26B)

Role: Cloud-based SRE managing GPU-accelerated serverless endpoints (26B configuration).
Inference Stack: Runs google/gemma-4-26B-A4B-it via vLLM on GCP Cloud Run (RTX 6000 GPU in us-central1).
Key Tools:
- deploy_vllm: Automates serverless Cloud Run GPU vLLM deployments.
- analyze_cloud_logging: Summarizes Google Cloud Logging errors.
- get_vllm_deployment_config: Generates gcloud configuration options.
- get_help: Retrieve server configuration and tool details.
Documentation: See gpu-26B-devops-agent/README.md.

3. GPU DevOps Agent (6000)

Role: Cloud-based SRE managing GPU-accelerated serverless endpoints (RTX 6000 config).
Inference Stack: Runs google/gemma-4-26B-A4B-it via vLLM on GCP Cloud Run (RTX 6000 GPU in us-central1).
Key Tools:
- deploy_vllm: Automates serverless Cloud Run GPU vLLM deployments.
- analyze_cloud_logging: Summarizes Google Cloud Logging errors.
- get_vllm_deployment_config: Generates gcloud configuration options.
- get_help: Retrieve server configuration and tool details.
Documentation: See gpu-6000-devops-agent/README.md.

4. GPU DevOps Agent (vLLM)

Role: Cloud-based SRE managing GPU-accelerated serverless endpoints (L4 configuration).
Inference Stack: Runs google/gemma-4-E4B-it via vLLM on GCP Cloud Run (NVIDIA L4 GPU in us-east4).
Key Tools:
- deploy_vllm: Automates serverless Cloud Run GPU vLLM deployments.
- analyze_cloud_logging: Summarizes Google Cloud Logging errors.
- get_vllm_deployment_config: Generates gcloud configuration options.
- get_help: Retrieve server configuration and tool details.
Documentation: See gpu-vllm-devops-agent/README.md.

5. TPU DevOps Agent

Role: High-performance TPU SRE/DevOps managing large-scale private clusters.
Inference Stack: Runs google/gemma-4-31B-it via vLLM on Google Cloud TPUs (v6e Trillium / Flex-start VMs).
Key Tools:
- manage_queued_resource: Manage the TPU Queued Resource (create, check, etc.).
- run_vllm_benchmark: Run performance benchmark on TPU.
- query_queued_gemma4_with_stats: Query model on TPU and measure latency/throughput.
- get_help: Retrieve server configuration and tool details.
Documentation: See tpu-vllm-devops-agent/README.md and tpu-vllm-devops-agent/GEMINI.md.

🔒 Security & Credentials

When deploying to Google Cloud or Hugging Face, secure credentials using:

Hugging Face Access Token: Saved locally or to Google Secret Manager via save_hf_token tools.
Application Default Credentials (ADC): Set up using GCP credentials helper scripts (set_adc.sh inside individual sub-agent folders).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚀 Gemma 4 DevOps Agents

📂 Project Structure

🛠 Features & Capabilities

🏗 Global Makefile Usage

🚀 Sub-Agent Overviews

1. Local DevOps Agent

2. GPU DevOps Agent (26B)

3. GPU DevOps Agent (6000)

4. GPU DevOps Agent (vLLM)

5. TPU DevOps Agent

🔒 Security & Credentials

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
gpu-6000-devops-agent		gpu-6000-devops-agent
gpu-vllm-devops-agent		gpu-vllm-devops-agent
local-devops-agent		local-devops-agent
tpu-vllm-devops-agent		tpu-vllm-devops-agent
GEMINI.md		GEMINI.md
Makefile		Makefile
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

🚀 Gemma 4 DevOps Agents

📂 Project Structure

🛠 Features & Capabilities

🏗 Global Makefile Usage

🚀 Sub-Agent Overviews

1. Local DevOps Agent

2. GPU DevOps Agent (26B)

3. GPU DevOps Agent (6000)

4. GPU DevOps Agent (vLLM)

5. TPU DevOps Agent

🔒 Security & Credentials

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages