-
Notifications
You must be signed in to change notification settings - Fork 0
Architecture
Klanker Maker follows AWS Organizations best practices, supporting either a three-account or two-account topology. In both models, sandboxes run in a dedicated application account - completely separated from the account that owns the domain and applies SCP policies.
| Account | Role | What Lives Here | Why Separate |
|---|---|---|---|
| Management | DNS, identity, org root | Route53 hosted zone, domain registration, AWS SSO, Organizations root, SCP attachments | Domain and identity are org-wide - they don't belong in a sandbox blast radius |
| Terraform | State and provisioning | S3 state buckets, DynamoDB lock tables, cross-account provisioning role | Terraform state contains every resource ARN and secret path - isolating it limits exposure if the application account is compromised |
| Application | Sandbox execution | Regional VPCs, EC2/ECS instances, IAM sandbox roles, SES, Lambda handlers, DynamoDB budget table, S3 artifacts, CloudWatch Logs | This is where agents run - if an agent escapes its sandbox, it can only reach resources in this account, not state or DNS |
In a two-account topology, the Terraform and Application accounts are the same - set both account IDs to the same value during km configure. Simpler for development; the management account stays separate for SCP and DNS.
Authentication is via AWS SSO with named profiles:
aws sso login --profile klanker-management # DNS, domain, SCP
aws sso login --profile klanker-terraform # State, provisioning
aws sso login --profile klanker-application # VPC, sandbox runtimeThe km CLI selects the right AWS profile per command automatically.
km configure prompts for resource_prefix (default km) and email_subdomain - propagated to Terragrunt via KM_RESOURCE_PREFIX and KM_EMAIL_SUBDOMAIN. This lets you run multiple isolated km installs in the same AWS account (e.g., prod- and staging- prefixes). See OPERATOR-GUIDE.md Β§ Multi-instance support.
No hardcoded account IDs. No hardcoded domains. A fork with a different domain works end-to-end after km configure.
km CLI
βββ cmd/km/ CLI entry point
βββ cmd/ttl-handler/ Lambda: TTL expiry + artifact upload
βββ cmd/budget-enforcer/ Lambda: budget ceiling enforcement
βββ cmd/create-handler/ Lambda: remote sandbox creation via EventBridge
βββ cmd/email-create-handler/ Lambda: email-driven sandbox creation
βββ cmd/github-token-refresher/ Lambda: GitHub App installation token refresh
βββ cmd/km-slack-bridge/ Lambda: Slack outbound + inbound bridge (Function URL)
βββ internal/app/cmd/ Cobra commands (configure, bootstrap, init, validate,
β create, clone, destroy, pause/resume/lock, stop, extend,
β roll, at, list, status, logs, budget, shell, agent,
β doctor, otel, info, rsync, email, slack, ami)
βββ pkg/
β βββ profile/ SandboxProfile schema, validation, inheritance
β βββ compiler/ Profile β Terragrunt artifacts (EC2 + ECS paths)
β βββ ebpf/ eBPF enforcer (cgroup BPF programs, DNS resolver,
β β audit consumer, SSL uprobes)
β βββ aws/ SDK helpers (S3, SES, CloudWatch, DynamoDB,
β β EventBridge Scheduler, identity/signing)
β βββ terragrunt/ Runner + per-sandbox state isolation
β βββ lifecycle/ TTL scheduling, idle detection, teardown
β βββ github/ GitHub App token management (multi-account)
β βββ allowlistgen/ Allowlist generation from observed traffic
β βββ at/ Deferred/recurring operation scheduling
β βββ localnumber/ Persistent local sandbox numbering
βββ sidecars/
β βββ dns-proxy/ DNS allowlist filter (UDP/TCP:53)
β βββ http-proxy/ HTTP allowlist + AI token metering (Bedrock, Anthropic, OpenAI)
β βββ audit-log/ Command + network log router with secret redaction
β βββ tracing/ OTel Collector sidecar (logs, metrics β S3)
βββ km-slack/ Sandbox-side Slack post binary (Ed25519-signed)
βββ km-slack-bridge/ Bridge Lambda source
βββ profiles/ Built-in YAML profiles
βββ infra/
βββ modules/ Terraform modules (network, ec2spot, ecs-cluster,
β ecs-task, ecs-service, efs, ses, scp,
β dynamodb-budget, dynamodb-identities,
β dynamodb-sandboxes, dynamodb-schedules,
β budget-enforcer, create-handler, email-handler,
β github-token, s3-replication, ttl-handler,
β ecs-spot-handler, slack-bridge)
βββ live/ Terragrunt hierarchy (site.hcl, per-sandbox isolation)
Editable architecture diagram: docs/sandbox-architecture.excalidraw - open in excalidraw.com or the VS Code Excalidraw extension.
Klanker Maker is itself an AWS application. The km CLI is the front door, but most of the platform runs as Lambdas, EventBridge schedules, DynamoDB tables, and SQS queues - so a sandbox can be created, modified, or destroyed from anywhere there's AWS API access.

| Service | Role |
|---|---|
| EventBridge Scheduler | Drives km at deferred and recurring operations (one-shot creates, nightly destroys, recurring agent runs). Per-sandbox TTL schedules trigger the TTL handler Lambda. |
Lambda - km-create-handler |
Remote sandbox creation (km create --remote). The CLI publishes the profile to EventBridge; the Lambda runs the compile + Terragrunt apply with a service role. |
Lambda - km-ttl-handler |
Fires on TTL expiry: artifact upload to S3, lifecycle email + Slack notification, Terragrunt destroy, EventBridge schedule cancel. |
Lambda - km-budget-enforcer |
Triggered when the proxy reports a budget breach: revokes Bedrock IAM permissions on the sandbox role; on km budget add, restores them. |
Lambda - km-email-create-handler |
SES inbound rule routes operator emails to this handler. Haiku interprets free-form English (km at create, please destroy worker-3), validates safe-phrase auth, dispatches the action. |
Lambda - km-github-token-refresher |
Refreshes GitHub App installation tokens before expiry, writes to per-sandbox SSM Parameter Store paths. |
Lambda - km-slack-bridge |
Function URL that receives signed payloads from sandboxes (outbound notifications) and signing-secret-verified Slack /events webhooks (inbound). Posts to Slack Web API; enqueues inbound to per-sandbox SQS FIFO queues. |
DynamoDB - km-sandboxes |
Source of truth for the fleet. km list, km status, alias lookups, GSIs for Slack channel ID β sandbox lookup. |
DynamoDB - km-budget (Global Table) |
Per-sandbox spend counters, replicated to every region where agents run. Sub-millisecond reads from inside the sandbox. |
DynamoDB - km-identities |
Public Ed25519 keys for every sandbox; used by recipients to verify inbound email signatures. |
DynamoDB - km-slack-threads |
(channel_id, thread_ts) β claude_session_id mapping for resumable Slack-driven Claude sessions. TTL-expired after 30 days. |
DynamoDB - km-slack-stream-messages |
Per-turn message anchors for transcript streaming. Future: reaction-as-action triggers. |
DynamoDB - km-schedules |
Active km at schedules, surfaced by km at list. |
| SQS FIFO (per sandbox) |
km-slack-inbound-{id}.fifo - bridge enqueues Slack messages here; sandbox-side poller dequeues and dispatches to Claude. ContentBasedDeduplication off; FIFO ordering preserved. |
| SES | Inbound: operator inbox, sandbox mailboxes ({id}@sandboxes.{domain}). Outbound: lifecycle notifications, inter-sandbox email, signed payloads. Domain DKIM + SPF auto-configured by km init. |
| S3 | Artifacts bucket (per region, replicated cross-region), OTEL telemetry, transcripts (gzipped JSONL), agent run output, sidecar binaries. |
| SSM Parameter Store + KMS | Per-sandbox signing keys, GitHub tokens, Slack secrets, sandbox config. KMS-encrypted, allowlisted refs only. |
| SSM Session Manager | The only way to reach a sandbox. km shell, km agent, command dispatch. No SSH, no bastion, no inbound ports. |
| Service Control Policy | Org-level deny on SG mutation, IAM escalation, instance creation, SSM pivot, org discovery, out-of-region resource creation. Enforced before IAM. |
The control plane scales to whatever the underlying AWS services scale to. There is no shared in-process state between operators, no central coordinator to run, no daemon to monitor. Two operators on opposite sides of the world can drive the same fleet via SSO; the DynamoDB tables are the rendezvous.