Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 19 additions & 0 deletions .codecov.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
coverage:
status:
project:
default:
target: auto
threshold: 1%
patch:
default:
target: 70%
threshold: 5%

comment:
layout: "condensed_header, condensed_files, condensed_footer"
behavior: default

ignore:
- "examples/**"
- "tests/**"
- "docs/**"
74 changes: 73 additions & 1 deletion docs/guides/registration.md
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,9 @@ The `.with_registration()` method accepts these parameters:
retry_delay=2.0, # Seconds between retries
fail_on_error=False, # Abort startup on failure
timeout=10.0, # HTTP request timeout
enable_keepalive=True, # Enable periodic ping to keep service alive
keepalive_interval=10.0, # Seconds between keepalive pings
auto_deregister=True, # Automatically deregister on shutdown
)
```

Expand All @@ -106,6 +109,9 @@ The `.with_registration()` method accepts these parameters:
- **retry_delay** (`float`): Delay in seconds between retry attempts. Default: 2.0.
- **fail_on_error** (`bool`): If True, raise exception and abort startup on registration failure. If False, log warning and continue. Default: False.
- **timeout** (`float`): HTTP request timeout in seconds. Default: 10.0.
- **enable_keepalive** (`bool`): Enable periodic pings to keep service registered. Default: True.
- **keepalive_interval** (`float`): Seconds between keepalive pings. Default: 10.0.
- **auto_deregister** (`bool`): Automatically deregister service on shutdown. Default: True.

---

Expand All @@ -120,7 +126,51 @@ The `.with_registration()` method accepts these parameters:
5. **Payload Creation**: Serializes ServiceInfo to JSON (supports custom subclasses)
6. **Registration Request**: Sends POST to orchestrator endpoint
7. **Retry on Failure**: Retries with delay if request fails
8. **Logging**: Logs all attempts and final outcome
8. **Keepalive Started**: If enabled, background task starts pinging orchestrator
9. **Service Runs**: Service handles requests while staying alive via pings
10. **Shutdown**: On graceful shutdown, stops keepalive and optionally deregisters
11. **Logging**: Logs all registration, ping, and deregistration events

### Keepalive and TTL

Services can be configured to send periodic "ping" requests to the orchestrator to indicate they're still alive. The orchestrator tracks a Time-To-Live (TTL) for each service and automatically removes services that haven't pinged within the TTL window.

**How it works:**

1. **Initial Registration**: Service registers and receives response with:
- `id`: Unique ULID identifier for this service
- `ttl_seconds`: How long until service expires (default: 30 seconds)
- `ping_url`: Endpoint to send keepalive pings (automatically provided by orchestrator)

2. **Keepalive Loop**: Background task automatically sends PUT requests to `ping_url` every N seconds:
- Default interval: 10 seconds (configurable via `keepalive_interval`)
- Each ping resets the service's expiration time
- Failures are logged but don't crash the service

3. **TTL Expiration**: Orchestrator runs cleanup every 5 seconds:
- Removes services that haven't pinged within TTL window
- Logs expired services for monitoring

4. **Graceful Shutdown**: On service shutdown:
- Keepalive task stops (no more pings)
- Service explicitly deregisters (if `auto_deregister=True`)
- Immediate removal from registry

**Configuration examples:**

```python
# Default: keepalive enabled, auto-deregister on shutdown
.with_registration()

# Disable keepalive (rely on manual health checks)
.with_registration(enable_keepalive=False)

# Custom keepalive interval (faster pings)
.with_registration(keepalive_interval=5.0)

# Don't deregister on shutdown (let TTL expire naturally)
.with_registration(auto_deregister=False)
```

### Registration Payload

Expand Down Expand Up @@ -154,6 +204,28 @@ For custom ServiceInfo subclasses:
}
```

### Registration Response

The orchestrator responds with registration details, including the ping endpoint:

```json
{
"id": "01K83B5V85PQZ1HTH4DQ7NC9JM",
"status": "registered",
"service_url": "http://my-service:8000",
"message": "Service registered successfully",
"ttl_seconds": 30,
"ping_url": "http://orchestrator:9000/services/01K83B5V85PQZ1HTH4DQ7NC9JM/$ping"
}
```

**Key fields:**
- `id`: Unique ULID identifier assigned by orchestrator
- `ttl_seconds`: Time-to-live in seconds (service must ping within this window)
- `ping_url`: Endpoint for keepalive pings (automatically used by the service)

**Important**: The `ping_url` is provided by the orchestrator - services don't need to configure it. The service automatically uses this URL for keepalive pings when `enable_keepalive=True`.

### Hostname Resolution

Priority order:
Expand Down
41 changes: 16 additions & 25 deletions examples/registration/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,35 +1,28 @@
# Registration Demo Dockerfile
FROM ghcr.io/astral-sh/uv:0.9-python3.13-bookworm-slim AS builder

WORKDIR /app

ARG USER=servicekit UID=10001
RUN useradd -u ${UID} -m -s /bin/bash ${USER}

# UV configuration for better build performance
ENV UV_COMPILE_BYTECODE=1
ENV UV_LINK_MODE=copy

# Copy and build parent servicekit project
COPY pyproject.toml uv.lock README.md /servicekit/
COPY src /servicekit/src/
WORKDIR /servicekit
RUN uv build
# Copy parent servicekit project (needed as path dependency)
COPY pyproject.toml uv.lock README.md /app/
COPY src /app/src/

# Install servicekit wheel in app directory
WORKDIR /app
RUN --mount=type=cache,target=/root/.cache/uv \
uv venv && \
uv pip install /servicekit/dist/*.whl
# Copy registration example
COPY examples/registration /app/examples/registration

# Copy demo files
COPY examples/registration/main.py ./
COPY examples/registration/main_custom.py ./
COPY examples/registration/orchestrator.py ./
# Install dependencies from registration example directly in /app
WORKDIR /app/examples/registration
RUN --mount=type=cache,target=/root/.cache/uv \
uv sync --frozen

# Cleanup Python cache files
RUN find /app/.venv -type d -name '__pycache__' -prune -exec rm -rf {} + && \
find /app/.venv -type f -name '*.py[co]' -delete || true
RUN find .venv -type d -name '__pycache__' -prune -exec rm -rf {} + && \
find .venv -type f -name '*.py[co]' -delete || true

# ---- runtime ----
FROM python:3.13-slim AS runtime
Expand All @@ -51,14 +44,12 @@ RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
apt-get install -y --no-install-recommends ca-certificates tini && \
apt-get clean && rm -rf /var/lib/apt/lists/*

# Copy venv and application from builder
COPY --from=builder --chown=${USER}:${USER} /app/.venv /app/.venv
COPY --from=builder --chown=${USER}:${USER} /app/main.py /app/main.py
COPY --from=builder --chown=${USER}:${USER} /app/main_custom.py /app/main_custom.py
COPY --from=builder --chown=${USER}:${USER} /app/orchestrator.py /app/orchestrator.py
# Copy entire app directory including venv and source from builder
COPY --from=builder --chown=${USER}:${USER} /app /app

ENV VIRTUAL_ENV=/app/.venv
ENV PATH=/app/.venv/bin:${PATH}
ENV VIRTUAL_ENV=/app/examples/registration/.venv
ENV PATH=/app/examples/registration/.venv/bin:${PATH}
WORKDIR /app/examples/registration
ENV PYTHONDONTWRITEBYTECODE=1
ENV PYTHONUNBUFFERED=1
ENV PYTHONFAULTHANDLER=1
Expand Down
85 changes: 77 additions & 8 deletions examples/registration/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,15 +10,24 @@ Demonstrates automatic service registration with an orchestrator for service dis
- **Custom Metadata**: Support for ServiceInfo subclasses with additional fields
- **Mock Orchestrator**: Simple orchestrator for testing and development
- **Multi-Service Setup**: Example with multiple services (svca, svcb)
- **Keepalive & TTL**: Services send periodic pings to stay registered (30s TTL, 10s interval)
- **Auto-Deregistration**: Services gracefully deregister on shutdown
- **Valkey-based Storage**: TTL and expiration handled by Valkey (no manual cleanup needed)

## Quick Start

### Local Development

**Prerequisites**: Valkey or Redis running on localhost:6379

#### Run Orchestrator

```bash
# Start Valkey (using Docker)
docker run -d -p 6379:6379 valkey/valkey:8

# Install dependencies
cd examples/registration
uv sync

# Run the mock orchestrator
Expand Down Expand Up @@ -63,10 +72,10 @@ docker compose down
## Architecture

```
┌─────────────┐
│ Orchestrator│ ← Registration endpoint at :9000/services/$register
│ (port 9000)│
└──────▲──────┘
┌─────────────┐ ┌────────┐
│ Orchestrator│────→│ Valkey │ TTL-based service expiration
│ (port 9000)│ │ :6379 │
└──────▲──────┘ └────────┘
│ HTTP POST on startup
Expand All @@ -90,7 +99,8 @@ docker compose down

### Orchestrator Endpoints

- `POST /services/$register` - Register a service (called by services on startup)
- `POST /services/$register` - Register a service (returns service_id, ttl_seconds, ping_url)
- `PUT /services/{id}/$ping` - Send keepalive ping to extend TTL
- `GET /services` - List all registered services
- `GET /services/{id}` - Get specific service details by ULID
- `DELETE /services/{id}` - Deregister a service by ULID
Expand Down Expand Up @@ -119,11 +129,70 @@ docker compose down
"id": "01K83B5V85PQZ1HTH4DQ7NC9JM",
"status": "registered",
"service_url": "http://svca:8000",
"message": "..."
"message": "...",
"ttl_seconds": 30,
"ping_url": "http://orchestrator:9000/services/01K83B5V85PQZ1HTH4DQ7NC9JM/$ping"
}
```
7. **Retry on Failure**: Retries up to 5 times with 2-second delay
8. **Success/Failure Logging**: Logs outcome with service ID via structured logging
7. **Keepalive Started**: Background task starts sending pings every 10 seconds to `ping_url`
8. **Service Runs**: Service handles requests while keepalive maintains registration
9. **Retry on Failure**: Initial registration retries up to 5 times with 2-second delay
10. **Graceful Shutdown**: On shutdown, service stops keepalive and deregisters explicitly
11. **Success/Failure Logging**: Logs all registration, ping, and deregistration events

## Keepalive and TTL

### How It Works

The orchestrator uses Valkey's built-in TTL mechanism for automatic service expiration:

- **TTL**: 30 seconds (configurable in `orchestrator.py`)
- **Ping Interval**: 10 seconds (services send keepalive every 10s)
- **Expiration**: Handled automatically by Valkey (no manual cleanup task needed)

**Timeline Example:**
- `T+0s`: Service registers, Valkey stores with `EX 30` (expires at T+30s)
- `T+10s`: Service pings, Valkey resets TTL to 30s (expires at T+40s)
- `T+20s`: Service pings, Valkey resets TTL to 30s (expires at T+50s)
- `T+30s`: Service pings, Valkey resets TTL to 30s (expires at T+60s)
- If service crashes at `T+35s` and stops pinging:
- `T+65s`: Valkey automatically removes the key (30s after last ping)
- Service no longer appears in registry

### Ping Endpoint

**Request:**
```bash
PUT /services/{service_id}/$ping
```

**Response:**
```json
{
"id": "01K83B5V85PQZ1HTH4DQ7NC9JM",
"status": "alive",
"last_ping_at": "2025-10-27T12:00:30.000Z",
"expires_at": "2025-10-27T12:01:00.000Z"
}
```

### Configuration Options

Services can configure keepalive behavior:

```python
# Default: keepalive enabled, 10s interval, auto-deregister on shutdown
.with_registration()

# Disable keepalive (service expires after 30s if not manually pinged)
.with_registration(enable_keepalive=False)

# Custom ping interval (faster keepalive)
.with_registration(keepalive_interval=5.0)

# Don't deregister on shutdown (let TTL expire naturally)
.with_registration(auto_deregister=False)
```

## Configuration

Expand Down
16 changes: 16 additions & 0 deletions examples/registration/compose.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,16 @@
services:
valkey:
image: valkey/valkey:8
ports:
- "6379:6379"
restart: unless-stopped
healthcheck:
test: ["CMD", "valkey-cli", "ping"]
interval: 5s
timeout: 3s
retries: 5
start_period: 10s

orchestrator:
build:
context: ../..
Expand All @@ -10,6 +22,10 @@ services:
LOG_FORMAT: json
LOG_LEVEL: INFO
WORKERS: 1
VALKEY_URL: redis://valkey:6379
depends_on:
valkey:
condition: service_healthy
restart: unless-stopped
healthcheck:
test:
Expand Down
Loading