Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 6 additions & 3 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,10 +37,11 @@ ExecutionKit is a minimal library for LLM reasoning patterns — it fills the ga

| Module | Role |
|--------|------|
| `provider.py` | `LLMProvider` protocol, `Provider` HTTP client, `LLMResponse`, 9-class error hierarchy |
| `errors.py` | 9-class exception hierarchy (`ExecutionKitError` → `LLMError`, `PatternError` subtrees); extracted from `provider.py` (F-06) |
| `provider.py` | `LLMProvider` protocol, `Provider` HTTP client, `LLMResponse`; re-exports error classes from `errors.py` for backwards compatibility; `_classify_http_error()` is the single HTTP status→exception mapping point shared by both backends (F-02) |
| `types.py` | Frozen value types: `TokenUsage`, `PatternResult[T]`, `Tool`, `VotingStrategy`, `Evaluator` |
| `cost.py` | `CostTracker` — mutable accumulator with two-phase accounting (`reserve_call` + `record_without_call`) |
| `patterns/base.py` | `checked_complete()` — shared budget guard + retry entry point for all patterns |
| `patterns/base.py` | `checked_complete()` — shared budget guard + retry entry point; `_check_budget()` helper uses `getattr()` field loop replacing per-field if-chains (F-05/F-08); `_TrackedProvider.supports_tools` delegates to wrapped provider via `getattr` instead of hardcoding `Literal[True]` (F-04) |
| `patterns/consensus.py` | Parallel sampling, majority/unanimous voting, agreement metadata |
| `patterns/refine_loop.py` | Iterative improvement with `ConvergenceDetector`; default evaluator uses XML sandboxing |
| `patterns/react_loop.py` | Think-act-observe loop; validates tool args against JSON Schema; caps context via `max_history_messages` |
Expand All @@ -55,7 +56,9 @@ ExecutionKit is a minimal library for LLM reasoning patterns — it fills the ga

**Two-phase cost accounting** — `reserve_call()` pre-increments the call counter before `await` (TOCTOU-safe for concurrent patterns); `record_without_call(response)` adds token counts after success.

**Budget guards** — `checked_complete()` in `patterns/base.py` checks token/call budget before every LLM call and raises `BudgetExhaustedError` (with accumulated cost snapshot) if exceeded.
**Budget guards** — `checked_complete()` in `patterns/base.py` checks token/call budget before every LLM call and raises `BudgetExhaustedError` (with accumulated cost snapshot) if exceeded. The internal `_check_budget()` helper iterates over field names using `getattr()` rather than repeating an if-block per field (F-05/F-08).

**Centralised HTTP error mapping** — `_classify_http_error()` in `provider.py` is the single function that converts HTTP status codes to the appropriate error subclass. Both the `_post_httpx` and `_post_urllib` backends call it, eliminating the duplicated mapping logic that previously existed in each (F-02).

**Structural typing** — `LLMProvider` and `ToolCallingProvider` are `@runtime_checkable` protocols, not base classes. Any object matching the interface works.

Expand Down
43 changes: 43 additions & 0 deletions docs/api-reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -1089,10 +1089,53 @@ Validate that an evaluator score is in [0.0, 1.0] and not NaN.

---

### `_check_budget()` (internal)

```python
def _check_budget(
budget: TokenUsage,
current: TokenUsage,
fields: tuple[str, ...],
*,
sentinel_suffix: str,
exceeded_suffix: str,
) -> None
```

Internal helper used by `checked_complete()` (F-05/F-08). Iterates over the named `TokenUsage` fields using `getattr()` and raises `BudgetExhaustedError` on the first field that is either sentinel-exhausted (value `-1`, set by `pipe()` propagation) or over its limit. This replaces the previous per-field if-block repetition and follows the same pattern as CPython's `dataclasses.asdict()`.

**Location:** `executionkit/patterns/base.py`

**Raises:** `BudgetExhaustedError` on the first exhausted field.

---

### `_classify_http_error()` (internal)

```python
def _classify_http_error(
status: int,
raw: dict[str, Any],
retry_after: float,
*,
cause: BaseException,
) -> NoReturn
```

Internal helper in `provider.py` (F-02). Centralises the HTTP status code → exception mapping that is shared by both the `_post_httpx` and `_post_urllib` backends. Raises the correct typed exception — `RateLimitError` for HTTP 429, `PermanentError` for 401/403/404, `ProviderError` for all other non-2xx codes — and chains `cause` as the original exception. Both HTTP backends call this single function rather than duplicating the mapping logic.

**Location:** `executionkit/provider.py`

**Raises:** `RateLimitError`, `PermanentError`, or `ProviderError` (always raises; return type is `NoReturn`).

---

## Error Hierarchy

All exceptions carry `cost: TokenUsage` and `metadata: dict[str, Any]` attributes set at raise time.

> **Module location (F-06):** The full 9-class hierarchy is defined in `executionkit/errors.py`. `provider.py` re-exports every class under the same name so that `from executionkit.provider import XError` imports remain valid.

```
ExecutionKitError
├── LLMError — provider communication errors
Expand Down
28 changes: 23 additions & 5 deletions docs/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,15 +37,21 @@ shape every design decision:
executionkit/
├── __init__.py — public API surface; sync wrappers
├── types.py — frozen value types: PatternResult, TokenUsage, Tool, VotingStrategy, Evaluator
├── errors.py — 9-class exception hierarchy (F-06: extracted from provider.py)
├── provider.py — LLMProvider protocol, ToolCallingProvider protocol,
│ Provider concrete class, LLMResponse, ToolCall,
│ and the 9-class error hierarchy
│ Provider concrete class, LLMResponse, ToolCall;
│ re-exports error classes from errors.py for backwards compatibility;
│ _classify_http_error() is the single HTTP status→exception mapping
│ point for both urllib and httpx backends (F-02)
├── cost.py — CostTracker mutable accumulator
├── compose.py — pipe() composition helper, PatternStep protocol
├── kit.py — Kit session facade (provider + cumulative usage)
├── _mock.py — MockProvider test double (satisfies both protocols)
├── patterns/
│ ├── base.py — checked_complete(), validate_score(), _TrackedProvider
│ ├── base.py — checked_complete(), validate_score(), _TrackedProvider;
│ │ _check_budget() uses getattr() field loop replacing per-field
│ │ if-chains (F-05/F-08); _TrackedProvider.supports_tools delegates
│ │ to wrapped provider via getattr (F-04)
│ ├── consensus.py — parallel majority/unanimous voting
│ ├── refine_loop.py — iterative score-guided refinement
│ └── react_loop.py — tool-calling think-act-observe loop
Expand All @@ -66,7 +72,8 @@ patterns/base ──► cost, engine/retry, provider, types
patterns/consensus ──► cost, engine/parallel, engine/retry, patterns/base, provider, types
patterns/refine_loop ──► cost, engine/convergence, engine/retry, patterns/base, provider, types
patterns/react_loop ──► cost, engine/retry, patterns/base, provider, types
provider ──► types
provider ──► types, errors (re-exports all 9 error classes from errors.py)
errors ──► types
cost ──► types
engine/* ──► provider (retry only)
```
Expand Down Expand Up @@ -172,8 +179,13 @@ directly. Its snapshot is emitted as an immutable `TokenUsage` via `to_usage()`.

## Error Handling Architecture

The full 9-class exception hierarchy lives in `executionkit/errors.py` (F-06).
`provider.py` re-exports all nine classes under the same names so that existing
`from executionkit.provider import XError` imports continue to work without
modification (PEP 387 backwards compatibility).

```
ExecutionKitError
ExecutionKitError ← executionkit/errors.py
├── LLMError ← provider communication failures
│ ├── RateLimitError ← HTTP 429; carries retry_after float
│ ├── PermanentError ← HTTP 401/403/404; do not retry
Expand All @@ -188,6 +200,12 @@ All errors carry `cost: TokenUsage` so callers can see what was spent before
the failure. `pipe()` augments errors with the cumulative cross-step cost before
re-raising.

**HTTP error classification:** `_classify_http_error()` in `provider.py` is the
single function responsible for mapping HTTP status codes to the correct error
subclass. Both the `_post_httpx` and `_post_urllib` backends call it, eliminating
duplicated mapping logic (F-02). This mirrors the pattern used by the Anthropic
SDK's `_make_status_error()`.

**Retry boundary:** `with_retry()` in `engine/retry.py` only retries
`RateLimitError` and `ProviderError`. `PermanentError` propagates immediately.
`asyncio.CancelledError` is always re-raised without retry.
Expand Down
32 changes: 28 additions & 4 deletions docs/c4/c4-code-src-executionkit-patterns.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,9 +22,9 @@
- **Raises**: `ValueError` if score is NaN or outside [0.0, 1.0] range

#### `checked_complete(provider: LLMProvider, messages: Sequence[dict[str, Any]], tracker: CostTracker, budget: TokenUsage | None, retry: RetryConfig | None, **kwargs: Any) -> LLMResponse`
- **Description**: Makes a budget-aware LLM API call with retry logic. Checks token and LLM call budgets before dispatching and records usage in the cost tracker.
- **Description**: Makes a budget-aware LLM API call with retry logic. Checks token and LLM call budgets before dispatching (via `_check_budget`) and records usage in the cost tracker.
- **Location**: `base.py:24-55`
- **Dependencies**: `LLMProvider`, `CostTracker`, `BudgetExhaustedError`, `with_retry`, `DEFAULT_RETRY`, `TokenUsage`, `RetryConfig`, `LLMResponse`
- **Dependencies**: `LLMProvider`, `CostTracker`, `BudgetExhaustedError`, `with_retry`, `DEFAULT_RETRY`, `TokenUsage`, `RetryConfig`, `LLMResponse`, `_check_budget`, `_BUDGET_FIELD_LABELS`
- **Parameters**:
- `provider: LLMProvider` - The LLM provider to use
- `messages: Sequence[dict[str, Any]]` - Messages to send to the LLM
Expand All @@ -35,6 +35,26 @@
- **Return Type**: `LLMResponse` - Response from the LLM provider
- **Raises**: `BudgetExhaustedError` if any budget constraint would be exceeded

#### `_check_budget(budget: TokenUsage, current: TokenUsage, fields: tuple[str, ...], *, sentinel_suffix: str, exceeded_suffix: str) -> None`
- **Description**: Validates selected `TokenUsage` fields by comparing the configured `budget` against the current accumulated `TokenUsage`. Iterates over the supplied `fields` and raises `BudgetExhaustedError` with a descriptive message if a field has reached a sentinel condition or would exceed its allowed limit.
- **Location**: `base.py`
- **Dependencies**: `TokenUsage`, `BudgetExhaustedError`, `_BUDGET_FIELD_LABELS`
- **Parameters**:
- `budget: TokenUsage` - Maximum allowed token/call counts
- `current: TokenUsage` - Current accumulated token/call usage to validate against the budget
- `fields: tuple[str, ...]` - Names of the `TokenUsage` fields to check
- `sentinel_suffix: str` - Message suffix used when a budget field is already at its sentinel/exhausted value
- `exceeded_suffix: str` - Message suffix used when the current usage would exceed the configured budget
- **Return Type**: `None`
- **Raises**: `BudgetExhaustedError` naming the field that hit a sentinel condition or exceeded its budget (e.g., "input_tokens", "llm_calls")

#### `_BUDGET_FIELD_LABELS`
- **Description**: Module-level dict mapping `TokenUsage` field names to human-readable label strings used in `BudgetExhaustedError` messages. Drives the field-loop in `_check_budget`, making it easy to add new budget dimensions without modifying control flow.
- **Location**: `base.py`
- **Type**: `dict[str, str]`
- **Example entries**: `{"input_tokens": "input tokens", "output_tokens": "output tokens", "llm_calls": "LLM calls"}`
- **Dependencies**: None

#### `_note_truncation(response: LLMResponse, metadata: dict[str, Any], context: str) -> None`
- **Description**: Logs a warning and increments truncation counter in metadata if the LLM response was truncated (finish_reason indicates truncation).
- **Location**: `base.py:58-66`
Expand Down Expand Up @@ -185,7 +205,8 @@
- `_budget: TokenUsage | None` - Optional token budget constraints
- `_retry: RetryConfig | None` - Retry configuration
- `_context: str` - Context string for error messages
- `supports_tools: bool = True` - Class attribute indicating tool support capability
- **Properties**:
- `supports_tools: bool` - Delegates to the wrapped provider's `supports_tools` attribute rather than hardcoding `Literal[True]`; this allows `_TrackedProvider` to accurately reflect the capabilities of the underlying provider at runtime
- **Methods**:
- `__init__(provider: LLMProvider, tracker: CostTracker, metadata: dict[str, Any], *, budget: TokenUsage | None, retry: RetryConfig | None, context: str) -> None` - Initializes the wrapper with dependencies
- `complete(messages: Sequence[dict[str, Any]], *, temperature: float | None = None, max_tokens: int | None = None, tools: Sequence[dict[str, Any]] | None = None, **kwargs: Any) -> LLMResponse` - Wraps provider.complete() with budget and truncation checks
Expand Down Expand Up @@ -228,6 +249,7 @@ None - executionkit has zero external runtime dependencies as specified in `pypr
### Standard Library Dependencies

- `asyncio` - For async/await support and task management (react_loop)
- `logging` - Module-level import in `react_loop.py` for structured diagnostic logging
- `collections.Counter` - For vote counting in consensus
- `json` - For serializing tool arguments (react_loop)
- `math` - For NaN checking in score validation
Expand Down Expand Up @@ -270,7 +292,7 @@ classDiagram

class TrackedProvider {
<<class>>
+supports_tools: bool
+supports_tools: bool (property, delegates to _provider)
-_provider: LLMProvider
-_tracker: CostTracker
-_metadata: dict
Expand All @@ -285,6 +307,8 @@ classDiagram
<<module>>
+validate_score(score) float
+checked_complete(provider, messages, ...) LLMResponse
-_check_budget(tracker, budget) None
-_BUDGET_FIELD_LABELS dict
-_note_truncation(response, metadata, context) void
-_TrackedProvider TrackedProvider
}
Expand Down
Loading