Skip to content

codegen(python): drop ReflectapiOption, use model_fields_set for partial fields#151

Merged
hardbyte merged 17 commits into
mainfrom
drop-reflectapi-option-class
May 13, 2026
Merged

codegen(python): drop ReflectapiOption, use model_fields_set for partial fields#151
hardbyte merged 17 commits into
mainfrom
drop-reflectapi-option-class

Conversation

@hardbyte
Copy link
Copy Markdown
Contributor

@hardbyte hardbyte commented May 12, 2026

Summary

Drops the hand-rolled ReflectapiOption[T] wrapper class in favour of ReflectapiPartialModel (a BaseModel mixin that uses Pydantic's own model_fields_set), then hardens the Python codegen against a broad set of edge cases — validated end-to-end against a real-world schema (hundreds of types, dozens of namespaces).

Closes #150. Supersedes #149.

Why drop ReflectapiOption?

reflectapi::Option<T> is a three-state protocol type: Some(value) / explicit None / Undefined (key absent on the wire). TypeScript clients express it natively (field?: T | null); Rust clients use the reflectapi::Option<T> enum. The Python client used to ship a runtime class wrapper for the same job — but Pydantic already tracks "was this field on the wire?" via __pydantic_fields_set__. The wrapper duplicated Pydantic's bookkeeping, forced users to learn a parallel .unwrap / .is_undefined / .unwrap_or(None) API, and kept producing fragile-edge bugs.

After this PR, partial fields are plain T | None and presence is queried at the model level:

# Field omitted on the wire if you don't pass it
Item(name="rex").model_dump_json()             == '{"name":"rex"}'

# Null on the wire if you pass None
Item(name="rex", kind=None).model_dump_json()  == '{"name":"rex","kind":null}'

# "Was this field on the wire?" — exact analogue of TS `obj.kind !== undefined`
"kind" in item.model_fields_set

Runtime changes

  • New: reflectapi_runtime.partial.ReflectapiPartialModel@model_serializer(mode="wrap") filters keys by model_fields_set (alias-aware).
  • New: reflectapi_runtime.duration.ReflectapiDurationAnnotated[timedelta, BeforeValidator, PlainSerializer] adapter for serde's {secs, nanos} wire shape.
  • Deleted: reflectapi_runtime.option — entire ReflectapiOption / Undefined / some / none / undefined / serialize_option_dict surface.
  • client.py / streaming.py simplified: model_dump_json(by_alias=True) and trust the model's serializer (partial or plain).
  • New: partial.py SSE parser tests pin the existing feed_eof() behaviour for both parse_sse and aparse_sse, plus end-to-end streaming through MockTransport.

Codegen changes (Python)

  • reflectapi::Option<T>T | None (was ReflectapiOption[T]); default None.
  • Field.is_partial / DataClass.is_partial propagate through the renderer to pick ReflectapiPartialModel vs BaseModel and emit validate_assignment=True on the ConfigDict.
  • Strict _rebuild_models(): collects every model_rebuild() failure into one RuntimeError (replacing the try: … except Exception: pass that hid the original four bug reports for months). Non-Pydantic rendered types (enums) are skipped.

Codegen edges hardened in the same pass

Edge Fix
Dangling order.OrderInsertData annotations Namespaces expose both the alias-stripped and Rust-leaf forms
std.tuple.TupleN references with no class Map every std::tuple::TupleN<…> to Python tuple[…] (Pydantic v2 validates fixed-arity heterogeneous)
std::time::Duration rejected by timedelta Use ReflectapiDuration (round-trips {secs, nanos}timedelta)
std.marker.PhantomData[T] dangling New strip_phantom_data_fields() preprocessor drops every PhantomData field
Field names matching Pydantic v2 / v1 methods (model_dump, schema, json, dict, …) Rename with trailing underscore + serialization/validation alias
Fields starting with model_ protected_namespaces=() in every generated ConfigDict
Vec<u8>bytes (serde wire is [1,2,3], not base64) Render as list[int]
Generic self-referential types blow OpenAPI stack In-progress set keyed by full monomorphization
Dead ReflectapiOption ADT classes in generated output Add reflectapi::Option to non_rendered_types
Cross-namespace auth.AuthError undefined at rebuild Bind every top-level namespace into every other
uuid::UuidUuidUuid in monomorphized names Destutter arg names through the same pipeline as class names
Doc strings with literal " carried Rust-source \" escapes through to Python Read doc literals via LitStr::value() in reflectapi-derive
Quote-style mismatch produced '… \\"NZ\\" …' outputs New python_string_literal picks the wrapping quote that needs the fewest escapes
TS SSE parser dropped final event if server closed without \n\n Add flush() to __EventSourceParserStream

Validation

  • cargo fmt, cargo clippy --all-features --all-targets -D warnings, cargo test --workspace — green.
  • pytest in reflectapi-python-runtime355 pass, 0 fail (includes 4 new SSE eof-flush tests).
  • New CI job python-codegen-smoke regenerates the demo Python client and imports it; the strict _rebuild_models() raises on any dangling reference, so the class of bug behind the four original reports fails CI at PR time. Job also checks the demo snapshot is committed (no drift). Ruff pinned to 0.15.8 for the formatter call so the drift check is deterministic.
  • End-to-end validation against a real-world consumer schema (hundreds of types, dozens of namespaces): regenerates cleanly, imports without warnings, downstream client tests pass except for those requiring a live server. Zero hand-written consumer code referenced the removed API — the migration is "regenerate the client".

Coverage fixtures

The demo's coverage module exercises everything the codegen would otherwise mishandle silently — Python keyword field names, Pydantic-reserved names, self/mutual/generic recursion, enum variants named after keywords, non-string-keyed HashMap, #[serde(transparent)], three-state Option wrapping Option, fields colliding with imports, empty structs, docstrings with quotes/backslashes/triple-quotes, type names shadowing imports, generics with multiple instantiations, fields with non-None serde defaults. All round-trip through the CI smoke test.

Breaking change — migration

reflectapi-runtime no longer exports ReflectapiOption, Undefined, Option, some, none, undefined, serialize_option_dict. Generated client code regenerates clean; hand-written code uses these patterns:

Before After
ReflectapiOption(Undefined) don't set the field
ReflectapiOption(None) field=None
ReflectapiOption(value) field=value
opt.is_undefined "field" not in model.model_fields_set
opt.is_none model.field is None and "field" in model.model_fields_set
opt.unwrap_or(default) model.field if "field" in model.model_fields_set else default

Follow-ups (out of this PR)

…ial fields

`reflectapi::Option<T>` is a three-state protocol type:
`Some(value)` / explicit `None` / `Undefined` (key absent on the
wire). TypeScript clients express that natively
(`field?: T | null`); Rust clients use the `reflectapi::Option<T>`
enum. The Python client previously shipped a hand-rolled
`ReflectapiOption[T]` wrapper class to carry the same distinction —
but Pydantic already tracks "was this field on the wire?" via
`__pydantic_fields_set__`. The wrapper was duplicating Pydantic's
own bookkeeping and forcing users to learn a parallel
`.unwrap` / `.is_undefined` API.

This rip-out replaces the wrapper with a `ReflectapiPartialModel`
mixin (one `@model_serializer` that filters by `model_fields_set`).
Generated classes that contain at least one `reflectapi::Option<T>`
field inherit from it; field types collapse to plain `T | None`.

API for callers:

  # Field omitted on the wire if you don't pass it
  Item(name="rex").model_dump_json()            == '{"name":"rex"}'

  # Field is `null` on the wire if you pass `None` explicitly
  Item(name="rex", kind=None).model_dump_json() == '{"name":"rex","kind":null}'

  # Was this field on the wire?
  "kind" in item.model_fields_set               # exact analogue of TS
                                                # `obj.kind !== undefined`

Runtime changes
- New: `reflectapi_runtime.partial.ReflectapiPartialModel`. Exposes
  one `@model_serializer(mode="wrap")` that drops keys not in
  `model_fields_set`, honouring `by_alias=True`.
- Deleted: `reflectapi_runtime.option` (the entire
  `ReflectapiOption` / `Undefined` / `some` / `none` / `undefined` /
  `serialize_option_dict` surface).
- `client._serialize_request_body` and `streaming` no longer need
  the option-dict dance — they just call
  `model.model_dump_json(by_alias=True)` and trust the model.

Codegen changes
- `reflectapi::Option` now maps to `T | None` (was
  `ReflectapiOption[T]`); default is `None`.
- `Field.is_partial` and `DataClass.is_partial` track whether any
  field originated from `reflectapi::Option<T>`; the renderer picks
  `ReflectapiPartialModel` vs `BaseModel` and adds
  `validate_assignment=True` to the ConfigDict on partial classes
  so post-construction assignments land on the wire.
- New helper `schema_has_partial_field` triggers the
  `ReflectapiPartialModel` runtime import.

Tests
- New: `tests/test_partial.py` (21 tests pinning the wire-format
  contract — absent vs null vs value, nested partial, container of
  models, JSON round-trip, aliases, defaults).
- `tests/test_option.py` deleted.
- `tests/test_edge_cases.py` and `tests/test_enhanced_features.py`
  stripped of dead `ReflectapiOption` test classes; the integration
  test was rewritten against `ReflectapiPartialModel`.
- Total runtime suite: 347 pass, 0 fail.
- Demo client regenerated; workspace tests + clippy clean.

BREAKING CHANGE: This is a 0.18.x change. Code importing
`ReflectapiOption`, `Undefined`, `some`, `none`, `serialize_option_dict`
from `reflectapi_runtime` will need to migrate to plain `T | None`
fields and query `model_fields_set` for presence.
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Code review skipped — your organization's overage spend limit has been reached.

Code review is billed via overage credits. To resume reviews, an organization admin can raise the monthly limit at claude.ai/admin-settings/claude-code.

Once credits are available, reopen this pull request to trigger a review.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b2d16c7ce7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread reflectapi-python-runtime/src/reflectapi_runtime/client.py Outdated
Comment thread reflectapi-python-runtime/src/reflectapi_runtime/streaming.py Outdated
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 12, 2026

📖 Documentation Preview: https://reflectapi-docs-preview-pr-151.partly.workers.dev

Updated automatically from commit 5767f17

hardbyte added 12 commits May 13, 2026 11:40
Four user-reported codegen bugs all share one root cause: the codegen
emitted Python type annotations referencing symbols it didn't actually
define. They went undetected because the generated `_rebuild_models()`
helper wrapped `model_rebuild()` in `try: ... except Exception: pass`,
so dangling references only surfaced when users instantiated affected
models. Fixes are below; the central protection is the strict-rebuild
contract — the entire class of "codegen emitted a dangling reference"
bugs now fails fast at import time and a new CI job pins it.

## The four bugs

1. Namespace alias mismatch (570 model_rebuild failures in core-server)

   `myapi/order/__init__.py` aliased classes with the namespace cap
   stripped (`InsertData = MyapiOrderInsertData`) while field
   annotations elsewhere referenced the un-stripped Rust leaf
   (`order.OrderInsertData`). The latter never resolved.

   Root cause: `module_aliases()` and `type_name_to_python_ref()`
   stripped names with two different rules. Fix: namespaces now expose
   *both* the stripped form (ergonomic public name) and the Rust leaf
   (matches field annotations). `ModuleType` carries the original Rust
   path so the alias generator can compute the leaf.

2. `std.tuple.TupleN[A, B]` with no `std.tuple` class (6 occurrences)

   Rust tuples `(A, B)` were rendered as `std.tuple.Tuple2[A, B]` but
   no `std.tuple` submodule was ever generated.

   Root cause: only `Tuple0` had a mapping; arities 1..16 fell through
   to the default dotted-path renderer. Fix: map every `std::tuple::TupleN`
   to Python's built-in `tuple` — Pydantic v2 validates
   `tuple[A, B, …]` as a fixed-arity heterogeneous tuple, exactly
   matching serde's JSON-array wire shape.

3. `std::time::Duration` rendered as `timedelta` (9 occurrences)

   serde emits `Duration` as `{"secs": <u64>, "nanos": <u32>}`.
   Pydantic's bare `timedelta` validator rejects the dict shape.

   Root cause: codegen mapped `std::time::Duration` to `timedelta`
   based on the Rust type name without considering serde's actual JSON
   shape. Fix: new `ReflectapiDuration` runtime type — an `Annotated`
   wrapper around `timedelta` with `BeforeValidator` (accepts `{secs,
   nanos}` and existing `timedelta` instances) and `PlainSerializer`
   (emits `{secs, nanos}` so the wire round-trips). Microsecond
   precision; documented caveat for sub-µs `nanos`.

4. `std.marker.PhantomData[T]` with no `std.marker` class (NEW)

   serde renders `PhantomData<T>` as JSON `null`, and reflectapi-derive
   surfaces the field in the schema. The codegen rendered an annotation
   referencing a non-existent `std.marker.PhantomData` class.

   Root cause: PhantomData is a type-system-only marker — it carries no
   wire data and shouldn't appear in the Python model at all. Fix: new
   `strip_phantom_data_fields()` schema preprocessor drops every
   `std::marker::PhantomData<_>` field before rendering. Adds `Fields::retain()`
   to `reflectapi-schema` for the surgery.

## Validation harness

The deeper problem was the silent error swallowing in `_rebuild.py`:

    try:
        _model.model_rebuild()
    except Exception:
        pass

This pattern made the bugs above survive every PR. The generated
`_rebuild_models()` now:

- Collects every `model_rebuild()` failure into one structured report.
- Raises `RuntimeError` with all failures listed at once.
- Skips non-Pydantic rendered types (enums) so they don't false-positive.

The namespace-level `_rebuild_models()` call's bare `except Exception:`
was narrowed to `except ImportError:` so partial-load races are still
tolerated but rebuild failures propagate.

End-to-end CI guard:

- New `python-codegen-smoke` job in `.github/workflows/ci.yaml`
  regenerates the demo Python client and runs `python -c "import
  api_client"`. The strict `_rebuild_models()` runs at import time, so
  any future dangling reference fails CI immediately. The job also
  checks the demo snapshots are committed (no codegen drift).

- New runtime unit tests in `tests/test_codegen_regressions.py` pin
  the `ReflectapiDuration` contract (8 tests).

- The demo Rust crate now includes a `codegen_regression` endpoint
  whose types exercise all four bug patterns; the schema and demo
  client snapshots are regenerated.

## Test plan

- `cargo fmt`, `cargo clippy -p reflectapi --all-features --all-targets
  -D warnings`, `cargo test --workspace` — all green.
- `pytest` in `reflectapi-python-runtime` — 355 pass (8 new
  regression tests, 0 fail).
- `import api_client` against the regenerated demo — no exceptions,
  `RateLimit.model_validate({'retry_after': {'secs': 30, 'nanos': 0}})`
  produces `timedelta(seconds=30)`, model_dump_json round-trips back to
  `{"secs": 30, "nanos": 0}`.
Adds coverage fixtures to the demo crate that exercise Python codegen
edge cases the smoke test would otherwise miss, plus the fixes those
fixtures uncovered.

Python codegen fixes
- Field names exact-matching Pydantic v2 methods (`model_dump`,
  `model_validate`, `model_config`, etc.) are renamed with a trailing
  underscore so the methods aren't shadowed at the instance level.
- `protected_namespaces=()` is set on every generated `ConfigDict`,
  letting any user field starting with `model_` exist without
  Pydantic raising at class construction.
- `Vec<u8>` renders as `list[int]` instead of `bytes`. serde's
  default JSON shape for `Vec<u8>` is a JSON array of integers, not a
  base64 string; the old mapping made Pydantic's `bytes` validator
  reject every valid wire payload.

OpenAPI codegen fix
- Add an in-progress set keyed by full monomorphization so generic
  self-referential types no longer recurse forever and blow the
  stack. Non-generic recursion was already handled by the
  stub-on-insert into `components.schemas`.

Coverage fixtures
The demo's `coverage` module exercises:
- Python keyword field names (`class`, `from`, `import`, `lambda`,
  `return`, `yield`, `None`, `True`).
- Pydantic-reserved field names (`model_config`, `model_fields_set`,
  `model_dump_json`).
- Self-referential and mutually recursive structs.
- Generic recursive types instantiated with concrete arguments.
- Enums with variants whose names are Python keywords.
- HashMap with non-string keys.
- `#[serde(transparent)]` newtypes.
- Three-state Option wrapping a regular Option.
- Field names colliding with imports brought in by the generated
  module (`BaseModel`, `Field`, `Annotated`, `Generic`, `TypeVar`).
- Empty structs.
- Docstrings containing quotes, apostrophes, backslashes, and triple
  quotes.
- Type names that shadow Pydantic-imported symbols (`BaseModel`).
- The same generic used with multiple concrete type arguments.
- Fields with non-`None` serde defaults.

The CI smoke test exercises every fixture by regenerating the demo
client and importing it; the strict `_rebuild_models()` raises on
any dangling type reference.

Demo client snapshots, OpenAPI spec, Rust/TS generated clients and
the affected Python demo tests are regenerated and updated for the
new module names.
The Python codegen shells out to `ruff format` for the generated
client, so the in-tree snapshot has line-wrapping that depends on
the ruff version. Without pinning, the smoke test's
'no codegen drift' diff check fails any time CI's ruff differs from
the developer's local install — which is just version flap, not a
real codegen change.
The action runs `ruff check` over the whole repo by default and
fails on pre-existing style issues in the runtime tests. We only
want ruff on PATH so the codegen can call `ruff format` on its
output; pass `args: --version` to short-circuit the lint phase
(matches the Rust job's usage).
`reflectapi::Option<T>` renders as `T | None` at every use site;
the three states are carried by `model_fields_set` on the containing
`ReflectapiPartialModel`. The schema still declares it as a tagged
enum, so without this exclusion the codegen emits a parallel
`ReflectapiOption` / `ReflectapiOptionSome` / `ReflectapiOptionNone` /
`ReflectapiOptionUndefined` set of classes that nothing references.
…rebuild

Field annotations across sibling namespaces (e.g. `auth.AuthError` from
inside `upload/__init__.py`) need the referenced namespace bound as a
module-level global so Pydantic's `model_rebuild()` can resolve the
name. The rebuild was previously only binding each namespace's own root,
which worked for schemas with a single shared package prefix (the demo's
`myapi.<sub>.<sub>` shape) but left sibling references undefined for
schemas where every namespace is its own top-level module (the common
case for non-prefixed schemas like core-server).
`mangle_monomorphized_name` concatenated arg names with `::` replaced
by `_`, so an arg of `uuid::Uuid` produced the suffix `uuid_Uuid` →
`UuidUuid` after PascalCase, leaking the namespace cap into the class
name (`IdentityDataUuidUuidX` instead of `IdentityDataUuidX`). Run
the arg name through the same destutter the rest of the class-name
pipeline uses.
Pydantic v2 emits a UserWarning when a field name shadows one of the
deprecated v1 BaseModel methods (`schema`, `json`, `dict`,
`parse_obj`, etc.). Extend the existing exact-match rename to cover
them so generated classes import without warnings.
… fewest escapes

Field descriptions used to be pre-escaped for double-quoted wrapping
(\" everywhere), then ruff would reflow them to single quotes —
leaving literal backslashes in the formatted output (`'... \"NZ\"
...'`). Pick the wrapping quote per literal: prefer double quotes,
fall back to single when the content has " but no ', and only escape
when both appear.
`parse_doc_attributes` was converting the doc literal via
`to_token_stream().to_string()`, which produced the Rust source
representation of the string — preserving every `\"` escape from
the original source. Doc comments containing literal double quotes
therefore landed in the schema as bytes `\"` instead of `"`,
and that quoted form propagated into Python field descriptions.
Use `Lit::Str::value()` to read the actual string content.
hardbyte added 4 commits May 13, 2026 15:29
The TypeScript `__EventSourceParserStream` had no `flush()` method,
so a server that closed the connection immediately after the final
`data:` line — without the spec's trailing blank-line terminator —
silently dropped the last event. Add `flush() { parser.feed('\n\n') }`
to the template so the buffered event dispatches on stream close.

The Python runtime already handled the symmetric case via
`feed_eof()` in both `parse_sse` and `aparse_sse`. Add explicit
tests for the async parser eof-flush and for the end-to-end
streaming client receiving a server response that closes
mid-event.
…ge fixtures out of the demo

- Restore the historical wire format for plain Pydantic request models:
  `model_dump_json(by_alias=True, exclude_none=True)` for non-partial
  models so unset optional fields stay absent on the wire.
  `ReflectapiPartialModel` still uses no exclude — explicit `None`
  encodes the protocol's "field present, value cleared" state and
  must reach the wire. Same fix applied in `client.py` and `streaming.py`.

- Move codegen coverage fixtures (PyKeywordFields, recursive types,
  Pydantic-reserved field names, etc.) out of the public demo schema
  into the integration test at `reflectapi-demo/tests/codegen_coverage.rs`.
  The demo crate's `builder()` now only registers pet-themed routes.
  The integration test builds a private schema, runs Python codegen to
  `target/codegen-coverage-client/`, and CI imports the result so
  strict `model_rebuild()` still catches dangling-reference bugs.

- Add four runtime tests pinning the partial-vs-plain wire-format
  contract.
- Gitignore reflectapi-demo/target/ so codegen coverage-client
  outputs stay out of the index.
- Migrate the demo client pyproject from the deprecated
  `[tool.uv] dev-dependencies = …` to the standard
  `[dependency-groups] dev = …` shape; clears the uv warning.
- Tighten coverage test + CI comments to describe the current
  behaviour rather than contrast with prior state.
@hardbyte hardbyte merged commit dbd07cb into main May 13, 2026
6 checks passed
@hardbyte hardbyte deleted the drop-reflectapi-option-class branch May 13, 2026 04:29
@hardbyte hardbyte mentioned this pull request May 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ReflectapiOption: reconsider custom Pydantic type vs. model-level presence tracking

1 participant