Skip to content

ReflectapiOption: reconsider custom Pydantic type vs. model-level presence tracking #150

@hardbyte

Description

@hardbyte

Background

ReflectapiOption[T] is a hand-rolled Pydantic custom type that exists to carry a third state beyond what T | None provides:

State Wire Carried as
Some {"field": value} ReflectapiOption(value)
None {"field": null} ReflectapiOption(None)
Undefined key absent ReflectapiOption(Undefined)

Everything else the class does — validate, coerce, serialize — re-implements behaviour Pydantic already provides for nullable fields. PR #149 patched the most visible symptom of that re-derivation (inner-model validation was missing, the validator/serializer were structurally asymmetric, .value contradicted its docstring). But the underlying question hasn't been answered: should "field absent on the wire" live on the field type at all?

How the other languages handle the same problem

Layer Three-state representation
Rust source (server) reflectapi::Option<T> — custom 3-variant enum (Undefined / None / Some(T)) defined in reflectapi/src/option.rs
Rust generated client Same reflectapi::Option<T> — passes through 1:1 (rust.rs codegen mapping)
TypeScript generated client field?: T | null — uses the language's native absent state (the ?:) and null together (typescript.rs codegen mapping for reflectapi::Option)
Python generated client ReflectapiOption[T] — runtime class wrapper (python.rs codegen mapping)

What this comparison says:

  1. TypeScript proves a wrapper isn't required. TS has two orthogonal absence states baked into the language: field?: (optional → can be absent) and | null (explicit null). The codegen maps the three-state directly onto them. Nothing to validate, nothing to round-trip, no class, no runtime cost — obj.age === undefined vs obj.age === null vs a value.

  2. Rust's enum is genuinely needed. Rust has no native "absent" state in struct fields; every field must have a value. The enum exists to encode something the type system can't otherwise express, and match exhaustiveness makes it ergonomic — callers can't accidentally skip a state.

  3. Python sits in an awkward middle. Python's data model does distinguish absent from null (dict.get(k, SENTINEL), __pydantic_fields_set__). It's only Python's type annotations that lack a syntactic equivalent to TS's ?:. So the custom wrapper exists to surface, at the field-value level, a distinction Pydantic already tracks at the model level. That's the structural smell.

  4. The Rust client uses pattern matching, not accessors. There's no .value / .is_some / .unwrap_or; you match and the compiler enforces exhaustiveness. The Python wrapper imitates that API surface but the imitation is what makes it brittle — Python users have to remember which accessor doesn't raise on which state, while Rust users get a compile error if they don't handle all three arms.

Alternatives

A. Keep ReflectapiOption[T], derive both directions from a single inner schema. Lowest risk; current shape. After #149 this is at least internally consistent. Long-term maintenance cost is "every change to Pydantic semantics for Optional has to be re-validated against the custom validator," which is non-zero but bounded.

B. Track "explicitly-present keys" on the model, not the field type.

  • Field type collapses to T | None. Pydantic validates it normally.
  • A model-level model_validator(mode='before') records which keys were present in the input dict on a __reflectapi_present__: set[str] attribute.
  • A custom serializer omits keys not in that set when mode='json'.
  • Users querying "was this field provided?" call model.is_provided('snapshot') instead of model.snapshot.is_undefined.
  • Pros: zero custom Pydantic core schema, zero forward-ref pain, the field type is what users expect.
  • Cons: breaking API change (.is_undefined / .unwrap_or go away). Need to thread the presence set through nested models. Doesn't compose with Annotated tricks for individual fields.

C. Annotated[T | None, MissingMarker()].

  • Marker is a Pydantic-aware metadata class implementing __get_pydantic_core_schema__ once (not per-T).
  • Field type stays nominally T | None; the marker tags it as "omit from output when absent."
  • Distinct-from-null state stored as a sentinel value on the field or as a sidecar set[str].
  • Pros: less custom surface than A, doesn't require model-level cooperation like B.
  • Cons: still relies on a sentinel value (or sidecar) to express "undefined," so it inherits most of A's edge cases. Slightly nicer ergonomics.

D. Stop carrying Undefined on the round-trip path entirely. (← matches the TypeScript pattern most closely.)

  • In-memory representation: T | None. Pydantic-native validation.
  • Serializer-side: a model_serializer walks the model's __pydantic_fields_set__ and omits fields not in it.
  • The custom type exists only for serialization, not for validation or in-memory storage.
  • "Was this field provided?" is answered via 'snapshot' in item.model_fields_set — the exact analogue of TS's obj.snapshot !== undefined.
  • Pros: cleanest separation. Pydantic already tracks model_fields_set, so the sidecar exists for free. Field types match what users would write by hand.
  • Cons: the "was this provided?" query moves from the field's value to a model-level lookup. That's fine for codegen but worse for users hand-coding.

Cost framing

The custom type touches 5–6 user-facing surfaces (.value, .unwrap, .unwrap_or, .map, .filter, is_undefined / is_none / is_some). Approach B or D removes most of them; A keeps them; C keeps most. Whether that ergonomic loss is worth the structural simplification depends on how many consumers (Partly, others) currently call those methods.

Recommendation

Audit consumer code (grep for .is_undefined / .unwrap_or against the generated client). If usage is concentrated in codegen-emitted helpers and a small set of utility wrappers, D is the cleanest endpoint — it's the Python analogue of what TypeScript already does, and it shrinks the custom-Pydantic surface to a single model_serializer shared across all generated models. If users are reaching into the wrapper API directly, A is the path of least surprise.

Don't decide this on the way to 1.0 without an explicit audit — locking the current design into a stable contract makes B/C/D much more expensive later.

Related

  • fix(runtime): validate inner type in ReflectapiOption[T] schema #149 (immediate fix for the validation asymmetry)
  • The structural smell is general: any custom Pydantic type that re-implements Pydantic semantics should derive validator/serializer from a single source schema, not author them separately. The class predates Pydantic's no_info_wrap_validator_function pattern.
  • Aside (separate issue worth filing if confirmed): the demo Python client's Pet.age field renders as int | None, not ReflectapiOption[int], despite the Rust source declaring age: reflectapi::Option<u8>. Either a stale snapshot or a real schema-collapse bug — needs investigation independent of this redesign.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions