feat(schema): add semantic IR and symbol ID infrastructure#124
feat(schema): add semantic IR and symbol ID infrastructure#124
Conversation
Add SymbolId system, semantic IR types, ID assignment, and normalization pipeline to reflectapi-schema. This provides stable, unique identifiers for all schema symbols and a multi-stage pipeline for transforming raw schemas into validated semantic representations. New modules: - symbol.rs: SymbolId/SymbolKind types with stable identifiers - ids.rs: ensure_symbol_ids() for post-deserialization ID assignment - semantic.rs: Immutable semantic IR (SemanticSchema, SymbolTable, etc.) - normalize.rs: TypeConsolidation, NamingResolution, CircularDependency detection stages, and Normalizer (Schema -> SemanticSchema) Schema type changes: - Added id: SymbolId field to Schema, Function, Primitive, Struct, Field, Enum, Variant (serde skip_serializing, backward compatible) - Manual PartialEq/Hash impls exclude id from comparisons - PartialEq + Eq added to SerializationMode, Copy to SymbolKind Addresses #96, lays groundwork for #123.
There was a problem hiding this comment.
Code review is billed via overage credits. To resume reviews, an organization admin can raise the monthly limit at claude.ai/admin-settings/claude-code.
Once credits are available, reopen this pull request to trigger a review.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 86c00edfb8
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
- TypeConsolidationStage now rewrites type references after renaming conflicted types (fixes dangling references to old names) - ensure_symbol_ids uses separate seen maps per typespace and disambiguates output types that share an FQN with a different input type (prevents SymbolId collisions in the Normalizer) - Field::new and Variant::new use Default::default() for id so ensure_symbol_ids can assign proper parent-contextualized paths
|
@claude review |
- ids.rs: Use struct/enum's actual ID (not seen-map ID) as owner for
member ID assignment, fixing inconsistent parent-child paths when
types have pre-assigned IDs
- normalize.rs: Track all conflicting qualified names in name_usage
(Vec<String> per simple name), not just the first, so
update_type_references_in_schema builds mappings for all conflicting
types and avoids dangling references
- normalize.rs: Fix generate_unique_name fallback to join all module
parts instead of using module_parts[0], which would return an
excluded part ("model"/"proto") and cause name collisions
https://claude.ai/code/session_01UcJQe3CE12BFgqDiadkgii
- test_pre_assigned_id_member_paths_consistent: verifies struct field ID paths use the struct's actual ID as parent prefix - test_pre_assigned_id_enum_member_paths_consistent: same for enums - test_naming_resolution_all_conflicting_types_have_references_rewritten: verifies function references to all conflicting types (not just the first) are rewritten to valid names after NamingResolutionStage - test_generate_unique_name_excluded_modules_no_collision: verifies model::Foo and model::proto::Foo produce different names - test_generate_unique_name_with_non_excluded_module: normal case https://claude.ai/code/session_01UcJQe3CE12BFgqDiadkgii
For structs with `#[serde(flatten)]` on an internally-tagged enum,
generate per-variant models that merge parent struct fields + variant
fields + tag discriminator, then emit a discriminated union RootModel.
This matches the flat wire format serde produces.
Before: `Offer` had only `id: str` (enum fields silently dropped)
After: `Offer` is a RootModel union of `OfferSingle{id,type,business}`
and `OfferGroup{id,type,count}` — wire-compatible with serde
Also:
- Compose NormalizationPipeline into Normalizer (runs TypeConsolidation,
NamingResolution, CircularDependencyResolution before IR construction)
- Add snapshot tests for flattened externally-tagged, adjacently-tagged,
and untagged enums
- Document Boxing strategy as intentional no-op (Rust schemas already
encode Box<T>); add integration tests for self-referential and
multi-type circular dependency normalization
- Add docs/architecture.md covering semantic IR pipeline, codegen
backends, and flattened type handling
- Remove all point-in-time language ("currently", "not yet", "planned")
- Rename Section 7 from "Current Status and Roadmap" to "Limitations
and Design Gaps" — state facts, not progress
- Delete "Complete" and "In Progress" subsections
- Fix Schema/SemanticSchema code samples to include id fields
- Add reflectapi-python-runtime crate description
- Fix OpenAPI version (3.1, not 3.0)
- Replace vague language with specifics throughout
- Remove subjective tone and issue number references from prose
- Handler function signature convention (Input, Output, Headers, Error) - Input/Output traits as the self-registration mechanism - reflectapi::Option<T> three-state type (Undefined | None | Some) - Primitive.fallback mechanism for codegen type resolution - #[reflectapi(...)] derive macro attributes reference - Snapshot test architecture (5 snapshots per test, trybuild)
- Add description/deprecation_note to Function struct sample - Fix TypeConsolidation claim: both copies are renamed when name appears in both typespaces (not just when types differ) - Fix NamingResolution example: proto is skipped in prefix generation, so use ApiUser/BillingUser not ProtoUser
Replace dead reflectapi.partly.workers.dev URLs (returning 404) with links to local docs. Add link to architecture doc.
ids.rs: - Use struct's actual id (not seen-map id) as owner for member assignment, fixing inconsistent parent-child paths - Zero-pad tuple field indices (arg00, arg01, ...) so BTreeMap ordering matches positional order for 10+ fields - assign_disambiguated_id now clears and re-assigns all member IDs after disambiguation, maintaining hierarchical consistency - Schema root uses sentinel path ["__schema__", name] to avoid collision with same-named user types normalize.rs: - TypeConsolidation uses full qualified name for conflict renaming (input.a.Foo vs input.b.Foo) preventing silent type drops - resolve_types filters resolution_cache to type-level symbols only, preventing Field/Variant entries from shadowing type lookups - discover_struct/enum_symbols derives SymbolInfo.path from field.id.path for consistency with split-path ID assignment
|
@claude review |
...c/tests/snapshots/reflectapi_demo__tests__serde__flatten_adjacently_tagged_enum_field-5.snap
Show resolved
Hide resolved
| let flattened_internal_enum = | ||
| struct_def | ||
| .fields | ||
| .iter() | ||
| .filter(|f| f.flattened()) | ||
| .find_map(|field| { | ||
| let type_name = resolve_flattened_type_name(&field.type_ref); | ||
| match schema.get_type(type_name) { | ||
| Some(reflectapi_schema::Type::Enum(enum_def)) => { | ||
| match &enum_def.representation { | ||
| reflectapi_schema::Representation::Internal { tag } => { | ||
| Some((field, enum_def.clone(), tag.clone())) | ||
| } | ||
| _ => None, | ||
| } | ||
| } | ||
| _ => None, | ||
| } | ||
| }); | ||
|
|
||
| if let Some((_enum_field, enum_def, tag)) = flattened_internal_enum { | ||
| // Wire-compatible path: generate per-variant models with merged fields | ||
| render_struct_with_flattened_internal_enum( | ||
| struct_def, |
There was a problem hiding this comment.
🔴 The find_map at python.rs:462 returns only the FIRST flattened internally-tagged enum field; if a struct has two such fields (valid in Rust when they use different tag names), the second enum's variants are never generated and are silently dropped from the Python output. Any consumer deserializing such a struct will face a mismatch: the Rust type has two independent discriminated unions flattened in, but the Python model only reflects one of them.
Extended reasoning...
What the bug is and how it manifests
In render_struct_with_flatten (python.rs lines 457–475), the iterator chains .filter(|f| f.flattened()).find_map(...) to locate a flattened internally-tagged enum field. find_map short-circuits on the first match and returns Option<(field, enum_def, tag)>. Only that single enum_def is ever passed to render_struct_with_flattened_internal_enum. If a struct has a second flattened internally-tagged enum field — valid Rust with serde when the two enums use distinct tag field names (e.g. type and kind) — find_map never sees it.
The specific code path that triggers it
Inside render_struct_with_flattened_internal_enum, the loop at lines 561–578 iterates over all flattened fields:
for field in struct_def.fields.iter().filter(|f| f.flattened()) {
let type_name = resolve_flattened_type_name(&field.type_ref);
if let Some(reflectapi_schema::Type::Struct(_)) = schema.get_type(type_name) {
// expand struct fields into base_fields
}
// Enum fields are handled below as variants <-- misleading comment
}The comment says "Enum fields are handled below as variants", but "below" refers only to the for variant in &enum_def.variants loop, which iterates over the variants of the ONE enum that was found by find_map. A second flattened internally-tagged enum field is neither expanded into base_fields nor iterated as a variant block. It is completely skipped.
Why existing code does not prevent it
The function signature render_struct_with_flattened_internal_enum(... enum_def: &Enum ...) accepts a single enum. There is no mechanism to pass, receive, or render a second enum. The test suite (test_flatten_internally_tagged_enum_field) uses a struct with exactly one flattened enum, so the missing second-enum path is never exercised.
What the impact would be
Given a Rust struct:
struct Combined {
id: String,
#[serde(flatten)] action: ActionKind, // internal tag "type"
#[serde(flatten)] status: StatusKind, // internal tag "kind"
}The generated Python model would contain only the ActionKind discriminated union variants. Every variant that comes from StatusKind — including its tag field "kind" — is absent from the Python output. Any Python code receiving a wire message with {"id":"1","type":"Create","kind":"Active",...} would fail to deserialize or would silently ignore the kind and all status-related fields.
Step-by-step proof
struct_defhas two flattened fields:action: ActionKind(internal tagtype) andstatus: StatusKind(internal tagkind)..filter(|f| f.flattened()).find_map(...))evaluatesactionfirst.ActionKindmatchesRepresentation::Internal, sofind_mapreturnsSome((action_field, action_enum_def, "type"))immediately.status: StatusKindis never evaluated.render_struct_with_flattened_internal_enumreceivesenum_def = ActionKindand generatesCombinedCreate,CombinedDelete, etc. — noCombinedActive,CombinedInactivevariants.- The inner loop at line 562 skips
statusbecause it is an Enum (not a Struct) and the comment defers to code that never runs for it. - Result:
StatusKind's variants are entirely absent from the Python output.
How to fix it
Collect ALL flattened internally-tagged enums (not just the first), then either: (a) generate a cross-product of variant combinations, which is complex but wire-accurate; or (b) for each additional internally-tagged enum beyond the first, fall back to the standard field emission path used for non-internal enums, with a documented limitation. At minimum, a warning or error should be surfaced when multiple flattened internally-tagged enums are detected, rather than silently generating incorrect output.
| StdNumNonZeroI32 = Annotated[int, "Rust NonZero i32 type"] | ||
| StdNumNonZeroI64 = Annotated[int, "Rust NonZero i64 type"] | ||
|
|
||
| # Rebuild models to resolve forward references | ||
| try: | ||
| ReflectapiDemoTestsSerdeCell.model_rebuild() | ||
| ReflectapiDemoTestsSerdeValue.model_rebuild() | ||
| except AttributeError: | ||
| # Some types may not have model_rebuild method | ||
| pass | ||
|
|
||
| # Factory classes (generated after model rebuild to avoid forward references) | ||
|
|
There was a problem hiding this comment.
🔴 The Python codegen emits model_rebuild() calls for Union type aliases (e.g., ReflectapiDemoTestsSerdeValue = Union[...]) alongside real BaseModel subclasses inside a single try/except AttributeError block. Union aliases have no model_rebuild() method, so the call always raises AttributeError. Because all calls share one block, any Union alias that sorts alphabetically before a real BaseModel subclass will silently abort the entire try block, leaving the real model's forward references unresolved. The fix is to either wrap each model_rebuild() call in its own try/except block, or filter the list to exclude Union aliases.
Extended reasoning...
What the bug is and how it manifests
The Python codegen in reflectapi/src/codegen/python.rs (around line 1320-1332) collects all rendered type names, sorts them alphabetically, and emits them in a single try/except AttributeError block. In the snapshot flatten_untagged_enum_field-5.snap (lines 148-160), ReflectapiDemoTestsSerdeValue is a plain Python Union type alias, not a Pydantic BaseModel subclass:
ReflectapiDemoTestsSerdeValue = Union[
ReflectapiDemoTestsSerdeValueNum, ReflectapiDemoTestsSerdeValueText
]
Union type aliases in Python are typing special forms and have no model_rebuild() method. Calling .model_rebuild() on them always raises AttributeError.
The specific code path that triggers the latent bug
Types are sorted alphabetically before the block is emitted (sorted_type_names.sort() in python.rs). In the tested snapshot, Cell (C) sorts before Value (V), so ReflectapiDemoTestsSerdeCell.model_rebuild() runs first and succeeds, and then the AttributeError from ReflectapiDemoTestsSerdeValue.model_rebuild() is caught. This specific case is harmless.
However, the structural defect is that all calls share one try/except block. Consider any schema where a Union alias name sorts alphabetically before a real BaseModel/RootModel subclass — for example, an 'AValue = Union[...]' alias and a 'BModel(BaseModel)' class. The sequence would be: (1) AValue.model_rebuild() raises AttributeError, (2) the except block catches it and execution exits the entire try block, (3) BModel.model_rebuild() is never called.
Why existing code does not prevent it
The comment 'Some types may not have model_rebuild method' shows the author anticipated this case, but the single-block structure is the defect. The only reason the tested snapshots work is that all real models happen to sort before the Union aliases in the current test cases. With 'from future import annotations' active (which this generated file uses), Pydantic defers annotation evaluation and depends on model_rebuild() being called to resolve forward references in complex schemas. Any schema where a Union alias sorts before a real model relying on forward reference resolution will silently produce broken Pydantic models.
Step-by-step proof for the latent ordering failure
Suppose a schema produces 'AValueUnion = Union[AVariant1, AVariant2]' and 'class BModel(BaseModel): field: SomeForwardRef'. In the single try/except block (alphabetical order): AValueUnion.model_rebuild() raises AttributeError, the except catches it and exits the block, BModel.model_rebuild() never runs, and SomeForwardRef remains an unresolved string annotation in BModel.
How to fix it
Option 1 (simplest): wrap each call in its own try/except so that a failure on a Union alias does not abort subsequent real model rebuilds. Option 2: filter the type name list at codegen time to exclude Union type aliases, only emitting model_rebuild() calls for actual BaseModel/RootModel subclasses.
| pub struct SemanticEnum { | ||
| pub id: SymbolId, | ||
| pub name: String, | ||
| pub serde_name: String, | ||
| pub description: String, | ||
|
|
||
| /// Resolved generic parameters | ||
| pub parameters: Vec<SemanticTypeParameter>, | ||
|
|
||
| /// Variants ordered deterministically | ||
| pub variants: BTreeMap<SymbolId, SemanticVariant>, | ||
|
|
||
| /// Serde representation strategy | ||
| pub representation: crate::Representation, | ||
|
|
||
| /// Language-specific configuration | ||
| pub codegen_config: crate::LanguageSpecificTypeCodegenConfig, | ||
| } |
There was a problem hiding this comment.
🔴 SemanticEnum.variants is BTreeMap<SymbolId, SemanticVariant> (semantic.rs:103), which sorts variants alphabetically by name rather than by declaration order. For #[serde(untagged)] enums, serde tries variants in declaration order and picks the first successful deserialization — any downstream codegen backend iterating SemanticEnum.variants will silently use the wrong order, causing incorrect deserialization when an alphabetically-earlier variant can absorb input intended for a later one. Fix by using IndexMap<SymbolId, SemanticVariant> or Vec to preserve insertion order.
Extended reasoning...
What the bug is and how it manifests
SemanticEnum.variants is typed as BTreeMap<SymbolId, SemanticVariant> (semantic.rs line 103). SymbolId derives Ord by field order: kind, then path: Vec, then disambiguator. All variants within the same enum share kind=Variant, and their path ends with the variant name — so the BTreeMap sorts them alphabetically by variant name, not by the order they appear in the source.
The specific code path that triggers it
In normalize.rs, build_semantic_enum (around line 1134) iterates enm.variants() — which returns variants in their raw declaration order (preserved in Vec) — and inserts each into a BTreeMap<SymbolId, SemanticVariant> keyed by SymbolId. The BTreeMap then re-sorts by SymbolId::Ord, discarding the position metadata. The Normalizer::build_semantic_enum code is:
for variant in enm.variants() {
let semantic_variant = self.build_semantic_variant(variant)?;
variants.insert(variant.id.clone(), semantic_variant); // BTreeMap re-sorts
}
Why existing code does not prevent it
The raw Enum.variants field is Vec, which preserves declaration order. That order is available at the point build_semantic_enum iterates enm.variants(). However, the result is inserted into a BTreeMap which re-sorts by SymbolId. There is no assertion, test, or fallback that checks whether BTreeMap ordering matches declaration order.
What the impact would be
For #[serde(untagged)] enums, serde's contract is: try variants in declaration order, use the first that deserializes successfully. A codegen backend that iterates SemanticEnum.variants (the natural, intended API) will silently produce a client that applies variants in alphabetical order instead. This leads to incorrect deserialization for any untagged enum where two variants can both deserialize a given input — the wrong variant is selected with no error.
Example: an enum declared as [Integer(i64), Float(f64)] — both variants can deserialize the JSON value 42. Serde (declaration order) picks Integer. A backend using SemanticEnum.variants iteration (alphabetical) tries Float first and picks Float. The generated client silently deserializes a different type than Rust would.
The new test case added in this PR includes test_flatten_untagged_enum_field with enum Value { Num { value: f64 }, Text { text: String } }. Alphabetically Num < Text, which happens to match declaration order here. But for any enum where declaration order differs from alphabetical order, the bug manifests.
How to fix it
Replace BTreeMap<SymbolId, SemanticVariant> with an insertion-order-preserving collection:
- IndexMap<SymbolId, SemanticVariant> from the indexmap crate — preserves insertion order, provides O(1) keyed lookup
- Vec — simplest, no key-based lookup without an auxiliary index
The same fix is needed for SemanticVariant.fields and SemanticStruct.fields for correctness with positional (unnamed) fields.
Step-by-step proof
- Define an untagged enum: variants declared as [Integer(i64), Float(f64)].
- Normalizer::build_semantic_enum inserts Integer (SymbolId path=["MyEnum","Integer"]) then Float (path=["MyEnum","Float"]) into BTreeMap.
- BTreeMap sorts by path lexicographically: "Float" < "Integer", so Float entry comes first in iteration.
- A codegen backend calls semantic_enum.variants.values() and emits: try Float, then try Integer.
- For JSON input 42: both would match — Float wins because it was tried first. Rust's serde would have picked Integer (declaration order). The generated client deserializes a different type silently.
- Python codegen: set has_externally_tagged_enums flag for Adjacent representation too, fixing missing RootModel/model_validator imports - generate_unique_name: join ALL non-excluded module components to avoid collisions (ServicesUserProfile vs AuthUserProfile) - discover_symbols: use function.id.path instead of splitting HTTP URL path, fixing SymbolTable get_by_path for endpoints
|
@claude review |
- Sanitize tag discriminator field name for Python reserved words (e.g., "type" → "type_" with alias). Fixes SyntaxError when tag name is a Python keyword. - Add model_rebuild() calls for per-variant classes generated by render_struct_with_flattened_internal_enum. Fixes forward reference resolution with `from __future__ import annotations`. - Guard against empty enum variants producing invalid `Union[]` syntax.
ids.rs (3 tests): - Zero-padded tuple field ordering (arg00..arg11 sort correctly) - Disambiguated ID propagates to member IDs - Schema root ID does not collide with same-named type normalize.rs (4 tests): - TypeConsolidation preserves all types with qualified name uniqueness - resolve_types does not confuse variant with type of same name - generate_unique_name distinguishes same-inner-module paths - Function symbol path matches ID for get_by_path lookups
Merges the askama dependency removal from PR #122. Template structs now use manual render() methods returning String instead of askama::Template derive + fallible render. Conflict resolution: kept both the TestingModule render() impl from #122 and the #[derive(Clone)] on Field from our branch. Fixed render()? -> render() in render_struct_with_flattened_internal_enum.
Run Normalizer::normalize() at the start of Python codegen's generate() function, making the SemanticSchema available alongside the raw Schema. - Add convenience methods to SemanticSchema: get_type_by_name(), get_type(), types(), functions(), type_names() - The SemanticSchema is constructed once and available for render functions that benefit from type-safe SymbolId lookups - Raw Schema is still used for the main iteration loop since the Normalizer's NamingResolutionStage transforms type names, and the existing codegen relies on pre-normalization names - Graceful fallback if normalization fails (best-effort) This is the first consumer of SemanticSchema in the codegen path, validating the IR infrastructure from #96.
- Replace broken fallback (would panic on same error) with .ok() that makes normalization best-effort - Use _semantic prefix for intentionally-unused binding - get_type_by_name: use symbol table O(log n) lookup with linear scan fallback, instead of always O(n) - type_names: return iterator instead of allocating Vec<String> - Remove stale dead code reference to `semantic` variable
Python codegen fixes: - Underscore-prefixed fields no longer treated as Pydantic private attributes. sanitize_field_name strips leading underscores and generates Field(alias="_original") for wire compatibility. - exclude_none=True removed from enum serializers — was dropping intentional None values. Plain model_dump() matches serde behavior. - Factory method parameters now include type annotations (e.g., `def circle(radius: float)` instead of `def circle(radius)`). - sanitize_field_name_with_alias now takes serde_name for proper alias generation on renamed fields. Normalizer refactor: - normalize() takes &Schema instead of Schema by value, eliminating the clone at the call site (clones internally for pipeline mutation) - build_semantic_ir receives pre-pipeline original_names map - SemanticPrimitive/Struct/Enum gain original_name field preserving pre-normalization qualified names - SemanticSchema::get_type_by_name falls back to original_name search ~88 snapshots updated with type-annotated factory params, wire-name aliases on renamed fields, and model_dump() without exclude_none.
Port the TypeScript/Rust namespace algorithm to Python codegen.
Type definitions remain at module top-level with flat PascalCase names
for Pydantic forward-reference resolution. Namespace alias classes
provide dotted access paths mirroring the Rust module hierarchy:
class reflectapi_demo:
class tests:
class serde:
Offer = ReflectapiDemoTestsSerdeOffer
OfferKind = ReflectapiDemoTestsSerdeOfferKind
Users access types as: reflectapi_demo.tests.serde.Offer
Type references in annotations, client methods, model_rebuild calls,
and factory classes all use dotted paths. This matches the approach
used by TypeScript (export namespace) and Rust (pub mod) backends.
Implementation:
- New Module struct + modules_from_rendered_types (ported from TS)
- type_name_to_python_ref converts :: paths to dotted notation
- Client signatures use dotted type references
- Factory/testing utilities use namespaced names
- Removed old generate_nested_class_structure dead code
125 snapshot files updated.
- extract_defined_names now only matches top-level definitions (no leading whitespace), preventing enum member values like NOT_FOUND from leaking into namespace alias classes - Filter out SCREAMING_SNAKE_CASE constants (enum members) - Filter out *Variants internal union type aliases from namespace (implementation details, not part of the public API surface)
…lones - Remove dead _semantic normalizer call (constructed but never used) - Filter TypeVar declarations (T, U) from extract_defined_names - Move instead of clone rendered_original_names_in_order - Collect rendered_type_keys before moving rendered_types - Delete dead Imports::render() method (~95 lines) - Delete always-false has_flatten_support field - Inline trivial to_valid_python_identifier wrapper
Coverage for previously untested code paths across 6 categories: Namespace edge cases (3): single-segment types, deeply nested modules, numeric/special character field names Flatten edge cases (5): nested flatten depth > 1, optional internally- tagged enum flatten, multiple flattened structs, combined struct + enum flatten, unit-variant-only enum flatten Enum representation edge cases (4): generic externally-tagged enum, generic adjacently-tagged enum, mixed variant types (unit + struct), serde rename on variants Type reference edge cases (4): Box<T> unwrapping, nested generic containers (Vec<Vec<u32>>), self-referential struct, Option<Option<T>> Field sanitization edge cases (3): all Python keywords as field names, special characters in serde renames, multiple underscore prefixes Factory/client edge cases (2): 12-variant enum at scale, empty enum 105 new snapshot files (21 tests x 5 snapshots each).
Real-world validation against Partly's core-server (284 endpoints, 78K-line schema) revealed two codegen bugs: 1. Descriptions containing backslashes (e.g., "object\'s") break Python docstrings because \ acts as a line continuation character. Added sanitize_for_docstring() that escapes \ and """ in all 13 template render methods that emit docstrings. 2. Factory method names derived from enum variant names (e.g., "global", "from") can be Python keywords, producing SyntaxError. Applied safe_python_identifier() to all factory method name and parameter name generation sites. The generated 57K-line Python client for core-server now parses as valid Python (verified with py_compile).
Fixes NameError when importing the generated client: model_rebuild() was called inline (in render_struct_with_flattened_internal_enum) before namespace alias classes were defined, so dotted type references like `business_rules.Response` could not be resolved. Moved all model_rebuild() calls to the global rebuild section which runs after namespace classes are defined. Also sanitized all remaining docstring emission points (13 locations) to escape backslashes and triple-quotes in description text. Fixed factory method names and parameters using Python keywords (from, global) via safe_python_identifier(). Validated against Partly's core-server (284 endpoints, 78K-line schema): - 57K-line Python client generates as valid Python - Imports in 0.65s - Successfully authenticates against live API (dev13)
mdbook 0.5.x changed the preprocessor JSON protocol, breaking mdbook-keeper compatibility. Pin both tools to compatible versions: - mdbook ~0.4 (0.4.x series) - mdbook-keeper ~0.5 This fixes doc builds that have been failing on main since Jan 2026. Applied to docs.yml, docs-preview.yml workflows. Also: add __pycache__/*.pyc to .gitignore, remove accidentally committed pycache files.
|
📖 Documentation Preview: https://reflectapi-docs-preview-pr-124.partly.workers.dev Updated automatically from commit 1dd3496 |
- TypeScript no longer uses askama (removed in #122), uses std::fmt::Write - Python is no longer experimental — validated against production API - Python section updated to document namespace classes, alias handling, docstring escaping, factory type annotations - Python flatten example updated to show actual type_ alias pattern - Limitations section references #127 for remaining DX improvements
Field descriptions: - Schema field descriptions now emitted as Field(description="...") in generated Pydantic models. Descriptions appear in IDE hover, model_json_schema(), and help() output. - Added sanitize_for_string_literal() to escape newlines, quotes, and backslashes in description strings. - Flattened-field internal descriptions (prefixed "(flattened") are filtered out as they're implementation details. Typed error returns: - Client methods now return ApiResponse[OutputType, ErrorType] instead of ApiResponse[Any], making the error type visible in the signature and IDE autocompletion. - ApiResponse runtime class updated to Generic[T, E] (backward compatible — ApiResponse[T] still works). - Docstring return section shows both success and error types. Validated against Partly's core-server (284 endpoints): - All field descriptions preserved including multi-line ones - All error types visible in method signatures - Generated 57K-line client passes py_compile
Factory classes (371 in core-server output) consumed ~13K lines (19%
of file) and provided no value over direct type construction:
# Before (factory):
myapi_proto_PetsCreateErrorFactory.conflict()
# After (direct, already works):
myapi.proto.PetsCreateError("Conflict")
Removed:
- FactoryInfo struct and 5 factory generation functions
- generate_factory_method_params/args helpers
- render_*_without_factory naming (renamed to render_*/render_enum)
- sanitize_field_name (only used by factory code)
- HybridEnumClass and FactoryMethod template structs
- Default generate_testing changed to false
697 lines removed from python.rs (6595 -> 5898).
Core-server output: 58K lines (down from 68K, -13%).
Validated: imports, constructs types, authenticates against live API.
Runtime fixes: - Use Pydantic TypeAdapter for all response validation, replacing the manual isinstance/model_validate chain. This correctly handles generic types like list[Model], dict[str, Model], Union types, and plain BaseModel subclasses. - Use TypeAdapter.validate_json(bytes) for Pydantic's Rust-based fast JSON parser when raw bytes are available, falling back to validate_python(dict) otherwise. - Add error_model parameter to _make_request and _handle_error_response. When an API returns an error, the runtime attempts to deserialize the error body into the typed error model. Accessible via ApplicationError.typed_error. Codegen: - Generated _make_request calls now pass error_model= with the typed error type from the schema. Validated against Partly's core-server: - list[BillingCurrencyListItem] returns typed Pydantic models (was returning raw dicts) - CustomerGetError.typed_error = CustomerNotFoundVariant(customer_id=...) (was raw dict string) - 170 currencies validated via fast validate_json path
The Python codegen now uses SemanticSchema as the primary driver for type iteration, import detection, and function ordering: - Type iteration uses semantic.types() (deterministic BTreeMap order) instead of manual topological_sort_types (removed: 118 lines) - Import detection (has_enums, has_literal, etc.) uses SemanticType pattern matching instead of raw schema.get_type() lookups - Function iteration uses semantic.functions() for ordering - Deprecation detection uses semantic function metadata The raw Schema is kept for rendering (render functions need concrete Struct/Enum/Field types). Lookups use original_name (pre-normalization qualified name like "analytics::AnalyticsEventInsertData") to find types in the consolidated raw schema. Fixed original_names capture in Normalizer: builds short→qualified name mapping from pre-normalization type names, keyed by the post-normalization short name that NamingResolutionStage produces. Validated: 220 tests pass, core-server (284 endpoints) generates valid 47K-line Python client, live API authentication works.
The Python codegen now uses SemanticSchema as the single source of truth for type iteration, with the raw Schema providing concrete type data for rendering. Architecture: - NormalizationPipeline::for_codegen() runs only CircularDependency detection (no TypeConsolidation, no NamingResolution) - schema.consolidate_types() runs first, then Normalizer builds SemanticSchema from the consolidated schema - Since NamingResolution is skipped, SemanticType.name() matches the raw Schema's names exactly — no name-domain mismatch - Removed all original_name bridging logic TypeVar collision fix: - Detects when TypeVar names (e.g., Identity) collide with class names and renames them with _T_ prefix (_T_Identity) - rename_type_params_in_schema() propagates renames through all type parameter declarations and type references Validated: 220 tests pass, core-server (284 endpoints, 59K lines) generates valid Python, live API authentication works.
Replace the fixed standard()/for_codegen() pipeline variants with a
declarative PipelineBuilder that lets backends configure each stage:
PipelineBuilder::new()
.consolidation(Consolidation::Skip) // or Standard (default)
.naming(Naming::Skip) // or Standard, or Custom(stage)
.circular_dependency_strategy(...) // default: Intelligent
.add_stage(custom_stage) // append backend-specific stages
.build()
Three configuration dimensions:
- Consolidation: Standard (run TypeConsolidationStage) | Skip
- Naming: Standard (NamingResolution) | Skip | Custom(Box<dyn Stage>)
- ResolutionStrategy: passed to CircularDependencyResolutionStage
Convenience methods standard() and for_codegen() delegate to the
builder internally and remain as shorthand. Python codegen uses
PipelineBuilder directly with Skip/Skip.
Architecture doc updated with PipelineBuilder diagram and config docs.
- Remove stale architecture doc claims (field descriptions and error types are now implemented, not "remaining gaps") - Remove dead code in render_struct: unreachable flattened-fields collection loop (flattened structs take the early return path)
- Always import Field — used for descriptions, aliases, discriminators across many contexts. Fixes NameError for schemas with aliased fields but no discriminated unions. - Remove dead try/except around response_model identity check in runtime client (both sync and async).
Summary
Adds semantic IR infrastructure (#96), fixes Python codegen for flattened tagged-union fields (#123), makes Python a first-class codegen backend with namespace classes, typed errors, field descriptions, and SemanticSchema-driven code generation.
Semantic IR Infrastructure (
reflectapi-schema)symbol.rsSymbolId/SymbolKind— stable unique identifiers for all schema symbolsids.rsensure_symbol_ids()— canonical ID assignment with cross-typespace disambiguationsemantic.rsSemanticSchema,SemanticType,SymbolTable,ResolvedTypeReferencenormalize.rsNormalizationPipeline+Normalizer(&Schema→SemanticSchema)Schema type changes:
id: SymbolIdon all types,Normalizer::normalize(&Schema),original_nameonSemanticTypepreserving pre-normalization qualified names.Python Codegen — Now Driven by SemanticSchema
The Python codegen uses
SemanticSchemaas the primary driver:semantic.types()(deterministic BTreeMap order, replaces manual topological sort)SemanticTypepattern matchingsemantic.functions()Schemakept for rendering (concrete field/variant data)Python Codegen — Wire-Compatible Flatten (#123)
For
#[serde(flatten)]on internally-tagged enums, generates per-variant models merging parent fields + tag + variant fields into a discriminated unionRootModel.Python Codegen — First-Class DX
auth.UsersSignInRequest)Field(description="...")ApiResponse[OutputType, ErrorType]in method signaturesApplicationError.typed_erroras Pydantic modellist[Model]viaTypeAdapter(was returning raw dicts)validate_json(bytes)Python Runtime Fixes
TypeAdapterfor all response validation (handleslist[Model], generics, unions)error_modelparameter on_make_requestfor typed error deserializationApiResponse[T, E]generic with both success and error type parametersvalidate_json(bytes)fast path for Pydantic's Rust-based parserOther Changes
docs/architecture.md)Real-World Validation
Partly's core-server (284 endpoints, 78K-line schema):
list[BillingCurrencyListItem]returns Pydantic modelsApplicationError.typed_error=CustomerGetErrorCustomerNotFoundVariantTest Coverage