Skip to content

feat: add model serialization with input types and constraint types#103

Merged
Kamilbenkirane merged 2 commits intomainfrom
feat/model-serialization-input-types
Dec 26, 2025
Merged

feat: add model serialization with input types and constraint types#103
Kamilbenkirane merged 2 commits intomainfrom
feat/model-serialization-input-types

Conversation

@Kamilbenkirane
Copy link
Copy Markdown
Member

Summary

  • Add type computed_field to Constraint for JSON serialization
  • Add InputType enum (text, image, video, audio) to core
  • Add supported_input_types computed field to Model (per-capability)
  • Add optional_input_types computed field to Model (from constraints)
  • Add get_required_input_types() and get_constraint_input_type() helpers
  • Use SerializeAsAny[Constraint] for proper polymorphic serialization

This enables the API to return complete model metadata including:

  • What input types each capability requires (e.g., text-generation: text)
  • What optional inputs are accepted via parameters (e.g., reference_images)
  • Constraint type identifiers for frontend rendering

Test plan

  • All existing tests pass
  • CI passes (lint, format, typecheck, security, coverage)

- Add `type` computed_field to Constraint for JSON serialization
- Add `InputType` enum (text, image, video, audio) to core
- Add `supported_input_types` computed field to Model (per-capability)
- Add `optional_input_types` computed field to Model (from constraints)
- Add `get_required_input_types()` and `get_constraint_input_type()` helpers
- Use `SerializeAsAny[Constraint]` for proper polymorphic serialization

This enables the API to return complete model metadata including:
- What input types each capability requires (e.g., text-generation: text)
- What optional inputs are accepted via parameters (e.g., reference_images)
- Constraint type identifiers for frontend rendering
@claude
Copy link
Copy Markdown

claude Bot commented Dec 26, 2025

Pull Request Review: Model Serialization with Input Types and Constraint Types

Summary

This PR adds computed fields to Model and Constraint classes to enable proper JSON serialization with metadata about input types and constraint types. The implementation is well-designed and follows good software engineering practices.


Code Quality and Best Practices

Strengths

  1. Well-designed API: The use of computed fields for supported_input_types and optional_input_types keeps the Model class clean and the data is derived automatically.

  2. Good separation of concerns: Helper functions (get_required_input_types, get_constraint_input_type) are properly isolated in io.py.

  3. Type safety: Excellent use of type hints throughout, including proper handling of generics and unions.

  4. Clean introspection: The get_constraint_input_type function properly uses inspect.get_annotations with eval_str=True to handle stringified annotations.

  5. Proper use of SerializeAsAny: Using SerializeAsAny[Constraint] for polymorphic serialization is the correct approach for Pydantic v2.

Areas for Improvement

1. Missing Tests (Priority: HIGH)

The PR adds significant new functionality but I don't see corresponding test coverage. Missing test cases include:

  • Test Model.supported_input_types computed field
  • Test Model.optional_input_types computed field
  • Test Constraint.type computed field for all constraint types
  • Test get_required_input_types() with different capabilities
  • Test get_constraint_input_type() with various constraint types
  • Test _extract_input_type() with unions, lists, and nested generics
  • Test serialization of Model to JSON includes new fields

Recommendation: Add a new test file tests/unit_tests/test_io.py or expand test_models.py with tests for the new computed fields and helper functions.

2. Potential Edge Cases in _extract_input_type (Priority: MEDIUM)

In io.py:95-116, the condition 'if origin is types.UnionType or origin is not None:' means ANY generic type (not just unions) will trigger recursion into args. While this works, it could lead to unexpected behavior with complex nested types.

Recommendation: Be more explicit about which types should be recursively examined.

3. Documentation (Priority: LOW)

The computed fields have docstrings, but they're quite brief. Consider adding examples showing what the return values look like.

4. Error Handling (Priority: LOW)

In get_required_input_types (io.py:75-92), if get_input_class raises a KeyError, the error will propagate. Consider whether this is the desired behavior or if it should return an empty set for unregistered capabilities.


Potential Bugs or Issues

Issue 1: KeyError for Unregistered Capabilities

Location: io.py:87, models.py:31

If a Model has a capability that hasn't had an Input class registered yet (e.g., during initialization), calling model.supported_input_types will raise a KeyError.

Scenario: A model with Capability.SEARCH (if no Input registered for SEARCH) will raise KeyError when accessing supported_input_types.

Recommendation: Add error handling in supported_input_types computed field to gracefully skip capabilities without registered Input classes, or document this requirement clearly.

Issue 2: Return Type Inconsistency

Location: io.py:119-136

The function get_constraint_input_type returns InputType | None. The function could return early on first match for better readability rather than continuing to iterate through all annotations.


Performance Considerations

Good Choices

  1. Computed fields are cached by Pydantic, so repeated access to supported_input_types won't trigger re-computation.

  2. Introspection happens at access time, not initialization, which is good for startup performance.

Minor Optimization

The optional_input_types property (models.py:35-42) iterates through all constraints. For models with many constraints, this is acceptable but worth noting.


Security Concerns

No Major Security Issues

  1. The use of inspect.get_annotations(eval_str=True) is safe here since it's only evaluating type annotations from code within the package, not user input.

  2. No external data is being processed without validation.

  3. All new code follows the existing security patterns in the codebase.


Test Coverage

Critical Gap: Missing Test Coverage

Based on my review of the test files, there are NO tests for the new functionality.

Required tests include:

  1. Constraint type serialization (test that each constraint type serializes with correct type field)
  2. Model input types (test supported_input_types and optional_input_types)
  3. Helper functions (test get_required_input_types, get_constraint_input_type, _extract_input_type)
  4. Edge cases (empty capabilities, empty parameter_constraints)
  5. Integration (test Model.model_dump() includes new computed fields)

Overall Assessment

This is a well-designed feature that follows good architectural principles. The code quality is high, and the implementation is clean. However, the PR should not be merged without addressing the test coverage gap.

Recommendations Before Merge:

  1. MUST: Add comprehensive unit tests for all new functionality
  2. SHOULD: Add defensive handling for unregistered capabilities in supported_input_types
  3. SHOULD: Refine the type checking logic in _extract_input_type to be more explicit
  4. NICE TO HAVE: Expand docstrings with examples

Score: 7.5/10

  • Code Quality: 9/10
  • Test Coverage: 2/10 (Critical gap)
  • Documentation: 7/10
  • Architecture: 9/10

Great work on the implementation! Once tests are added, this will be a solid contribution.

@Kamilbenkirane Kamilbenkirane merged commit 69ac0fe into main Dec 26, 2025
11 checks passed
@claude
Copy link
Copy Markdown

claude Bot commented Dec 26, 2025

Code Review - PR #103: Add Model Serialization with Input Types and Constraint Types

Summary

This PR adds computed fields to enable complete model metadata serialization, including input type requirements and constraint type identifiers. The implementation is well-structured and follows good architectural patterns.


Strengths

  1. Clean API Design: The computed fields (supported_input_types, optional_input_types) provide a clear, self-documenting API for model capabilities
  2. Proper Separation of Concerns: Input type introspection logic is centralized in io.py rather than scattered across multiple files
  3. Type Safety: Good use of type hints throughout, including proper handling of generic types and unions
  4. Centralized Mapping: INPUT_TYPE_MAPPING provides a single source of truth for type-to-InputType mappings
  5. Polymorphic Serialization: Correct use of SerializeAsAny[Constraint] to ensure constraint subclasses serialize properly

@claude
Copy link
Copy Markdown

claude Bot commented Dec 26, 2025

🔍 Code Quality Issues

1. Missing Test Coverage ⚠️ HIGH PRIORITY

The PR adds significant new functionality but includes no tests for the new features:

Missing test coverage for:

  • get_required_input_types() - needs tests with various Input class configurations
  • get_constraint_input_type() - needs tests for all constraint types (ImageConstraint, ImagesConstraint, etc.)
  • _extract_input_type() - needs tests for union types, generics, nested types
  • Model.supported_input_types computed field - needs serialization tests
  • Model.optional_input_types computed field - needs tests with various constraint combinations
  • Constraint.type computed field - needs tests to verify it returns correct class names

Recommended tests (examples):

# tests/unit_tests/test_io.py (new file)
def test_get_required_input_types_text_generation():
    result = get_required_input_types(Capability.TEXT_GENERATION)
    assert result == {InputType.TEXT}

def test_get_constraint_input_type_image_constraint():
    constraint = ImageConstraint()
    result = get_constraint_input_type(constraint)
    assert result == InputType.IMAGE

@claude
Copy link
Copy Markdown

claude Bot commented Dec 26, 2025

2. Potential Runtime Error in get_required_input_types() ⚠️ MEDIUM PRIORITY

Location: src/celeste/io.py:88-92

Issue: The function assumes all field annotations exist directly in INPUT_TYPE_MAPPING, but doesn't handle fields with Optional[...] or union types.

return {
    INPUT_TYPE_MAPPING[field.annotation]  # ❌ KeyError if annotation is Optional[str] or str | None
    for field in input_class.model_fields.values()
    if field.annotation in INPUT_TYPE_MAPPING
}

Example failure case:

class MyInput(Input):
    prompt: str | None  # This won't match INPUT_TYPE_MAPPING[str | None]

Recommended fix:

def get_required_input_types(capability: Capability) -> set[InputType]:
    input_class = get_input_class(capability)
    input_types: set[InputType] = set()
    
    for field in input_class.model_fields.values():
        input_type = _extract_input_type(field.annotation)
        if input_type is not None:
            input_types.add(input_type)
    
    return input_types

This leverages the existing _extract_input_type() helper which already handles unions and generics properly.

@claude
Copy link
Copy Markdown

claude Bot commented Dec 26, 2025

3. Incomplete Type Handling in _extract_input_type() ⚠️ LOW PRIORITY

Location: src/celeste/io.py:110

Issue: The condition origin is types.UnionType or origin is not None is overly broad and may process types unexpectedly.

if origin is types.UnionType or origin is not None:  # ❌ Catches ALL generics
    for arg in get_args(param_type):
        result = _extract_input_type(arg)
        if result is not None:
            return result

Edge cases:

  • dict[str, ImageArtifact] → Would incorrectly return InputType.IMAGE
  • tuple[str, ImageArtifact] → Would incorrectly return InputType.TEXT

Recommended fix:

from typing import Union
import typing

# Handle Union types (including Optional)
if origin is types.UnionType or origin is Union:
    for arg in get_args(param_type):
        result = _extract_input_type(arg)
        if result is not None:
            return result

# Handle list/sequence types
elif origin in (list, typing.Sequence):
    args = get_args(param_type)
    if args:
        return _extract_input_type(args[0])

@claude
Copy link
Copy Markdown

claude Bot commented Dec 26, 2025

🔒 Security Considerations

No security issues identified. The code:

  • Uses type introspection safely without eval() or dynamic code execution
  • Properly uses eval_str=True in inspect.get_annotations() for forward references (Python 3.10+)
  • Doesn't expose internal implementation details through serialization

Performance Considerations

Computed Fields May Be Expensive ℹ️ INFORMATIONAL

Location: src/celeste/models.py:27-42

The computed fields call get_required_input_types() for each capability on every serialization:

@computed_field
@property
def supported_input_types(self) -> dict[Capability, set[InputType]]:
    return {cap: get_required_input_types(cap) for cap in self.capabilities}  # Not cached

Impact:

  • For a model with 5 capabilities, this performs 5 Input class lookups and field introspections per serialization
  • If list_models() returns 50 models, that's 250+ introspection operations

Recommendation (future optimization):
Consider caching if performance becomes an issue using @lru_cache on get_required_input_types().

Note: This is not blocking for the current PR, but worth considering if serialization becomes a bottleneck.

@claude
Copy link
Copy Markdown

claude Bot commented Dec 26, 2025

📝 Best Practices

Good:

  • ✅ Version bump in pyproject.toml is appropriate for a feature addition
  • ✅ Type ignore comments are justified (# type: ignore[prop-decorator] for computed_field)
  • ✅ Proper use of __all__ exports
  • ✅ Consistent naming conventions

Could Improve:

  • Add INPUT_TYPE_MAPPING to module docstring explaining its purpose
  • Consider making _extract_input_type() public if it's useful for extensions

📋 Recommendations

Before Merging (Required):

  1. Add comprehensive test coverage for all new functions and computed fields
  2. Fix get_required_input_types() to handle union types using _extract_input_type()
  3. Add tests verifying JSON serialization includes the new computed fields

Future Improvements (Optional):

  1. Consider stricter type matching in _extract_input_type() to avoid dict/tuple edge cases
  2. Add performance benchmarks if model serialization becomes a hot path
  3. Document the relationship between supported_input_types (from Input classes) vs optional_input_types (from constraints)

🎯 Verdict

Conditional Approval - The architecture and implementation are solid, but test coverage is critical before merging. The potential KeyError in get_required_input_types() should also be addressed.

Once tests are added and the union type handling is fixed, this will be a valuable addition to the API.

Great work on the overall design! The computed field approach provides a clean, discoverable API for consumers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant