Skip to content

[Feature]: Replace jsonschema with marshmallow for validation #5

@amannocci

Description

@amannocci

🚀 Feature Summary

Replace the current jsonschema dependency with marshmallow for data validation across the Terranova project.

🤔 Problem Statement

The project currently uses jsonschema (v4.23.0+) for JSON schema validation, specifically in the manifest validation logic. While jsonschema is a solid choice for JSON Schema validation, marshmallow offers:

  1. Better integration with Python dataclasses: The project already uses dataclasses extensively with the @serde decorator for data serialization/deserialization
  2. Simpler validation API: Marshmallow provides a more Pythonic and intuitive interface compared to jsonschema
  3. Built-in field validation: Direct validation methods tied to specific fields rather than generic schema validation
  4. Schema composition: Better support for nested and composed schemas, reducing boilerplate
  5. Error messages: More structured and user-friendly validation error messages
  6. Dependency consolidation: Potential to leverage marshmallow alongside existing serialization patterns

💡 Proposed Solution

  1. Identify all jsonschema usages: Currently used in src/terranova/resources.py for manifest validation (imports on lines 28-29, usage on lines 186-188)

  2. Create marshmallow schemas: Convert existing JSON schemas to marshmallow schema definitions, maintaining the same validation rules

  3. Update validation logic: Replace jsonschema.validators.validate() calls with marshmallow's load/validate methods

  4. Update dependencies:

    • Remove jsonschema>=4.23.0 from pyproject.toml
    • Add marshmallow>=3.20.0 (or latest stable version)
  5. Update imports: Change imports from jsonschema to marshmallow in affected files

  6. Maintain backward compatibility: Ensure validation behavior remains identical - same validation rules, same error handling

  7. Add tests: Ensure existing test coverage covers all validation scenarios

🔄 Alternatives Considered

  • Keep jsonschema: Maintain status quo, but misses opportunity for better integration with existing code patterns
  • Use Pydantic: Another popular validation library, but would require more significant refactoring
  • Use attrs: Similar to dataclasses but adds significant complexity without clear benefit

📈 Impact

  • Code simplification: Reduce dependency count and align validation with dataclass usage patterns
  • Improved maintainability: More consistent with project architecture (already using serde and dataclasses)
  • Better error handling: More detailed and structured validation errors
  • Future-proofing: Marshmallow integrates better with modern Python ecosystem tools

📝 Acceptance Criteria

  • All jsonschema imports removed from codebase
  • Marshmallow schemas created for all validation scenarios
  • Manifest validation works identically to current implementation
  • All existing tests pass without modification
  • New tests added for marshmallow validation
  • Dependencies updated in pyproject.toml
  • Documentation updated if needed
  • No breaking changes to public APIs

📝 Additional Context

Current Usage:

  • File: src/terranova/resources.py
  • Method: ResourcesManifest.load()
  • Validation: JSON schema validation of manifest data
  • Exception Handling: ValidationError from jsonschema.exceptions

Implementation Notes:

  • The project uses pyserde for serialization/deserialization
  • Dataclasses are decorated with @serde
  • Error handling should preserve exception types or map them appropriately
  • Consider potential performance implications of the migration

Related: This change should be coordinated with the existing serialization strategy using the @serde decorator and pyserde package.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions