Skip to content

Integrate Improved Structured Outputs in the Gemini API#48

Merged
Kamilbenkirane merged 1 commit intomainfrom
fix/gemini-list-schema-defs
Nov 19, 2025
Merged

Integrate Improved Structured Outputs in the Gemini API#48
Kamilbenkirane merged 1 commit intomainfrom
fix/gemini-list-schema-defs

Conversation

@Kamilbenkirane
Copy link
Copy Markdown
Member

This PR integrates the improved structured outputs features announced in the Gemini API blog post (Nov 2025).

Changes

  • Fix parameter name: Updated from responseSchema to responseJsonSchema (correct API parameter name)
  • Remove dead code: Removed _resolve_refs() method since the API now supports `` natively
  • Fix list schemas: Preserve at root level when creating array schemas from `list[BaseModel]` to support references
  • Support expanded features: The implementation now supports:
    • anyOf for conditional structures (Unions)
    • `` for recursive schemas (native support)
    • minimum and maximum for numeric constraints
    • additionalProperties and type: 'null'
    • prefixItems for tuple-like arrays

Related

  • [CEL-139] Improving Structured Outputs in the Gemini API

- Update to use responseJsonSchema parameter (correct API name)
- Remove  resolution code - API now supports  natively
- Preserve  at root level for list schemas to support  references
- Support expanded JSON Schema features: anyOf, minimum/maximum, additionalProperties, prefixItems
@claude
Copy link
Copy Markdown

claude Bot commented Nov 19, 2025

PR Review: Integrate Improved Structured Outputs in the Gemini API

Summary

This PR makes critical updates to align with Google's improved structured outputs API. The changes are well-motivated and the code quality is generally good.

✅ Positive Changes

1. Correct API Parameter Name (parameters.py:117)

  • Fixed responseSchema → responseJsonSchema
  • Essential for compatibility with the updated Gemini API

2. Simplified Implementation

  • Removed the entire _resolve_refs() method (54 lines!)
  • Google API now natively supports $ref
  • Significantly reduces complexity and improves maintainability

3. Proper $defs Handling for Arrays (parameters.py:143-148)

  • Correctly extracts $defs from nested schema and moves to root level
  • JSON Schema spec requires $defs at root - this implementation is correct
  • Handles list[BaseModel] case properly

🔍 Code Quality Observations

Good Practices

  1. Clear documentation updates - Docstrings accurately reflect changes
  2. Type hints maintained throughout
  3. Recursive helpers preserved and working correctly

Areas for Consideration

1. Test Coverage 🔴 HIGH PRIORITY

Issue: No unit tests found for the OutputSchemaMapper class

  • Changes affect critical schema transformation logic
  • No tests verify $defs extraction works correctly
  • No tests validate responseJsonSchema parameter is set properly
  • No regression tests to ensure $ref is preserved

Recommendation: Add test cases for:

  • Basic BaseModel → schema conversion
  • list[BaseModel] with $defs extraction
  • $ref preservation (not resolved)
  • _uppercase_types() behavior
  • _remove_unsupported_fields() removes title
  • Integration test with actual Gemini API

2. Potential Edge Cases 🟡 MEDIUM PRIORITY

What if items_schema has nested $defs inside properties? What if there are $defs name collisions between different models? Should there be validation that defs is a dict?

Example: If BaseModel has nested models, list[Person] where Person contains Address should work, but needs testing.

3. Missing Validation 🟡 MEDIUM PRIORITY

No validation that:

  • output_schema is actually a BaseModel or valid type annotation
  • The generated schema is valid JSON Schema
  • $defs dictionary is not empty before adding to root schema

4. Documentation Gap 🟡 MEDIUM PRIORITY

  • No link to the Google API blog post mentioned in PR description
  • No comment explaining WHY $defs must be at root level
  • No examples of new supported features (anyOf, minimum, maximum, etc.)

🔒 Security Analysis

✅ No security concerns identified

  • No user input directly interpolated
  • Schema generation uses Pydantic built-in methods
  • No SQL, command injection, or XSS vulnerabilities

⚡ Performance Considerations

✅ Performance improved

  • Removed expensive recursive _resolve_refs() method
  • Simpler dict comprehensions are more efficient
  • Overall: ~54 lines removed, logic simplified

📋 Recommendations Summary

Priority Item Action
🔴 HIGH Add unit tests Create tests for OutputSchemaMapper
🟡 MEDIUM Add edge case handling Handle nested $defs, name collisions
🟡 MEDIUM Add input validation Validate output_schema type
🟡 MEDIUM Improve documentation Add API reference links

✅ Approval Status

Code Quality: ✅ Good
Logic: ✅ Correct
Security: ✅ Safe
Performance: ✅ Improved
Tests: ❌ Missing

Recommendation: Approve with suggestions - The core changes are solid and necessary. However, adding tests would significantly increase confidence in the implementation, especially for the $defs extraction logic.

Great work on simplifying this implementation! The removal of _resolve_refs() is a significant improvement.

@Kamilbenkirane Kamilbenkirane merged commit 36cfff5 into main Nov 19, 2025
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant