Skip to content

refactor: migrate Anthropic to native structured outputs API and remove raw_event metadata#40

Merged
Kamilbenkirane merged 1 commit intomainfrom
refactor/anthropic-structured-outputs-and-remove-raw-event-metadata
Nov 16, 2025
Merged

refactor: migrate Anthropic to native structured outputs API and remove raw_event metadata#40
Kamilbenkirane merged 1 commit intomainfrom
refactor/anthropic-structured-outputs-and-remove-raw-event-metadata

Conversation

@Kamilbenkirane
Copy link
Copy Markdown
Member

This PR includes two major improvements:

1. Anthropic Structured Outputs Migration

  • Migrate from tool-based structured outputs to Anthropic's native structured outputs API
  • Use output_format parameter with json_schema type instead of tools/tool_choice
  • Simplify streaming implementation by removing tool_use block tracking
  • Remove complex tool_use parsing logic (203 lines removed from streaming.py)
  • Add anthropic-beta header for structured-outputs-2025-11-13 feature
  • Update OutputSchemaMapper to use native output_format instead of tools
  • Remove OUTPUT_SCHEMA constraint from model definitions (now handled via API)

2. Remove raw_event Metadata

  • Remove metadata={"raw_event": event} from all streaming chunk creations
  • Violates metadata principle: metadata should not contain content fields
  • Reduces memory usage by not duplicating event data
  • Affects: OpenAI, Anthropic, XAI, Cohere, Mistral, Google text generation providers
  • Also removes from image intelligence and speech generation providers
  • ElevenLabs: removed raw_event wrapper but kept content_length metadata

Benefits

  • Cleaner codebase with 313 lines removed
  • Better adherence to metadata design principles
  • Reduced memory footprint in streaming scenarios
  • Simpler Anthropic integration using official API

Testing

  • ✅ All CI checks passed (linting, formatting, type checking, security, tests)
  • ✅ 288 tests passed with 84% coverage

…ve raw_event metadata

This commit includes two major improvements:

1. Anthropic Structured Outputs Migration:
   - Migrate from tool-based structured outputs to Anthropic's native structured outputs API
   - Use output_format parameter with json_schema type instead of tools/tool_choice
   - Simplify streaming implementation by removing tool_use block tracking
   - Remove complex tool_use parsing logic (203 lines removed from streaming.py)
   - Add anthropic-beta header for structured-outputs-2025-11-13 feature
   - Update OutputSchemaMapper to use native output_format instead of tools
   - Remove OUTPUT_SCHEMA constraint from model definitions (now handled via API)

2. Remove raw_event Metadata:
   - Remove metadata={"raw_event": event} from all streaming chunk creations
   - Violates metadata principle: metadata should not contain content fields
   - Reduces memory usage by not duplicating event data
   - Affects: OpenAI, Anthropic, XAI, Cohere, Mistral, Google text generation providers
   - Also removes from image intelligence and speech generation providers
   - ElevenLabs: removed raw_event wrapper but kept content_length metadata

Benefits:
- Cleaner codebase with 313 lines removed
- Better adherence to metadata design principles
- Reduced memory footprint in streaming scenarios
- Simpler Anthropic integration using official API
@claude
Copy link
Copy Markdown

claude Bot commented Nov 16, 2025

Pull Request Review: Migration to Anthropic Native Structured Outputs API

This PR represents a significant refactoring that modernizes the Anthropic integration and improves code quality. Overall, the changes are well-executed with strong benefits. Here's my detailed review:


✅ Code Quality and Best Practices

Excellent Improvements:

  1. Massive Code Simplification: Removing 313 lines while adding only 70 is impressive - this represents a 77% reduction in changed areas
  2. Cleaner Architecture: Migration from tool-based workaround to native API is the right approach
  3. Better Separation of Concerns: The new _ensure_additional_properties method properly handles schema requirements
  4. Proper Error Handling: Removed complex tool_use parsing logic that was error-prone
  5. Good Documentation: Docstrings are clear and explain the new approach well

Minor Observations:

  • packages/text-generation/src/celeste_text_generation/providers/anthropic/parameters.py:156-182: The _ensure_additional_properties method recursively processes schemas - consider adding a depth limit or cycle detection for pathological cases (though unlikely in practice)
  • packages/text-generation/src/celeste_text_generation/providers/anthropic/parameters.py:94: Schema conversion happens on every request - for frequently-used schemas, consider memoization/caching

🐛 Potential Bugs or Issues

Critical Items to Verify:

  1. Breaking API Change (packages/text-generation/src/celeste_text_generation/providers/anthropic/config.py:18):

    • The beta version structured-outputs-2025-11-13 is quite new
    • Question: Has this been tested with production API keys? Is there a rollback plan if Anthropic deprecates this beta version?
    • Suggestion: Document version requirements and add monitoring for API deprecation warnings
  2. Response Content Parsing (packages/text-generation/src/celeste_text_generation/providers/anthropic/client.py:68-74):

    text_content = ""
    for content_block in content:
        if content_block.get("type") == "text":
            text_content = content_block.get("text") or ""
            break
    • With native structured outputs, the response content structure changes from tool_use blocks to direct JSON in text blocks
    • Concern: The code only takes the first text block and breaks. What if there are multiple text blocks? What if structured output is split across blocks?
    • Recommendation: Add validation that only one text block exists when output_schema is set, or concatenate all text blocks
  3. parse_output Type Handling (packages/text-generation/src/celeste_text_generation/providers/anthropic/parameters.py:121-126):

    if isinstance(content, dict):
        parsed_json = content
    else:
        parsed_json = json.loads(content)
    • With native structured outputs, content should always be a string (JSON text)
    • Concern: The isinstance(content, dict) branch may be dead code now, or could mask issues if called incorrectly
    • Recommendation: Add a comment explaining when dict vs string occurs, or remove dead code path

⚡ Performance Considerations

Improvements:

  1. Reduced Memory Footprint: Removing raw_event metadata duplication is excellent - this was wasteful and violated design principles
  2. Simpler Streaming: Removing 200 lines of tool_use tracking significantly reduces streaming overhead
  3. Fewer Allocations: No longer building intermediate tool definition objects

Potential Optimizations:

  1. Schema Resolution (parameters.py:184-237): The _resolve_refs method recursively processes the entire schema on every request

    • For static schemas (most use cases), this is repeated work
    • Suggestion: Cache resolved schemas keyed by schema hash/id
  2. Nested Dictionary Copies (parameters.py:161): schema = schema.copy() creates shallow copies

    • This is correct, but the nested dictionary updates in properties/items create new dicts
    • Performance impact is minimal for typical schemas but could be noticeable for very large nested schemas

🔒 Security Concerns

Good Practices:

  1. API Key Handling: Unchanged, continues to use SecretStr properly
  2. No Arbitrary Code Execution: Removed tool-based approach eliminates potential tool injection vectors

Observations:

  1. Schema Validation: The new approach relies on Anthropic's server-side validation of additionalProperties: false

    • This is correct, but worth noting that malformed schemas will fail at runtime rather than client-side
    • Impact: Low - API will return clear errors
  2. Beta Feature Header (config.py:18): Using a beta API feature in production

    • Recommendation: Add monitoring/alerting for unexpected API responses
    • Recommendation: Document the beta status clearly for users

🧪 Test Coverage

Concerns:

  1. No New Tests: The PR description mentions "288 tests passed with 84% coverage" but doesn't show new tests for the migration

    • Critical: Were existing tests updated to verify native structured outputs work correctly?
    • Question: Are there integration tests that verify:
      • Structured outputs with BaseModel
      • Structured outputs with list[BaseModel]
      • Streaming with structured outputs
      • The new beta header is sent correctly
      • Error handling when beta feature is not available
  2. Regression Testing:

    • The old tool-based approach is completely removed
    • Recommendation: Ensure tests verify backward compatibility for clients using output_schema
  3. Edge Cases:

    • Empty responses
    • Malformed JSON from API
    • Nested complex schemas with $ref, anyOf, allOf
    • Very large schemas

📋 Specific Recommendations

High Priority:

  1. Add integration test coverage for native structured outputs (both streaming and non-streaming)
  2. Verify the response parsing logic handles all content block scenarios with structured outputs
  3. Document the beta API dependency and version requirements in README/docs
  4. Add error handling for cases where the beta feature might not be available

Medium Priority:

  1. Consider caching resolved schemas to improve performance for repeated requests
  2. Add validation that content has expected structure when output_schema is set
  3. Document migration path for any breaking changes in streaming chunk metadata

Low Priority:

  1. Clean up potentially dead code path in parse_output (dict vs string handling)
  2. Add depth limit to _ensure_additional_properties recursive calls
  3. Consider adding telemetry to track native structured output usage vs failures

🎯 Summary

Overall Assessment: ✅ Approve with minor recommendations

This is a high-quality refactoring that:

  • Removes significant technical debt (313 lines removed)
  • Adopts the official API approach (better long-term)
  • Improves memory efficiency (no raw_event duplication)
  • Simplifies the codebase (especially streaming)

Main concerns:

  1. Test coverage for the new implementation needs verification
  2. Beta API dependency should be documented and monitored
  3. Response parsing logic should be validated for edge cases

Recommendation: Merge after addressing the test coverage question and verifying production API compatibility.

Great work on this refactoring! The code is much cleaner and more maintainable. 🎉

@Kamilbenkirane Kamilbenkirane merged commit a55ccc3 into main Nov 16, 2025
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant