Skip to content

feat: Add speech-generation package#56

Merged
Kamilbenkirane merged 4 commits intomainfrom
package/speech-generation
Dec 5, 2025
Merged

feat: Add speech-generation package#56
Kamilbenkirane merged 4 commits intomainfrom
package/speech-generation

Conversation

@Kamilbenkirane
Copy link
Copy Markdown
Member

🎯 Overview

Adds the speech-generation capability package with support for OpenAI, Google Cloud TTS, and ElevenLabs providers.

✨ Features

  • Unified speech generation interface - Consistent API across all providers
  • Multi-provider support:
    • OpenAI TTS-1 (tts-1, tts-1-hd)
    • Google Cloud TTS (multiple voices)
    • ElevenLabs (eleven_multilingual_v2, eleven_flash_v2_5, etc.)
  • Streaming support - ElevenLabs streaming implementation
  • Voice management - Voice constraints with name/ID mapping
  • Type-safe - Full type annotations and mypy compliance

🔧 Technical Changes

  • New package: celeste-speech-generation (v0.2.8)
  • Root package version bump: celeste-ai (v0.2.12 → v0.2.13)
  • Fixed input content extraction pattern (explicit kwargs) across all generation capabilities
  • Added ELEVENLABS_API_KEY to GitHub workflows
  • Updated CI/CD configuration

✅ Testing

  • All unit tests passing
  • Integration tests passing (requires API keys)
  • CI pipeline green (linting, type checking, security scan, tests)
  • Coverage: 84.55% (above 80% requirement)

📦 Installation

uv add "celeste-ai[speech-generation]"

🚀 Usage

from celeste import create_client, Capability, Provider

client = create_client(
    capability=Capability.SPEECH_GENERATION,
    provider=Provider.ELEVENLABS,
)

response = await client.generate(text="Hello, world!")
audio = response.content  # AudioArtifact

📋 Checklist

  • Package implementation complete
  • All providers implemented (OpenAI, Google, ElevenLabs)
  • Streaming support (ElevenLabs)
  • Tests written and passing
  • CI/CD updated
  • Documentation updated
  • Version bumped
  • Root package updated

…bs providers

- Add complete speech-generation package with unified interface
- Support for OpenAI TTS-1, Google Cloud TTS, and ElevenLabs models
- Implement streaming support for ElevenLabs
- Add voice constraints with name/ID mapping
- Fix input content extraction pattern across all generation capabilities
- Update root pyproject.toml: bump version to 0.2.13, add speech-generation to optional-dependencies
- Add ELEVENLABS_API_KEY to GitHub workflows
- Update CI/CD: add speech-generation to mypy checks, exclude conftest files
- All tests passing, CI checks green

Breaking changes: None (new package)
@claude
Copy link
Copy Markdown

claude Bot commented Dec 4, 2025

Pull Request Review: Speech Generation Package

Overview

This is a well-structured PR that adds speech generation capability to Celeste AI with support for OpenAI, Google Cloud TTS, and ElevenLabs providers. The implementation follows the existing patterns in the codebase and maintains consistency with other capability packages.

✅ Strengths

1. Architecture & Design

  • Consistent abstraction: The implementation follows the established pattern from text/image/video generation packages, making it easy to maintain
  • Provider-agnostic API: Unified interface across all providers demonstrates good abstraction
  • Voice constraint system: The VoiceConstraint class elegantly handles both voice IDs and names, automatically mapping names to IDs (constraints.py:18-39)

2. Code Quality

  • Type safety: Full type annotations with proper use of Unpack[SpeechGenerationParameters] for kwargs
  • Error handling: Appropriate validation and error messages throughout
  • Clean separation: Provider-specific code is properly isolated in provider modules
  • Documentation: Good docstrings explaining implementation details

3. Testing

  • Parameterized tests: Smart use of pytest.mark.parametrize to test all providers with the same code (test_generate.py:8-23)
  • Integration coverage: Both streaming and non-streaming generation tested
  • Practical assertions: Tests verify actual audio data presence, not just types
  • Coverage: 84.55% exceeds the 80% requirement

🔍 Issues & Concerns

1. Critical: Inconsistent Input Pattern

The _create_inputs method has been updated to use explicit keyword arguments across all packages. This is good! The PR correctly updates the pattern across all packages to avoid extracting input fields from **parameters. This prevents issues where parameters and inputs could conflict.

2. Potential Bug: ElevenLabs Voice ID Handling ⚠️

In elevenlabs/client.py:85-86 and 149-150:

  • If _voice_id is not present, it defaults to the first voice without any validation
  • The underscore prefix suggests this is a workaround for the request body structure
  • If the parameter mapper fails to set this key, it silently falls back to a default

Recommendation: Add explicit validation that voice was provided by the user

3. Security Concern: API Key Exposure in Error Messages 🔒

In elevenlabs/client.py:175-191, the error includes the full response.request object which may contain the API key in headers.

Recommendation: Be cautious about logging/exposing request objects in error messages and consider sanitizing headers before including in exceptions

4. Performance: Redundant Voice List Construction 💡

In constraints.py:25-30, every constraint validation rebuilds lists/dicts from scratch.

Recommendation: Consider caching as instance attributes or use @cached_property

5. Code Smell: NotImplementedError in Production Code 💭

Both OpenAI and ElevenLabs clients have _parse_content methods that raise NotImplementedError. This violates the Liskov Substitution Principle.

Recommendation: Consider refactoring the base Client class to make _parse_content optional for binary responses

6. Missing: Response Format Validation ⚠️

In elevenlabs/client.py:123, if a user provides an invalid format, it silently defaults to MP3 without warning.

Recommendation: Add validation that the format is supported before making the request

📊 Performance Considerations

Positive:

  • Streaming support: ElevenLabs streaming is well-implemented for low-latency audio delivery
  • Async/await: Proper async patterns throughout for non-blocking I/O
  • Binary handling: Direct binary response handling avoids unnecessary encoding/decoding

Could Improve:

  • Memory efficiency: Consider streaming large audio responses to disk rather than holding in memory
  • Connection pooling: Verify httpx client reuses connections across requests
  • Backpressure: Consider implementing backpressure in streaming to avoid buffering too much audio

🔒 Security Assessment

Positive:

  • API key handling: Uses SecretStr properly to avoid accidental logging
  • Input validation: Good validation of user inputs through constraints
  • No command injection: All inputs properly sanitized before API calls

Concerns:

  • Error message exposure (mentioned above): Request objects in exceptions may leak sensitive data
  • Base64 decoding: Google client base64 decode doesnt validate input size - could lead to memory exhaustion with malicious responses

📝 Minor Suggestions

  1. Consistency in voice defaults: Each provider has hardcoded voice defaults. Consider centralizing this configuration.

  2. Documentation: The README is excellent, but could benefit from voice parameter usage examples, streaming examples, and audio format comparison table

  3. Testing: Consider adding unit tests for the VoiceConstraint class, tests for error conditions, and tests for MIME type mapping functions

🎯 Summary

Overall Assessment: This is high-quality work that successfully implements speech generation following established patterns. The code is clean, well-tested, and production-ready with minor improvements needed.

Recommended Actions:

  1. Merge-blocking: None - the critical input pattern fix is already implemented correctly
  2. ⚠️ High Priority: Address voice ID defaulting behavior and add validation
  3. 💡 Nice-to-have: Address performance optimizations and minor code smells in follow-up PRs

Test Coverage: ✅ 84.55% exceeds requirements
CI/CD: ✅ All checks configured properly
Documentation: ✅ Comprehensive and clear
API Design: ✅ Consistent with existing packages

Great work! This is a solid addition to the Celeste AI framework. The unified API across providers is exactly what users need.

- Remove 8 legacy voices: Rachel, Antoni, Elli, Josh, Arnold, Serena, Sam, Domi
- Keep Adam (replacement for Antoni/Arnold) and Bella (not being replaced)
- Add Laura (replaces Domi) with confirmed ID: FGY2WhTYpPnrIDTdsKH5
- Add TODO comments for Janet, Peter, Craig, Riley (not yet available in API)
- Update tests to use Laura instead of Rachel

Breaking change: Users must migrate from legacy voices before ElevenLabs deprecation on Feb 28, 2026.
- Move conftest.py from tests/ to tests/integration_tests/
- Matches structure used in other packages (text-generation, image-generation, video-generation)
- Removes need for --exclude '.*conftest\.py' in mypy commands
- Add change detection to publish.yml to only run integration tests for changed packages
- Use matrix strategy for parallel integration test execution
- Add ELEVENLABS_API_KEY to integration test environment
- Remove --exclude '.*conftest\.py' from mypy commands (no longer needed)
- Update Makefile typecheck command to match CI workflow
@claude
Copy link
Copy Markdown

claude Bot commented Dec 4, 2025

Code Review: PR #56 - Speech Generation Package

Executive Summary

Overall Assessment: ✅ APPROVE with Minor Recommendations

This PR adds a well-architected speech-generation capability following established patterns. The implementation demonstrates solid engineering practices with type safety, proper abstractions, and consistent patterns across providers. No critical or high-priority issues found - the code is production-ready.

Key Strengths:

  • ✅ Consistent architecture with text/image generation packages
  • ✅ Proper handling of binary audio responses
  • ✅ Excellent type safety with comprehensive annotations
  • ✅ Clean streaming implementation (ElevenLabs)
  • ✅ Well-designed voice constraint system
  • ✅ Good test coverage (84.55%)

Issues Found

Medium Priority (Consider Fixing)

1. OpenAI Client - Response Format Default

  • Location: packages/speech-generation/src/celeste_speech_generation/providers/openai/client.py:109
  • Issue: If user explicitly passes response_format="", it silently becomes "mp3"
  • Recommendation: Use explicit None check instead of falsy check

2. ElevenLabs Client - Private Method Access

  • Location: packages/speech-generation/src/celeste_speech_generation/providers/elevenlabs/client.py:175
  • Issue: Accessing _get_client() private method breaks encapsulation
  • Recommendation: Add public streaming API to HTTPClient or document this pattern

3. Streaming Tests - Missing Finalization Test

  • Location: packages/speech-generation/tests/integration_tests/test_speech_generation/test_stream.py
  • Missing: Test for stream cleanup and early exit scenarios

Low Priority (Nice to Have)

  1. Voice Constraint - Case Sensitivity
  2. Google Client - Error Message Context
  3. Google Client - Base64 Decoding error handling
  4. Missing Docstrings on several client methods
  5. Test Coverage Gaps (negative tests, audio format validation)
  6. ElevenLabs Error Handling - use errors='replace'

Security Review

No security concerns found

  • API keys properly handled with SecretStr
  • No hardcoded credentials
  • No injection vulnerabilities
  • Base64 decoding safe
  • No blocking I/O in async code

Performance Review

No performance issues

  • HTTP client pooling used correctly
  • No unnecessary data copies
  • Stream concatenation acceptable for typical TTS responses
  • No blocking operations in async paths

Test Coverage: Good (84.55%)

What's Tested:

  • ✅ Basic generation for all 3 providers
  • ✅ Streaming for ElevenLabs
  • ✅ Response structure validation

What Could Be Added:

  • ⚠️ Negative test cases
  • ⚠️ Edge cases (empty text, long text)
  • ⚠️ Stream cleanup/early exit

Specific PR Items Review

✅ "Fixed input content extraction pattern (explicit kwargs)"

Status: VERIFIED - The explicit kwarg pattern is correctly applied and prevents text from being consumed by **parameters. Consistent with image/text generation packages.

✅ Binary Response Handling

Status: EXCELLENT - Both OpenAI and ElevenLabs cleanly override generate() to handle binary audio with proper MIME type mapping.

✅ Streaming Implementation

Status: GOOD - ElevenLabs streaming properly wraps binary chunks with clean async context management.

✅ Voice Constraints

Status: EXCELLENT - VoiceConstraint class accepts both IDs and names with clear error messages.


Highlights

1. Excellent Documentation in Code
The SpeechGenerationChunk docstring proactively explains why finish_reason is omitted - great design communication!

2. Clean Binary Response Pattern
Overriding generate() for binary responses while keeping _parse_content() for interface compliance is elegant.

3. Voice Constraint Design
Accepting both IDs and names with automatic ID resolution provides great UX.


Final Verdict

✅ APPROVE - Ready to Merge

This is high-quality, production-ready code. All identified issues are minor enhancements that can be addressed in follow-up PRs if desired.

Confidence Level: High | Risk Level: Low | Code Quality: Excellent

Great work on this implementation! 🚀


Review generated with comprehensive analysis of 2089 additions across 43 files

@Kamilbenkirane Kamilbenkirane merged commit 864c83f into main Dec 5, 2025
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant