feat: Add speech-generation package by Kamilbenkirane · Pull Request #56 · withceleste/celeste-python

Kamilbenkirane · 2025-12-04T15:04:24Z

🎯 Overview

Adds the speech-generation capability package with support for OpenAI, Google Cloud TTS, and ElevenLabs providers.

✨ Features

Unified speech generation interface - Consistent API across all providers
Multi-provider support:
- OpenAI TTS-1 (tts-1, tts-1-hd)
- Google Cloud TTS (multiple voices)
- ElevenLabs (eleven_multilingual_v2, eleven_flash_v2_5, etc.)
Streaming support - ElevenLabs streaming implementation
Voice management - Voice constraints with name/ID mapping
Type-safe - Full type annotations and mypy compliance

🔧 Technical Changes

New package: celeste-speech-generation (v0.2.8)
Root package version bump: celeste-ai (v0.2.12 → v0.2.13)
Fixed input content extraction pattern (explicit kwargs) across all generation capabilities
Added ELEVENLABS_API_KEY to GitHub workflows
Updated CI/CD configuration

✅ Testing

All unit tests passing
Integration tests passing (requires API keys)
CI pipeline green (linting, type checking, security scan, tests)
Coverage: 84.55% (above 80% requirement)

📦 Installation

uv add "celeste-ai[speech-generation]"

🚀 Usage

from celeste import create_client, Capability, Provider

client = create_client(
    capability=Capability.SPEECH_GENERATION,
    provider=Provider.ELEVENLABS,
)

response = await client.generate(text="Hello, world!")
audio = response.content  # AudioArtifact

📋 Checklist

…bs providers - Add complete speech-generation package with unified interface - Support for OpenAI TTS-1, Google Cloud TTS, and ElevenLabs models - Implement streaming support for ElevenLabs - Add voice constraints with name/ID mapping - Fix input content extraction pattern across all generation capabilities - Update root pyproject.toml: bump version to 0.2.13, add speech-generation to optional-dependencies - Add ELEVENLABS_API_KEY to GitHub workflows - Update CI/CD: add speech-generation to mypy checks, exclude conftest files - All tests passing, CI checks green Breaking changes: None (new package)

claude · 2025-12-04T15:07:44Z

Pull Request Review: Speech Generation Package

Overview

This is a well-structured PR that adds speech generation capability to Celeste AI with support for OpenAI, Google Cloud TTS, and ElevenLabs providers. The implementation follows the existing patterns in the codebase and maintains consistency with other capability packages.

✅ Strengths

1. Architecture & Design

Consistent abstraction: The implementation follows the established pattern from text/image/video generation packages, making it easy to maintain
Provider-agnostic API: Unified interface across all providers demonstrates good abstraction
Voice constraint system: The VoiceConstraint class elegantly handles both voice IDs and names, automatically mapping names to IDs (constraints.py:18-39)

2. Code Quality

Type safety: Full type annotations with proper use of Unpack[SpeechGenerationParameters] for kwargs
Error handling: Appropriate validation and error messages throughout
Clean separation: Provider-specific code is properly isolated in provider modules
Documentation: Good docstrings explaining implementation details

3. Testing

Parameterized tests: Smart use of pytest.mark.parametrize to test all providers with the same code (test_generate.py:8-23)
Integration coverage: Both streaming and non-streaming generation tested
Practical assertions: Tests verify actual audio data presence, not just types
Coverage: 84.55% exceeds the 80% requirement

🔍 Issues & Concerns

1. Critical: Inconsistent Input Pattern ✅

The _create_inputs method has been updated to use explicit keyword arguments across all packages. This is good! The PR correctly updates the pattern across all packages to avoid extracting input fields from **parameters. This prevents issues where parameters and inputs could conflict.

2. Potential Bug: ElevenLabs Voice ID Handling ⚠️

In elevenlabs/client.py:85-86 and 149-150:

If _voice_id is not present, it defaults to the first voice without any validation
The underscore prefix suggests this is a workaround for the request body structure
If the parameter mapper fails to set this key, it silently falls back to a default

Recommendation: Add explicit validation that voice was provided by the user

3. Security Concern: API Key Exposure in Error Messages 🔒

In elevenlabs/client.py:175-191, the error includes the full response.request object which may contain the API key in headers.

Recommendation: Be cautious about logging/exposing request objects in error messages and consider sanitizing headers before including in exceptions

4. Performance: Redundant Voice List Construction 💡

In constraints.py:25-30, every constraint validation rebuilds lists/dicts from scratch.

Recommendation: Consider caching as instance attributes or use @cached_property

5. Code Smell: NotImplementedError in Production Code 💭

Both OpenAI and ElevenLabs clients have _parse_content methods that raise NotImplementedError. This violates the Liskov Substitution Principle.

Recommendation: Consider refactoring the base Client class to make _parse_content optional for binary responses

6. Missing: Response Format Validation ⚠️

In elevenlabs/client.py:123, if a user provides an invalid format, it silently defaults to MP3 without warning.

Recommendation: Add validation that the format is supported before making the request

📊 Performance Considerations

Positive:

Streaming support: ElevenLabs streaming is well-implemented for low-latency audio delivery
Async/await: Proper async patterns throughout for non-blocking I/O
Binary handling: Direct binary response handling avoids unnecessary encoding/decoding

Could Improve:

Memory efficiency: Consider streaming large audio responses to disk rather than holding in memory
Connection pooling: Verify httpx client reuses connections across requests
Backpressure: Consider implementing backpressure in streaming to avoid buffering too much audio

🔒 Security Assessment

Positive:

API key handling: Uses SecretStr properly to avoid accidental logging
Input validation: Good validation of user inputs through constraints
No command injection: All inputs properly sanitized before API calls

Concerns:

Error message exposure (mentioned above): Request objects in exceptions may leak sensitive data
Base64 decoding: Google client base64 decode doesnt validate input size - could lead to memory exhaustion with malicious responses

📝 Minor Suggestions

Consistency in voice defaults: Each provider has hardcoded voice defaults. Consider centralizing this configuration.
Documentation: The README is excellent, but could benefit from voice parameter usage examples, streaming examples, and audio format comparison table
Testing: Consider adding unit tests for the VoiceConstraint class, tests for error conditions, and tests for MIME type mapping functions

🎯 Summary

Overall Assessment: This is high-quality work that successfully implements speech generation following established patterns. The code is clean, well-tested, and production-ready with minor improvements needed.

Recommended Actions:

✅ Merge-blocking: None - the critical input pattern fix is already implemented correctly
⚠️ High Priority: Address voice ID defaulting behavior and add validation
💡 Nice-to-have: Address performance optimizations and minor code smells in follow-up PRs

Test Coverage: ✅ 84.55% exceeds requirements
CI/CD: ✅ All checks configured properly
Documentation: ✅ Comprehensive and clear
API Design: ✅ Consistent with existing packages

Great work! This is a solid addition to the Celeste AI framework. The unified API across providers is exactly what users need.

- Remove 8 legacy voices: Rachel, Antoni, Elli, Josh, Arnold, Serena, Sam, Domi - Keep Adam (replacement for Antoni/Arnold) and Bella (not being replaced) - Add Laura (replaces Domi) with confirmed ID: FGY2WhTYpPnrIDTdsKH5 - Add TODO comments for Janet, Peter, Craig, Riley (not yet available in API) - Update tests to use Laura instead of Rachel Breaking change: Users must migrate from legacy voices before ElevenLabs deprecation on Feb 28, 2026.

- Move conftest.py from tests/ to tests/integration_tests/ - Matches structure used in other packages (text-generation, image-generation, video-generation) - Removes need for --exclude '.*conftest\.py' in mypy commands

- Add change detection to publish.yml to only run integration tests for changed packages - Use matrix strategy for parallel integration test execution - Add ELEVENLABS_API_KEY to integration test environment - Remove --exclude '.*conftest\.py' from mypy commands (no longer needed) - Update Makefile typecheck command to match CI workflow

claude · 2025-12-04T15:26:32Z

Code Review: PR #56 - Speech Generation Package

Executive Summary

Overall Assessment: ✅ APPROVE with Minor Recommendations

This PR adds a well-architected speech-generation capability following established patterns. The implementation demonstrates solid engineering practices with type safety, proper abstractions, and consistent patterns across providers. No critical or high-priority issues found - the code is production-ready.

Key Strengths:

✅ Consistent architecture with text/image generation packages
✅ Proper handling of binary audio responses
✅ Excellent type safety with comprehensive annotations
✅ Clean streaming implementation (ElevenLabs)
✅ Well-designed voice constraint system
✅ Good test coverage (84.55%)

Issues Found

Medium Priority (Consider Fixing)

1. OpenAI Client - Response Format Default

Location: packages/speech-generation/src/celeste_speech_generation/providers/openai/client.py:109
Issue: If user explicitly passes response_format="", it silently becomes "mp3"
Recommendation: Use explicit None check instead of falsy check

2. ElevenLabs Client - Private Method Access

Location: packages/speech-generation/src/celeste_speech_generation/providers/elevenlabs/client.py:175
Issue: Accessing _get_client() private method breaks encapsulation
Recommendation: Add public streaming API to HTTPClient or document this pattern

3. Streaming Tests - Missing Finalization Test

Location: packages/speech-generation/tests/integration_tests/test_speech_generation/test_stream.py
Missing: Test for stream cleanup and early exit scenarios

Low Priority (Nice to Have)

Voice Constraint - Case Sensitivity
Google Client - Error Message Context
Google Client - Base64 Decoding error handling
Missing Docstrings on several client methods
Test Coverage Gaps (negative tests, audio format validation)
ElevenLabs Error Handling - use errors='replace'

Security Review

✅ No security concerns found

API keys properly handled with SecretStr
No hardcoded credentials
No injection vulnerabilities
Base64 decoding safe
No blocking I/O in async code

Performance Review

✅ No performance issues

HTTP client pooling used correctly
No unnecessary data copies
Stream concatenation acceptable for typical TTS responses
No blocking operations in async paths

Test Coverage: Good (84.55%)

What's Tested:

✅ Basic generation for all 3 providers
✅ Streaming for ElevenLabs
✅ Response structure validation

What Could Be Added:

⚠️ Negative test cases
⚠️ Edge cases (empty text, long text)
⚠️ Stream cleanup/early exit

Specific PR Items Review

✅ "Fixed input content extraction pattern (explicit kwargs)"

Status: VERIFIED - The explicit kwarg pattern is correctly applied and prevents text from being consumed by **parameters. Consistent with image/text generation packages.

✅ Binary Response Handling

Status: EXCELLENT - Both OpenAI and ElevenLabs cleanly override generate() to handle binary audio with proper MIME type mapping.

✅ Streaming Implementation

Status: GOOD - ElevenLabs streaming properly wraps binary chunks with clean async context management.

✅ Voice Constraints

Status: EXCELLENT - VoiceConstraint class accepts both IDs and names with clear error messages.

Highlights

1. Excellent Documentation in Code
The SpeechGenerationChunk docstring proactively explains why finish_reason is omitted - great design communication!

2. Clean Binary Response Pattern
Overriding generate() for binary responses while keeping _parse_content() for interface compliance is elegant.

3. Voice Constraint Design
Accepting both IDs and names with automatic ID resolution provides great UX.

Final Verdict

✅ APPROVE - Ready to Merge

This is high-quality, production-ready code. All identified issues are minor enhancements that can be addressed in follow-up PRs if desired.

Confidence Level: High | Risk Level: Low | Code Quality: Excellent

Great work on this implementation! 🚀

Review generated with comprehensive analysis of 2089 additions across 43 files

Kamilbenkirane added 3 commits December 4, 2025 16:19

Move conftest.py to integration_tests directory for consistency

60437d3

- Move conftest.py from tests/ to tests/integration_tests/ - Matches structure used in other packages (text-generation, image-generation, video-generation) - Removes need for --exclude '.*conftest\.py' in mypy commands

Kamilbenkirane merged commit 864c83f into main Dec 5, 2025
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add speech-generation package#56

feat: Add speech-generation package#56
Kamilbenkirane merged 4 commits intomainfrom
package/speech-generation

Kamilbenkirane commented Dec 4, 2025

Uh oh!

claude Bot commented Dec 4, 2025

Uh oh!

claude Bot commented Dec 4, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Kamilbenkirane commented Dec 4, 2025

🎯 Overview

✨ Features

🔧 Technical Changes

✅ Testing

📦 Installation

🚀 Usage

📋 Checklist

Uh oh!

claude Bot commented Dec 4, 2025

Pull Request Review: Speech Generation Package

Overview

✅ Strengths

1. Architecture & Design

2. Code Quality

3. Testing

🔍 Issues & Concerns

1. Critical: Inconsistent Input Pattern ✅

2. Potential Bug: ElevenLabs Voice ID Handling ⚠️

3. Security Concern: API Key Exposure in Error Messages 🔒

4. Performance: Redundant Voice List Construction 💡

5. Code Smell: NotImplementedError in Production Code 💭

6. Missing: Response Format Validation ⚠️

📊 Performance Considerations

Positive:

Could Improve:

🔒 Security Assessment

Positive:

Concerns:

📝 Minor Suggestions

🎯 Summary

Uh oh!

claude Bot commented Dec 4, 2025

Code Review: PR #56 - Speech Generation Package

Executive Summary

Issues Found

Medium Priority (Consider Fixing)

Low Priority (Nice to Have)

Security Review

Performance Review

Test Coverage: Good (84.55%)

Specific PR Items Review

✅ "Fixed input content extraction pattern (explicit kwargs)"

✅ Binary Response Handling

✅ Streaming Implementation

✅ Voice Constraints

Highlights

Final Verdict

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant