Skip to content

feat(speech-generation): add Gradium TTS provider with WebSocket API#89

Merged
Kamilbenkirane merged 5 commits intomainfrom
api/gradium_tts
Dec 19, 2025
Merged

feat(speech-generation): add Gradium TTS provider with WebSocket API#89
Kamilbenkirane merged 5 commits intomainfrom
api/gradium_tts

Conversation

@Kamilbenkirane
Copy link
Copy Markdown
Member

Summary

  • Add Gradium as a new speech generation provider using WebSocket-based TTS API
  • Implement 14 flagship voices across 5 languages (EN, FR, DE, ES, PT)
  • Add celeste-gradium provider package with WebSocket client implementation
  • Support for multiple audio output formats (wav, pcm, opus, ulaw, alaw)

Changes

New Provider Package (packages/providers/gradium/)

  • WebSocket-based TTS client (not HTTP REST like other providers)
  • Parameter mappers for voice, speed, output format, JSON config
  • Configuration with DEFAULT_VOICE_ID consistent with ElevenLabs pattern

Capability Integration (packages/capabilities/speech-generation/)

  • Gradium provider client extending both GradiumTextToSpeechClient and SpeechGenerationClient
  • 14 flagship voices: Emma, Lucas, Camille, Antoine, Mia, Lukas, Lucia, Carlos, Sofia, Miguel (+ 4 more)
  • Model definitions for gradium-tts-v1 and gradium-tts-v1-turbo

Core Updates

  • Add Provider.GRADIUM enum value
  • Add Capability.SPEECH_GENERATION to Gradium
  • Add GRADIUM_API_KEY credential support
  • Add WebSocket utility module (src/celeste/websocket.py)

Test plan

  • All existing unit tests pass (316 tests)
  • Coverage at 84.21% (above 80% threshold)
  • MyPy type checking passes
  • Ruff linting/formatting passes
  • Bandit security scan passes
  • Integration test with live Gradium API (requires API key)

didux123 and others added 5 commits December 19, 2025 16:30
  Add comprehensive Gradium text-to-speech integration with WebSocket streaming
  support.

  Features:
  - WebSocket-based TTS streaming with low-latency audio generation
  - 14 flagship voices across 5 languages (en, fr, de, es, pt)
  - Custom voice cloning with create/list/update/delete operations
  - Speed control via padding_bonus parameter (-4.0 to 4.0)
  - Multiple audio formats (wav, pcm, opus, ulaw_8000, alaw_8000, pcm_16000,
  pcm_24000)
  - EU/US regional endpoints for optimized latency
  - Credits monitoring and usage tracking

  Implementation details:
  - WebSocket protocol handling with proper message sequencing
  - Parameter mappers for voice, speed, and response_format
  - Pydantic models for API responses (VoiceInfo, CreditsSummary, TTSResult)
  - Full voice management REST API integration
  - Comprehensive test suite with 6 test functions

  Dependencies:
  - Added websockets>=13.0 to support WebSocket connections

  Documentation:
  - Complete README with usage examples and API reference
  - Test scripts for validation (test_gradium_tts.py, test_gradium_minimal.py)
Add Gradium as a new speech generation provider using their WebSocket-based
Text-to-Speech API. This provider enables high-quality multilingual speech
synthesis with 14 flagship voices across 5 languages.

Key changes:
- Add celeste-gradium provider package with WebSocket TTS client
- Add Gradium capability provider with parameter mappers
- Add WebSocket client module to celeste core
- Add 14 flagship voices (EN, FR, DE, ES, PT)
- Add Gradium integration test and CI secrets
- Bump celeste-ai and celeste-speech-generation to 0.3.1

Provider features:
- WebSocket streaming for low-latency audio generation
- Support for wav, pcm, opus output formats
- Voice selection by name (Emma, Kent, Sydney, etc.)
- Speed control via padding_bonus translation
- Export GRADIUM_VOICES in speech-generation __init__.py
- Add DEFAULT_VOICE_ID to config.py (consistent with ElevenLabs)
- Add _make_request stub to satisfy abstract interface
- Remove unused httpx import from capability client
@claude
Copy link
Copy Markdown

claude Bot commented Dec 19, 2025

PR Review: Add Gradium TTS Provider with WebSocket API

Summary

This PR adds Gradium as a new speech generation provider with WebSocket-based TTS. The implementation is well-structured and follows existing patterns in the codebase. Overall, this is a solid addition with good architectural decisions.


Strengths

Code Quality & Architecture

  • Excellent consistency: The implementation follows the existing provider patterns established by ElevenLabs, Google, and OpenAI
  • Clean separation of concerns: Provider-specific logic in celeste-gradium package, capability integration in speech-generation
  • Good abstraction: The mixin pattern for GradiumTextToSpeechClient is well-designed
  • Proper typing: Type hints are used consistently throughout

WebSocket Implementation

  • Proper protocol handling: The WebSocket flow is implemented correctly with setup → ready → text → audio chunks → end_of_stream
  • Context manager usage: Uses async with for automatic cleanup of WebSocket connections
  • Error handling: Captures and raises errors from the WebSocket stream appropriately

Documentation

  • Clear docstrings: Methods are well-documented with clear explanations
  • Inline comments: Critical sections have helpful comments explaining the flow

⚠️ Issues & Concerns

1. WebSocket Connection Management (Medium Priority)

Location: packages/providers/gradium/src/celeste_gradium/text_to_speech/client.py:64

The implementation creates a new WebSocket connection for each TTS request:

async with ws_connect(url, additional_headers=headers) as ws:

Issues:

  • No connection pooling or reuse
  • Creates overhead for multiple sequential requests
  • Doesn't leverage the WebSocketClient class added in src/celeste/websocket.py

Recommendation: Consider using the WebSocketClient utility for consistency, though creating new connections per request may be acceptable depending on Gradium's API design.


2. Unused WebSocket Utility Module (Low Priority)

Location: src/celeste/websocket.py

This module defines WebSocketClient and WebSocketConnection classes but they're not used anywhere in the Gradium implementation. The Gradium client directly uses websockets.asyncio.client.connect instead.

Questions:

  • Was this intended for future use?
  • Should Gradium use this abstraction?
  • If unused, should it be removed to avoid confusion?

Recommendation: Either use the utility module or remove it if it's not needed yet. If it's for future providers, add a comment explaining that.


3. WebSocket Cleanup Not Exposed (Low Priority)

Location: src/celeste/__init__.py:127

The module exports close_all_ws_clients() but the function in websocket.py:117-119 doesn't actually close any connections - it just clears the registry:

async def close_all_ws_clients() -> None:
    """Close all WebSocket clients and clear registry."""
    _ws_clients.clear()

Issue: This doesn't match the function's documentation. Active WebSocket connections won't be closed, potentially leading to resource leaks.

Recommendation: If WebSocketConnection objects are stored in the registry, iterate through them and call .close() before clearing.


4. Error Handling: JSON Parsing (Medium Priority)

Location: packages/providers/gradium/src/celeste_gradium/text_to_speech/client.py:94-96

if isinstance(message, bytes):
    data = json.loads(message.decode("utf-8"))
else:
    data = json.loads(message)

Issue: No exception handling for:

  • Invalid UTF-8 in bytes
  • Malformed JSON
  • Missing expected keys in the parsed JSON

Recommendation: Wrap in try-except and provide informative error messages:

try:
    if isinstance(message, bytes):
        data = json.loads(message.decode("utf-8"))
    else:
        data = json.loads(message)
except (json.JSONDecodeError, UnicodeDecodeError) as e:
    raise ValueError(f"Invalid message from Gradium API: {e}") from e

5. Test Coverage: No Unit Tests for Gradium (High Priority)

Observation: Only integration tests exist (in test_generate.py:22). No unit tests for:

  • WebSocket flow logic
  • Error handling paths
  • Parameter mapping
  • MIME type mapping

Recommendation: Add unit tests with mocked WebSocket connections to cover:

  • Successful TTS generation
  • Error responses from the API
  • Connection failures
  • Malformed responses
  • Parameter transformation edge cases

6. Hardcoded Region in BASE_URL (Low Priority)

Location: packages/providers/gradium/src/celeste_gradium/text_to_speech/config.py:12

BASE_URL = "wss://eu.api.gradium.ai/api"

Issue: The region is hardcoded to "eu". Users in other regions may experience higher latency.

Recommendation: Consider making the region configurable via environment variable or parameter:

BASE_URL = os.getenv("GRADIUM_REGION", "eu")
url = f"wss://{BASE_URL}.api.gradium.ai/api"

7. Model ID "default" Too Generic (Low Priority)

Location: packages/capabilities/speech-generation/src/celeste_speech_generation/providers/gradium/models.py:12

Model(
    id="default",
    provider=Provider.GRADIUM,
    display_name="Gradium Default TTS",

Issue: Using "default" as the model ID is ambiguous. If Gradium adds more models later, this could cause confusion.

Recommendation: Use a more specific ID like "gradium-tts-v1" (which is mentioned in the PR description but not used in the code).


8. Empty Audio Response Handling (Low Priority)

Location: packages/capabilities/speech-generation/src/celeste_speech_generation/providers/gradium/client.py:65-67

if not audio_bytes:
    msg = "No audio data in response"
    raise ValueError(msg)

Question: What happens if audio_chunks is an empty list? The joined result would be an empty bytes object b"", which is falsy.

Recommendation: This is correct, but consider checking if not audio_chunks earlier in the flow for clearer error messaging.


9. Speed Mapper Formula Could Be Confusing (Low Priority)

Location: packages/capabilities/speech-generation/src/celeste_speech_generation/providers/gradium/parameters.py:50

padding_bonus = (1.0 - validated_value) * 4.0

Issue: The formula is documented but not the reasoning behind the * 4.0 multiplier.

Recommendation: Add a comment explaining why 4.0 is the multiplier:

# Map speed range [0.25, 4.0] to padding_bonus range [12.0, -12.0]
# At speed 0.25 (slowest): padding_bonus = (1.0 - 0.25) * 4.0 = 3.0
# At speed 1.0 (normal): padding_bonus = 0.0
# At speed 2.0 (2x faster): padding_bonus = -4.0
padding_bonus = (1.0 - validated_value) * 4.0

🔒 Security Considerations

✅ Good Security Practices:

  1. API key handling: Properly uses SecretStr and environment variables
  2. No hardcoded credentials: API keys are properly externalized
  3. Header-based auth: Uses standard x-api-key header pattern

⚠️ Minor Concerns:

  1. WebSocket URL: Hardcoded to wss:// (correct for production)
  2. No timeout on WebSocket: Consider adding timeouts to prevent hanging connections
  3. No rate limiting: Consider adding client-side rate limiting if Gradium has rate limits

📊 Performance Considerations

✅ Good Practices:

  • Streams audio chunks incrementally
  • Uses async/await properly throughout
  • Minimal memory overhead with chunk collection

⚠️ Potential Improvements:

  1. Memory usage: All audio chunks are accumulated in memory before returning. For long audio, consider streaming to disk or returning chunks incrementally.
  2. Connection overhead: New WebSocket per request may add latency. Consider connection pooling if making many requests.

🧪 Test Coverage

Current Status:

  • ✅ Integration test added
  • ✅ Overall coverage at 84.21%
  • ❌ No unit tests specifically for Gradium provider

Recommendation: Add unit tests for better coverage of edge cases and error paths.


🎯 Recommended Actions

Priority Order:

  1. High: Add unit tests for Gradium provider
  2. Medium: Improve error handling in JSON parsing
  3. Medium: Address WebSocket connection management
  4. Low: Clarify the purpose of websocket.py utility module
  5. Low: Consider making region configurable
  6. Low: Use more specific model ID instead of "default"

📝 Minor Issues

  1. File in .gitignore: mureka.md is added to .gitignore - appears unrelated to this PR
  2. Voice list comment: PR description mentions "14 flagship voices" but the actual count matches ✓
  3. pyproject.toml: Missing celeste-google dependency was added (good catch!)

Overall Assessment

Rating: 8/10

This is a well-implemented feature that follows the codebase conventions and patterns. The WebSocket implementation is correct, and the integration with the existing capability system is clean. The main areas for improvement are:

  • Adding comprehensive unit tests
  • Improving error handling robustness
  • Clarifying the WebSocket utility module usage

Recommendation: ✅ Approve with minor revisions

The identified issues are mostly minor and don't block merging. Consider addressing the high-priority items (unit tests) in a follow-up PR if time is constrained.

Great work on maintaining consistency with existing providers! 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants