Skip to content

feat(xai): add Grok Imagine for image and video generation#128

Merged
Kamilbenkirane merged 3 commits intomainfrom
xai/imagine
Jan 29, 2026
Merged

feat(xai): add Grok Imagine for image and video generation#128
Kamilbenkirane merged 3 commits intomainfrom
xai/imagine

Conversation

@Kamilbenkirane
Copy link
Member

Summary

Add xAI's Grok Imagine models for image and video generation:

  • grok-imagine-image - AI image generation and editing
  • grok-imagine-video - AI video generation and editing

Features

Images (grok-imagine-image)

Operation Endpoint Description
generate POST /v1/images/generations Generate images from text prompts
edit POST /v1/images/edits Edit existing images with text instructions

Supported Parameters:

Parameter Type Options Description
aspect_ratio Choice 1:1, 3:4, 4:3, 9:16, 16:9, 2:3, 3:2, 9:19.5, 19.5:9, 9:20, 20:9, 1:2, 2:1, auto Image aspect ratio
num_images Range 1-10 Number of images to generate
output_format Choice url, b64_json Response format

Videos (grok-imagine-video)

Operation Endpoint Description
generate POST /v1/videos/generations Generate videos from text prompts
edit POST /v1/videos/edits Edit existing videos with text instructions

Async Polling Pattern:

  • Initial request returns request_id
  • Poll GET /v1/videos/{request_id} until completion
  • HTTP 202 = processing, HTTP 200 = ready

Supported Parameters:

Parameter Type Options Description
duration Range 1-15 seconds Video duration
aspect_ratio Choice 16:9, 4:3, 1:1, 9:16, 3:4, 3:2, 2:3 Video aspect ratio
resolution Choice 720p, 480p Video resolution

Usage

```python
import celeste

Image generation

image = await celeste.images.generate(
prompt="A cat in a tree",
model="grok-imagine-image",
aspect_ratio="16:9",
num_images=1,
)
print(image.content.url)

Image editing

edited = await celeste.images.edit(
image=image.content,
prompt="Add a bird in the tree",
model="grok-imagine-image",
)

Video generation

video = await celeste.videos.generate(
prompt="A ball bouncing",
model="grok-imagine-video",
duration=5,
aspect_ratio="16:9",
)
print(video.content.url)

Video editing

edited = await celeste.videos.edit(
video=video.content,
prompt="Change the ball color to blue",
model="grok-imagine-video",
)
```

Files Changed

Provider-level (HTTP/API handling)

  • `src/celeste/providers/xai/images/` - Images API client, config, parameters
  • `src/celeste/providers/xai/videos/` - Videos API client with async polling

Modality-level (Celeste interface)

  • `src/celeste/modalities/images/providers/xai/` - Images client, models, parameters
  • `src/celeste/modalities/videos/providers/xai/` - Videos client, models, parameters
  • `src/celeste/modalities/videos/io.py` - Added `video` field to `VideoInput` for edit ops

Tests

  • Added xAI to integration tests for images generate/edit and videos generate

Test plan

  • Unit tests pass (433 tests, 82% coverage)
  • Integration test: `pytest -m integration -k xai` for image generation
  • Integration test: `pytest -m integration -k xai` for image editing
  • Integration test: `pytest -m integration -k xai --slow` for video generation
  • Manual test: video editing with generated video URL

🤖 Generated with Claude Code

Add xAI Grok Imagine support for image and video generation:

Images (grok-imagine-image):
- Generate and edit operations
- Parameters: aspect_ratio, num_images, output_format
- Aspect ratios: 1:1, 3:4, 4:3, 9:16, 16:9, 2:3, 3:2, and more

Videos (grok-imagine-video):
- Generate and edit operations
- Async polling pattern (HTTP 200=ready, 202=processing)
- Parameters: duration (1-15s), aspect_ratio, resolution

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@claude
Copy link

claude bot commented Jan 29, 2026

Pull Request Review: xAI Grok Imagine Integration

Overview

This PR adds xAI's Grok Imagine models for image and video generation/editing. The implementation follows established patterns in the codebase and demonstrates solid understanding of the architecture.

✅ Strengths

1. Excellent Architectural Consistency

  • Follows the two-layer mixin pattern consistently across all implementations
  • Provider-level mixins in src/celeste/providers/xai/ handle HTTP/API logic
  • Modality-level clients handle operations cleanly
  • Parameter mappers properly separated and reusable

2. Async Polling Implementation

The video polling implementation is well-designed with proper handling of immediate responses, async polling flow, configurable timeout, and clean error handling.

3. Good Error Handling

  • Raises appropriate exceptions (ValueError, RuntimeError, TimeoutError)
  • Uses descriptive error messages
  • Properly handles missing fields in responses

4. Clean Code Quality

  • Clear docstrings with operation descriptions
  • Type hints throughout
  • Consistent naming conventions
  • Well-organized file structure

🔍 Issues & Recommendations

CRITICAL: Potential Bug in Video Edit Operation

Location: src/celeste/modalities/videos/providers/xai/client.py:28

Issue: This assumes VideoArtifact always has a url attribute, but artifacts can also have data (bytes) or path (local file). This will fail with AttributeError for non-URL artifacts.

Recommendation: Add proper handling for non-URL artifacts or raise a clear error message.

MEDIUM: Image Edit Missing Proper Serialization

Location: src/celeste/modalities/images/providers/xai/client.py:32

Issue: Passing the ImageArtifact object directly to the request. Verify this matches xAI API specification.

MEDIUM: Missing Video Edit Tests

Issue: No automated integration test for video editing. Images has test_edit.py, but videos doesn't.

LOW: Polling Configuration Could Be More Robust

Location: src/celeste/providers/xai/videos/config.py:16-17

5-minute timeout might be insufficient for longer videos. Consider increasing default timeout to 10-15 minutes.

LOW: Missing Type Validation

Add type validation in _parse_content methods to prevent silent failures.

🔒 Security: No Issues Found

  • API keys properly handled via AuthHeader with SecretStr
  • No hardcoded credentials
  • Uses standard auth registration pattern

⚡ Performance

The polling implementation could benefit from exponential backoff to reduce API calls.

📋 Test Coverage

  • ✅ Image generation integration test
  • ✅ Image edit integration test
  • ✅ Video generation integration test
  • ❌ Video edit integration test (missing)
  • ❌ Unit tests for edge cases (missing)

🎯 Action Items

Before Merge (MUST FIX)

  1. Fix video edit to handle non-URL artifacts or raise clear error
  2. Verify image edit serialization matches xAI API spec
  3. Add video edit integration test

Recommended (SHOULD FIX)

  1. Increase polling timeout or make configurable
  2. Add type validation in _parse_content methods
  3. Consider exponential backoff for polling

Optional (NICE TO HAVE)

  1. Add unit tests for edge cases
  2. Strip notebook execution metadata
  3. Document timeout/retry behavior

🎉 Conclusion

This is a solid implementation that follows best practices and integrates cleanly with the existing codebase. The main concerns are around edge case handling for different artifact types in edit operations and missing test coverage for video editing.

Recommendation: Approve with changes requested ✅ (after fixing the critical issues)

Great work on maintaining consistency with the codebase patterns! 🚀

Kamilbenkirane and others added 2 commits January 29, 2026 22:48
- Image edit: serialize ImageArtifact to URL or base64 string instead
  of passing object directly (xAI API expects string)
- Video edit: validate video has URL before using, raise clear error
  if not (xAI only supports URL, not base64/path)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Let the xAI API handle validation for video edit requests.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@claude
Copy link

claude bot commented Jan 29, 2026

PR Review: xAI Grok Imagine Integration

Summary

This PR adds comprehensive support for xAI's Grok Imagine models for image and video generation. The implementation follows the existing codebase patterns well and integrates cleanly with the Celeste architecture.

✅ Strengths

Architecture & Design

  • Excellent Pattern Consistency: The implementation closely follows existing provider patterns (OpenAI, Google, etc.), making it easy to understand and maintain
  • Proper Layering: Clean separation between provider-level HTTP/API handling and modality-level Celeste interfaces
  • Async Polling Pattern: The video client implements async polling correctly with configurable timeouts (5s intervals, max 60 polls = 5 minutes)
  • Mixin Architecture: Proper use of mixins for shared functionality across modalities

Code Quality

  • Type Safety: Good use of type hints throughout (Unpack[ImageParameters], dict[str, Any], etc.)
  • Error Handling: Proper validation and error messages for missing data
  • Documentation: Clear docstrings explaining the purpose and behavior of classes and methods
  • DRY Principle: Parameter mappers properly inherit from base classes and avoid duplication

🔍 Issues & Recommendations

🐛 Critical: Video Polling Logic Bug

Location: src/celeste/providers/xai/videos/client.py:108-113

# Parse response for error handling
video_obj = status_response.json()
status = video_obj.get("status", "")
if status == config.STATUS_FAILED:
    error = video_obj.get("error", "Video generation failed")
    raise RuntimeError(error)

Issue: This code is unreachable because:

  1. Line 98: self._handle_error_response(status_response) is called first
  2. Line 101: If status is 200, we return immediately
  3. Line 105: If status is 202, we continue the loop
  4. Lines 108-113 can only be reached if status is neither 200 nor 202, but _handle_error_response would have already raised an exception for non-2xx status codes

Recommendation: Either remove the unreachable code or restructure the logic:

# Option 1: Remove unreachable code
if status_response.status_code == 200:
    return status_response.json()
if status_response.status_code == 202:
    continue
# _handle_error_response will handle any other status codes
self._handle_error_response(status_response)

⚠️ Potential Issues

1. Race Condition in Video Polling

Location: src/celeste/providers/xai/videos/client.py:91-92

The polling starts immediately without checking if the initial response already has a URL:

# Poll for completion
poll_endpoint = f"/v1/videos/{request_id}"
for _ in range(config.MAX_POLLS):
    await asyncio.sleep(config.POLL_INTERVAL)  # Sleeps before first check

Recommendation: Check status immediately before sleeping:

for _ in range(config.MAX_POLLS):
    status_response = await self.http_client.get(...)
    self._handle_error_response(status_response)
    
    if status_response.status_code == 200:
        return status_response.json()
    if status_response.status_code != 202:
        break
    
    await asyncio.sleep(config.POLL_INTERVAL)

2. Import Location Inconsistency

Location: src/celeste/modalities/images/providers/xai/client.py:339

Base64 is imported inside the method:

def _parse_content(self, response_data: dict[str, Any], **parameters) -> ImageArtifact:
    b64_json = image_data.get("b64_json")
    if b64_json:
        import base64  # Imported here

Recommendation: Move to top-level imports for consistency with OpenAIImagesClient (line 3).

3. Missing MIME Type for Video Artifacts

Location: src/celeste/modalities/videos/providers/xai/client.py:584

Unlike OpenAI's video client which specifies VideoMimeType.MP4, xAI videos don't specify a MIME type:

return VideoArtifact(url=url)  # Missing mime_type

Recommendation: Add MIME type if known, or document why it's not needed for URL-based artifacts.

4. Timeout Configuration Hardcoded

Location: src/celeste/providers/xai/videos/config.py:89-90

MAX_POLLS = 60
POLL_INTERVAL = 5  # seconds

Recommendation: Consider making these configurable through environment variables or client initialization for users who need longer/shorter timeouts.

📝 Minor Improvements

1. Notebook Cell Formatting

The changes to notebooks/working-with-images.ipynb reorder JSON fields (moving execution_count before outputs). While this works, it's not directly related to the xAI feature.

Recommendation: Consider separating notebook formatting changes into a separate commit or reverting if not necessary.

2. Error Messages

Generic error messages could be more specific:

  • Line 349 (images): "No image URL or base64 data in response" - could indicate which format was expected based on output_format parameter
  • Line 1051 (videos): "No video URL in response" - could show the actual response structure for debugging

3. Video Edit Input Validation

Location: src/celeste/modalities/videos/providers/xai/client.py:540-542

if inputs.video is not None:
    request["video"] = {"url": inputs.video.url}

Potential Issue: No validation that inputs.video.url is not None. If a VideoArtifact is created with data but no URL, this will fail.

Recommendation: Add validation:

if inputs.video is not None:
    if not inputs.video.url:
        raise ValueError("Video editing requires a video URL")
    request["video"] = {"url": inputs.video.url}

🔒 Security Assessment

No significant security issues found

  • Credentials properly managed through the existing XAI_API_KEY environment variable
  • No sensitive data logged or exposed
  • Base64 decoding uses built-in library (safe)
  • URL handling doesn't introduce injection vulnerabilities
  • Async sleep durations are reasonable (5s intervals, not susceptible to timing attacks)

🧪 Test Coverage

Current Coverage

  • ✅ Integration tests added for images generate/edit
  • ✅ Integration tests added for videos generate
  • ❌ Video editing integration test marked as manual (unchecked in test plan)

Recommendations

  1. Add Video Edit Test: The video edit functionality is implemented but not tested in integration tests
  2. Add Unit Tests: Consider adding unit tests for:
    • Parameter mappers
    • Response parsing logic
    • Error handling paths
    • Polling timeout behavior
  3. Edge Cases: Test scenarios like:
    • Immediate video availability (no polling needed)
    • Polling timeout
    • Invalid response formats
    • Missing URLs/base64 data

Test Coverage Impact

The PR mentions 82% overall coverage with 433 tests passing, which is good. However, the new xAI code paths may not be fully covered by existing tests.

📊 Performance Considerations

Video Generation Timeout

  • Current: 5 minutes maximum (60 polls × 5 seconds)
  • Assessment: Reasonable for video generation, but could be too short for longer videos (15s videos at max duration)
  • Recommendation: Consider increasing MAX_POLLS to 120 (10 minutes) or making it model-aware based on requested duration

Polling Efficiency

  • Current: Fixed 5-second intervals
  • Consideration: Could implement exponential backoff to reduce API calls
  • Impact: Not critical given the 5-minute timeout window

✅ Final Verdict

Overall Assessment: APPROVE with minor changes recommended

This is a solid implementation that:

  • ✅ Follows existing patterns consistently
  • ✅ Has good code quality and documentation
  • ✅ Includes reasonable test coverage
  • ✅ Has no security concerns
  • ⚠️ Has one critical bug (unreachable error handling code)
  • ⚠️ Has a few minor issues that should be addressed

Required Changes

  1. Fix the unreachable error handling code in video polling logic

Recommended Changes

  1. Fix race condition in video polling (check before first sleep)
  2. Move base64 import to top-level
  3. Add video edit integration test
  4. Add input validation for video edit operations
  5. Consider making timeout configurable

Optional Improvements

  1. Add MIME type to video artifacts
  2. Improve error messages with more context
  3. Add unit tests for edge cases
  4. Consider exponential backoff for polling

Great work on this integration! 🚀


Review performed by Claude Code PR Review

@Kamilbenkirane Kamilbenkirane merged commit 0ca70d0 into main Jan 29, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant