-
Notifications
You must be signed in to change notification settings - Fork 517
Add Modal orchestrator with step operator and orchestrator flavors #3733
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
- Adds new Modal orchestrator flavor for serverless pipeline execution - Implements optimized execution modes: pipeline (default) and per_step - Supports GPU/CPU resource configuration with intelligent defaults - Features persistent apps with warm containers for fast execution - Includes comprehensive documentation and examples - Simplifies execution model by removing redundant single_function mode 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
Documentation Link Check Results✅ Absolute links check passed |
✅ Branch tenant has been deployed! Access it at: https://staging.cloud.zenml.io/workspaces/feature-modal-orchestrator/projects |
elif ( | ||
log_age < 300 | ||
): # Only show logs from last 5 minutes | ||
# This log is recent enough to likely be ours | ||
logger.info(f"{log_msg}") | ||
# Else: skip old logs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This bit seems maybe like it could use more attention. Also are you sure that we wouldn't have time zone mismatches here etc?
else: | ||
# Fallback to first step's resource settings if no pipeline-level resources | ||
if deployment.step_configurations: | ||
first_step = list(deployment.step_configurations.values())[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the first step is unrepresentative, though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should check / just make this really explicit in the docs that this is how we assume this
# Register the orchestrator with explicit credentials | ||
zenml orchestrator register <ORCHESTRATOR_NAME> \ | ||
--flavor=modal \ | ||
--token=<MODAL_TOKEN> \ | ||
--workspace=<MODAL_WORKSPACE> \ | ||
--synchronous=true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should use --token-id
and --token-secret
separately as per the code?
### Authentication with different environments | ||
|
||
For production deployments, you can specify different Modal environments: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe could have a little info box in this section (or maybe even above, linking down here) to say that you might want to have two different stacks, each associated with a different modal environment, one for prod and the other for development etc etc.
log_stream_active.set() | ||
start_time = time.time() | ||
|
||
def stream_logs() -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function in function smells a bit wrong and also wondering if we should instead use their Python SDK to stream the logs? https://github.com/modal-labs/modal-client/blob/4177d0b994ac69e01ada7d7a96655c9dcaae570e/modal/cli/utils.py#L24
Possibly something for down the line, though the func-in-func seems off.
if TYPE_CHECKING: | ||
from zenml.integrations.modal.flavors.modal_orchestrator_flavor import ( | ||
ModalOrchestratorConfig, | ||
ModalOrchestratorSettings, | ||
) | ||
|
||
from zenml.integrations.modal.flavors.modal_orchestrator_flavor import ( | ||
ModalExecutionMode, | ||
) | ||
|
||
if TYPE_CHECKING: | ||
from zenml.models import PipelineDeploymentResponse, PipelineRunResponse | ||
from zenml.models.v2.core.pipeline_deployment import PipelineDeploymentBase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
combine the 'if TYPE_CHECKING' parts?
|
||
The Modal orchestrator supports two execution modes: | ||
|
||
1. **`pipeline` (default)**: Runs the entire pipeline in a single Modal function for maximum speed and cost efficiency |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand why this pipeline
option is max speed. Isn't it running everything sequentially in the same container? Wouldn't running things in parallel in separate Modal function calls run faster?
src/zenml/integrations/modal/flavors/modal_orchestrator_flavor.py
Outdated
Show resolved
Hide resolved
Using the ZenML `modal` integration, you can orchestrate and scale your ML pipelines on [Modal's](https://modal.com/) serverless cloud platform with minimal setup and maximum efficiency. | ||
|
||
The Modal orchestrator is designed for speed and cost-effectiveness, running entire pipelines in single serverless functions to minimize cold starts and optimize resource utilization. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe some representative screenshot of the Modal UI in here to make the docs a bit friendlier?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think its fine without
- Extract nested log streaming function into ModalLogStreamer class for better code organization - Remove unreliable timezone-based log filtering that could miss logs due to clock skew - Implement smarter resource fallback: use highest requirements across all steps instead of potentially unrepresentative first step - Add logging for resource selection decisions to improve debugging - Fix function-in-function code smell identified in PR review 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
- Combine duplicate TYPE_CHECKING blocks into single import section - Improve import organization and reduce redundancy - Maintain all existing functionality while improving code structure 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
- Import MODAL_ORCHESTRATOR_FLAVOR constant from central location to avoid duplication - Update requirements to modal>=1 after testing compatibility with both orchestrator and step operator - Remove unnecessary utils import that was only for mypy discovery - Maintain consistent import patterns across Modal integration files 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
Based on PR review feedback: - Fix token authentication examples to use --token-id and --token-secret - Add "When NOT to use it" section with clear tradeoffs and alternatives - Add info boxes for environment separation best practices and cost implications - Document Modal vs Step Operator differences with usage recommendations - Add GPU base image requirements and CUDA compatibility warnings - Clarify execution modes: "pipeline" mode reduces overhead vs enables parallelism - Document resource fallback behavior and warming window defaults - Add container warming cost implications with specific guidance - Remove tracking pixel per review request - Improve overall documentation clarity and completeness 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
|
||
# Pass pipeline run ID for proper isolation (following other orchestrators' pattern) | ||
if placeholder_run: | ||
environment["ZENML_PIPELINE_RUN_ID"] = str(placeholder_run.id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@schustmi can i do this? Does this work?
src/zenml/integrations/modal/orchestrators/modal_orchestrator_entrypoint.py
Outdated
Show resolved
Hide resolved
ZenML CLI Performance Comparison (Threshold: 1.0s, Timeout: 60s, Slow: 5s)❌ Failed Commands on Current Branch (feature/modal-orchestrator)
🚨 New Failures IntroducedThe following commands fail on your branch but worked on the target branch:
Performance Comparison
Summary
Environment Info
|
with modal.enable_output(): | ||
# Create sandbox with the entrypoint command | ||
# Note: Modal sandboxes inherit environment from the image | ||
sb = await modal.Sandbox.create.aio( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is async interface used to create sandboxes? instead of just modal.Sandbox.create
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- await modal.Sandbox.create.aio() properly integrates with the asyncio event loop
- modal.Sandbox.create() would make blocking system calls that don't yield control back to the event
In this specific case, since we're using asyncio.run() and only creating one sandbox at a time in
pipeline mode, the practical difference might seem minimal. However:
- Per-step mode: When creating multiple sandboxes concurrently, async is crucial
- Future extensibility: Allows for potential concurrent operations
- Modal's design: Modal's async API is optimized for better performance and proper resource management
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need maybe to add modal to the full list of orchestrators in the docs page?
Describe changes
Add Modal Orchestrator Integration
This PR adds a new Modal orchestrator to the ZenML integrations, enabling users
to run complete ML pipelines on Modal's serverless cloud infrastructure with
optimized performance and cost efficiency.
What does this PR do?
pipelines on Modal's cloud platform
maximum speed
for faster execution
intelligent defaults
Implementation Details
File Structure
src/zenml/integrations/modal/
├── orchestrators/ # New directory
│ ├── init.py
│ └── modal_orchestrator.py # Main orchestrator implementation
├── flavors/
│ ├── modal_orchestrator_flavor.py # New flavor definition
│ └── init.py # Updated exports
└── init.py # Updated with orchestrator
registration
Key Features
performance
CPU, 64GB RAM)
starts
compatibility
Usage Example
Why this approach?
inter-step overhead
integration
Breaking Changes
None - this is a new integration that doesn't affect existing functionality.
Dependencies
Note: This orchestrator follows the same patterns as other ZenML orchestrators
(GCP Vertex, Kubernetes) and integrates seamlessly with the existing ZenML
stack architecture.
Note: I also updated the step operator logic to unify it
Pre-requisites
Please ensure you have done the following:
develop
and the open PR is targetingdevelop
. If your branch wasn't based on develop read Contribution guide on rebasing branch to develop.Types of changes