Add Modal orchestrator with step operator and orchestrator flavors #3733

htahir1 · 2025-06-12T09:36:41Z

Describe changes

Add Modal Orchestrator Integration

This PR adds a new Modal orchestrator to the ZenML integrations, enabling users
to run complete ML pipelines on Modal's serverless cloud infrastructure with
optimized performance and cost efficiency.

What does this PR do?

Adds Modal Orchestrator: New orchestrator flavor that executes entire ZenML
pipelines on Modal's cloud platform
Optimized Execution Modes: Two execution modes for different use cases:
- pipeline (default): Runs entire pipeline in single Modal function for
  maximum speed
- per_step: Runs each step separately for granular control and debugging
Persistent Apps: Implements warm container reuse with 2-hour refresh cycles
for faster execution
Resource Configuration: Full support for GPU, CPU, memory settings with
intelligent defaults
Authentication: Modal token support with fallback to default Modal auth

Implementation Details

File Structure

src/zenml/integrations/modal/
├── orchestrators/ # New directory
│ ├── init.py
│ └── modal_orchestrator.py # Main orchestrator implementation
├── flavors/
│ ├── modal_orchestrator_flavor.py # New flavor definition
│ └── init.py # Updated exports
└── init.py # Updated with orchestrator
registration

Key Features

Pipeline-First Design: Uses PipelineEntrypointConfiguration for optimal
performance
Smart Resource Management: Automatic fallbacks and intelligent defaults (32
CPU, 64GB RAM)
Cost Optimization: Persistent apps with warm containers to minimize cold
starts
Stack Validation: Ensures remote artifact store and container registry
compatibility
Comprehensive Error Handling: Detailed logging and error reporting

Usage Example

  from zenml import pipeline, step
  from zenml.integrations.modal.flavors import ModalOrchestratorSettings

  # Configure Modal orchestrator
  @pipeline(
      settings={
          "orchestrator": ModalOrchestratorSettings(
              execution_mode="pipeline",  # Run entire pipeline in one function
              cpu_count=16,
              memory_mb=32768,
              gpu="A100",
              region="us-east-1"
          )
      }
  )
  def my_pipeline():
      # Your pipeline steps here
      pass

Why this approach?

Performance: Running entire pipelines in single containers eliminates
inter-step overhead
Cost Efficiency: Fewer container spawns = lower costs on Modal's platform
Simplicity: Clean API with just two execution modes for distinct use cases
ZenML Native: Leverages ZenML's PipelineEntrypointConfiguration for optimal
integration

Breaking Changes

None - this is a new integration that doesn't affect existing functionality.

Dependencies

Adds modal>=0.64.49,<1 requirement to Modal integration
No new dependencies for core ZenML

Note: This orchestrator follows the same patterns as other ZenML orchestrators
(GCP Vertex, Kubernetes) and integrates seamlessly with the existing ZenML
stack architecture.

Note: I also updated the step operator logic to unify it

Pre-requisites

Please ensure you have done the following:

I have read the CONTRIBUTING.md document.
I have added tests to cover my changes.
I have based my new branch on develop and the open PR is targeting develop. If your branch wasn't based on develop read Contribution guide on rebasing branch to develop.
IMPORTANT: I made sure that my changes are reflected properly in the following resources:
- ZenML Docs
- Dashboard: Needs to be communicated to the frontend team.
- Templates: Might need adjustments (that are not reflected in the template tests) in case of non-breaking changes and deprecations.
- Projects: Depending on the version dependencies, different projects might get affected.

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Other (add details above)

coderabbitai · 2025-06-12T09:36:48Z

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

- Adds new Modal orchestrator flavor for serverless pipeline execution - Implements optimized execution modes: pipeline (default) and per_step - Supports GPU/CPU resource configuration with intelligent defaults - Features persistent apps with warm containers for fast execution - Includes comprehensive documentation and examples - Simplifies execution model by removing redundant single_function mode 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

github-actions · 2025-06-12T09:46:05Z

Documentation Link Check Results

✅ Absolute links check passed
✅ Relative links check passed
_{Last checked: 2025-06-27 07:17:59 UTC}

…tepOperator

github-actions · 2025-06-12T11:51:26Z

✅ Branch tenant has been deployed! Access it at: https://staging.cloud.zenml.io/workspaces/feature-modal-orchestrator/projects

strickvl · 2025-06-13T11:43:37Z

src/zenml/integrations/modal/utils.py

+                        elif (
+                            log_age < 300
+                        ):  # Only show logs from last 5 minutes
+                            # This log is recent enough to likely be ours
+                            logger.info(f"{log_msg}")
+                        # Else: skip old logs


This bit seems maybe like it could use more attention. Also are you sure that we wouldn't have time zone mismatches here etc?

strickvl · 2025-06-13T11:44:19Z

src/zenml/integrations/modal/utils.py

+    else:
+        # Fallback to first step's resource settings if no pipeline-level resources
+        if deployment.step_configurations:
+            first_step = list(deployment.step_configurations.values())[0]


what is the first step is unrepresentative, though?

We should check / just make this really explicit in the docs that this is how we assume this

strickvl · 2025-06-13T11:45:43Z

docs/book/component-guide/orchestrators/modal.md

+# Register the orchestrator with explicit credentials
+zenml orchestrator register <ORCHESTRATOR_NAME> \
+    --flavor=modal \
+    --token=<MODAL_TOKEN> \
+    --workspace=<MODAL_WORKSPACE> \
+    --synchronous=true


I think this should use --token-id and --token-secret separately as per the code?

strickvl · 2025-06-13T11:47:59Z

docs/book/component-guide/orchestrators/modal.md

+### Authentication with different environments
+
+For production deployments, you can specify different Modal environments:


Maybe could have a little info box in this section (or maybe even above, linking down here) to say that you might want to have two different stacks, each associated with a different modal environment, one for prod and the other for development etc etc.

docs/book/component-guide/orchestrators/modal.md

strickvl · 2025-06-13T12:12:09Z

src/zenml/integrations/modal/utils.py

+    log_stream_active.set()
+    start_time = time.time()
+
+    def stream_logs() -> None:


Function in function smells a bit wrong and also wondering if we should instead use their Python SDK to stream the logs? https://github.com/modal-labs/modal-client/blob/4177d0b994ac69e01ada7d7a96655c9dcaae570e/modal/cli/utils.py#L24

Possibly something for down the line, though the func-in-func seems off.

strickvl · 2025-06-13T12:13:03Z

src/zenml/integrations/modal/orchestrators/modal_orchestrator.py

+if TYPE_CHECKING:
+    from zenml.integrations.modal.flavors.modal_orchestrator_flavor import (
+        ModalOrchestratorConfig,
+        ModalOrchestratorSettings,
+    )
+
+from zenml.integrations.modal.flavors.modal_orchestrator_flavor import (
+    ModalExecutionMode,
+)
+
+if TYPE_CHECKING:
+    from zenml.models import PipelineDeploymentResponse, PipelineRunResponse
+    from zenml.models.v2.core.pipeline_deployment import PipelineDeploymentBase


combine the 'if TYPE_CHECKING' parts?

strickvl · 2025-06-13T12:26:08Z

docs/book/component-guide/orchestrators/modal.md

+
+The Modal orchestrator supports two execution modes:
+
+1. **`pipeline` (default)**: Runs the entire pipeline in a single Modal function for maximum speed and cost efficiency


Not sure I understand why this pipeline option is max speed. Isn't it running everything sequentially in the same container? Wouldn't running things in parallel in separate Modal function calls run faster?

src/zenml/integrations/modal/flavors/modal_orchestrator_flavor.py

strickvl · 2025-06-13T12:30:13Z

docs/book/component-guide/orchestrators/modal.md

+Using the ZenML `modal` integration, you can orchestrate and scale your ML pipelines on [Modal's](https://modal.com/) serverless cloud platform with minimal setup and maximum efficiency.
+
+The Modal orchestrator is designed for speed and cost-effectiveness, running entire pipelines in single serverless functions to minimize cold starts and optimize resource utilization.
+


Maybe some representative screenshot of the Modal UI in here to make the docs a bit friendlier?

i think its fine without

…estrator

- Extract nested log streaming function into ModalLogStreamer class for better code organization - Remove unreliable timezone-based log filtering that could miss logs due to clock skew - Implement smarter resource fallback: use highest requirements across all steps instead of potentially unrepresentative first step - Add logging for resource selection decisions to improve debugging - Fix function-in-function code smell identified in PR review 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

- Combine duplicate TYPE_CHECKING blocks into single import section - Improve import organization and reduce redundancy - Maintain all existing functionality while improving code structure 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

- Import MODAL_ORCHESTRATOR_FLAVOR constant from central location to avoid duplication - Update requirements to modal>=1 after testing compatibility with both orchestrator and step operator - Remove unnecessary utils import that was only for mypy discovery - Maintain consistent import patterns across Modal integration files 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

Based on PR review feedback: - Fix token authentication examples to use --token-id and --token-secret - Add "When NOT to use it" section with clear tradeoffs and alternatives - Add info boxes for environment separation best practices and cost implications - Document Modal vs Step Operator differences with usage recommendations - Add GPU base image requirements and CUDA compatibility warnings - Clarify execution modes: "pipeline" mode reduces overhead vs enables parallelism - Document resource fallback behavior and warming window defaults - Add container warming cost implications with specific guidance - Remove tracking pixel per review request - Improve overall documentation clarity and completeness 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

htahir1 · 2025-06-24T11:54:47Z

src/zenml/integrations/modal/orchestrators/modal_orchestrator.py

+
+        # Pass pipeline run ID for proper isolation (following other orchestrators' pattern)
+        if placeholder_run:
+            environment["ZENML_PIPELINE_RUN_ID"] = str(placeholder_run.id)


@schustmi can i do this? Does this work?

…estrator

src/zenml/integrations/modal/orchestrators/modal_orchestrator_entrypoint.py

src/zenml/integrations/modal/utils.py

docs/book/component-guide/orchestrators/modal.md

github-actions · 2025-06-25T19:33:21Z

ZenML CLI Performance Comparison (Threshold: 1.0s, Timeout: 60s, Slow: 5s)

❌ Failed Commands on Current Branch (feature/modal-orchestrator)

zenml stack list: Command failed on run 1 (exit code: 1)
zenml pipeline list: Command failed on run 1 (exit code: 1)
zenml model list: Command failed on run 1 (exit code: 1)

🚨 New Failures Introduced

The following commands fail on your branch but worked on the target branch:

zenml stack list
zenml pipeline list
zenml model list

Performance Comparison

Command	develop Time (s)	feature/modal-orchestrator Time (s)	Difference	Status
`zenml --help`	1.562350 ± 0.013928	1.611387 ± 0.005477	+0.049s	✓ No significant change
`zenml model list`	Not tested	Failed	N/A	❌ Broken in current branch
`zenml pipeline list`	Not tested	Failed	N/A	❌ Broken in current branch
`zenml stack --help`	1.532120 ± 0.012923	1.610110 ± 0.025140	+0.078s	✓ No significant change
`zenml stack list`	Not tested	Failed	N/A	❌ Broken in current branch

Summary

Total commands analyzed: 5
Commands compared for timing: 2
Commands improved: 0 (0.0% of compared)
Commands degraded: 0 (0.0% of compared)
Commands unchanged: 2 (100.0% of compared)
Failed commands: 3 (NEW FAILURES INTRODUCED)
Timed out commands: 0
Slow commands: 0

Environment Info

Target branch: Linux 6.11.0-1015-azure
Current branch: Linux 6.11.0-1015-azure
Test timestamp: 2025-06-26T08:31:25Z
Timeout: 60 seconds
Slow threshold: 5 seconds

safoinme · 2025-06-26T00:15:19Z

src/zenml/integrations/modal/orchestrators/modal_orchestrator.py

+        with modal.enable_output():
+            # Create sandbox with the entrypoint command
+            # Note: Modal sandboxes inherit environment from the image
+            sb = await modal.Sandbox.create.aio(


why is async interface used to create sandboxes? instead of just modal.Sandbox.create

- await modal.Sandbox.create.aio() properly integrates with the asyncio event loop - modal.Sandbox.create() would make blocking system calls that don't yield control back to the event

In this specific case, since we're using asyncio.run() and only creating one sandbox at a time in
pipeline mode, the practical difference might seem minimal. However:

Per-step mode: When creating multiple sandboxes concurrently, async is crucial

Future extensibility: Allows for potential concurrent operations

Modal's design: Modal's async API is optimized for better performance and proper resource management

strickvl

I think you need maybe to add modal to the full list of orchestrators in the docs page?

…estrator

Add Modal orchestrator with step operator and orchestrator flavors

a599f69

htahir1 requested review from strickvl and safoinme June 12, 2025 09:36

github-actions bot added internal enhancement labels Jun 12, 2025

htahir1 added 3 commits June 12, 2025 12:04

Add pipeline-wide resource settings for hardware resources

cd6b59f

Add Modal step operator orchestrator to run pipelines on Modal

58d72c5

Add log streaming for async execution in Modal Orchestrator

9a0f2a5

htahir1 added the staging-workspace label Jun 12, 2025

htahir1 added 2 commits June 12, 2025 12:33

Add app warming window hours for container reuse

7cf8fd5

Remove unnecessary exception handling in ModalOrchestrator and ModalS…

60d227a

…tepOperator

htahir1 added 7 commits June 12, 2025 13:54

Refactor return statement to assign and return validator

45ed008

Add Modal integration utils module for mypy discoverability

0b6b9f9

Update Modal integration to require Modal version 1

d8318b1

Introduce ModalExecutionMode enum for execution modes

b7400b6

Refactor Modal authentication setup and deployment

ee15117

Update platform references in log messages

85aeef4

Include pipeline run ID for isolation and conflict prevention

01f7ef2

strickvl reviewed Jun 13, 2025

View reviewed changes

htahir1 and others added 5 commits June 13, 2025 14:35

Merge remote-tracking branch 'origin/develop' into feature/modal-orch…

6deba69

…estrator

htahir1 requested a review from strickvl June 13, 2025 14:25

Refactor error message formatting and GPU count handling

f2218b1

htahir1 added 9 commits June 24, 2025 08:30

Update environment variable naming convention in Modal step operator

5f22bb0

Update Modal orchestrator to use sandbox architecture

376d7e4

Update Modal orchestrator for per-step sandboxes

04d0204

Add generate_sandbox_tags function and set tags in sandboxes

34f717a

Refactor Modal orchestrator and image building functions

bc1f488

Added Modal step operator setup guide and examples

32b5571

Refactor configuration options inheritance logic

31b2c13

Update modal environment flag format in step operators

4a90c3d

Update modal orchestrator and step operator flavors

312d490

htahir1 requested review from schustmi, strickvl and safoinme June 24, 2025 10:27

htahir1 commented Jun 24, 2025

View reviewed changes

htahir1 added 3 commits June 24, 2025 14:10

Merge branch 'develop' into feature/modal-orchestrator

1392ec0

Merge remote-tracking branch 'origin/develop' into feature/modal-orch…

f9c0818

…estrator

Update entrypoint configuration description

11d63b3

strickvl requested changes Jun 25, 2025

View reviewed changes

src/zenml/integrations/modal/orchestrators/modal_orchestrator_entrypoint.py Outdated Show resolved Hide resolved

src/zenml/integrations/modal/utils.py Outdated Show resolved Hide resolved

docs/book/component-guide/orchestrators/modal.md Outdated Show resolved Hide resolved

htahir1 added 2 commits June 25, 2025 21:28

Update configuration section heading to "Configuration Examples".

75e0615

Create deep copy of settings for Modal orchestrator

4fa278f

htahir1 requested a review from strickvl June 25, 2025 19:31

safoinme reviewed Jun 26, 2025

View reviewed changes

strickvl requested changes Jun 26, 2025

View reviewed changes

Add ModalOrchestrator for running pipelines on Modal platform

79a449a

htahir1 requested review from strickvl and safoinme June 26, 2025 08:26

strickvl approved these changes Jun 26, 2025

View reviewed changes

htahir1 added 3 commits June 26, 2025 10:54

Merge remote-tracking branch 'origin/develop' into feature/modal-orch…

a499755

…estrator

Merge branch 'develop' into feature/modal-orchestrator

2ae1e8b

Merge branch 'develop' into feature/modal-orchestrator

72a9f72

		### Authentication with different environments

		For production deployments, you can specify different Modal environments:


		The Modal orchestrator supports two execution modes:

		1. `pipeline` (default): Runs the entire pipeline in a single Modal function for maximum speed and cost efficiency

		Using the ZenML `modal` integration, you can orchestrate and scale your ML pipelines on [Modal's](https://modal.com/) serverless cloud platform with minimal setup and maximum efficiency.

		The Modal orchestrator is designed for speed and cost-effectiveness, running entire pipelines in single serverless functions to minimize cold starts and optimize resource utilization.

Add Modal orchestrator with step operator and orchestrator flavors #3733

Are you sure you want to change the base?

Add Modal orchestrator with step operator and orchestrator flavors #3733

Uh oh!

Conversation

htahir1 commented Jun 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe changes

What does this PR do?

Pre-requisites

Types of changes

Uh oh!

coderabbitai bot commented Jun 12, 2025

Review skipped

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

Uh oh!

github-actions bot commented Jun 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Documentation Link Check Results

Uh oh!

github-actions bot commented Jun 12, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

ZenML CLI Performance Comparison (Threshold: 1.0s, Timeout: 60s, Slow: 5s)

❌ Failed Commands on Current Branch (feature/modal-orchestrator)

🚨 New Failures Introduced

Performance Comparison

Summary

Environment Info

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

strickvl left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

htahir1 commented Jun 12, 2025 •

edited

Loading

github-actions bot commented Jun 12, 2025 •

edited

Loading

github-actions bot commented Jun 25, 2025 •

edited

Loading