diff --git a/README.md b/README.md index a8b63338c9..3abf8f90a2 100644 --- a/README.md +++ b/README.md @@ -1,3 +1,9 @@ +
+ +

Create an internal MLOps platform for your entire machine learning team. +

+
+ -
- 🏁 Table of Contents -
    -
  1. Introduction
  2. -
  3. Quickstart
  4. -
  5. - Learning -
  6. -
  7. Roadmap
  8. -
  9. Contributing and Community
  10. -
  11. Getting Help
  12. -
  13. License
  14. -
-
- -
- -# 🤖 Introduction - -🤹 ZenML is an extensible, open-source MLOps framework for creating portable, -production-ready machine learning pipelines. By decoupling infrastructure from -code, ZenML enables developers across your organization to collaborate more -effectively as they develop to production. - -- 💼 ZenML gives data scientists the freedom to fully focus on modeling and -experimentation while writing code that is production-ready from the get-go. - -- 👨‍💻 ZenML empowers ML engineers to take ownership of the entire ML lifecycle - end-to-end. Adopting ZenML means fewer handover points and more visibility on - what is happening in your organization. - -- 🛫 ZenML enables MLOps infrastructure experts to define, deploy, and manage -sophisticated production environments that are easy to use for colleagues. +## 🤸 Quickstart +[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/zenml-io/zenml/blob/main/examples/quickstart/quickstart.ipynb) -
- ZenML Hero -
+[Install ZenML](https://docs.zenml.io/getting-started/installation) via [PyPI](https://pypi.org/project/zenml/). Python 3.8 - 3.11 is required: + +```bash +pip install "zenml[server]" notebook +``` + +Take a tour with the guided quickstart by running: -# 🛠️ Why ZenML? +```bash +zenml go +``` -![Walkthrough of ZenML Model Control Plane (Dashboard available only on ZenML Cloud)](/docs/book/.gitbook/assets/mcp_walkthrough.gif) +## 🪄 Simple, integrated, End-to-end MLOps -ZenML offers a systematic approach to structuring your machine learning codebase for a seamless transition to production. It's an ideal solution for teams grappling with establishing an internal standard for coordinating ML operations. ZenML provides not just a tool, but a workflow strategy that guides you in integrating all your tools and infrastructure effectively. +### Create machine learning pipelines with minimal code changes -Use ZenML if: +ZenML is a MLOps framework intended for data scientists or ML engineers looking to standardize machine learning practices. Just add `@step` and `@pipeline` to your existing Python functions to get going. Here is a toy example: -- You need to easily automate ML workflows on services like an Airflow cluster or AWS Sagemaker Pipelines. -- Your ML tasks require repeatability and reproducibility. -- Automating and standardizing ML workflows across your team is a challenge. -- Your team integrates multiple tools with no central platform. -- You'd like a single place to track data, code, configuration, and models along with your cloud artifact storage. -- Collaboration and hand-overs between multiple teams is difficult. +```python +from zenml import pipeline, step -# ☄️ What makes ZenML different? +@step # Just add this decorator +def load_data() -> dict: + training_data = [[1, 2], [3, 4], [5, 6]] + labels = [0, 1, 0] + return {'features': training_data, 'labels': labels} -![Before and after ZenML](/docs/book/.gitbook/assets/zenml-why.png) +@step +def train_model(data: dict) -> None: + total_features = sum(map(sum, data['features'])) + total_labels = sum(data['labels']) + + print(f"Trained model using {len(data['features'])} data points. " + f"Feature sum is {total_features}, label sum is {total_labels}") -ZenML marries the capabilities of a classic pipeline tool like [Airflow](https://airflow.apache.org/) and a metadata tracking service like [MLflow](https://mlflow.org/). Furthermore, both these types of tools can seamlessly co-exist with ZenML, providing a comprehensive, end-to-end ML experience. +@pipeline # This function combines steps together +def simple_ml_pipeline(): + dataset = load_data() + train_model(dataset) -It excels at: +if __name__ == "__main__": + run = simple_ml_pipeline() # call this to run the pipeline + +``` -- Enabling creation of simple, pythonic [ML pipelines](https://docs.zenml.io/user-guide/starter-guide/create-an-ml-pipeline) that function locally and on any [orchestration backend](https://docs.zenml.io/user-guide/production-guide/cloud-orchestration). -- Automating versioning of [data](https://docs.zenml.io/user-guide/starter-guide/manage-artifacts) and [models](https://docs.zenml.io/user-guide/starter-guide/track-ml-models) on [remote artifact storage like S3](https://docs.zenml.io/user-guide/production-guide/remote-storage). -- Abstracting infrastructure and run configuration from code through a [simple YAML config](https://docs.zenml.io/user-guide/advanced-guide/pipelining-features/configure-steps-pipelines). -- Logging complex [metadata](https://docs.zenml.io/user-guide/advanced-guide/data-management/logging-metadata) for models and artifacts. -- Automatically containerizing and deploying your workflows to the cloud, connected to your [code repository](https://docs.zenml.io/user-guide/production-guide/connect-code-repository). -- Connecting your [secret store](https://docs.zenml.io/user-guide/advanced-guide/secret-management) to your ML workflows. +![Running a ZenML pipeline](/docs/book/.gitbook/assets/readme_basic_pipeline.gif) -However, ZenML doesn't: +### Deploy workloads easily on your production infrastructure -- Automatically create visuals and track experiments: It [integrates with experiment trackers](https://docs.zenml.io/stacks-and-components/component-guide/experiment-trackers) that specialize in this task. -- Package and deploy models: ZenML catalogs models and metadata, streamlining model deployment. Refer to [ZenML model deployers](https://docs.zenml.io/stacks-and-components/component-guide/model-deployers) for more information. -- Handle distributed computation: While ZenML pipelines scale vertically with ease, it [works with tools like Spark](https://docs.zenml.io/stacks-and-components/component-guide/step-operators/spark-kubernetes) for intricate distributed workflows. +The framework is a gentle entry point for practitioners to build complex ML pipelines with little knowledge required of the underlying infrastructure complexity. ZenML pipelines can be run on AWS, GCP, Azure, Airflow, Kubeflow and even on Kubernetes without having to change any code or know underlying internals. -# 🤸 Quickstart +```python +from zenml.config import ResourceSettings, DockerSettings -[Install ZenML](https://docs.zenml.io/getting-started/installation) via -[PyPI](https://pypi.org/project/zenml/). Python 3.8 - 3.11 is required: +@step( + settings={ + "resources": ResourceSettings(memory="16Gb", gpu="1", cpu="8"), + "docker": DockerSettings(parent_image="pytorch/pytorch:1.12.1-cuda11.3-cudnn8-runtime") + } +) +def training(...): + ... +``` ```bash -pip install "zenml[server]" -# you'll also need the `notebook` package installed to run Jupyter notebooks: -# OPTIONALLY: `pip install notebook` +zenml stack set k8s # Set a stack with kubernetes orchestrator +python run.py ``` -Take a tour with the guided quickstart by running: +**GIF of running a pipeline and running on k8s (potentially showing other DAG renders as well?)** -```bash -zenml go +### Track models, pipeline, and artifacts + +Create a complete lineage of who, where, and what data and models are produced. + +Get a complete lineage of your complete process. You’ll be able to find out who produced which model, at what time, with which data, and on which version of the code. This guarantees full reproducibility and auditability. + +```python +from zenml import Model + +@step(model=Model(name="classification")) +def trainer(training_df: pd.DataFrame) -> Annotated["model", torch.nn.Module]: + ... ``` -# 🔋 Deploy ZenML +![Exploring ZenML Models](/docs/book/.gitbook/assets/readme_mcp.gif) -For full functionality ZenML should be deployed on the cloud to -enable collaborative features as the central MLOps interface for teams. +### Purpose built for machine learning with integration to you favorite tools -
- ZenML Architecture Diagram. -
+While ZenML brings a lot of value of the box, it also integrates into your existing tooling and infrastructure without you having to be locked in. -Currently, there are two main options to deploy ZenML: +```python +from bentoml._internal.bento import bento -- **ZenML Cloud**: With [ZenML Cloud](cloud.zenml.io/?utm_source=readme&utm_medium=referral_link&utm_campaign=cloud_promotion&utm_content=signup_link), -you can utilize a control plane to create ZenML servers, also known as tenants. -These tenants are managed and maintained by ZenML's dedicated team, alleviating -the burden of server management from your end. +@step(on_failure=alert_slack, experiment_tracker="mlflow") +def train_and_deploy(training_df: pd.DataFrame) -> bento.Bento + mlflow.autolog() + ... + return bento +``` -- **Self-hosted deployment**: Alternatively, you have the flexibility to [deploy -ZenML on your own self-hosted environment](https://docs.zenml.io/deploying-zenml/zenml-self-hosted). -This can be achieved through various methods, including using our CLI, Docker, -Helm, or HuggingFace Spaces. +![Exploring ZenML Integrations](/docs/book/.gitbook/assets/readme_integrations.gif) -# 🖼️ Learning +## 🖼️ Learning -The best way to learn about ZenML is the [docs](https://docs.zenml.io). We recommend beginning with the [Starter Guide](https://docs.zenml.io/user-guide/starter-guide) to get up and running quickly. +The best way to learn about ZenML is the [docs](https://docs.zenml.io/). We recommend beginning with the [Starter Guide](https://docs.zenml.io/user-guide/starter-guide) to get up and running quickly. For inspiration, here are some other examples and use cases: 1. [E2E Batch Inference](examples/e2e/): Feature engineering, training, and inference pipelines for tabular machine learning. 2. [Basic NLP with BERT](examples/e2e_nlp/): Feature engineering, training, and inference focused on NLP. 3. [LLM RAG Pipeline with Langchain and OpenAI](https://github.com/zenml-io/zenml-projects/tree/main/llm-agents): Using Langchain to create a simple RAG pipeline. -4. [Huggingface Model to Sagemaker Endpoint](https://github.com/zenml-io/zenml-projects/tree/main/huggingface-sagemaker): Automated MLOps on Amazon Sagemaker and HuggingFace. +4. [Huggingface Model to Sagemaker Endpoint](https://github.com/zenml-io/zenml-projects/tree/main/huggingface-sagemaker): Automated MLOps on Amazon Sagemaker and HuggingFace +4. [LLMops](https://github.com/zenml-io/zenml-projects/tree/main/llm-complete-guide): Complete guide to do LLM with ZenML + +## 🔋 Deploy ZenML + +For full functionality ZenML should be deployed on the cloud to +enable collaborative features as the central MLOps interface for teams. + +Currently, there are two main ways to deploy ZenML: -# Use ZenML with VS Code +- **ZenML Cloud**: With [ZenML Cloud](cloud.zenml.io/?utm_source=readme&utm_medium=referral_link&utm_campaign=cloud_promotion&utm_content=signup_link), +you can make use of a control plane to create ZenML servers, also known as tenants. +These tenants are managed and maintained by ZenML’s dedicated team, alleviating +the burden of server management from your end. +- **Self-hosted deployment**: Alternatively, you have the flexibility to [deploy +ZenML on your own self-hosted environment](https://docs.zenml.io/deploying-zenml/zenml-self-hosted). +This can be achieved through various methods, including using our CLI, Docker, +Helm, or HuggingFace Spaces. -ZenML has a [VS Code -extension](https://marketplace.visualstudio.com/items?itemName=ZenML.zenml-vscode) -that allows you to inspect your stacks and pipeline runs directly from your -editor. The extension also allows you to switch your stacks without needing to -type any CLI commands. +## Use ZenML with VS Code -
- 🖥️ VS Code Extension in Action! -
- ZenML Extension -
-
+ZenML has a [VS Code extension](https://marketplace.visualstudio.com/items?itemName=ZenML.zenml-vscode) that allows you to inspect your stacks and pipeline runs directly from your editor. The extension also allows you to switch your stacks without needing to type any CLI commands. + +- 🖥️ VS Code Extension in Action! + + ![/docs/book/.gitbook/assets/zenml-extension-shortened.gif](/docs/book/.gitbook/assets/zenml-extension-shortened.gif) + -# 🗺 Roadmap +## 🗺 Roadmap -ZenML is being built in public. The [roadmap](https://zenml.io/roadmap) is a -regularly updated source of truth for the ZenML community to understand where -the product is going in the short, medium, and long term. +ZenML is being built in public. The [roadmap](https://zenml.io/roadmap) is a regularly updated source of truth for the ZenML community to understand where the product is going in the short, medium, and long term. -ZenML is managed by a [core team](https://zenml.io/company) of -developers that are responsible for making key decisions and incorporating -feedback from the community. The team oversees feedback via various channels, +ZenML is managed by a [core team](https://zenml.io/company) of developers that are responsible for making key decisions and incorporating feedback from the community. The team oversees feedback via various channels, and you can directly influence the roadmap as follows: - Vote on your most wanted feature on our [Discussion - board](https://zenml.io/discussion). +board](https://zenml.io/discussion). - Start a thread in our [Slack channel](https://zenml.io/slack). -- [Create an issue](https://github.com/zenml-io/zenml/issues/new/choose) on our - GitHub repo. +- [Create an issue](https://github.com/zenml-io/zenml/issues/new/choose) on our GitHub repo. -# 🙌 Contributing and Community +## 🙌 Contributing and Community We would love to develop ZenML together with our community! The best way to get -started is to select any issue from the [`good-first-issue` +started is to select any issue from the `[good-first-issue` label](https://github.com/issues?q=is%3Aopen+is%3Aissue+archived%3Afalse+user%3Azenml-io+label%3A%22good+first+issue%22) -and open up a Pull Request! If you +and open up a Pull Request! + +If you would like to contribute, please review our [Contributing Guide](CONTRIBUTING.md) for all relevant details. -# 🆘 Getting Help +## 🆘 Getting Help The first point of call should be [our Slack group](https://zenml.io/slack-invite/). @@ -277,17 +251,40 @@ Or, if you prefer, [open an issue](https://github.com/zenml-io/zenml/issues/new/choose) on our GitHub repo. -# Vulnerability affecting `zenml<0.46.7` (CVE-2024-25723) - -We have identified a critical security vulnerability in ZenML versions prior to -0.46.7. This vulnerability potentially allows unauthorized users to take -ownership of ZenML accounts through the user activation feature. Please [read our -blog post](https://www.zenml.io/blog/critical-security-update-for-zenml-users) -for more information on how we've addressed this. - -# 📜 License +## 📜 License ZenML is distributed under the terms of the Apache License Version 2.0. A complete version of the license is available in the [LICENSE](LICENSE) file in this repository. Any contribution made to this project will be licensed under the Apache License Version 2.0. + +
+

+

+ Join our + Slack + Slack Community and be part of the ZenML family. +
+
+ Features + · + Roadmap + · + Report Bug + · + Sign up for Cloud + · + Read Blog + · + Contribute to Open Source + · + Projects Showcase +
+
+ 🎉 Version 0.57.0 is out. Check out the release notes + here. +
+ 🖥️ Download our VS Code Extension here. +
+

+
\ No newline at end of file diff --git a/docs/book/.gitbook/assets/readme_basic_pipeline.gif b/docs/book/.gitbook/assets/readme_basic_pipeline.gif new file mode 100644 index 0000000000..fa11d4e691 Binary files /dev/null and b/docs/book/.gitbook/assets/readme_basic_pipeline.gif differ diff --git a/docs/book/.gitbook/assets/readme_integrations.gif b/docs/book/.gitbook/assets/readme_integrations.gif new file mode 100644 index 0000000000..9d0262304d Binary files /dev/null and b/docs/book/.gitbook/assets/readme_integrations.gif differ diff --git a/docs/book/.gitbook/assets/readme_mcp.gif b/docs/book/.gitbook/assets/readme_mcp.gif new file mode 100644 index 0000000000..565667491b Binary files /dev/null and b/docs/book/.gitbook/assets/readme_mcp.gif differ