Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create tags table #2036

Merged
merged 37 commits into from
Nov 14, 2023
Merged
Show file tree
Hide file tree
Changes from 26 commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
c9729d9
working option 2
avishniakov Nov 9, 2023
caecabc
working option 2
avishniakov Nov 9, 2023
27624c6
working option 2
avishniakov Nov 9, 2023
15df83c
fix mysql
avishniakov Nov 9, 2023
a813455
Merge branch 'develop' into feature/OSS-2610-create-tags-table
avishniakov Nov 9, 2023
331e1a1
fix bug + add tests
avishniakov Nov 9, 2023
7ab1a22
rename
avishniakov Nov 9, 2023
200776e
Merge branch 'develop' into feature/OSS-2610-create-tags-table
avishniakov Nov 10, 2023
a265b64
add alembic branches output on divergence
avishniakov Nov 10, 2023
4ff8b26
add client functions
avishniakov Nov 10, 2023
4438a5f
strenums for types/colors
avishniakov Nov 10, 2023
9f7daf6
add tags cli
avishniakov Nov 10, 2023
e9b539a
add tags cli
avishniakov Nov 10, 2023
da14b2b
try bypass alembic branching
avishniakov Nov 10, 2023
fb26bd4
remove tag<>resource endpoints
avishniakov Nov 10, 2023
df82b6e
rely on sql for tag links
avishniakov Nov 10, 2023
1dd232b
Merge branch 'develop' into feature/OSS-2610-create-tags-table
avishniakov Nov 10, 2023
17bced5
fix migration bug with uuids
avishniakov Nov 10, 2023
3bb6f82
remove `tagged`
avishniakov Nov 10, 2023
a8f508e
calm down branching check on zenml import
avishniakov Nov 10, 2023
3206946
update signature in tests
avishniakov Nov 10, 2023
994b59a
Merge branch 'develop' into feature/OSS-2610-create-tags-table
avishniakov Nov 10, 2023
ee57e87
update signature in tests
avishniakov Nov 10, 2023
7033f23
Merge branch 'develop' into feature/OSS-2610-create-tags-table
avishniakov Nov 10, 2023
4b51589
resolve branching
avishniakov Nov 10, 2023
81a0390
Auto-update of E2E template
actions-user Nov 10, 2023
de13f1e
move tagging code to sql store
avishniakov Nov 13, 2023
4adcd0b
resolve branching
avishniakov Nov 13, 2023
dce6439
Merge branch 'develop' into feature/OSS-2610-create-tags-table
avishniakov Nov 13, 2023
3f79f50
resolve alembic
avishniakov Nov 13, 2023
08649c8
stabilize test case
avishniakov Nov 13, 2023
005adcb
better cleanups in tests
avishniakov Nov 13, 2023
d6e76aa
workaround fix for quickstart
avishniakov Nov 13, 2023
5324f2b
revert hard cleanup
avishniakov Nov 13, 2023
1355e7c
explicit asserts in cli
avishniakov Nov 13, 2023
f388ae0
revert workaround fix for quickstart
avishniakov Nov 13, 2023
ed33c9f
Temporarily fix quickstart until the certificate is renewed
stefannica Nov 13, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .github/workflows/setup-python-environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -152,6 +152,8 @@ jobs:
if: ${{ inputs.os == 'ubuntu-latest' && inputs.python-version == '3.8' }}

- name: Check for alembic branch divergence
env:
ZENML_DEBUG: 0
run: |
bash scripts/check-alembic-branches.sh

Expand Down
Binary file modified examples/e2e/.assets/00_pipelines_composition.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
48 changes: 27 additions & 21 deletions examples/e2e/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,19 +81,21 @@ This template uses
to demonstrate how to perform major critical steps for Continuous Training (CT)
and Continuous Delivery (CD).

It consists of two pipelines with the following high-level setup:
It consists of three pipelines with the following high-level setup:
<p align="center">
<img height=300 src=".assets/00_pipelines_composition.png">
<img height=800 src=".assets/00_pipelines_composition.png">
</p>

Both pipelines are inside a shared Model Control Plane model context - training pipeline creates and promotes new Model Control Plane version and inference pipeline is reading from inference Model Control Plane version. This makes those pipelines closely connected, while ensuring that only quality-assured Model Control Plane versions are used to produce predictions delivered to stakeholders.
All pipelines are leveraging the Model Control Plane to bring all parts together - the training pipeline creates and promotes a new Model Control Plane version with a trained model object in it, deployment pipeline uses the inference Model Control Plane version (the one promoted during training) to create a deployment service and inference pipeline using deployment service from the inference Model Control Plane version and store back new set of predictions as a versioned data artifact for future use. This makes those pipelines closely connected while ensuring that only quality-assured Model Control Plane versions are used to produce predictions delivered to stakeholders.
* [CT] Training
* Load, split, and preprocess the training dataset
* Search for an optimal model object architecture and tune its hyperparameters
* Train the model object and evaluate its performance on the holdout set
* Compare a recently trained model object with one promoted earlier
* If a recently trained model object performs better - stage it as a new inference model object in model registry
* On success of the current model object - stage newly created Model Control Plane version as the one used for inference
* [CD] Deployment
* Deploy a new prediction service based on the model object connected to the inference Model Control Plane version.
* [CD] Batch Inference
* Load the inference dataset and preprocess it reusing object fitted during training
* Perform data drift analysis reusing training dataset of the inference Model Control Plane version as a reference
Expand All @@ -119,23 +121,27 @@ The project loosely follows [the recommended ZenML project structure](https://do

```
.
├── pipelines # `zenml.pipeline` implementations
│ ├── batch_inference.py # [CD] Batch Inference pipeline
│ └── training.py # [CT] Training Pipeline
├── steps # logically grouped `zenml.steps` implementations
│ ├── alerts # alert developer on pipeline status
│ ├── data_quality # quality gates built on top of drift report
│ ├── etl # ETL logic for dataset
│ ├── hp_tuning # tune hyperparameters and model architectures
│ ├── inference # inference on top of the model from the registry
│ ├── promotion # find if a newly trained model will be new inference
│ └── training # train and evaluate model
├── utils # helper functions
├── configs # pipelines configuration files
│ ├── deployer_config.yaml # the configuration of the deployment pipeline
│ ├── inference_config.yaml # the configuration of the batch inference pipeline
│ └── train_config.yaml # the configuration of the training pipeline
├── pipelines # `zenml.pipeline` implementations
│ ├── batch_inference.py # [CD] Batch Inference pipeline
│ ├── deployment.py # [CD] Deployment pipeline
│ └── training.py # [CT] Training Pipeline
├── steps # logically grouped `zenml.steps` implementations
│ ├── alerts # alert developer on pipeline status
│ ├── deployment # deploy trained model objects
│ ├── data_quality # quality gates built on top of drift report
│ ├── etl # ETL logic for dataset
│ ├── hp_tuning # tune hyperparameters and model architectures
│ ├── inference # inference on top of the model from the registry
│ ├── promotion # find if a newly trained model will be new inference
│ └── training # train and evaluate model
├── utils # helper functions
├── .dockerignore
├── inference_config.yaml # the configuration of the batch inference pipeline
├── Makefile # helper scripts for quick start with integrations
├── README.md # this file
├── requirements.txt # extra Python dependencies
├── run.py # CLI tool to run pipelines on ZenML Stack
└── train_config.yaml # the configuration of the training pipeline
├── Makefile # helper scripts for quick start with integrations
├── README.md # this file
├── requirements.txt # extra Python dependencies
└── run.py # CLI tool to run pipelines on ZenML Stack
```
44 changes: 44 additions & 0 deletions examples/e2e/configs/deployer_config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# Apache Software License 2.0
#
# Copyright (c) ZenML GmbH 2023. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

# environment configuration
settings:
docker:
required_integrations:
- aws
- evidently
- kubeflow
- kubernetes
- mlflow
- sklearn
- slack

# configuration of steps
steps:
notify_on_success:
parameters:
notify_on_success: False

# configuration of the Model Control Plane
model_config:
name: e2e_use_case
version: staging

# pipeline level extra configurations
extra:
notify_on_failure: True

44 changes: 44 additions & 0 deletions examples/e2e/configs/inference_config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# Apache Software License 2.0
#
# Copyright (c) ZenML GmbH 2023. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

# environment configuration
settings:
docker:
required_integrations:
- aws
- evidently
- kubeflow
- kubernetes
- mlflow
- sklearn
- slack

# configuration of steps
steps:
notify_on_success:
parameters:
notify_on_success: False

# configuration of the Model Control Plane
model_config:
name: e2e_use_case
version: staging

# pipeline level extra configurations
extra:
notify_on_failure: True

112 changes: 112 additions & 0 deletions examples/e2e/configs/train_config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
# Apache Software License 2.0
#
# Copyright (c) ZenML GmbH 2023. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

# environment configuration
settings:
docker:
required_integrations:
- aws
- evidently
- kubeflow
- kubernetes
- mlflow
- sklearn
- slack

# configuration of steps
steps:
model_trainer:
parameters:
name: e2e_use_case
compute_performance_metrics_on_current_data:
parameters:
target_env: staging
promote_with_metric_compare:
parameters:
mlflow_model_name: e2e_use_case
target_env: staging
notify_on_success:
parameters:
notify_on_success: False

# configuration of the Model Control Plane
model_config:
name: e2e_use_case
license: apache
description: e2e_use_case E2E Batch Use Case
audience: All ZenML users
use_cases: |
The ZenML E2E project project demonstrates how the most important steps of
the ML Production Lifecycle can be implemented in a reusable way remaining
agnostic to the underlying infrastructure, and shows how to integrate them together
into pipelines for Training and Batch Inference purposes.
ethics: No impact.
tags:
- e2e
- batch
- sklearn
- from template
- ZenML delivered
create_new_model_version: true

# pipeline level extra configurations
extra:
notify_on_failure: True
# This set contains all the model configurations that you want
# to evaluate during hyperparameter tuning stage.
model_search_space:
random_forest:
model_package: sklearn.ensemble
model_class: RandomForestClassifier
search_grid:
criterion:
- gini
- entropy
max_depth:
- 2
- 4
- 6
- 8
- 10
- 12
min_samples_leaf:
range:
start: 1
end: 10
n_estimators:
range:
start: 50
end: 500
step: 25
decision_tree:
model_package: sklearn.tree
model_class: DecisionTreeClassifier
search_grid:
criterion:
- gini
- entropy
max_depth:
- 2
- 4
- 6
- 8
- 10
- 12
min_samples_leaf:
range:
start: 1
end: 10
1 change: 1 addition & 0 deletions examples/e2e/pipelines/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,3 +18,4 @@

from .batch_inference import e2e_use_case_batch_inference
from .training import e2e_use_case_training
from .deployment import e2e_use_case_deployment
22 changes: 8 additions & 14 deletions examples/e2e/pipelines/batch_inference.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,6 @@
# limitations under the License.
#


from steps import (
data_loader,
drift_quality_gate,
Expand All @@ -25,13 +24,10 @@
notify_on_success,
)

from zenml import get_pipeline_context, pipeline
from zenml import pipeline
from zenml.artifacts.external_artifact import ExternalArtifact
from zenml.integrations.evidently.metrics import EvidentlyMetricConfig
from zenml.integrations.evidently.steps import evidently_report_step
from zenml.integrations.mlflow.steps.mlflow_deployer import (
mlflow_model_registry_deployer_step,
)
from zenml.logger import get_logger

logger = get_logger(__name__)
Expand All @@ -49,7 +45,13 @@ def e2e_use_case_batch_inference():
# Link all the steps together by calling them and passing the output
# of one step as the input of the next step.
########## ETL stage ##########
df_inference, target = data_loader(is_inference=True)
df_inference, target, _ = data_loader(
random_state=ExternalArtifact(
model_artifact_pipeline_name="e2e_use_case_training",
model_artifact_name="random_state",
),
is_inference=True,
)
df_inference = inference_data_preprocessor(
dataset_inf=df_inference,
preprocess_pipeline=ExternalArtifact(
Expand All @@ -70,15 +72,7 @@ def e2e_use_case_batch_inference():
)
drift_quality_gate(report)
########## Inference stage ##########
deployment_service = mlflow_model_registry_deployer_step(
registry_model_name=get_pipeline_context().extra["mlflow_model_name"],
registry_model_version=ExternalArtifact(
model_artifact_name="promoted_version",
),
replace_existing=True,
)
inference_predict(
deployment_service=deployment_service,
dataset_inf=df_inference,
after=["drift_quality_gate"],
)
Expand Down
37 changes: 37 additions & 0 deletions examples/e2e/pipelines/deployment.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# Apache Software License 2.0
#
# Copyright (c) ZenML GmbH 2023. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

from steps import deployment_deploy, notify_on_failure, notify_on_success

from zenml import pipeline


@pipeline(on_failure=notify_on_failure)
def e2e_use_case_deployment():
"""
Model deployment pipeline.

This is a pipeline deploys trained model for future inference.
"""
### ADD YOUR OWN CODE HERE - THIS IS JUST AN EXAMPLE ###
# Link all the steps together by calling them and passing the output
# of one step as the input of the next step.
########## Deployment stage ##########
deployment_deploy()

notify_on_success(after=["deployment_deploy"])
### YOUR CODE ENDS HERE ###
Loading