Update starter tempalte #12

avishniakov · 2023-12-18T16:45:08Z

Reborn of #11 due to branch rename

Summary by CodeRabbit

New Features
- Added a new "Compress Images" GitHub Actions workflow to optimize images on pull requests.
- Introduced a simplified starter template for ZenML projects with updated instructions.
- Provided a new quickstart Jupyter notebook guide for MLOps pipelines using ZenML.
- Implemented a command-line interface for running different pipelines in ZenML projects.
Improvements
- Updated the requirements.txt to specify newer versions of dependencies and added additional required packages.
- Enhanced the README.md to reflect changes in the repository's purpose and usage.
- Improved GitHub Actions workflows with additional input descriptions and a fail-fast strategy option.
Documentation
- Added comprehensive documentation for building MLOps pipelines with ZenML in template/README.md.
- Included descriptions and instructions for various steps and utilities in the ZenML pipeline process.
Bug Fixes
- Removed specific component configurations for MLflow and Evidently from tests/conftest.py to align with updated stack setup.
Refactor
- Streamlined the GitHub Actions workflow for exporting requirements by focusing on sklearn integration.
Chores
- Updated .gitignore to include additional directories and file types for better development experience.
Tests
- Revised tests to accommodate new pipeline parameters and logic, including a test for custom pipeline names.

…motion logic, added two trainins runs etc

…om sate

coderabbitai · 2023-12-19T14:05:33Z

Walkthrough

The project underwent a significant streamlining process, focusing on a simpler ZenML starter template and refining its associated GitHub workflows and actions. It removed extraneous integrations to emphasize sklearn, enhanced CI with additional inputs and fail-safe strategies, introduced image optimization, and updated the .gitignore for better development environment support. The documentation, including READMEs and notebooks, now offers clearer guidance for MLOps with ZenML, and the codebase reflects a shift towards more precise dependency management and modular pipeline construction.

Changes

File Path	Change Summary
`.github/actions/.../action.yml`	Updated `zenml integration export-requirements` to exclude multiple integrations, keeping only `sklearn`.
`.github/workflows/...`	Enhanced workflows with input descriptions and `fail-fast` strategy. Added "Compress Images" workflow.
`.gitignore`	Added entries for `.idea`, `*.zen`, and `.vscode`.
`README.md`, `template/README.md`, `template/quickstart.ipynb`	Updated to reflect a shift to a starter template and provided detailed MLOps guidance with ZenML.
`requirements.txt`, `template/requirements.txt`	Updated dependencies and added `zenml[server]>=0.52.0` and `notebook`.
`template/...` (multiple files in `template` directory)	Introduced a suite of new files defining a structured approach to MLOps pipelines, including data loading, preprocessing, training, and inference.
`tests/...` (files within `tests` directory)	Removed MLflow and Evidently configurations; updated test functions to align with new template logic.

🐇✨
In a burrow, deep and snug,
CodeRabbit tweaked the project's rug.
Now with ZenML, clear and bright,
MLOps workflows take their flight. 🚀🌟

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on X ?

Tips

Chat with CodeRabbit Bot (`@coderabbitai`)

You can reply to a review comment made by CodeRabbit.
You can tag CodeRabbit on specific lines of code or files in the PR by tagging @coderabbitai in a comment.
You can tag @coderabbitai in a PR comment and ask one-off questions about the PR and the codebase. Use quoted replies to pass the context for follow-up questions.

CodeRabbit Commands (invoked as PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
The JSON schema for the configuration file is available here.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/coderabbit-overrides.v2.json

coderabbitai

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits

Files that changed from the base of the PR and between 59f1687 and 831997c.

Files ignored due to filter (5)

copier.yaml
template/configs/feature_engineering.yaml
template/configs/inference.yaml
template/configs/training_rf.yaml
template/configs/training_sgd.yaml

Files selected for processing (30)

.github/actions/starter_template_test/action.yml (1 hunks)
.github/workflows/ci.yml (3 hunks)
.github/workflows/image-optimizer.yml (1 hunks)
.gitignore (2 hunks)
README.md (4 hunks)
requirements.txt (1 hunks)
template/README.md (1 hunks)
template/license_header (1 hunks)
template/pipelines/init.py (1 hunks)
template/pipelines/feature_engineering.py (1 hunks)
template/pipelines/inference.py (1 hunks)
template/pipelines/training.py (1 hunks)
template/quickstart.ipynb (1 hunks)
template/requirements.txt (1 hunks)
template/run.py (1 hunks)
template/steps/init.py (1 hunks)
template/steps/data_loader.py (1 hunks)
template/steps/data_preprocessor.py (1 hunks)
template/steps/data_splitter.py (1 hunks)
template/steps/inference_predict.py (1 hunks)
template/steps/inference_preprocessor.py (1 hunks)
template/steps/model_evaluator.py (1 hunks)
template/steps/model_promoter.py (1 hunks)
template/steps/model_trainer.py (1 hunks)
template/utils/init.py (1 hunks)
template/utils/preprocess.py (1 hunks)
template/{% if open_source_license %}LICENSE{% endif %} (1 hunks)
template/{{ _copier_conf.answers_file }} (1 hunks)
tests/conftest.py (1 hunks)
tests/test_starter_template.py (3 hunks)

Files skipped from review due to trivial changes (6)

.gitignore
template/license_header
template/pipelines/init.py
template/requirements.txt
template/utils/init.py
template/{{ _copier_conf.answers_file }}

Additional comments: 26

.github/actions/starter_template_test/action.yml (1)

69-73: The Concatenate requirements step has been simplified to only include sklearn in the requirements. Ensure that this aligns with the project's dependency simplification strategy and that no other integrations are needed.

.github/workflows/ci.yml (3)

5-13: Input descriptions for ref-template and ref-zenml have been added to the workflow_dispatch section, improving the clarity of the workflow file.

40-40: The fail-fast attribute has been set to false under the strategy section, allowing all jobs to run even if one fails. This change could be beneficial for identifying multiple failures in a single workflow run.

58-59: The ref-zenml and ref-template inputs now have default values, which could be useful for running the workflow with default references without specifying them each time.

.github/workflows/image-optimizer.yml (1)

1-26: A new GitHub Actions workflow named "Compress Images" has been added, which is triggered on pull requests that include image files. This is a good practice for optimizing repository assets and reducing the size of the codebase.

README.md (1)

3-15: The README.md file has been updated to reflect the repository's shift from a collection of templates to a single starter template for ZenML projects. This change should be communicated clearly to users who may be familiar with the previous structure.

requirements.txt (1)

1-5: The requirements.txt file has been updated with a new version constraint for scikit-learn and additional dependencies zenml[server]>=0.52.0 and notebook. Ensure that these changes are compatible with the project's requirements and do not introduce any version conflicts.

template/README.md (1)

1-212: A comprehensive guide for building MLOps pipelines with ZenML has been added to the template/README.md file. This guide includes an overview, instructions, and detailed explanations, which can be very helpful for new users.

template/pipelines/feature_engineering.py (1)

1-59: The feature_engineering pipeline is well-defined with clear documentation and parameterization. It's structured to load data, process it, and split it into train and test sets, which is a common pattern in MLOps pipelines.

template/pipelines/inference.py (1)

1-46: The inference pipeline is well-defined with clear documentation and parameterization. It's structured to load inference data, process it with a preprocessing pipeline, and run inference with a trained model.

template/pipelines/training.py (1)

1-58: The training pipeline is well-defined with clear documentation and parameterization. It's structured to load data from a preprocessing pipeline, train a model on it, and evaluate the model.

template/quickstart.ipynb (1)

1-1117: The quickstart.ipynb Jupyter notebook has been added to provide a hands-on introduction to MLOps using ZenML. It demonstrates the setup and execution of ML workflows, which can be very beneficial for new users to get started with ZenML.

template/run.py (1)

1-221: The template/run.py file introduces a command-line interface for running different pipelines, enhancing the usability of the project. The CLI is well-structured with clear options and help messages.

template/steps/__init__.py (1)

1-26: The template/steps/__init__.py file is well-organized, importing various step modules that define the different stages of the MLOps pipelines. This centralizes the step definitions and makes them easily accessible.

template/steps/data_loader.py (1)

1-47: The data_loader step is well-defined with clear documentation and parameterization. It's structured to load the Breast Cancer dataset and prepare it for further processing, which is a common requirement in MLOps pipelines.

template/steps/data_preprocessor.py (1)

1-74: The data_preprocessor step is well-defined with clear documentation and parameterization. It's structured to prepare the data for model training, including options to drop NA values, normalize data, and drop specific columns.

template/steps/data_splitter.py (1)

1-45: The data_splitter step is well-defined with clear documentation and parameterization. It's structured to split the dataset into train and test sets, which is a standard procedure in preparing data for machine learning models.

template/steps/inference_predict.py (1)

1-56: The inference_predict step is well-defined with clear documentation and parameterization. It's structured to take a trained model and inference dataset to produce predictions.

template/steps/inference_preprocessor.py (1)

1-49: The inference_preprocessor step is well-defined with clear documentation and parameterization. It's structured to prepare the inference dataset using a pretrained preprocessing pipeline.

template/steps/model_evaluator.py (1)

1-86: The model_evaluator step is well-defined with clear documentation and parameterization. It's structured to evaluate a trained model's performance on the train and test datasets and log the model's accuracy.

template/steps/model_promoter.py (1)

1-61: The model_promoter step is well-defined with clear documentation and parameterization. It's structured to conditionally promote a model based on its accuracy, which is a critical step in the model deployment lifecycle.

template/steps/model_trainer.py (1)

1-54: The model_trainer step is well-defined with clear documentation and parameterization. It's structured to configure and train a model on the training dataset, supporting different types of models.

template/utils/preprocess.py (1)

1-41: The template/utils/preprocess.py file adds support classes for data preprocessing, which are likely to be used in scikit-learn Pipelines. These utility classes are well-documented and provide functionality for dropping NA values, specific columns, and casting data types.

template/{% if open_source_license %}LICENSE{% endif %} (1)

1-1: The template for including a license file is a standard practice for open-source projects. Ensure that the correct license is included based on the project's licensing strategy.

tests/conftest.py (1)

31-36: The configure_stack function in tests/conftest.py has been updated to remove configurations for MLflow and Evidently components. Ensure that this change aligns with the updated testing strategy and that the necessary components are still being tested.

tests/test_starter_template.py (1)

55-104: > Note: This review was outside the patches, so it was mapped to the patch with the greatest overlap. Original lines [16-123]

The test_starter_template.py file has been updated with functions to generate and run a project with different options, including a custom product name. Ensure that these tests cover the new functionality introduced in the PR and that they are passing.

htahir1 and others added 30 commits November 29, 2023 18:13

Simplified starter

f4b9955

Template

2dd1ac4

Latest

ff02dbf

Steps

2d566dc

Works until inference pipeline

8ea0c85

Works until inference pipeline

16d0faa

Run py cleaned

5f781bf

Fixing

2f7ef97

Updated

fd1aad6

Updated

c8bf3e3

new notebook

7d3e4bc

new notebook

f3ef39b

new notebook

639ad9e

Images

cf08f35

Cleaned up and finalized for alexej

548293d

Initial changes

8557a3d

Removed some copier options

072d3ef

Compared with e2e template

4259fc8

Removed templateized ipynb

7b218b5

Many things in this commit - Formatted, darglinted, added complex pro…

f99caf2

…motion logic, added two trainins runs etc

Latest changes

d5bc5e6

README

88036a2

New readme

fdca688

LatesT

8ce022b

LatesT

dfcbe51

Woot

2af724f

took out final comment

0374dcd

Notebook cleaned

c04dfbc

Further cleanup, fix caching and artifact version fetching, move rand…

4743775

…om sate

readme update

bfd271a

AlexejPenner and others added 17 commits December 15, 2023 00:03

Fixed non-rendering visualizations

cddbed5

Updated

9fe56dd

Updated

a2ca4f8

new CTA

0238510

new CTA

4f44541

Updated

066517f

add image optimizer

2806440

Optimised images with calibre/image-actions

dd35b68

imporve ci

c91cff2

fix ci

05b7a2e

fix ci/tests

621e38c

vscode

b00197c

remove mlflow reqs

6f7b6cc

fix tests

012096e

add --no-cache

be772f9

use SafeLoader

b257658

fix img paths

831997c

coderabbitai bot reviewed Dec 19, 2023

View reviewed changes

YAML requirements fixed

652a614

avishniakov merged commit 928bc41 into main Dec 20, 2023
14 checks passed

avishniakov deleted the 2023.12.18 branch December 20, 2023 08:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update starter tempalte #12

Update starter tempalte #12

avishniakov commented Dec 18, 2023 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 19, 2023 •

edited

Loading

Chat with CodeRabbit Bot (`@coderabbitai`)

CodeRabbit Commands (invoked as PR comments)

CodeRabbit Configration File (`.coderabbit.yaml`)

coderabbitai bot left a comment

Update starter tempalte #12

Update starter tempalte #12

Conversation

avishniakov commented Dec 18, 2023 • edited by coderabbitai bot Loading

Summary by CodeRabbit

coderabbitai bot commented Dec 19, 2023 • edited Loading

Walkthrough

Changes

Chat with CodeRabbit Bot (@coderabbitai)

CodeRabbit Commands (invoked as PR comments)

CodeRabbit Configration File (.coderabbit.yaml)

coderabbitai bot left a comment

Choose a reason for hiding this comment

avishniakov commented Dec 18, 2023 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 19, 2023 •

edited

Loading

Chat with CodeRabbit Bot (`@coderabbitai`)

CodeRabbit Configration File (`.coderabbit.yaml`)