Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update starter tempalte #12

Merged
merged 48 commits into from
Dec 20, 2023
Merged

Update starter tempalte #12

merged 48 commits into from
Dec 20, 2023

Conversation

avishniakov
Copy link
Contributor

@avishniakov avishniakov commented Dec 18, 2023

Reborn of #11 due to branch rename

Summary by CodeRabbit

  • New Features

    • Added a new "Compress Images" GitHub Actions workflow to optimize images on pull requests.
    • Introduced a simplified starter template for ZenML projects with updated instructions.
    • Provided a new quickstart Jupyter notebook guide for MLOps pipelines using ZenML.
    • Implemented a command-line interface for running different pipelines in ZenML projects.
  • Improvements

    • Updated the requirements.txt to specify newer versions of dependencies and added additional required packages.
    • Enhanced the README.md to reflect changes in the repository's purpose and usage.
    • Improved GitHub Actions workflows with additional input descriptions and a fail-fast strategy option.
  • Documentation

    • Added comprehensive documentation for building MLOps pipelines with ZenML in template/README.md.
    • Included descriptions and instructions for various steps and utilities in the ZenML pipeline process.
  • Bug Fixes

    • Removed specific component configurations for MLflow and Evidently from tests/conftest.py to align with updated stack setup.
  • Refactor

    • Streamlined the GitHub Actions workflow for exporting requirements by focusing on sklearn integration.
  • Chores

    • Updated .gitignore to include additional directories and file types for better development experience.
  • Tests

    • Revised tests to accommodate new pipeline parameters and logic, including a test for custom pipeline names.

Copy link

coderabbitai bot commented Dec 19, 2023

Walkthrough

The project underwent a significant streamlining process, focusing on a simpler ZenML starter template and refining its associated GitHub workflows and actions. It removed extraneous integrations to emphasize sklearn, enhanced CI with additional inputs and fail-safe strategies, introduced image optimization, and updated the .gitignore for better development environment support. The documentation, including READMEs and notebooks, now offers clearer guidance for MLOps with ZenML, and the codebase reflects a shift towards more precise dependency management and modular pipeline construction.

Changes

File Path Change Summary
.github/actions/.../action.yml Updated zenml integration export-requirements to exclude multiple integrations, keeping only sklearn.
.github/workflows/... Enhanced workflows with input descriptions and fail-fast strategy. Added "Compress Images" workflow.
.gitignore Added entries for .idea, *.zen, and .vscode.
README.md, template/README.md, template/quickstart.ipynb Updated to reflect a shift to a starter template and provided detailed MLOps guidance with ZenML.
requirements.txt, template/requirements.txt Updated dependencies and added zenml[server]>=0.52.0 and notebook.
template/... (multiple files in template directory) Introduced a suite of new files defining a structured approach to MLOps pipelines, including data loading, preprocessing, training, and inference.
tests/... (files within tests directory) Removed MLflow and Evidently configurations; updated test functions to align with new template logic.

🐇✨
In a burrow, deep and snug,
CodeRabbit tweaked the project's rug.
Now with ZenML, clear and bright,
MLOps workflows take their flight. 🚀🌟

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on X ?


Tips

Chat with CodeRabbit Bot (@coderabbitai)

  • You can reply to a review comment made by CodeRabbit.
  • You can tag CodeRabbit on specific lines of code or files in the PR by tagging @coderabbitai in a comment.
  • You can tag @coderabbitai in a PR comment and ask one-off questions about the PR and the codebase. Use quoted replies to pass the context for follow-up questions.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • The JSON schema for the configuration file is available here.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/coderabbit-overrides.v2.json

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 59f1687 and 831997c.
Files ignored due to filter (5)
  • copier.yaml
  • template/configs/feature_engineering.yaml
  • template/configs/inference.yaml
  • template/configs/training_rf.yaml
  • template/configs/training_sgd.yaml
Files selected for processing (30)
  • .github/actions/starter_template_test/action.yml (1 hunks)
  • .github/workflows/ci.yml (3 hunks)
  • .github/workflows/image-optimizer.yml (1 hunks)
  • .gitignore (2 hunks)
  • README.md (4 hunks)
  • requirements.txt (1 hunks)
  • template/README.md (1 hunks)
  • template/license_header (1 hunks)
  • template/pipelines/init.py (1 hunks)
  • template/pipelines/feature_engineering.py (1 hunks)
  • template/pipelines/inference.py (1 hunks)
  • template/pipelines/training.py (1 hunks)
  • template/quickstart.ipynb (1 hunks)
  • template/requirements.txt (1 hunks)
  • template/run.py (1 hunks)
  • template/steps/init.py (1 hunks)
  • template/steps/data_loader.py (1 hunks)
  • template/steps/data_preprocessor.py (1 hunks)
  • template/steps/data_splitter.py (1 hunks)
  • template/steps/inference_predict.py (1 hunks)
  • template/steps/inference_preprocessor.py (1 hunks)
  • template/steps/model_evaluator.py (1 hunks)
  • template/steps/model_promoter.py (1 hunks)
  • template/steps/model_trainer.py (1 hunks)
  • template/utils/init.py (1 hunks)
  • template/utils/preprocess.py (1 hunks)
  • template/{% if open_source_license %}LICENSE{% endif %} (1 hunks)
  • template/{{ _copier_conf.answers_file }} (1 hunks)
  • tests/conftest.py (1 hunks)
  • tests/test_starter_template.py (3 hunks)
Files skipped from review due to trivial changes (6)
  • .gitignore
  • template/license_header
  • template/pipelines/init.py
  • template/requirements.txt
  • template/utils/init.py
  • template/{{ _copier_conf.answers_file }}
Additional comments: 26
.github/actions/starter_template_test/action.yml (1)
  • 69-73: The Concatenate requirements step has been simplified to only include sklearn in the requirements. Ensure that this aligns with the project's dependency simplification strategy and that no other integrations are needed.
.github/workflows/ci.yml (3)
  • 5-13: Input descriptions for ref-template and ref-zenml have been added to the workflow_dispatch section, improving the clarity of the workflow file.

  • 40-40: The fail-fast attribute has been set to false under the strategy section, allowing all jobs to run even if one fails. This change could be beneficial for identifying multiple failures in a single workflow run.

  • 58-59: The ref-zenml and ref-template inputs now have default values, which could be useful for running the workflow with default references without specifying them each time.

.github/workflows/image-optimizer.yml (1)
  • 1-26: A new GitHub Actions workflow named "Compress Images" has been added, which is triggered on pull requests that include image files. This is a good practice for optimizing repository assets and reducing the size of the codebase.
README.md (1)
  • 3-15: The README.md file has been updated to reflect the repository's shift from a collection of templates to a single starter template for ZenML projects. This change should be communicated clearly to users who may be familiar with the previous structure.
requirements.txt (1)
  • 1-5: The requirements.txt file has been updated with a new version constraint for scikit-learn and additional dependencies zenml[server]>=0.52.0 and notebook. Ensure that these changes are compatible with the project's requirements and do not introduce any version conflicts.
template/README.md (1)
  • 1-212: A comprehensive guide for building MLOps pipelines with ZenML has been added to the template/README.md file. This guide includes an overview, instructions, and detailed explanations, which can be very helpful for new users.
template/pipelines/feature_engineering.py (1)
  • 1-59: The feature_engineering pipeline is well-defined with clear documentation and parameterization. It's structured to load data, process it, and split it into train and test sets, which is a common pattern in MLOps pipelines.
template/pipelines/inference.py (1)
  • 1-46: The inference pipeline is well-defined with clear documentation and parameterization. It's structured to load inference data, process it with a preprocessing pipeline, and run inference with a trained model.
template/pipelines/training.py (1)
  • 1-58: The training pipeline is well-defined with clear documentation and parameterization. It's structured to load data from a preprocessing pipeline, train a model on it, and evaluate the model.
template/quickstart.ipynb (1)
  • 1-1117: The quickstart.ipynb Jupyter notebook has been added to provide a hands-on introduction to MLOps using ZenML. It demonstrates the setup and execution of ML workflows, which can be very beneficial for new users to get started with ZenML.
template/run.py (1)
  • 1-221: The template/run.py file introduces a command-line interface for running different pipelines, enhancing the usability of the project. The CLI is well-structured with clear options and help messages.
template/steps/__init__.py (1)
  • 1-26: The template/steps/__init__.py file is well-organized, importing various step modules that define the different stages of the MLOps pipelines. This centralizes the step definitions and makes them easily accessible.
template/steps/data_loader.py (1)
  • 1-47: The data_loader step is well-defined with clear documentation and parameterization. It's structured to load the Breast Cancer dataset and prepare it for further processing, which is a common requirement in MLOps pipelines.
template/steps/data_preprocessor.py (1)
  • 1-74: The data_preprocessor step is well-defined with clear documentation and parameterization. It's structured to prepare the data for model training, including options to drop NA values, normalize data, and drop specific columns.
template/steps/data_splitter.py (1)
  • 1-45: The data_splitter step is well-defined with clear documentation and parameterization. It's structured to split the dataset into train and test sets, which is a standard procedure in preparing data for machine learning models.
template/steps/inference_predict.py (1)
  • 1-56: The inference_predict step is well-defined with clear documentation and parameterization. It's structured to take a trained model and inference dataset to produce predictions.
template/steps/inference_preprocessor.py (1)
  • 1-49: The inference_preprocessor step is well-defined with clear documentation and parameterization. It's structured to prepare the inference dataset using a pretrained preprocessing pipeline.
template/steps/model_evaluator.py (1)
  • 1-86: The model_evaluator step is well-defined with clear documentation and parameterization. It's structured to evaluate a trained model's performance on the train and test datasets and log the model's accuracy.
template/steps/model_promoter.py (1)
  • 1-61: The model_promoter step is well-defined with clear documentation and parameterization. It's structured to conditionally promote a model based on its accuracy, which is a critical step in the model deployment lifecycle.
template/steps/model_trainer.py (1)
  • 1-54: The model_trainer step is well-defined with clear documentation and parameterization. It's structured to configure and train a model on the training dataset, supporting different types of models.
template/utils/preprocess.py (1)
  • 1-41: The template/utils/preprocess.py file adds support classes for data preprocessing, which are likely to be used in scikit-learn Pipelines. These utility classes are well-documented and provide functionality for dropping NA values, specific columns, and casting data types.
template/{% if open_source_license %}LICENSE{% endif %} (1)
  • 1-1: The template for including a license file is a standard practice for open-source projects. Ensure that the correct license is included based on the project's licensing strategy.
tests/conftest.py (1)
  • 31-36: The configure_stack function in tests/conftest.py has been updated to remove configurations for MLflow and Evidently components. Ensure that this change aligns with the updated testing strategy and that the necessary components are still being tested.
tests/test_starter_template.py (1)
  • 55-104: > Note: This review was outside the patches, so it was mapped to the patch with the greatest overlap. Original lines [16-123]

The test_starter_template.py file has been updated with functions to generate and run a project with different options, including a custom product name. Ensure that these tests cover the new functionality introduced in the PR and that they are passing.

@avishniakov avishniakov merged commit 928bc41 into main Dec 20, 2023
14 checks passed
@avishniakov avishniakov deleted the 2023.12.18 branch December 20, 2023 08:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants