Skip to content

Conversation

@wjayesh
Copy link
Contributor

@wjayesh wjayesh commented Dec 15, 2023

Describe changes

To help with failed DB migrations, this PR implements database backup and restoration features directly in the SQL ZenML store and uses them to automatically recover from DB migration failures.

Four different DB backup strategies are implemented, depending on where and how the backup is stored:

  1. DB backup is disabled
  2. in-memory: the database schema and its entire contents are stored in-memory.
  3. dump file: the database schema and its entire contents are stored in a JSON file on disk (e.g. in a persistent volume, for helm chart deployments)
  4. database: the database is cloned in the same database server using a different database

For SQLite databases, the backup/restore operations are performed differently: the SQLite database file itself is copied/restored.

The backup/restore are performed automatically for all types of deployments when DB migrations are executed. They can also be invoked manually via two newly implemented CLI commands.

Implementation Details

  • The DB connection values are derived from the Zen store configuration
  • The location where the file dump is stored:
    • A persistent volume, if a zenml.database.storageClass value was provided to the Helm chart. Different Kubernetes providers have different storage classes and we can't set a default for the chart.
      OR
    • An empty directory volume, otherwise.
  • The database backup/recovery is performed during DB migrations and only if a schema migration will actually happen (i.e. the alembic head and current revisions don't match).
  • A proprietary JSON file format is used for the DB dump instead of a traditional SQL dump for the following reasons:
    • avoids the pains of having to install the mysqldump binary version aligned with the target DB server type and version
    • not vulnerable to SQL injection attacks
    • somewhat easier to read, understand and fix manually in case of problems
  • The data being transferred between the source database and the target backup is processed in a manner that minimizes memory consumption, which means this scales well with larger databases.

Pre-requisites

Please ensure you have done the following:

  • I have read the CONTRIBUTING.md document.
  • If my change requires a change to docs, I have updated the documentation accordingly.
  • If I have added an integration, I have updated the integrations table and the corresponding website section.
  • I have added tests to cover my changes.
  • I have based my new branch on develop and the open PR is targeting develop. If your branch wasn't based on develop read Contribution guide on rebasing branch to develop.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Other (add details above)

Summary by CodeRabbit

  • New Features

    • Implemented new commands for database backup and restoration.
    • Introduced database backup strategies and related configurations.
  • Refactor

    • Enhanced database migration utilities and methods for improved reliability.
  • Style

    • Updated script for better compatibility across environments.

@github-actions github-actions bot added internal To filter out internal PRs and issues enhancement New feature or request labels Dec 15, 2023
@github-actions
Copy link
Contributor

E2E template updates in examples/e2e have been pushed.

Copy link
Contributor

@stefannica stefannica left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good so far.

@github-actions
Copy link
Contributor

NLP template updates in examples/nlp-case have been pushed.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 21, 2023

Important

Auto Review Skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository.

To trigger a single review, invoke the @coderabbitai review command.

Walkthrough

The recent updates to ZenML introduce enhancements to database management. The script format.sh has been made more portable. New functionality for backing up and restoring the ZenML database has been added, including the DatabaseBackupStrategy enum to manage different backup methods. A constant for the backup directory name has been introduced, and the migration system now includes utilities for handling database backups and recoveries. Additionally, the SqlZenStore class has been updated to support these backup and restoration features.

Changes

File Path Change Summary
scripts/format.sh Updated shebang for better portability and flexibility.
src/zenml/cli/.../base.py Added commands for database backup and restoration.
src/zenml/constants.py Added SQL_STORE_BACKUP_DIRECTORY_NAME constant.
src/zenml/enums.py Introduced DatabaseBackupStrategy enumeration.
src/zenml/zen_stores/migrations/alembic.py Updated to retrieve head revisions using a nested function.
src/zenml/zen_stores/migrations/utils.py Created MigrationUtils class for database migration, backup, and recovery.
src/zenml/zen_stores/sql_zen_store.py Added backup-related attributes and methods, updated database migration handling.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit-tests for this file.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit tests for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository from git and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit tests.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • The JSON schema for the configuration file is available here.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/coderabbit-overrides.v2.json

CodeRabbit Discord Community

Join our Discord Community to get help, request features, and share feedback.

@github-actions
Copy link
Contributor

Quickstart template updates in examples/quickstart have been pushed.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@github-actions
Copy link
Contributor

E2E template updates in examples/e2e have been pushed.

@stefannica stefannica changed the title Add DB backup and recovery to Kubernetes Job Add DB backup and recovery during DB schema migrations Jan 23, 2024
Copy link
Contributor

@schustmi schustmi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't looked at anything in-depth so don't count this as a review, just wanted to say great job @wjayesh @stefannica , I wish we would have had this a long time ago, would have saved us many headaches from failed migrations 🎉

@strickvl
Copy link
Contributor

@coderabbitai review

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 2

Configuration used: .coderabbit.yaml

Commits Files that changed from the base of the PR and between 1b44013 and 46911f5.
Files ignored due to path filters (3)
  • src/zenml/zen_server/deploy/helm/templates/server-db-job.yaml is excluded by: !**/*.yaml
  • src/zenml/zen_server/deploy/helm/templates/server-db-pvc.yaml is excluded by: !**/*.yaml
  • src/zenml/zen_server/deploy/helm/values.yaml is excluded by: !**/*.yaml
Files selected for processing (7)
  • scripts/format.sh (1 hunks)
  • src/zenml/cli/base.py (2 hunks)
  • src/zenml/constants.py (1 hunks)
  • src/zenml/enums.py (1 hunks)
  • src/zenml/zen_stores/migrations/alembic.py (1 hunks)
  • src/zenml/zen_stores/migrations/utils.py (1 hunks)
  • src/zenml/zen_stores/sql_zen_store.py (16 hunks)
Files skipped from review due to trivial changes (1)
  • scripts/format.sh
Additional comments: 11
src/zenml/zen_stores/migrations/alembic.py (1)
  • 159-178: The method head_revisions is added to retrieve the list of head revisions after migrations. The use of a nested function do_get_head_rev within run_migrations is a good approach to encapsulate the logic for getting head revisions. However, ensure that the Alembic class's run_migrations method is designed to accept a function like do_get_head_rev and that it properly handles the head_revisions list without side effects.
src/zenml/enums.py (1)
  • 351-361: The addition of the DatabaseBackupStrategy enum is consistent with the PR's objectives to introduce various database backup strategies. The enum values are clear and descriptive, which is good for maintainability.
src/zenml/constants.py (1)
  • 142-143: The new constant SQL_STORE_BACKUP_DIRECTORY_NAME is added with the value "sql_store_backup". This is a straightforward addition and follows the naming conventions of other constants in the file.
src/zenml/zen_stores/migrations/utils.py (1)
  • 43-576: The MigrationUtils class provides a comprehensive set of methods for database migration, backup, and recovery. It includes methods for backing up to a file, memory, and another database, as well as corresponding restoration methods. The class is well-structured, and methods are clearly named, which is good for readability and maintainability.

However, there are a few points to consider:

  • Ensure that the database operations are properly transactional where necessary to prevent partial updates in case of failures.
  • For methods like backup_database_to_file and restore_database_from_file, confirm that the JSON format is secure and cannot be exploited for injection attacks since it's mentioned as a reason for choosing JSON over SQL dumps.
  • The method _copy_database is private and used for backing up to another database. Verify that it handles large databases efficiently and doesn't lead to performance issues.
  • For SQLite databases, the backup and restore methods simply copy the file. Ensure that there are no concurrency issues with this approach, especially if the database could be in use during the backup or restore operations.
src/zenml/cli/base.py (1)
  • 39-39: The import statement has been updated to include DatabaseBackupStrategy. Ensure that this new enum is used appropriately within the file.
src/zenml/zen_stores/sql_zen_store.py (6)
  • 377-380: The backup_strategy, backup_directory, and backup_database attributes have been added to the SqlZenStoreConfiguration class. Ensure that all references and usages of these new attributes are correctly implemented throughout the codebase.
  • 670-687: > 📝 NOTE

This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [673-744]

The get_sqlalchemy_config method has been updated. Verify that the changes to the SQLAlchemy engine configuration are correct and do not introduce any security vulnerabilities, especially with the handling of SSL parameters.

  • 1108-1159: The backup_database method has been added to handle different backup strategies. Ensure that the backup process is secure, especially when dealing with file paths and database names. Also, verify that error handling is robust and that the backup process does not negatively impact performance.
  • 1160-1209: The restore_database method has been added. Verify that the restoration process is secure and that it correctly handles different backup strategies. Ensure that error handling is robust and that the restoration process does not introduce inconsistencies in the database state.
  • 1211-1228: The cleanup_database_backup method has been added. Verify that the cleanup process securely deletes backup files and databases, and that it does not accidentally delete non-backup data. Ensure that error handling is robust.
  • 1250-1347: > 📝 NOTE

This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [1231-1385]

The migrate_database method has been updated to include database backup and restoration logic. Verify that the migration process is secure, especially with the new backup and restoration functionality. Ensure that error handling is robust and that the migration process does not introduce inconsistencies in the database schema or data.

stefannica and others added 2 commits January 23, 2024 23:55
Co-authored-by: Michael Schuster <schustmi@users.noreply.github.com>
@stefannica stefannica self-requested a review January 23, 2024 23:00
Copy link
Contributor

@AlexejPenner AlexejPenner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did not expect this level of complexity with 3 different backup strategies. I left some questions/concerns. @stefannica

Copy link
Contributor

@AlexejPenner AlexejPenner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🦭

@stefannica stefannica requested a review from schustmi January 25, 2024 12:25
Copy link
Contributor

@stefannica stefannica left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, for what that's worth

@stefannica stefannica merged commit 1357883 into develop Jan 25, 2024
@stefannica stefannica deleted the feature/db-backup-and-restore branch January 25, 2024 16:42
kabinja pushed a commit to kabinja/zenml that referenced this pull request Jan 29, 2024
* add fns to take url and create/restore dump

* mount a volume to the job for backups

* create a pvc if storageclass provided

* add storageclass and backup storage size

* compare alembic current and head to decide on dump

* use pymysql where possible

* install mysql client in image

* log head and current

* Auto-update of E2E template

* move database dump and restore into sqlzenstore

* rename storage class to backup storage class

* make pvc a helm hook resource to be created before job

* Auto-update of NLP template

* pass backup directory as store config

* add check for current and head

* add function to check head revisions

* Auto-update of Starter template

* Auto-update of E2E template

* catch exceptions and add connection options

* Auto-update of Starter template

* implement sqldump in python

* more fixes but still doesn't work :((

* fix a host of syntax problems with SQL dumping

* add backticks to all column names

* remove mysqldump installation

* Fix secrets store initialization after merge

* move attribute inside the database dict in helm values

* Auto-update of Starter template

* Auto-update of E2E template

* Auto-update of NLP template

* Fixes and improvements

* removed old DB upgrade code from pre-alembic days
* moved DB backup/restore try-catch block closer to where the alembic
upgrade happens
* better logs in case of failures
* use OS file copy instead of sqlite utility to backup SQLite database
* avoid unnecessary DB backup and restore operations

* Friendlier error message in case of successful DB restore after failed upgrade

* Remove test upgrade failure accidentally inserted in previous commit

* Fix helm chart

* Take db name from config during restore

* Create tables at the top of the backup db script and print the first 500 lines in the log in case of errors

* Implemented working JSON and DB backup strategies

* Restore chart version

* Fix helm chart backup volume condition

* add pvc deletion policy

* add note for setting fsGroup when using PVC

* Moved DB backup/restore code to a separate file and add in-memory backup strategy

* Fix the format script to use bash instead of sh

* Fix docstrings

* Update src/zenml/zen_stores/sql_zen_store.py

Co-authored-by: Michael Schuster <schustmi@users.noreply.github.com>

* Reuse existing backup and doc updates

* Apply code review suggestions

* Add overwrite and cleanup options to backup/restore CLI commands

* Fix docstrings

* Fixed removal and reuse of DB backups after failed migration attempts and added more docs

* Remove unused attribute

---------

Co-authored-by: GitHub Actions <actions@github.com>
Co-authored-by: Stefan Nica <stefan@zenml.io>
Co-authored-by: Michael Schuster <schustmi@users.noreply.github.com>
adtygan pushed a commit to adtygan/zenml that referenced this pull request Mar 21, 2024
* add fns to take url and create/restore dump

* mount a volume to the job for backups

* create a pvc if storageclass provided

* add storageclass and backup storage size

* compare alembic current and head to decide on dump

* use pymysql where possible

* install mysql client in image

* log head and current

* Auto-update of E2E template

* move database dump and restore into sqlzenstore

* rename storage class to backup storage class

* make pvc a helm hook resource to be created before job

* Auto-update of NLP template

* pass backup directory as store config

* add check for current and head

* add function to check head revisions

* Auto-update of Starter template

* Auto-update of E2E template

* catch exceptions and add connection options

* Auto-update of Starter template

* implement sqldump in python

* more fixes but still doesn't work :((

* fix a host of syntax problems with SQL dumping

* add backticks to all column names

* remove mysqldump installation

* Fix secrets store initialization after merge

* move attribute inside the database dict in helm values

* Auto-update of Starter template

* Auto-update of E2E template

* Auto-update of NLP template

* Fixes and improvements

* removed old DB upgrade code from pre-alembic days
* moved DB backup/restore try-catch block closer to where the alembic
upgrade happens
* better logs in case of failures
* use OS file copy instead of sqlite utility to backup SQLite database
* avoid unnecessary DB backup and restore operations

* Friendlier error message in case of successful DB restore after failed upgrade

* Remove test upgrade failure accidentally inserted in previous commit

* Fix helm chart

* Take db name from config during restore

* Create tables at the top of the backup db script and print the first 500 lines in the log in case of errors

* Implemented working JSON and DB backup strategies

* Restore chart version

* Fix helm chart backup volume condition

* add pvc deletion policy

* add note for setting fsGroup when using PVC

* Moved DB backup/restore code to a separate file and add in-memory backup strategy

* Fix the format script to use bash instead of sh

* Fix docstrings

* Update src/zenml/zen_stores/sql_zen_store.py

Co-authored-by: Michael Schuster <schustmi@users.noreply.github.com>

* Reuse existing backup and doc updates

* Apply code review suggestions

* Add overwrite and cleanup options to backup/restore CLI commands

* Fix docstrings

* Fixed removal and reuse of DB backups after failed migration attempts and added more docs

* Remove unused attribute

---------

Co-authored-by: GitHub Actions <actions@github.com>
Co-authored-by: Stefan Nica <stefan@zenml.io>
Co-authored-by: Michael Schuster <schustmi@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request internal To filter out internal PRs and issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants