-
Notifications
You must be signed in to change notification settings - Fork 557
Add DB backup and recovery during DB schema migrations #2158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
E2E template updates in |
stefannica
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good so far.
…nml-io/zenml into feature/db-backup-and-restore
|
NLP template updates in |
…nml-io/zenml into feature/db-backup-and-restore
|
Important Auto Review SkippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the To trigger a single review, invoke the WalkthroughThe recent updates to ZenML introduce enhancements to database management. The script Changes
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
|
Quickstart template updates in |
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
|
E2E template updates in |
…nml-io/zenml into feature/db-backup-and-restore
…enml into feature/db-backup-and-restore
schustmi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't looked at anything in-depth so don't count this as a review, just wanted to say great job @wjayesh @stefannica , I wish we would have had this a long time ago, would have saved us many headaches from failed migrations 🎉
|
@coderabbitai review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 2
Configuration used: .coderabbit.yaml
Files ignored due to path filters (3)
src/zenml/zen_server/deploy/helm/templates/server-db-job.yamlis excluded by:!**/*.yamlsrc/zenml/zen_server/deploy/helm/templates/server-db-pvc.yamlis excluded by:!**/*.yamlsrc/zenml/zen_server/deploy/helm/values.yamlis excluded by:!**/*.yaml
Files selected for processing (7)
- scripts/format.sh (1 hunks)
- src/zenml/cli/base.py (2 hunks)
- src/zenml/constants.py (1 hunks)
- src/zenml/enums.py (1 hunks)
- src/zenml/zen_stores/migrations/alembic.py (1 hunks)
- src/zenml/zen_stores/migrations/utils.py (1 hunks)
- src/zenml/zen_stores/sql_zen_store.py (16 hunks)
Files skipped from review due to trivial changes (1)
- scripts/format.sh
Additional comments: 11
src/zenml/zen_stores/migrations/alembic.py (1)
- 159-178: The method
head_revisionsis added to retrieve the list of head revisions after migrations. The use of a nested functiondo_get_head_revwithinrun_migrationsis a good approach to encapsulate the logic for getting head revisions. However, ensure that theAlembicclass'srun_migrationsmethod is designed to accept a function likedo_get_head_revand that it properly handles thehead_revisionslist without side effects.src/zenml/enums.py (1)
- 351-361: The addition of the
DatabaseBackupStrategyenum is consistent with the PR's objectives to introduce various database backup strategies. The enum values are clear and descriptive, which is good for maintainability.src/zenml/constants.py (1)
- 142-143: The new constant
SQL_STORE_BACKUP_DIRECTORY_NAMEis added with the value "sql_store_backup". This is a straightforward addition and follows the naming conventions of other constants in the file.src/zenml/zen_stores/migrations/utils.py (1)
- 43-576: The
MigrationUtilsclass provides a comprehensive set of methods for database migration, backup, and recovery. It includes methods for backing up to a file, memory, and another database, as well as corresponding restoration methods. The class is well-structured, and methods are clearly named, which is good for readability and maintainability.However, there are a few points to consider:
- Ensure that the database operations are properly transactional where necessary to prevent partial updates in case of failures.
- For methods like
backup_database_to_fileandrestore_database_from_file, confirm that the JSON format is secure and cannot be exploited for injection attacks since it's mentioned as a reason for choosing JSON over SQL dumps.- The method
_copy_databaseis private and used for backing up to another database. Verify that it handles large databases efficiently and doesn't lead to performance issues.- For SQLite databases, the backup and restore methods simply copy the file. Ensure that there are no concurrency issues with this approach, especially if the database could be in use during the backup or restore operations.
src/zenml/cli/base.py (1)
- 39-39: The import statement has been updated to include
DatabaseBackupStrategy. Ensure that this new enum is used appropriately within the file.src/zenml/zen_stores/sql_zen_store.py (6)
- 377-380: The
backup_strategy,backup_directory, andbackup_databaseattributes have been added to theSqlZenStoreConfigurationclass. Ensure that all references and usages of these new attributes are correctly implemented throughout the codebase.- 670-687: > 📝 NOTE
This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [673-744]
The
get_sqlalchemy_configmethod has been updated. Verify that the changes to the SQLAlchemy engine configuration are correct and do not introduce any security vulnerabilities, especially with the handling of SSL parameters.
- 1108-1159: The
backup_databasemethod has been added to handle different backup strategies. Ensure that the backup process is secure, especially when dealing with file paths and database names. Also, verify that error handling is robust and that the backup process does not negatively impact performance.- 1160-1209: The
restore_databasemethod has been added. Verify that the restoration process is secure and that it correctly handles different backup strategies. Ensure that error handling is robust and that the restoration process does not introduce inconsistencies in the database state.- 1211-1228: The
cleanup_database_backupmethod has been added. Verify that the cleanup process securely deletes backup files and databases, and that it does not accidentally delete non-backup data. Ensure that error handling is robust.- 1250-1347: > 📝 NOTE
This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [1231-1385]
The
migrate_databasemethod has been updated to include database backup and restoration logic. Verify that the migration process is secure, especially with the new backup and restoration functionality. Ensure that error handling is robust and that the migration process does not introduce inconsistencies in the database schema or data.
Co-authored-by: Michael Schuster <schustmi@users.noreply.github.com>
AlexejPenner
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did not expect this level of complexity with 3 different backup strategies. I left some questions/concerns. @stefannica
AlexejPenner
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🦭
… and added more docs
stefannica
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, for what that's worth
* add fns to take url and create/restore dump * mount a volume to the job for backups * create a pvc if storageclass provided * add storageclass and backup storage size * compare alembic current and head to decide on dump * use pymysql where possible * install mysql client in image * log head and current * Auto-update of E2E template * move database dump and restore into sqlzenstore * rename storage class to backup storage class * make pvc a helm hook resource to be created before job * Auto-update of NLP template * pass backup directory as store config * add check for current and head * add function to check head revisions * Auto-update of Starter template * Auto-update of E2E template * catch exceptions and add connection options * Auto-update of Starter template * implement sqldump in python * more fixes but still doesn't work :(( * fix a host of syntax problems with SQL dumping * add backticks to all column names * remove mysqldump installation * Fix secrets store initialization after merge * move attribute inside the database dict in helm values * Auto-update of Starter template * Auto-update of E2E template * Auto-update of NLP template * Fixes and improvements * removed old DB upgrade code from pre-alembic days * moved DB backup/restore try-catch block closer to where the alembic upgrade happens * better logs in case of failures * use OS file copy instead of sqlite utility to backup SQLite database * avoid unnecessary DB backup and restore operations * Friendlier error message in case of successful DB restore after failed upgrade * Remove test upgrade failure accidentally inserted in previous commit * Fix helm chart * Take db name from config during restore * Create tables at the top of the backup db script and print the first 500 lines in the log in case of errors * Implemented working JSON and DB backup strategies * Restore chart version * Fix helm chart backup volume condition * add pvc deletion policy * add note for setting fsGroup when using PVC * Moved DB backup/restore code to a separate file and add in-memory backup strategy * Fix the format script to use bash instead of sh * Fix docstrings * Update src/zenml/zen_stores/sql_zen_store.py Co-authored-by: Michael Schuster <schustmi@users.noreply.github.com> * Reuse existing backup and doc updates * Apply code review suggestions * Add overwrite and cleanup options to backup/restore CLI commands * Fix docstrings * Fixed removal and reuse of DB backups after failed migration attempts and added more docs * Remove unused attribute --------- Co-authored-by: GitHub Actions <actions@github.com> Co-authored-by: Stefan Nica <stefan@zenml.io> Co-authored-by: Michael Schuster <schustmi@users.noreply.github.com>
* add fns to take url and create/restore dump * mount a volume to the job for backups * create a pvc if storageclass provided * add storageclass and backup storage size * compare alembic current and head to decide on dump * use pymysql where possible * install mysql client in image * log head and current * Auto-update of E2E template * move database dump and restore into sqlzenstore * rename storage class to backup storage class * make pvc a helm hook resource to be created before job * Auto-update of NLP template * pass backup directory as store config * add check for current and head * add function to check head revisions * Auto-update of Starter template * Auto-update of E2E template * catch exceptions and add connection options * Auto-update of Starter template * implement sqldump in python * more fixes but still doesn't work :(( * fix a host of syntax problems with SQL dumping * add backticks to all column names * remove mysqldump installation * Fix secrets store initialization after merge * move attribute inside the database dict in helm values * Auto-update of Starter template * Auto-update of E2E template * Auto-update of NLP template * Fixes and improvements * removed old DB upgrade code from pre-alembic days * moved DB backup/restore try-catch block closer to where the alembic upgrade happens * better logs in case of failures * use OS file copy instead of sqlite utility to backup SQLite database * avoid unnecessary DB backup and restore operations * Friendlier error message in case of successful DB restore after failed upgrade * Remove test upgrade failure accidentally inserted in previous commit * Fix helm chart * Take db name from config during restore * Create tables at the top of the backup db script and print the first 500 lines in the log in case of errors * Implemented working JSON and DB backup strategies * Restore chart version * Fix helm chart backup volume condition * add pvc deletion policy * add note for setting fsGroup when using PVC * Moved DB backup/restore code to a separate file and add in-memory backup strategy * Fix the format script to use bash instead of sh * Fix docstrings * Update src/zenml/zen_stores/sql_zen_store.py Co-authored-by: Michael Schuster <schustmi@users.noreply.github.com> * Reuse existing backup and doc updates * Apply code review suggestions * Add overwrite and cleanup options to backup/restore CLI commands * Fix docstrings * Fixed removal and reuse of DB backups after failed migration attempts and added more docs * Remove unused attribute --------- Co-authored-by: GitHub Actions <actions@github.com> Co-authored-by: Stefan Nica <stefan@zenml.io> Co-authored-by: Michael Schuster <schustmi@users.noreply.github.com>
Describe changes
To help with failed DB migrations, this PR implements database backup and restoration features directly in the SQL ZenML store and uses them to automatically recover from DB migration failures.
Four different DB backup strategies are implemented, depending on where and how the backup is stored:
For SQLite databases, the backup/restore operations are performed differently: the SQLite database file itself is copied/restored.
The backup/restore are performed automatically for all types of deployments when DB migrations are executed. They can also be invoked manually via two newly implemented CLI commands.
Implementation Details
zenml.database.storageClassvalue was provided to the Helm chart. Different Kubernetes providers have different storage classes and we can't set a default for the chart.OR
mysqldumpbinary version aligned with the target DB server type and versionPre-requisites
Please ensure you have done the following:
developand the open PR is targetingdevelop. If your branch wasn't based on develop read Contribution guide on rebasing branch to develop.Types of changes
Summary by CodeRabbit
New Features
Refactor
Style