Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
61 commits
Select commit Hold shift + click to select a range
a72b21a
add fns to take url and create/restore dump
wjayesh Dec 14, 2023
b00e16d
mount a volume to the job for backups
wjayesh Dec 14, 2023
f7cc540
create a pvc if storageclass provided
wjayesh Dec 14, 2023
63b63ee
add storageclass and backup storage size
wjayesh Dec 14, 2023
488ad61
compare alembic current and head to decide on dump
wjayesh Dec 14, 2023
4b1f28c
use pymysql where possible
wjayesh Dec 15, 2023
fec3a73
install mysql client in image
wjayesh Dec 15, 2023
bbe8dd5
log head and current
wjayesh Dec 15, 2023
c66d9be
Auto-update of E2E template
actions-user Dec 15, 2023
8616cb6
move database dump and restore into sqlzenstore
wjayesh Dec 19, 2023
ecab602
Merge branch 'feature/db-backup-and-restore' of https://github.com/ze…
wjayesh Dec 19, 2023
96a9413
rename storage class to backup storage class
wjayesh Dec 19, 2023
e13303a
make pvc a helm hook resource to be created before job
wjayesh Dec 19, 2023
bf95b11
Auto-update of NLP template
actions-user Dec 19, 2023
20353a8
pass backup directory as store config
wjayesh Dec 19, 2023
6df0107
Merge branch 'feature/db-backup-and-restore' of https://github.com/ze…
wjayesh Dec 19, 2023
771e0c3
add check for current and head
wjayesh Dec 21, 2023
7b98c64
add function to check head revisions
wjayesh Dec 21, 2023
87635cf
Auto-update of Starter template
actions-user Dec 21, 2023
555ecba
Auto-update of E2E template
actions-user Dec 21, 2023
2a32fc8
catch exceptions and add connection options
wjayesh Dec 21, 2023
39b708e
Auto-update of Starter template
actions-user Dec 21, 2023
fee8be0
implement sqldump in python
wjayesh Jan 5, 2024
174467c
more fixes but still doesn't work :((
wjayesh Jan 5, 2024
3130028
fix a host of syntax problems with SQL dumping
wjayesh Jan 15, 2024
d100fdd
add backticks to all column names
wjayesh Jan 17, 2024
84fa59b
remove mysqldump installation
wjayesh Jan 17, 2024
b3de6b0
Merge branch 'develop' into feature/db-backup-and-restore
wjayesh Jan 17, 2024
5a94f11
Merge remote-tracking branch 'origin/develop' into feature/db-backup-…
stefannica Jan 18, 2024
cf6f2e8
Fix secrets store initialization after merge
stefannica Jan 18, 2024
5a75067
move attribute inside the database dict in helm values
wjayesh Jan 19, 2024
cadfd4b
Merge branch 'feature/db-backup-and-restore' of https://github.com/ze…
wjayesh Jan 19, 2024
8f239de
Auto-update of Starter template
actions-user Jan 19, 2024
fdd60fc
Auto-update of E2E template
actions-user Jan 19, 2024
728346e
Auto-update of NLP template
actions-user Jan 19, 2024
4307f9e
Fixes and improvements
stefannica Jan 19, 2024
e01267e
Merge branch 'feature/db-backup-and-restore' of github.com:zenml-io/z…
stefannica Jan 19, 2024
f82b0a2
Friendlier error message in case of successful DB restore after faile…
stefannica Jan 19, 2024
9d82d6c
Remove test upgrade failure accidentally inserted in previous commit
stefannica Jan 19, 2024
a70a9f8
Fix helm chart
stefannica Jan 19, 2024
11d6ffb
Take db name from config during restore
stefannica Jan 19, 2024
5a64deb
Create tables at the top of the backup db script and print the first …
stefannica Jan 19, 2024
c5e3fec
Implemented working JSON and DB backup strategies
stefannica Jan 22, 2024
4a91e70
Merge remote-tracking branch 'origin/develop' into feature/db-backup-…
stefannica Jan 22, 2024
2b75288
Restore chart version
stefannica Jan 22, 2024
b63fb76
Fix helm chart backup volume condition
stefannica Jan 22, 2024
e862e63
add pvc deletion policy
wjayesh Jan 23, 2024
2643924
Merge branch 'feature/db-backup-and-restore' of https://github.com/ze…
wjayesh Jan 23, 2024
1ffebbc
add note for setting fsGroup when using PVC
wjayesh Jan 23, 2024
bc3b23a
Moved DB backup/restore code to a separate file and add in-memory bac…
stefannica Jan 23, 2024
66bbc0b
Merge branch 'feature/db-backup-and-restore' of github.com:zenml-io/z…
stefannica Jan 23, 2024
46911f5
Fix the format script to use bash instead of sh
stefannica Jan 23, 2024
71fab80
Fix docstrings
stefannica Jan 23, 2024
7d95f98
Update src/zenml/zen_stores/sql_zen_store.py
stefannica Jan 23, 2024
7cbbf30
Reuse existing backup and doc updates
stefannica Jan 24, 2024
e038387
Apply code review suggestions
stefannica Jan 24, 2024
943b5f9
Add overwrite and cleanup options to backup/restore CLI commands
stefannica Jan 24, 2024
59984ef
Merge branch 'feature/db-backup-and-restore' of github.com:zenml-io/z…
stefannica Jan 24, 2024
1b09a1f
Fix docstrings
stefannica Jan 24, 2024
8762987
Fixed removal and reuse of DB backups after failed migration attempts…
stefannica Jan 25, 2024
f199617
Remove unused attribute
stefannica Jan 25, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ The following environment variables can be passed to the container:
* **ZENML\_DEFAULT\_PROJECT\_NAME**: The name of the default project created by the server on the first deployment, during database initialization. Defaults to `default`.
* **ZENML\_DEFAULT\_USER\_NAME**: The name of the default admin user account created by the server on the first deployment, during database initialization. Defaults to `default`.
* **ZENML\_DEFAULT\_USER\_PASSWORD**: The password to use for the default admin user account. Defaults to an empty password value, if not set.
* **ZENML\_STORE\_URL**: This URL should point to an SQLite database file _mounted in the container_, or to a MySQL-compatible database service _reachable from the container_. It takes one of these forms:
* **ZENML\_STORE\_URL**: This URL should point to an SQLite database file _mounted in the container_, or to a MySQL-compatible database service _reachable from the container_. It takes one of these forms:

```
sqlite:////path/to/zenml.db
Expand All @@ -43,6 +43,7 @@ The following environment variables can be passed to the container:
* **ZENML\_STORE\_SSL\_KEY**: This can be set to a client SSL private key required to connect to the MySQL database service. Only valid when `ZENML_STORE_URL` points to a MySQL database that uses SSL-secured connections and requires client SSL certificates. The variable can be set either to the path where the certificate file is mounted inside the container or to the certificate contents themselves. This variable also requires `ZENML_STORE_SSL_CERT` to be set.
* **ZENML\_STORE\_SSL\_VERIFY\_SERVER\_CERT**: This boolean variable controls whether the SSL certificate in use by the MySQL server is verified. Only valid when `ZENML_STORE_URL` points to a MySQL database that uses SSL-secured connections. Defaults to `False`.
* **ZENML\_LOGGING\_VERBOSITY**: Use this variable to control the verbosity of logs inside the container. It can be set to one of the following values: `NOTSET`, `ERROR`, `WARN`, `INFO` (default), `DEBUG` or `CRITICAL`.
* **ZENML\_STORE\_BACKUP\_STRATEGY**: This variable controls the database backup strategy used by the ZenML server. See the [Database backup and recovery](#database-backup-and-recovery) section for more details about this feature and other related environment variables. Defaults to `in-memory`.

If none of the `ZENML_STORE_*` variables are set, the container will default to creating and using an SQLite database file stored at `/zenml/.zenconfig/local_stores/default_zen_store/zenml.db` inside the container. The `/zenml/.zenconfig/local_stores` base path where the default SQLite database is located can optionally be overridden by setting the `ZENML_LOCAL_STORES_PATH` environment variable to point to a different path (e.g. a persistent volume or directory that is mounted from the host).

Expand Down Expand Up @@ -425,6 +426,47 @@ Tearing down the installation is as simple as running:
docker-compose -p zenml down
```


## Database backup and recovery

An automated database backup and recovery feature is enabled by default for all Docker deployments. The ZenML server will automatically back up the database in-memory before every database schema migration and restore it if the migration fails.

{% hint style="info" %}
The database backup automatically created by the ZenML server is only temporary and only used as an immediate recovery in case of database migration failures. It is not meant to be used as a long-term backup solution. If you need to back up your database for long-term storage, you should use a dedicated backup solution.
{% endhint %}

Several database backup strategies are supported, depending on where and how the backup is stored. The strategy can be configured by means of the `ZENML_STORE_BACKUP_STRATEGY` environment variable:

* `disabled` - no backup is performed
* `in-memory` - the database schema and data are stored in memory. This is the fastest backup strategy, but the backup is not persisted across container restarts, so no manual intervention is possible in case the automatic DB recovery fails after a failed DB migration. Adequate memory resources should be allocated to the ZenML server container when using this backup strategy with larger databases. This is the default backup strategy.
* `database` - the database is copied to a backup database in the same database server. This requires the `ZENML_STORE_BACKUP_DATABASE` environment variable to be set to the name of the backup database. This backup strategy is only supported for MySQL compatible databases and the user specified in the database URL must have permissions to manage (create, drop, and modify) the backup database in addition to the main database.
* `dump-file` - the database schema and data are dumped to a filesystem location inside the ZenML server container. This location can be customized by means of the `ZENML_STORE_BACKUP_DIRECTORY` environment variable. When this strategy is configured, users should mount a host directory in the container and point the `ZENML_STORE_BACKUP_DIRECTORY` variable to where it's mounted inside the container. If a host directory is not mounted, the dump file will be stored in the container's filesystem and will be lost when the container is removed.

The following additional rules are applied concerning the creation and lifetime of the backup:

* a backup is not attempted if the database doesn't need to undergo a migration (e.g. when the ZenML server is upgraded to a new version that doesn't require a database schema change or if the ZenML version doesn't change at all).
* a backup file or database is created before every database migration attempt (i.e. when the container starts). If a backup already exists (i.e. persisted in a mounted host directory or backup database), it is overwritten.
* the persistent backup file or database is cleaned up after the migration is completed successfully or if the database doesn't need to undergo a migration. This includes backups created by previous failed migration attempts.
* the persistent backup file or database is NOT cleaned up after a failed migration. This allows the user to manually inspect and/or apply the backup if the automatic recovery fails.

The following example shows how to deploy the ZenML server to use a mounted host directory to persist the database backup file during a database migration:

```shell
mkdir mysql-data

docker run --name mysql -d -p 3306:3306 -e MYSQL_ROOT_PASSWORD=password \
--mount type=bind,source=$PWD/mysql-data,target=/var/lib/mysql \
mysql:8.0

docker run -it -d -p 8080:8080 --name zenml \
--add-host host.docker.internal:host-gateway \
--mount type=bind,source=$PWD/mysql-data,target=/db-dump \
--env ZENML_STORE_URL=mysql://root:password@host.docker.internal/zenml \
--env ZENML_STORE_BACKUP_STRATEGY=dump-file \
--env ZENML_STORE_BACKUP_DIRECTORY=/db-dump \
zenmldocker/zenml-server
```

## Troubleshooting

You can check the logs of the container to verify if the server is up and, depending on where you have deployed it, you can also access the dashboard at a `localhost` port (if running locally) or through some other service that exposes your container to the internet.
Expand Down
42 changes: 42 additions & 0 deletions docs/book/deploying-zenml/zenml-self-hosted/deploy-with-helm.md
Original file line number Diff line number Diff line change
Expand Up @@ -669,5 +669,47 @@ To configure a backup secrets store in the Helm chart, use the same approach and
aws_secret_access_key: <your AWS secret access key>
```

### Database backup and recovery

An automated database backup and recovery feature is enabled by default for all Helm deployments. The ZenML server will automatically back up the database before every upgrade and restore it if the upgrade fails in a way that affects the database.

{% hint style="info" %}
The database backup automatically created by the ZenML server is only temporary and only used as an immediate recovery in case of database migration failures. It is not meant to be used as a long-term backup solution. If you need to back up your database for long-term storage, you should use a dedicated backup solution.
{% endhint %}

Several database backup strategies are supported, depending on where and how the backup is stored. The strategy can be configured by means of the `zenml.database.backupStrategy` Helm value:

* `disabled` - no backup is performed
* `in-memory` - the database schema and data are stored in memory. This is the fastest backup strategy, but the backup is not persisted across pod restarts, so no manual intervention is possible in case the automatic DB recovery fails after a failed DB migration. Adequate memory resources should be allocated to the ZenML server pod when using this backup strategy with larger databases. This is the default backup strategy.
* `database` - the database is copied to a backup database in the same database server. This requires the `backupDatabase` option to be set to the name of the backup database. This backup strategy is only supported for MySQL compatible databases and the user specified in the database URL must have permissions to manage (create, drop, and modify) the backup database in addition to the main database.
* `dump-file` - the database schema and data are dumped to a file local to the database initialization and upgrade job. Users may optionally configure a persistent volume where the dump file will be stored by setting the `backupPVStorageSize` and optionally the `backupPVStorageClass` options. If a persistent volume is not configured, the dump file will be stored in an emptyDir volume, which is not persisted. If configured, the user is responsible for deleting the resulting PVC when uninstalling the Helm release.

> **NOTE:** You should also set the `podSecurityContext.fsGroup` option if you are using a persistent volume to store the dump file.

The following additional rules are applied concerning the creation and lifetime of the backup:

* a backup is not attempted if the database doesn't need to undergo a migration (e.g. when the ZenML server is upgraded to a new version that doesn't require a database schema change or if the ZenML version doesn't change at all).
* a backup file or database is created before every database migration attempt (i.e. during every Helm upgrade). If a backup already exists (i.e. persisted in a persistent volume or backup database), it is overwritten.
* the persistent backup file or database is cleaned up after the migration is completed successfully or if the database doesn't need to undergo a migration. This includes backups created by previous failed migration attempts.
* the persistent backup file or database is NOT cleaned up after a failed migration. This allows the user to manually inspect and/or apply the backup if the automatic recovery fails.

The following example shows how to configure the ZenML server to use a persistent volume to store the database dump file:

```yaml
zenml:

# ...

database:
url: "mysql://admin:password@my.database.org:3306/zenml"

# Configure the database backup strategy
backupStrategy: dump-file
backupPVStorageSize: 1Gi

podSecurityContext:
fsGroup: 1000 # if you're using a PVC for backup, this should necessarily be set.
```

<!-- For scarf -->
<figure><img alt="ZenML Scarf" referrerpolicy="no-referrer-when-downgrade" src="https://static.scarf.sh/a.png?x-pxid=f0b4f458-0a54-4fcd-aa95-d5ee424815bc" /></figure>
2 changes: 1 addition & 1 deletion scripts/format.sh
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
#!/bin/sh -e
#!/usr/bin/env bash
set -x

# Initialize default source directories
Expand Down
134 changes: 133 additions & 1 deletion src/zenml/cli/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@
ENV_ZENML_ENABLE_REPO_INIT_WARNINGS,
REPOSITORY_DIRECTORY_NAME,
)
from zenml.enums import AnalyticsEventSource, StoreType
from zenml.enums import AnalyticsEventSource, DatabaseBackupStrategy, StoreType
from zenml.environment import Environment, get_environment
from zenml.exceptions import GitNotFoundError, InitializationException
from zenml.integrations.registry import integration_registry
Expand Down Expand Up @@ -659,3 +659,135 @@ def migrate_database(skip_default_registrations: bool = False) -> None:
cli_utils.warning(
"Unable to migrate database while connected to a ZenML server."
)


@cli.command("backup-database", help="Create a database backup.", hidden=True)
@click.option(
"--strategy",
"-s",
help="Custom backup strategy to use. Defaults to whatever is configured "
"in the store config.",
type=click.Choice(choices=DatabaseBackupStrategy.values()),
required=False,
default=None,
)
@click.option(
"--location",
default=None,
help="Custom location to store the backup. Defaults to whatever is "
"configured in the store config. Depending on the strategy, this can be "
"a local path or a database name.",
type=str,
)
@click.option(
"--overwrite",
"-o",
is_flag=True,
default=False,
help="Overwrite the existing backup.",
type=bool,
)
def backup_database(
strategy: Optional[str] = None,
location: Optional[str] = None,
overwrite: bool = False,
) -> None:
"""Backup the ZenML database.

Args:
strategy: Custom backup strategy to use. Defaults to whatever is
configured in the store config.
location: Custom location to store the backup. Defaults to whatever is
configured in the store config. Depending on the strategy, this can
be a local path or a database name.
overwrite: Whether to overwrite the existing backup.
"""
from zenml.zen_stores.base_zen_store import BaseZenStore
from zenml.zen_stores.sql_zen_store import SqlZenStore

store_config = (
GlobalConfiguration().store
or GlobalConfiguration().get_default_store()
)
if store_config.type == StoreType.SQL:
store = BaseZenStore.create_store(
store_config, skip_default_registrations=True, skip_migrations=True
)
assert isinstance(store, SqlZenStore)
msg, location = store.backup_database(
strategy=DatabaseBackupStrategy(strategy) if strategy else None,
location=location,
overwrite=overwrite,
)
cli_utils.declare(f"Database was backed up to {msg}.")
else:
cli_utils.warning(
"Cannot backup database while connected to a ZenML server."
)


@cli.command(
"restore-database", help="Restore the database from a backup.", hidden=True
)
@click.option(
"--strategy",
"-s",
help="Custom backup strategy to use. Defaults to whatever is configured "
"in the store config.",
type=click.Choice(choices=DatabaseBackupStrategy.values()),
required=False,
default=None,
)
@click.option(
"--location",
default=None,
help="Custom location where the backup is stored. Defaults to whatever is "
"configured in the store config. Depending on the strategy, this can be "
"a local path or a database name.",
type=str,
)
@click.option(
"--cleanup",
"-c",
is_flag=True,
default=False,
help="Cleanup the backup after restoring.",
type=bool,
)
def restore_database(
strategy: Optional[str] = None,
location: Optional[str] = None,
cleanup: bool = False,
) -> None:
"""Restore the ZenML database.

Args:
strategy: Custom backup strategy to use. Defaults to whatever is
configured in the store config.
location: Custom location where the backup is stored. Defaults to
whatever is configured in the store config. Depending on the
strategy, this can be a local path or a database name.
cleanup: Whether to cleanup the backup after restoring.
"""
from zenml.zen_stores.base_zen_store import BaseZenStore
from zenml.zen_stores.sql_zen_store import SqlZenStore

store_config = (
GlobalConfiguration().store
or GlobalConfiguration().get_default_store()
)
if store_config.type == StoreType.SQL:
store = BaseZenStore.create_store(
store_config, skip_default_registrations=True, skip_migrations=True
)
assert isinstance(store, SqlZenStore)
store.restore_database(
strategy=DatabaseBackupStrategy(strategy) if strategy else None,
location=location,
cleanup=cleanup,
)
cli_utils.declare("Database restore finished.")
else:
cli_utils.warning(
"Cannot restore database while connected to a ZenML server."
)
3 changes: 3 additions & 0 deletions src/zenml/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -139,6 +139,9 @@ def handle_int_env_var(var: str, default: int = 0) -> int:
# Default store directory subpath:
DEFAULT_STORE_DIRECTORY_NAME = "default_zen_store"

# SQL Store backup directory subpath:
SQL_STORE_BACKUP_DIRECTORY_NAME = "database_backup"

DEFAULT_USERNAME = "default"
DEFAULT_PASSWORD = ""
DEFAULT_WORKSPACE_NAME = "default"
Expand Down
13 changes: 13 additions & 0 deletions src/zenml/enums.py
Original file line number Diff line number Diff line change
Expand Up @@ -346,3 +346,16 @@ class MetadataResourceTypes(StrEnum):
STEP_RUN = "step_run"
ARTIFACT_VERSION = "artifact_version"
MODEL_VERSION = "model_version"


class DatabaseBackupStrategy(StrEnum):
"""All available database backup strategies."""

# Backup disabled
DISABLED = "disabled"
# In-memory backup
IN_MEMORY = "in-memory"
# Dump the database to a file
DUMP_FILE = "dump-file"
# Create a backup of the database in the remote database service
DATABASE = "database"
32 changes: 31 additions & 1 deletion src/zenml/zen_server/deploy/helm/templates/server-db-job.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ metadata:
"helm.sh/hook-weight": "-1"
"helm.sh/hook-delete-policy": before-hook-creation{{ if not .Values.zenml.debug }},hook-succeeded{{ end }}
spec:
backoffLimit: 2
backoffLimit: 0
template:
metadata:
annotations:
Expand All @@ -32,6 +32,20 @@ spec:
{{- end }}
securityContext:
{{- toYaml .Values.podSecurityContext | nindent 8 }}

{{- if eq .Values.zenml.database.backupStrategy "dump-file" }}
volumes:
# define a volume that will hold a backup of the database
- name: db-backup
# if a storage PVC is configured, then use it
{{- if .Values.zenml.database.backupPVStorageSize }}
persistentVolumeClaim:
claimName: {{ include "zenml.fullname" . }}-db-backup
{{- else }}
# otherwise, use an emptyDir
emptyDir: {}
{{- end }}
{{- end }}
restartPolicy: Never
containers:
- name: {{ .Chart.Name }}-db-migration
Expand All @@ -41,6 +55,11 @@ spec:
imagePullPolicy: {{ .Values.zenml.image.pullPolicy }}
args: ["migrate-database"]
command: ['zenml']
{{- if eq .Values.zenml.database.backupStrategy "dump-file" }}
volumeMounts:
- name: db-backup
mountPath: /backups
{{- end }}
env:
{{- if .Values.zenml.debug }}
- name: ZENML_LOGGING_VERBOSITY
Expand All @@ -56,6 +75,17 @@ spec:
value: sql
- name: ZENML_STORE_SSL_VERIFY_SERVER_CERT
value: {{ .Values.zenml.database.sslVerifyServerCert | default "false" | quote }}
{{- if .Values.zenml.database.backupStrategy }}
- name: ZENML_STORE_BACKUP_STRATEGY
value: {{ .Values.zenml.database.backupStrategy | quote }}
{{- if eq .Values.zenml.database.backupStrategy "database" }}
- name: ZENML_STORE_BACKUP_DATABASE
value: {{ .Values.zenml.database.backupDatabase | quote }}
{{- else if eq .Values.zenml.database.backupStrategy "dump-file" }}
- name: ZENML_STORE_BACKUP_DIRECTORY
value: /backups
{{- end }}
{{- end }}
{{- range $k, $v := include "zenml.serverEnvVariables" . | fromYaml }}
- name: {{ $k }}
value: {{ $v | quote }}
Expand Down
Loading