Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
6633a17
Add Pulumi deploy script
scottyeager Nov 5, 2024
becfd88
Fix post_deploy, naming
scottyeager Nov 6, 2024
3d8017d
Add rebuild test
scottyeager Nov 8, 2024
260f511
Use env vars, better SSH key discovery
scottyeager Nov 8, 2024
bfe95e0
Fix typo
scottyeager Nov 8, 2024
8178063
Add SSH key to vars example
scottyeager Nov 15, 2024
6b05752
Don't proceed until VM responds to ping
scottyeager Nov 15, 2024
6583cef
Add prometheus
scottyeager Nov 15, 2024
6c20149
Support using custom zstor binary
scottyeager Nov 15, 2024
c869fa4
Use pulumi_command
scottyeager Nov 19, 2024
4940c4e
Improve deployment flow
scottyeager Nov 20, 2024
cf78f9d
Add recover script
scottyeager Nov 20, 2024
f3a7824
Fix recover script
scottyeager Nov 21, 2024
848e839
Improving test
scottyeager Nov 21, 2024
f518ef1
Update README
scottyeager Nov 21, 2024
402ab89
Change config numbering and clean unused code
scottyeager Nov 21, 2024
3572baf
Remove dead link
scottyeager Nov 21, 2024
6c0743b
Reorg test scripts
scottyeager Nov 22, 2024
2d1cddf
Update vars.example.py
scottyeager Nov 27, 2024
dfc2679
Update triggers for config upload
scottyeager Dec 13, 2024
4517cc1
Improve tests
scottyeager Dec 13, 2024
bf6de3c
Use separate deployment for vm
scottyeager Dec 16, 2024
43a5aca
Remove unused original version secrets gen
scottyeager Dec 16, 2024
ee090ff
Update test
scottyeager Dec 17, 2024
f0fc3fc
Update README with backend replacement details
scottyeager Dec 17, 2024
31dfd37
Support deploying with no frontend VM
scottyeager Dec 18, 2024
4658465
Only deploy network with vm, fix zstor config upload
scottyeager Dec 19, 2024
2783f53
Wait for hash to print on file uploads
scottyeager Dec 19, 2024
c231e5f
Don't delete stack unless `down` succeeds
scottyeager Dec 19, 2024
724e231
Add retry uploads
scottyeager Dec 21, 2024
5696a29
Dedicated script to wait for uploads
scottyeager Dec 21, 2024
5d3d07a
Add Prometheus push gateway
scottyeager Dec 21, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions pulumi/Pulumi.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
name: qsfs
runtime: python
89 changes: 89 additions & 0 deletions pulumi/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
# Deploy QSFS with Pulumi

This is a Pulumi deployment script in Python that fully automates the setup of a QSFS instance. The following steps are required to use this script:

1. Install Pulumi and Python on your system
2. Use Pip to install the Python dependencies
3. Copy and edit vars.py and zstor_config.base.toml

Only Linux and MacOS are supported. If you run Windows, I'd recommend equipping yourself with a WSL environment.

## Install Pulumi and Python

We won't cover the details here. Probably your system already has `python3`.

For Pulumi, check here: https://www.pulumi.com/docs/iac/download-install/

## Install Python dependencies

We need some Python packages to make this work. Using a venv is recommended.

```
python -m venv .venv
source .venv/bin/activate
pip install pulumi pulumi_random pulumi_command pulumi_threefold
```

## Prep config

Two config files are needed. Examples are included here. Copy the examples to the expected paths, then edit the files according to your needs.

```
cp vars.example.py vars.py
cp zstor_config.base.example.toml zstor_config.base.toml

$EDITOR vars.py
$EDITOR zstor_config.base.toml
```

## Deploy

Prior to using Pulumi, you need to login. There are some options here, which you can read about, but the simplest thing is to just use `--local`:

```
pulumi login --local
```

Now we can bring up the deployment. Create a stack when prompted with your name of choice.

```
pulumi up
```

If you want to destroy the deployment, bring it down like this:

```
pulumi down
```

## Replacing backends

If you want to replace any data or metadata backends, just edit `vars.py` and run `pulumi up` again. Note that this is a destructive operation and any backends not present in the new config will be decomissioned. Data loss is possible if too many backends are decommissioned at one time without rebuilding the data. You must have the minimal shard count available to be able to reconstruct the data.

After running `pulumi up` with the new config, the Pulumi script will automatically upload an updated Zstor config file to the VM. However, Zstor will not start using the new config automatically. You either need to restart Zstor or perform a hot reload of the config by sending the SIGUSR1 signal to Zstor:

```
pkill zstor -SIGUSR1
```

Once the new config is loaded, Zstor will automatically start writing data or metadata to the new backends to restore the desired shard count for each stored file. This can take up to ten minutes to be triggered.

You can check the progress of rebuilding using the Zstor `status` command:

```
zstor -c /etc/zstor-default.toml status
```

## Recover to new VM

If you need to replace the frontend VM for any reason, such as a node outage, follow these steps. Any data that has been uploaded to the backends can be recovered into the new VM. Any data that was not yet uploaded to the backends will be lost.

1. Update the `vars.py` file and set `VM_NODE` to the new node id
2. Destroy the old VM and deploy the new VM by running `pulumi up`
3. SSH to the new VM and run the recovery script:

```
bash /root/scripts/recover.sh
```

If all went well, your files should appear under the mount point, `/mnt/qsfs`.
Loading