This repo is a fork of the mlos-autotuning-template repo.
It is meant as a basic class demo and example of tuning a local sqlite
instance running via benchbase
and analyzing the results using MLOS, a framework to help benchmark and automate systems tuning.
Systems tuning is a difficult and time consuming task, yet more and more necessary as the complexity of cloud systems and all of their varied choices, different deployment environments, and myriad of workloads grows. With it, we can reduce cost, improve performance, lower carbon footprint, improve reliability, etc., by fitting the system to the workload and its execution environment, tuning everything from VM size, to OS parameters, to application configurations.
To mitigate risk to production systems, we often employ offline exploration of this large parameter space to find a better config. Benchmarks are needed to exercise the system, many parameter combinations should be tried, and data collected and compared. This process has traditionally been done manually, which is both time consuming and error prone. Noise in the cloud further makes it less reproducible.
The goal of autotuning systems like MLOS are to use automation and data driven techniques to help reduce the burden of this task and make it more approachable for many different systems and workloads in order to help bring the vision of an autonomous instance optimized cloud closer to reality.
There are several items in this example:
-
Some configs and example commands to use
mlos_bench
to autotune asqlite
workload (see below).These can be run in the background while you explore the data in some of the other notebooks.
-
This is your workbook for this demo. Use it to analyze the data from running
mlos_bench
to find a better SQLite configuration and help understand what the optimizer found about the performance of that config.Initially, there won't be much data here to work with, until the commands from the loop in the previous step have run for a short while.
-
mlos_demo_sqlite_teachers.ipynb
Here we analyze the data from running 100 trials of
mlos_bench
for SQLite optimization, as detailed in the instructions below. The results you obtain during this workshop should look similar to what we have in this notebook. -
This notebook explores some existing data that we've collected with the
mlos_bench
tool while optimizing a MySQL Server on Azure.It is meant to familiarize yourself with the data access and visualization APIs while the commands from the first step gather new data for the
sqlite
demo in the background.
You can see a brief example demo of this tool in the following video:
For this demo, we will be using Github's Codespaces feature to provide a pre-configured environment for you to use.
- Just a Github Account :-)
-
For a more pleasant experience, we recommend connecting to the remote codespace using a local instance of VSCode, but it's not required. You can also just use the web interface.
It is also possible to use a local checkout of the code using
git
,docker
, and a devcontainer, but we omit these instructions for now.
-
Create a github account if you do not already have one.
-
Open the project in your browser.
Navigate to the green <> Code drop down at the top of page and select the green Create codespace on main button.
-
Reopen the workspace (if prompted).
Note: you can trigger the prompt by browsing to the
mlos-autotuning.code-workspace
file and following the prompt in the lower right to reopen. -
Run the following code in the terminal at the bottom of the page, confirm you get an output back with help text.
conda activate mlos mlos_bench --help
You should see some help output that looks like the following:
usage: mlos_bench [-h] [--config CONFIG] [--log_file LOG_FILE] [--log_level LOG_LEVEL] [--config_path CONFIG_PATH [CONFIG_PATH ...]] [--environment ENVIRONMENT] [--optimizer OPTIMIZER] [--storage STORAGE] [--random_init] [--random_seed RANDOM_SEED] [--tunable_values TUNABLE_VALUES [TUNABLE_VALUES ...]] [--globals GLOBALS [GLOBALS ...]] [--no_teardown] mlos_bench : Systems autotuning and benchmarking tool options: -h, --help show this help message and exit ...
-
That's it! If you run into any issues, please reach out to the teaching team and we can assist prior to class starting.
These instructions use the Github Codespaces approach described above.
-
Open the codespace previously created above by browsing to the green
<> Code
button on the project repo site as before.- Use the "Open in VSCode Desktop" option from the triple bar menu on the left hand side to re-open the codespace in a local VSCode instance.
Note this step is optional, but recommended for a better experience. You can alternatively stay in the browser interface for the entire demo.
-
Make sure the local repo is up to date.
To be executed in the integrated terminal at the bottom of the VSCode window:
# Pull the latest sqlite-autotuning demo code. git pull
-
Make sure the MLOS dependencies are up to date.
To be executed in the integrated terminal at the bottom of the VSCode window:
# Pull the latest MLOS code. git -C MLOS pull
-
Make sure the
mlos_bench.sqlite
data is available.To be executed in the integrated terminal at the bottom of the VSCode window:
# Download the previously generated results database. test -f mlos_bench.sqlite || wget -Nc https://adumlosdemostorage.blob.core.windows.net/adu-mlos-db-example/adu_notebook_db/mlos_bench.sqlite
-
Activate the conda environment in the integrated terminal (lower panel):
conda activate mlos
-
Make sure the TPC-C database is preloaded.
Note: this is an optimization. If not present, the scripts below will generate it the first time it's needed.
mkdir -p workdir/benchbase/db.bak wget -Nc -O workdir/benchbase/db.bak/tpcc.db https://adumlosdemostorage.blob.core.windows.net/adu-mlos-db-example/adu_notebook_db/tpcc.db
-
Run the
mlos_bench
tool as a one-shot benchmark.For instance, to run the sqlite example from the upstream MLOS repo (pulled locally):
To be executed in the integrated terminal at the bottom of the VSCode window:
# Run the one-shot benchmark. # This will run a single experiment trial and output the results to the local results database. mlos_bench --config "./config/cli/local-sqlite-bench.jsonc" --globals "./config/experiments/sqlite-sync-journal-pagesize-caching-experiment.jsonc"
This should take a few minutes to run and does the following:
-
Loads the CLI config
./config/cli/local-sqlite-bench.jsonc
-
The
config/experiments/sqlite-sync-journal-pagesize-caching-experiment.jsonc
further customizes that config with the experiment specific parameters (e.g., telling it which tunable parameters to use for the experiment, the experiment name, etc.).Alternatively, Other config files from the
config/experiments/
directory can be referenced with the--globals
argument as well in order to customize the experiment.
-
-
The CLI config also references and loads the root environment config
./config/environments/apps/sqlite/sqlite-local-benchbase.jsonc
.- In that config the
setup
section lists commands used to- Prepare a config for the
sqlite
instance based on the tunable parameters specified in the experiment config, - Load or restores a previously loaded copy of a
tpcc.db
sqlite
instance using abenchbase
docker
image.
- Prepare a config for the
- Next, the
run
section lists commands used to- execute a TPC-C workload against that
sqlite
instance - assemble the results into a file that is read in the
read_results_file
config section in order to store them into themlos_bench
results database.
- execute a TPC-C workload against that
- In that config the
-
-
Run the
mlos_bench
tool as an optimization loop.# Run the optimization loop by referencing a different config file # that specifies an optimizer and objective target. mlos_bench --config "./config/cli/local-sqlite-opt.jsonc" --globals "./config/experiments/sqlite-sync-journal-pagesize-caching-experiment.jsonc" --trial-config-repeat-count 3 --max-iterations 100
The command above will run the optimization loop for 100 iterations, which should take about 30 minutes since each trial should takes about 12 seconds to run.
Note: a 10 second run is not very long evaluation period. It's used here to keep the demo short, but in practice you would want to run for longer to get more accurate results.
To do this, it follows the procedure outlined above, but instead of running a single trial, it runs an optimization loop that runs multiple trials, each time updating the tunable parameters based on the results of the previous trial, balancing exploration and exploitation to find the optimal set of parameters.
The overall process looks like this:
Source: LlamaTune: VLDB 2022
While that's executing you can try exploring other previously collected data using the
mlos_demo_mysql.ipynb
notebook.
-
Use the
mlos_demo_sqlite.ipynb
notebook to analyze the results.To do this, you may need to activate the appropriate python kernel in the Jupyter notebook environment.
Here's a short list of some tips/tricks of things to try in case you encounter some issues during the demo:
-
If the "Select Kernels" menu is hanging during the notebook steps,
Or, if the
mlos_bench --help
step returns acommand not found
error,Then, try to
- Update and/or restart VSCode
- Restart your codespace
Here are some additional sources of information:
- MLOS - the main repo for the
mlos_bench
tool.