Exploring MLflow in depth for Python and Scala.
- Hello World and Hello World Nested Runs.
- Python Scikit-learn - most advanced example.
- PySpark ML.
- PyTorch ML.
- Tools - dump run, dump experiment, save runs to CSV files, export run/experiment, copy run/experiment to another tracking server, etc.
- Scala Spark ML examples - uses MLFlow Java client.
- Tools - Useful MLflow tools: dump run, dump experiment, dump runs to CSV files, etc.
- Note: You must install Python MLflow for Java client to work:
pip install mlflow
.
- mlflow-java - MLflow Java and Scala extras such as proposed RunContext.
Before running the examples, you need to install the MLflow Python environment and launch an MLflow server.
Install either with PyPi or Miniconda (conda.yaml).
pip install mlflow
- Install miniconda3:
https://conda.io/miniconda.html
- Create the environment:
conda env create --file conda.yaml
- Source the environment:
source activate mlflow-fun
mlflow server --host 0.0.0.0 --port 5000 --backend-store-uri $PWD/mlruns --default-artifact-root $PWD/mlruns
For those examples that use Spark, download the latest Spark version to your local machine. See Download Apache Spark.
To run the examples against a Databricks cluster see the following documentation:
For examples see Hello World and Scikit-learn Wine Quality.
Setup
export MLFLOW_TRACKING_URI=databricks
The token and tracking server URL will be picked up from your Databricks CLI default profile in ~/.databricks.cfg
.
You can also override these values with the following environment variables:
export DATABRICKS_TOKEN=MY_TOKEN
export DATABRICKS_HOST=https://myshard.cloud.databricks.com