# Meridian MLflow Demo

This Colab demonstrates how to integrate Meridian model runs with [MLflow tracking](https://mlflow.org/docs/latest/ml/tracking/). We'll cover the necessary installation and setup steps, followed by a simple example that showcases Meridian's MLflow configuration and data retrieval.

<a name="install"></a>
## Step 0: Install

1\. Make sure you are using one of the available GPU Colab runtimes which is **required** to run Meridian. You can change your notebook's runtime in `Runtime > Change runtime type` in the menu. All users can use the T4 GPU runtime which is sufficient to run the demo colab, free of charge. Users who have purchased one of Colab's paid plans have access to premium GPUs (such as V100, A100 or L4 Nvidia GPU).

2\. Install the latest version of Meridian with MLflow dependencies (MLflow support is available as of version 1.1.3), and verify that GPU is available.

In [None]:
# Install meridian with mlflow: from PyPI @ latest release
!pip install --upgrade google-meridian[colab,and-cuda,mlflow]

# Install meridian with mlflow: from PyPI @ specific version
# !pip install google-meridian[colab,and-cuda,mlflow]==1.1.3

# Install meridian: from GitHub @HEAD
# !pip install --upgrade "google-meridian[colab,and-cuda,mlflow] @ git+https://github.com/google/meridian.git"



In [None]:
# Install (latest) Numpy, Tensorflow and Meridian version
import numpy as np
import pandas as pd
import tensorflow as tf
import tensorflow_probability as tfp
import arviz as az
import meridian
from meridian import constants
from meridian.data import data_frame_input_data_builder
from meridian.data import load
from meridian.data import test_utils
from meridian.model import model
from meridian.model import spec
from meridian.model import prior_distribution
from meridian.analysis import optimizer
from meridian.analysis import analyzer
from meridian.analysis import visualizer
from meridian.analysis import summarizer
from meridian.analysis import formatter

# check if GPU is available
from psutil import virtual_memory
ram_gb = virtual_memory().total / 1e9
print('Your runtime has {:.1f} gigabytes of available RAM\n'.format(ram_gb))
print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))
print("Num CPUs Available: ", len(tf.config.experimental.list_physical_devices('CPU')))

Your runtime has 13.6 gigabytes of available RAM

Num GPUs Available:  1
Num CPUs Available:  1


<a name="load-data"></a>
## Step 1: Load the data

Load the [simulated dataset in CSV format](https://github.com/google/meridian/blob/main/meridian/data/simulated_data/csv/geo_all_channels.csv) as follows.

1\. Read the data into a Pandas DataFrame.

In [None]:
df = pd.read_csv(
    "https://raw.githubusercontent.com/google/meridian/refs/heads/main/meridian/data/simulated_data/csv/geo_all_channels.csv"
)

2\. Create a DataFrameInputDataBuilder instance.



In [None]:
builder = data_frame_input_data_builder.DataFrameInputDataBuilder(
    kpi_type='non_revenue'
)

3\. Offer the components to the builder. Note that the components may be offered all at once or piecewise.



In [None]:
builder = (
    builder.with_kpi(df, kpi_col="conversions")
    .with_revenue_per_kpi(df, revenue_per_kpi_col="revenue_per_conversion")
    .with_population(df)
    .with_controls(
        df, control_cols=["sentiment_score_control", "competitor_sales_control"]
    )
)

channels = ["Channel0", "Channel1", "Channel2", "Channel3", "Channel4"]
builder = builder.with_media(
    df,
    media_cols=[f"{channel}_impression" for channel in channels],
    media_spend_cols=[f"{channel}_spend" for channel in channels],
    media_channels=channels,
)

data = builder.build()

Note that the simulated data here does not contain reach and frequency. We recommend including reach and frequency data whenever they are available. For information about the advantages of utilizing reach and frequency, see [Bayesian Hierarchical Media Mix Model Incorporating Reach and Frequency Data](https://research.google/pubs/bayesian-hierarchical-media-mix-model-incorporating-reach-and-frequency-data/#:~:text=By%20incorporating%20R%26F%20into%20MMM,based%20on%20optimal%20frequency%20recommendations.). For code snippet for loading reach and frequency data, see [Load geo-level data with reach and frequency](https://developers.google.com/meridian/docs/user-guide/load-geo-data-with-rf)

The documentation provides guidance for instances where reach and frequency data is accessible for specific channels. Additionally, for information about how to load other data types and formats, including data with reach and frequency, see [Supported data types and formats](https://developers.google.com/meridian/docs/user-guide/supported-data-types-formats).

<a name="configure-model"></a>
## Step 2: Configure the model -- with MLflow Autolog!

Note, this example focuses on MLflow autologging.

For more information about configuring the parameters and using a customized model specification, such as setting different ROI priors for each media channel, see [Configure the model](https://developers.google.com/meridian/docs/user-guide/configure-model).

1. Prepare the model specification.

In [None]:
roi_mu = 0.2     # Mu for ROI prior for each media channel.
roi_sigma = 0.9  # Sigma for ROI prior for each media channel.
prior = prior_distribution.PriorDistribution(
    roi_m=tfp.distributions.LogNormal(roi_mu, roi_sigma, name=constants.ROI_M)
)
model_spec = spec.ModelSpec(prior=prior)

2. Enable MLflow autologging by calling `autolog()`. This enables automatic logging of parameters, metrics, and models by patching relevant functions. This only needs to be called once prior to any model runs.

In [None]:
from meridian.mlflow import autolog

autolog.autolog(log_metrics=True)  # Metric logging is not enabled by default.

3. In the context of a new MLflow run started with `mlflow.start_run()`, initialize the `Meridian` class and invoke `sample_prior()` and `sample_posterior()` to obtain samples from the prior and posterior distributions of model parameters. If you are using the T4 GPU runtime this step may take about 10 minutes for the provided data set.

In [None]:
import mlflow

with mlflow.start_run(run_name="my-run"):
  mmm = model.Meridian(input_data=data, model_spec=model_spec)
  mmm.sample_prior(500)
  mmm.sample_posterior(n_chains=7, n_adapt=500, n_burnin=500, n_keep=1000, seed=1)

4. After the run completes, you can retrieve run results using the MLflow client.

In [None]:
client = mlflow.tracking.MlflowClient()
experiment_id = "0"
runs = client.search_runs(
    experiment_id,
    max_results=1000,
    filter_string=f"attributes.run_name = 'my-run'"
)
if runs:
  print(runs[0])
else:
  print("No runs found.")


<Run: data=<RunData: metrics={'MAPE': 0.2559046149253845,
 'R_Squared': 0.7730023860931396,
 'wMAPE': 0.20020413398742676}, params={'arviz_version': '0.21.0',
 'meridian_version': '1.1.3',
 'prior.alpha_m': 'tfp.distributions.Uniform("alpha_m", batch_shape=[], '
                  'event_shape=[], dtype=float32)',
 'prior.alpha_om': 'tfp.distributions.Uniform("alpha_om", batch_shape=[], '
                   'event_shape=[], dtype=float32)',
 'prior.alpha_orf': 'tfp.distributions.Uniform("alpha_orf", batch_shape=[], '
                    'event_shape=[], dtype=float32)',
 'prior.alpha_rf': 'tfp.distributions.Uniform("alpha_rf", batch_shape=[], '
                   'event_shape=[], dtype=float32)',
 'prior.beta_m': 'tfp.distributions.HalfNormal("beta_m", batch_shape=[], '
                 'event_shape=[], dtype=float32)',
 'prior.beta_om': 'tfp.distributions.HalfNormal("beta_om", batch_shape=[], '
                  'event_shape=[], dtype=float32)',
 'prior.beta_orf': 'tfp.distributions.Ha