In [None]:
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

In [1]:
!pwd

/home/jupyter/jt-forecast-repo/vertex-forecas-repo


# Forecasting on Vertex pipelines for private preview

<table align="left">
  <td>
    <a href="https://colab.research.google.com/github/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/official/automl/automl_tabular_on_vertex_pipelines.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/colab-logo-32px.png" alt="Colab logo"> Run in Colab
    </a>
  </td>
  <td>
    <a href="https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/official/automl/automl_tabular_on_vertex_pipelines.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/github-logo-32px.png" alt="GitHub logo">
      View on GitHub
    </a>
  </td>
  <td>
    <a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/main/notebooks/official/automl/automl_tabular_on_vertex_pipelines.ipynb">
        <img src="https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32" alt="Vertex AI logo">
      Open in Vertex AI Workbench
    </a>
  </td>
</table>
<br/><br/><br/>

## Overview

In this tutorial, you will use a few Vertex AI Tabular Workflows pipelines to train AutoML models using different configurations. You will see:
- how `get_l2l_forecasting_pipeline_and_parameters` gives you the ability to customize the default AutoML Tabular pipeline
- how `get_l2l_forecasting_pipeline_and_parameters` allows you to reduce the training time and cost for an AutoML model by using the tuning results from a previous pipeline run.
- how `get_time_series_dense_encoder_forecasting_pipeline_and_parameters` allows you to train FastNN model
- how to enable probabilistic inference for forecasting training pipelines
- how to perform the batch prediction with the forecasting model trained with Tabular workflow.

Learn more about [Tabular Workflow for E2E AutoML](https://cloud.google.com/vertex-ai/docs/tabular-data/tabular-workflows/e2e-automl).

**Objective**

In this tutorial, you learn how to create AutoML Forecasting models using [Vertex AI Pipelines](https://cloud.google.com/vertex-ai/docs/pipelines/introduction) downloaded from [Google Cloud Pipeline Components](https://cloud.google.com/vertex-ai/docs/pipelines/components-introduction) (GCPC). These pipelines will be Vertex AI Tabular Workflow pipelines which are maintained by Google. These pipelines will showcase different ways to customize the Vertex Tabular training process.

This tutorial uses the following Google Cloud ML services:

- `AutoML Training`
- `Vertex AI Pipelines`

The steps performed are:

- Create a training pipeline with Learn-to-learn algorithm using specified machine type for training.
- Create a training pipeline that reuses the architecture search results from the previous pipeline to save time.
- Create a training pipeline with TiDE(Time series Dense Encoder) algorithm.
- Create a training pipeline with the probabilistic inference enabled.
- Perform the batch prediction using the trained model in the above steps.

***Dataset***

The dataset you will be using is [Liquor](https://www.kaggle.com/datasets/residentmario/iowa-liquor-sales).

**Costs**

This tutorial uses billable components of Google Cloud:

* Vertex AI
* Cloud Storage
* BigQuery

Learn about [Vertex AI
pricing](https://cloud.google.com/vertex-ai/pricing), [Cloud Storage
pricing](https://cloud.google.com/storage/pricing), and [BigQuery](https://cloud.google.com/bigquery), and use the [Pricing
Calculator](https://cloud.google.com/products/calculator/)
to generate a cost estimate based on your projected usage.

**VPC related config**

If you need to use a custom Dataflow subnetwork, you can set it through the `dataflow_subnetwork` parameter. The requirements are:
1. `dataflow_subnetwork` must be fully qualified subnetwork name.
   [[reference](https://cloud.google.com/dataflow/docs/guides/specifying-networks#example_network_and_subnetwork_specifications)]
1. The following service accounts must have [Compute Network User role](https://cloud.google.com/compute/docs/access/iam#compute.networkUser) assigned on the specified dataflow subnetwork [[reference](https://cloud.google.com/dataflow/docs/guides/specifying-networks#shared)]:
    1. Compute Engine default service account: PROJECT_NUMBER-compute@developer.gserviceaccount.com
    1. Dataflow service account: service-PROJECT_NUMBER@dataflow-service-producer-prod.iam.gserviceaccount.com

If your project has VPC-SC enabled, please make sure:

1. The dataflow subnetwork used in VPC-SC is configured properly for Dataflow.
   [[reference](https://cloud.google.com/dataflow/docs/guides/routes-firewall)]
1. `dataflow_use_public_ips` is set to False.


Fully qualified subnetwork name is in the form of:
>`https://www.googleapis.com/compute/v1/projects/HOST_PROJECT_ID/regions/REGION_NAME/subnetworks/SUBNETWORK_NAME`

Reference: https://cloud.google.com/dataflow/docs/guides/specifying-networks#example_network_and_subnetwork_specifications

**Install additional packages**

Install the latest version of the Google Cloud Pipeline Components (GCPC) SDK.

In [1]:
# !rm -rf ./google_cloud_pipeline_components*.whl
# !gsutil cp gs://automl-tables-build-oss-dependencies/gcpc/2023_03_27_05_17_32/google_cloud_pipeline_components-2.0.0b1.dev0-py2.py3-none-any.whl .
# !pip install --upgrade --force-reinstall --user ./google_cloud_pipeline_components*.whl -q

Copying gs://automl-tables-build-oss-dependencies/gcpc/2023_03_27_05_17_32/google_cloud_pipeline_components-2.0.0b1.dev0-py2.py3-none-any.whl...
/ [1 files][  1.2 MiB/  1.2 MiB]                                                
Operation completed over 1 objects/1.2 MiB.                                      
[0m[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
google-api-python-client 1.8.0 requires google-api-core<2dev,>=1.13.0, but you have google-api-core 2.11.0 which is incompatible.
ray 2.4.0 requires grpcio<=1.51.3,>=1.42.0; python_version >= "3.10" and sys_platform != "darwin", but you have grpcio 1.54.2 which is incompatible.
ydata-profiling 4.1.2 requires requests<2.29,>=2.24.0, but you have requests 2.31.0 which is incompatible.[0m[31m
[0m

#### Restart the kernel
Once you've installed the additional packages, you need to restart the notebook kernel so it can find the packages.


**Note: Once this cell has finished running, continue on. You do not need to re-run any of the cells above.**


## Load notebook config

> use the prefix defined in 00-env-setup

In [1]:
# naming convention for all cloud resources
VERSION        = "v1"              # TODO
PREFIX         = f'forecast-refresh-{VERSION}'   # TODO

print(f"PREFIX = {PREFIX}")

PREFIX = forecast-refresh-v1


In [2]:
# staging GCS
GCP_PROJECTS             = !gcloud config get-value project
PROJECT_ID               = GCP_PROJECTS[0]

# ! gcloud config set project $PROJECT_ID

# GCS bucket and paths
BUCKET_NAME              = f'{PREFIX}-{PROJECT_ID}-gcs'
BUCKET_URI               = f'gs://{BUCKET_NAME}'

config = !gsutil cat {BUCKET_URI}/config/notebook_env.py
print(config.n)
exec(config.n)


PROJECT_ID               = "hybrid-vertex"
PROJECT_NUM              = "934903580331"
LOCATION                 = "us-central1"

REGION                   = "us-central1"
BQ_LOCATION              = "US"
VPC_NETWORK_NAME         = "ucaip-haystack-vpc-network"

VERTEX_SA                = "934903580331-compute@developer.gserviceaccount.com"

PREFIX                   = "forecast-refresh-v1"
VERSION                  = "v1"

BUCKET_NAME              = "forecast-refresh-v1-hybrid-vertex-gcs"
BUCKET_URI               = "gs://forecast-refresh-v1-hybrid-vertex-gcs"

DATA_GCS_PREFIX          = "data"
DATA_PATH                = "gs://forecast-refresh-v1-hybrid-vertex-gcs/data"


VPC_NETWORK_FULL         = "projects/934903580331/global/networks/ucaip-haystack-vpc-network"



In [None]:
# For a list of available model metrics, go here:
!gsutil ls $BUCKET_URI

gs://forecast-refresh-v1-hybrid-vertex-gcs/automl_forecasting_pipeline/
gs://forecast-refresh-v1-hybrid-vertex-gcs/config/


## Imports

In [34]:
# Import required modules
import json
from typing import Any, Dict, List, Optional

from google.cloud import aiplatform, storage

# from google_cloud_pipeline_components.types.artifact_types import VertexDataset
from google_cloud_pipeline_components.preview.automl.forecasting import \
    utils as automl_forecasting_utils

from google.cloud import bigquery

# bigquery client
bqclient = bigquery.Client(
    project=PROJECT_ID,
    # location=LOCATION
)

import sys
sys.path.append("..")
from src import helpers

### Initialize Vertex SDK

Initialize the Vertex SDK for Python for your project.

In [5]:
EXPERIMENT_NAME = f"{PREFIX}-v1"

print(EXPERIMENT_NAME)

aiplatform.init(experiment=EXPERIMENT_NAME, project=PROJECT_ID, location=REGION)

forecast-refresh-v1-v1


### Define helper functions

In [None]:
# Fetch the tuple of GCS bucket and object URI.
def get_bucket_name_and_path(uri):
    no_prefix_uri = uri[len("gs://") :]
    splits = no_prefix_uri.split("/")
    return splits[0], "/".join(splits[1:])

# Fetch the content from a GCS object URI.

def download_from_gcs(uri):
    bucket_name, path = get_bucket_name_and_path(uri)
    storage_client = storage.Client(project=PROJECT_ID)
    bucket = storage_client.get_bucket(bucket_name)
    blob = bucket.blob(path)
    return blob.download_as_string()

# Upload the string content as a GCS object.

def write_to_gcs(uri: str, content: str):
    bucket_name, path = get_bucket_name_and_path(uri)
    storage_client = storage.Client()
    bucket = storage_client.get_bucket(bucket_name)
    blob = bucket.blob(path)
    blob.upload_from_string(content)


def generate_auto_transformation(column_names: List[str]) -> List[Dict[str, Any]]:
    transformations = []
    for column_name in column_names:
        transformations.append({"auto": {"column_name": column_name}})
    return transformations

# This is the example to set non-auto transformations.
# For more details about the transformations, please check:
# https://cloud.google.com/vertex-ai/docs/datasets/data-types-tabular#transformations
def generate_transformation(
    auto_column_names: Optional[List[str]] = None,
    numeric_column_names: Optional[List[str]] = None,
    categorical_column_names: Optional[List[str]] = None,
    text_column_names: Optional[List[str]] = None,
    timestamp_column_names: Optional[List[str]] = None,
) -> List[Dict[str, Any]]:
    if auto_column_names is None:
        auto_column_names = []
    if numeric_column_names is None:
        numeric_column_names = []
    if categorical_column_names is None:
        categorical_column_names = []
    if text_column_names is None:
        text_column_names = []
    if timestamp_column_names is None:
        timestamp_column_names = []
    return {
        "auto": auto_column_names,
        "numeric": numeric_column_names,
        "categorical": categorical_column_names,
        "text": text_column_names,
        "timestamp": timestamp_column_names,
    }


def write_auto_transformations(uri: str, column_names: List[str]):
    transformations = generate_auto_transformation(column_names)
    write_to_gcs(uri, json.dumps(transformations))

# Retrieve the data given a task name.
def get_task_detail(
    task_details: List[Dict[str, Any]], task_name: str
) -> List[Dict[str, Any]]:
    for task_detail in task_details:
        if task_detail.task_name == task_name:
            return task_detail

# Retrieve the URI of the model.
def get_deployed_model_uri(
    task_details,
):
    ensemble_task = get_task_detail(task_details, "model-upload")
    return ensemble_task.outputs["model"].artifacts[0].uri


def get_no_custom_ops_model_uri(task_details):
    ensemble_task = get_task_detail(task_details, "automl-tabular-ensemble")
    return download_from_gcs(
        ensemble_task.outputs["model_without_custom_ops"].artifacts[0].uri
    )

# Retrieve the feature importance details from GCS.
def get_feature_attributions(
    task_details,
):
    ensemble_task = get_task_detail(task_details, "model-evaluation-2")
    return download_from_gcs(
        ensemble_task.outputs["evaluation_metrics"]
        .artifacts[0]
        .metadata["explanation_gcs_path"]
    )

# Retrieve the evaluation metrics from GCS.
def get_evaluation_metrics(
    task_details,
):
    ensemble_task = get_task_detail(task_details, "model-evaluation")
    return download_from_gcs(
        ensemble_task.outputs["evaluation_metrics"].artifacts[0].uri
    )

# Pretty print the JSON string.
def load_and_print_json(s):
    parsed = json.loads(s)
    print(json.dumps(parsed, indent=2, sort_keys=True))

## Prepare for training

In [6]:
# Dataflow's fully qualified subnetwork name, when empty the default subnetwork will be used.
dataflow_subnetwork = None 

# Specifies whether Dataflow workers use public IP addresses.
dataflow_use_public_ips = True

import datetime

# NOW = datetime.datetime.now().strftime("%d %H:%M:%S.%f").replace(" ","").replace(":","_").replace(".","_")
NOW = '2106_14_34_399161' # tmp

print(NOW)

### Define training specification

In [10]:
# root_dir = os.path.join(BUCKET_URI, f"automl_forecasting_pipeline/run-{uuid.uuid4()}")

ROOT_DIR                      = f"{BUCKET_URI}/automl_forecasting_pipeline/{EXPERIMENT_NAME}/run-{NOW}"
optimization_objective        = "minimize-mae"
time_column                   = "date"
time_series_identifier_column = "store_name"
target_column                 = "sale_dollars"
data_source_csv_filenames     = None

print(f"ROOT_DIR              = {ROOT_DIR}")

ROOT_DIR              = gs://forecast-refresh-v1-hybrid-vertex-gcs/automl_forecasting_pipeline/forecast-refresh-v1-v1/run-2106_14_34_399161


In [11]:
# training data specs

data_source_bigquery_table_path = (
    "bq://bigquery-public-data.iowa_liquor_sales_forecasting.2020_sales_train"
)

training_fraction = 0.8
validation_fraction = 0.1
test_fraction = 0.1

predefined_split_key = None
if predefined_split_key:
    training_fraction = None
    validation_fraction = None
    test_fraction = None

weight_column = None

features = [
    time_column,
    target_column,
    "city",
    "zip_code",
    "county",
]

# available_at_forecast_columns = ",".join([time_column])
# unavailable_at_forecast_columns = ",".join([target_column])
# time_series_attribute_columns = ",".join(["city", "zip_code", "county"])

# forecast_horizon = 30
# context_window = 30

available_at_forecast_columns = [time_column]
unavailable_at_forecast_columns = [target_column]
time_series_attribute_columns = ["city", "zip_code", "county"]

forecast_horizon = 150
context_window = 150

print(f"available_at_forecast_columns    = {available_at_forecast_columns}")
print(f"unavailable_at_forecast_columns  = {unavailable_at_forecast_columns}")
print(f"time_series_attribute_columns    = {time_series_attribute_columns}")

available_at_forecast_columns    = ['date']
unavailable_at_forecast_columns  = ['sale_dollars']
time_series_attribute_columns    = ['city', 'zip_code', 'county']


In [12]:
# transformations = generate_auto_transformation(features)
transformations = generate_transformation(auto_column_names=features)

# TRANSFORM_CONFIG_PATH = f"{ROOT_DIR}/transform_config_{NOW}.json"
TRANSFORM_CONFIG_PATH = "gs://forecast-refresh-v1-hybrid-vertex-gcs/automl_forecasting_pipeline/run-28ec73a7-646e-420b-b883-2aa16ea2e518/transform_config_40ac07bd-c92b-4914-beda-18a382062acd.json"

print(f"transformations       = {transformations}\n")
print(f"TRANSFORM_CONFIG_PATH = {TRANSFORM_CONFIG_PATH}")

write_to_gcs(TRANSFORM_CONFIG_PATH, json.dumps(transformations))

transformations       = {'auto': ['date', 'sale_dollars', 'city', 'zip_code', 'county'], 'numeric': [], 'categorical': [], 'text': [], 'timestamp': []}

TRANSFORM_CONFIG_PATH = gs://forecast-refresh-v1-hybrid-vertex-gcs/automl_forecasting_pipeline/run-28ec73a7-646e-420b-b883-2aa16ea2e518/transform_config_40ac07bd-c92b-4914-beda-18a382062acd.json


In [13]:
# For a list of available model metrics, go here:
!gsutil ls $BUCKET_URI/

gs://forecast-refresh-v1-hybrid-vertex-gcs/automl_forecasting_pipeline/
gs://forecast-refresh-v1-hybrid-vertex-gcs/config/


### Create Vertex Managed Dataset

In [14]:
# Create a Vertex managed dataset artifact.
vertex_dataset = aiplatform.TimeSeriesDataset.create(
    bq_source=data_source_bigquery_table_path
)
vertex_dataset_artifact_id = vertex_dataset.gca_resource.metadata_artifact.split("/")[-1]
vertex_dataset_artifact_id

Creating TimeSeriesDataset
Create TimeSeriesDataset backing LRO: projects/934903580331/locations/us-central1/datasets/2490918303959089152/operations/1549626330700578816
TimeSeriesDataset created. Resource name: projects/934903580331/locations/us-central1/datasets/2490918303959089152
To use this TimeSeriesDataset in another session:
ds = aiplatform.TimeSeriesDataset('projects/934903580331/locations/us-central1/datasets/2490918303959089152')


'f8e2ba9b-7bc9-4961-9d57-037f83594781'

# AutoML Forecast with Tabular Workflows

> training & customize search space and change training configuration

We will create a skip evaluation AutoML Forecasting pipeline with the following customizations:
- Limit the hyperparameter search space
- Change machine type and tuning / training parallelism

### Supported forecast algorithms

## L2L - learn-to-learn algorithm

**Currently, 4 model types are supported in the APIs/SDK with the utility functions:**

* `time_series_dense_encoder` (TiDE): get_time_series_dense_encoder_forecasting_pipeline_and_parameters
* `learn_to_learn` (L2L): get_learn_to_learn_forecasting_pipeline_and_parameters
* `sequence_to_sequence` (seq2seq): get_sequence_to_sequence_forecasting_pipeline_and_parameters
* `temporal_fusion_transformer` (TFT): get_temporal_fusion_transformer_forecasting_pipeline_and_parameters

**You can use multiple features at the same time:**
1. You can use `TiDE` or `L2L` together with the probabilistic inference enabled.
2. With the probabilistic inference enabled, you don't have to set the optimization objective to be `minimize-quantile-loss` in order to train the model for quantile forecast.

In [29]:
worker_pool_specs_override = [
    {"machine_spec": {"machine_type": "n1-standard-8"}},  # override for TF chief node
    {},  # override for TF worker node, since it's not used, leave it empty
    {},  # override for TF ps node, since it's not used, leave it empty
    {
        "machine_spec": {
            "machine_type": "n1-standard-4"  # override for TF evaluator node
        }
    },
]

# Number of weak models in the final ensemble model.
num_selected_trials = 5

train_budget_milli_node_hours = 250  # 15 minutes

(
    template_path,
    parameter_values,
) = automl_forecasting_utils.get_learn_to_learn_forecasting_pipeline_and_parameters(
    project=PROJECT_ID,
    location=REGION,
    root_dir=ROOT_DIR,
    evaluated_examples_bigquery_path=f"bq://{PROJECT_ID}.{BIGQUERY_DATASET_NAME}", # `bq://project.dataset`.
    target_column=target_column,
    optimization_objective=optimization_objective,
    transformations=transformations,
    train_budget_milli_node_hours=train_budget_milli_node_hours,
    data_source_csv_filenames=data_source_csv_filenames,
    data_source_bigquery_table_path=data_source_bigquery_table_path,
    weight_column=weight_column,
    predefined_split_key=predefined_split_key,
    training_fraction=training_fraction,
    validation_fraction=validation_fraction,
    test_fraction=test_fraction,
    num_selected_trials=num_selected_trials,
    time_column=time_column,
    time_series_identifier_column=time_series_identifier_column,
    time_series_attribute_columns=time_series_attribute_columns,
    available_at_forecast_columns=available_at_forecast_columns,
    unavailable_at_forecast_columns=unavailable_at_forecast_columns,
    forecast_horizon=forecast_horizon,
    context_window=context_window,
    stage_1_tuner_worker_pool_specs_override=worker_pool_specs_override,
    # feature_transform_engine_dataflow_subnetwork=dataflow_subnetwork,
    dataflow_subnetwork=dataflow_subnetwork,
    # feature_transform_engine_dataflow_use_public_ips=dataflow_use_public_ips,
    dataflow_use_public_ips=dataflow_use_public_ips,
    # quantile forecast, L2L without probabilistic inference requires `minimize-quantile-loss`
    # quantiles=",".join(map(lambda x: str(x), [0.25, 0.5, 0.9])),
)

job_id = "l2l-forecasting-{}".format(EXPERIMENT_NAME)
job = aiplatform.PipelineJob(
    display_name=job_id,
    location=REGION,  # launches the pipeline job in the specified region
    template_path=template_path,
    job_id=job_id,
    pipeline_root=ROOT_DIR,
    parameter_values=parameter_values,
    enable_caching=False,
    # Uncomment the following line if you want to use Vertex managed dataset.
    # input_artifacts={'vertex_dataset': vertex_dataset_artifact_id},
)

# job.run(sync=False,experiment=EXPERIMENT_NAME)
job.submit(
    experiment=EXPERIMENT_NAME,
    # sync=False,
    service_account=VERTEX_SA,
)

Creating PipelineJob
PipelineJob created. Resource name: projects/934903580331/locations/us-central1/pipelineJobs/l2l-forecasting-forecast-refresh-v1-v1
To use this PipelineJob in another session:
pipeline_job = aiplatform.PipelineJob.get('projects/934903580331/locations/us-central1/pipelineJobs/l2l-forecasting-forecast-refresh-v1-v1')
View Pipeline Job:
https://console.cloud.google.com/vertex-ai/locations/us-central1/pipelines/runs/l2l-forecasting-forecast-refresh-v1-v1?project=934903580331
Associating projects/934903580331/locations/us-central1/pipelineJobs/l2l-forecasting-forecast-refresh-v1-v1 to Experiment: forecast-refresh-v1-v1
PipelineJob projects/934903580331/locations/us-central1/pipelineJobs/l2l-forecasting-d4191e93-9e7c-4ccc-aedd-2729ac2543a7 current state:
PipelineState.PIPELINE_STATE_RUNNING


## L2L - Skip architecture search
Instead of doing architecture search everytime, we can reuse the existing architecture search result. This could help:
1. reducing the variation of the output model
2. reducing training cost

The existing architecture search result is stored in the `tuning_result_output` output of the `automl-forecasting-stage-1-tuner` component. We can manually input it or get it programmatically.

In [37]:
# CREATE_NEW_ASSETS = True

# BIGQUERY_DATASET_NAME = f"{PREFIX}".replace("-","_").lower()

# if CREATE_NEW_ASSETS:
#     ds = bigquery.Dataset(f"{PROJECT_ID}.{BIGQUERY_DATASET_NAME}")
#     ds.location = BQ_LOCATION
#     ds = bqclient.create_dataset(dataset = ds, exists_ok = False)

#     print(ds.full_dataset_id)

In [102]:
pipeline_task_details = job.task_details

for task_deets in pipeline_task_details:
    print(task_deets.task_name)

automl-tabular-finalizer
feature-transform-engine
model-upload-2
automl-forecasting-ensemble-2
string-not-empty
calculate-training-parameters-2
condition-2
l2l-forecasting-forecast-refresh-v1-v1
condition-3
get-prediction-image-uri-2
training-configurator-and-validator
exit-handler-1
split-materialized-data
automl-forecasting-stage-1-tuner
calculate-transformations


In [109]:
stage_1_tuner_task = get_task_detail(
    pipeline_task_details, "automl-forecasting-stage-1-tuner"
)

stage_1_tuning_result_artifact_uri = (
    stage_1_tuner_task.outputs["tuning_result_output"].artifacts[0].uri
)

upload_model_task = get_task_detail(
    pipeline_task_details, "model-upload-2"
)

forecasting_mp_model_artifact = (
    upload_model_task.outputs["model"].artifacts[0]
)

forecasting_mp_model = aiplatform.Model(forecasting_mp_model_artifact.metadata['resourceName'])
forecasting_mp_model

<google.cloud.aiplatform.models.Model object at 0x7f1cfeac1090> 
resource name: projects/934903580331/locations/us-central1/models/3409416232942698496

In [41]:
# Number of weak models in the final ensemble model.
num_selected_trials = 5

train_budget_milli_node_hours = 250  # 15 minutes

JOB_ID = f"l2l-skip-a-search-{EXPERIMENT_NAME}-v7"

print(f"JOB_ID                 = {JOB_ID}")
print(f"TRANSFORM_CONFIG_PATH  = {TRANSFORM_CONFIG_PATH}")

JOB_ID                 = l2l-skip-a-search-forecast-refresh-v1-v1-v7
TRANSFORM_CONFIG_PATH  = gs://forecast-refresh-v1-hybrid-vertex-gcs/automl_forecasting_pipeline/run-28ec73a7-646e-420b-b883-2aa16ea2e518/transform_config_40ac07bd-c92b-4914-beda-18a382062acd.json


In [42]:
# stage_1_tuning_result_artifact_uri = 'gs://forecast-refresh-v1-hybrid-vertex-gcs/automl_forecasting_pipeline/run-28ec73a7-646e-420b-b883-2aa16ea2e518/934903580331/l2l-forecasting-forecast-refresh-v1-v1/automl-forecasting-stage-1-tuner_7439017359651635200/tuning_result_output'

In [43]:
# run pipeline run
(
    template_path,
    parameter_values,
) = automl_forecasting_utils.get_learn_to_learn_forecasting_pipeline_and_parameters(
    project=PROJECT_ID,
    location=REGION,
    root_dir=ROOT_DIR,
    evaluated_examples_bigquery_path=f"bq://{PROJECT_ID}.{BIGQUERY_DATASET_NAME}", # `bq://project.dataset`.
    target_column=target_column,
    optimization_objective=optimization_objective,
    transformations=transformations,
    train_budget_milli_node_hours=train_budget_milli_node_hours,
    data_source_csv_filenames=data_source_csv_filenames,
    data_source_bigquery_table_path=data_source_bigquery_table_path,
    weight_column=weight_column,
    predefined_split_key=predefined_split_key,
    training_fraction=training_fraction,
    validation_fraction=validation_fraction,
    test_fraction=test_fraction,
    num_selected_trials=num_selected_trials,
    time_column=time_column,
    # time_series_identifier_column=time_series_identifier_column,
    time_series_identifier_columns=[time_series_identifier_column],
    time_series_attribute_columns=time_series_attribute_columns,
    available_at_forecast_columns=available_at_forecast_columns,
    unavailable_at_forecast_columns=unavailable_at_forecast_columns,
    forecast_horizon=forecast_horizon,
    context_window=context_window,
    stage_1_tuning_result_artifact_uri=stage_1_tuning_result_artifact_uri,
    # feature_transform_engine_dataflow_subnetwork=dataflow_subnetwork,
    dataflow_subnetwork=dataflow_subnetwork,
    # feature_transform_engine_dataflow_use_public_ips=dataflow_use_public_ips,
    dataflow_use_public_ips=dataflow_use_public_ips,
)

job = aiplatform.PipelineJob(
    display_name=JOB_ID,
    location=REGION,  # launches the pipeline job in the specified region
    template_path=template_path,
    job_id=JOB_ID,
    pipeline_root=ROOT_DIR,
    parameter_values=parameter_values,
    enable_caching=True,
)

# job.run(sync=False,experiment=EXPERIMENT_NAME)
job.submit(
    experiment=EXPERIMENT_NAME,
    # sync=False,
    service_account=VERTEX_SA,
)

Creating PipelineJob
PipelineJob created. Resource name: projects/934903580331/locations/us-central1/pipelineJobs/l2l-skip-a-search-forecast-refresh-v1-v1-v7
To use this PipelineJob in another session:
pipeline_job = aiplatform.PipelineJob.get('projects/934903580331/locations/us-central1/pipelineJobs/l2l-skip-a-search-forecast-refresh-v1-v1-v7')
View Pipeline Job:
https://console.cloud.google.com/vertex-ai/locations/us-central1/pipelines/runs/l2l-skip-a-search-forecast-refresh-v1-v1-v7?project=934903580331
Associating projects/934903580331/locations/us-central1/pipelineJobs/l2l-skip-a-search-forecast-refresh-v1-v1-v7 to Experiment: forecast-refresh-v1-v1


In [44]:
pipeline_task_details = job.task_details

for task_deets in pipeline_task_details:
    print(task_deets.task_name)

automl-forecasting-stage-2-tuner
exit-handler-1
finalize-eval-quantile-parameters
l2l-skip-a-search-forecast-refresh-v1-v1-v7
split-materialized-data
get-prediction-image-uri
model-evaluation-forecasting
condition-4
condition-3
model-batch-explanation
training-configurator-and-validator
model-upload
table-to-uri
automl-forecasting-ensemble
automl-tabular-finalizer
feature-attribution
feature-transform-engine
get-or-create-model-description
model-batch-predict
condition-2
calculate-training-parameters
set-optional-inputs
get-predictions-column
string-not-empty
importer
model-evaluation-import


In [None]:
# Get model URI
skip_architecture_search_pipeline_task_details = (
    job.gca_resource.job_detail.task_details
)
skip_architecture_search_pipeline_task_details

## TiDE - Time series dense encoder

> (aka `FastNN`)


In [46]:
# Number of weak models in the final ensemble model.
num_selected_trials = 5

train_budget_milli_node_hours = 250  # 15 minutes

JOB_ID = f"tide-full-{EXPERIMENT_NAME}-v2"

print(f"JOB_ID = {JOB_ID}")

JOB_ID = tide-full-forecast-refresh-v1-v1-v2


In [47]:
(
    template_path,
    parameter_values,
) = automl_forecasting_utils.get_time_series_dense_encoder_forecasting_pipeline_and_parameters(
    project=PROJECT_ID,
    location=REGION,
    root_dir=ROOT_DIR,
    evaluated_examples_bigquery_path=f"bq://{PROJECT_ID}.{BIGQUERY_DATASET_NAME}", # `bq://project.dataset`.
    target_column=target_column,
    optimization_objective=optimization_objective,
    transformations=transformations,
    train_budget_milli_node_hours=train_budget_milli_node_hours,
    data_source_csv_filenames=data_source_csv_filenames,
    data_source_bigquery_table_path=data_source_bigquery_table_path,
    weight_column=weight_column,
    predefined_split_key=predefined_split_key,
    training_fraction=training_fraction,
    validation_fraction=validation_fraction,
    test_fraction=test_fraction,
    num_selected_trials=num_selected_trials,
    time_column=time_column,
    # time_series_identifier_column=time_series_identifier_column,
    time_series_identifier_columns=[time_series_identifier_column],
    time_series_attribute_columns=time_series_attribute_columns,
    available_at_forecast_columns=available_at_forecast_columns,
    unavailable_at_forecast_columns=unavailable_at_forecast_columns,
    forecast_horizon=forecast_horizon,
    context_window=context_window,
    # feature_transform_engine_dataflow_subnetwork=dataflow_subnetwork,
    dataflow_subnetwork=dataflow_subnetwork,
    # feature_transform_engine_dataflow_use_public_ips=dataflow_use_public_ips,
    dataflow_use_public_ips=dataflow_use_public_ips,
    # enable_probabilistic_inference=True,
    # quantile forecast, TiDE without probabilistic inference requires `minimize-quantile-loss`
    # quantiles=",".join(map(lambda x: str(x), [0.25, 0.5, 0.9])),
)

# job_id = "tide-forecasting-{}".format(EXPERIMENT_NAME)
job = aiplatform.PipelineJob(
    display_name=JOB_ID,
    location=REGION,  # launches the pipeline job in the specified region
    template_path=template_path,
    job_id=JOB_ID,
    pipeline_root=ROOT_DIR,
    parameter_values=parameter_values,
    enable_caching=False,
)

# job.run(sync=False,experiment=EXPERIMENT_NAME)
job.submit(
    experiment=EXPERIMENT_NAME,
    # sync=False,
    service_account=VERTEX_SA,
)

Creating PipelineJob
PipelineJob created. Resource name: projects/934903580331/locations/us-central1/pipelineJobs/tide-full-forecast-refresh-v1-v1-v2
To use this PipelineJob in another session:
pipeline_job = aiplatform.PipelineJob.get('projects/934903580331/locations/us-central1/pipelineJobs/tide-full-forecast-refresh-v1-v1-v2')
View Pipeline Job:
https://console.cloud.google.com/vertex-ai/locations/us-central1/pipelines/runs/tide-full-forecast-refresh-v1-v1-v2?project=934903580331
Associating projects/934903580331/locations/us-central1/pipelineJobs/tide-full-forecast-refresh-v1-v1-v2 to Experiment: forecast-refresh-v1-v1


In [50]:
# pipeline_task_details = job.gca_resource.job_detail.task_details

pipeline_task_details = job.task_details

for task_deets in pipeline_task_details:
    print(task_deets.task_name)

finalize-eval-quantile-parameters-2
exit-handler-1
string-not-empty
model-evaluation-forecasting-2
condition-2
condition-5
model-batch-explanation-2
automl-forecasting-stage-1-tuner
table-to-uri-2
feature-transform-engine
tide-full-forecast-refresh-v1-v1-v2
get-prediction-image-uri-2
feature-attribution-2
model-upload-2
automl-tabular-finalizer
split-materialized-data
model-batch-predict-2
calculate-training-parameters-2
get-predictions-column-2
set-optional-inputs
training-configurator-and-validator
automl-forecasting-ensemble-2
model-evaluation-import-2
condition-4
get-or-create-model-description-2


## TiDE - Skip architecture search

> After retrieving the thing result from the stage 1 tuner, you can use it to skip the model architecture search.

In [55]:
# Number of weak models in the final ensemble model.
num_selected_trials = 5

train_budget_milli_node_hours = 250  # 15 minutes

JOB_ID = f"tide-skip-search-{EXPERIMENT_NAME}"

RUN_EVALUATION = False

print(f"JOB_ID = {JOB_ID}")

JOB_ID = tide-skip-search-forecast-refresh-v1-v1


In [56]:
# Retrieve the tuning result output from the previous training pipeline.
stage_1_tuner_task = helpers.get_task_detail(
    pipeline_task_details, "automl-forecasting-stage-1-tuner"
)

stage_1_tuning_result_artifact_uri = (
    stage_1_tuner_task.outputs["tuning_result_output"].artifacts[0].uri
)

print(stage_1_tuning_result_artifact_uri)
train_budget_milli_node_hours = 250.0  # 15 minutes

gs://forecast-refresh-v1-hybrid-vertex-gcs/automl_forecasting_pipeline/forecast-refresh-v1-v1/run-2106_14_34_399161/934903580331/tide-full-forecast-refresh-v1-v1-v2/automl-forecasting-stage-1-tuner_-2508097211070414848/tuning_result_output


In [57]:

(
    template_path,
    parameter_values,
) = automl_forecasting_utils.get_time_series_dense_encoder_forecasting_pipeline_and_parameters(
    project=PROJECT_ID,
    location=REGION,
    root_dir=ROOT_DIR,
    target_column=target_column,
    optimization_objective=optimization_objective,
    transformations=transformations,
    train_budget_milli_node_hours=train_budget_milli_node_hours,
    data_source_csv_filenames=data_source_csv_filenames,
    data_source_bigquery_table_path=data_source_bigquery_table_path,
    weight_column=weight_column,
    predefined_split_key=predefined_split_key,
    training_fraction=training_fraction,
    validation_fraction=validation_fraction,
    test_fraction=test_fraction,
    num_selected_trials=num_selected_trials,
    time_column=time_column,
    # time_series_identifier_column=time_series_identifier_column,
    time_series_identifier_columns=[time_series_identifier_column],
    time_series_attribute_columns=time_series_attribute_columns,
    available_at_forecast_columns=available_at_forecast_columns,
    unavailable_at_forecast_columns=unavailable_at_forecast_columns,
    forecast_horizon=forecast_horizon,
    context_window=context_window,
    # feature_transform_engine_dataflow_subnetwork=dataflow_subnetwork,
    dataflow_subnetwork=dataflow_subnetwork,
    # feature_transform_engine_dataflow_use_public_ips=dataflow_use_public_ips,
    dataflow_use_public_ips=dataflow_use_public_ips,
    stage_1_tuning_result_artifact_uri=stage_1_tuning_result_artifact_uri,
    run_evaluation=RUN_EVALUATION
    # enable_probabilistic_inference=True,
    # quantile forecast, TiDE without probabilistic inference requires `minimize-quantile-loss`
    # quantiles=",".join(map(lambda x: str(x), [0.25, 0.5, 0.9])),
)

# job_id = "tide-forecasting-{}".format(EXPERIMENT_NAME)
job = aiplatform.PipelineJob(
    display_name=JOB_ID,
    location=REGION,  # launches the pipeline job in the specified region
    template_path=template_path,
    job_id=JOB_ID,
    pipeline_root=ROOT_DIR,
    parameter_values=parameter_values,
    enable_caching=False,
)

# job.run(sync=False,experiment=EXPERIMENT_NAME)
job.submit(
    experiment=EXPERIMENT_NAME,
    # sync=False,
    service_account=VERTEX_SA,
)

Creating PipelineJob
PipelineJob created. Resource name: projects/934903580331/locations/us-central1/pipelineJobs/tide-skip-search-forecast-refresh-v1-v1
To use this PipelineJob in another session:
pipeline_job = aiplatform.PipelineJob.get('projects/934903580331/locations/us-central1/pipelineJobs/tide-skip-search-forecast-refresh-v1-v1')
View Pipeline Job:
https://console.cloud.google.com/vertex-ai/locations/us-central1/pipelines/runs/tide-skip-search-forecast-refresh-v1-v1?project=934903580331
Associating projects/934903580331/locations/us-central1/pipelineJobs/tide-skip-search-forecast-refresh-v1-v1 to Experiment: forecast-refresh-v1-v1


## Probabilistic training

In [None]:
# Number of weak models in the final ensemble model.
num_selected_trials = 5

train_budget_milli_node_hours = 500  # 30 minutes

(
    template_path,
    parameter_values,
) = automl_forecasting_utils.get_learn_to_learn_forecasting_pipeline_and_parameters(
    project=PROJECT_ID,
    location=REGION,
    root_dir=root_dir,
    target_column=target_column,
    optimization_objective=optimization_objective,
    transformations=transform_config_path,
    train_budget_milli_node_hours=train_budget_milli_node_hours,
    data_source_csv_filenames=data_source_csv_filenames,
    data_source_bigquery_table_path=data_source_bigquery_table_path,
    weight_column=weight_column,
    predefined_split_key=predefined_split_key,
    training_fraction=training_fraction,
    validation_fraction=validation_fraction,
    test_fraction=test_fraction,
    num_selected_trials=num_selected_trials,
    time_column=time_column,
    time_series_identifier_column=time_series_identifier_column,
    time_series_attribute_columns=time_series_attribute_columns,
    available_at_forecast_columns=available_at_forecast_columns,
    unavailable_at_forecast_columns=unavailable_at_forecast_columns,
    forecast_horizon=forecast_horizon,
    context_window=context_window,
    feature_transform_engine_dataflow_subnetwork=dataflow_subnetwork,
    feature_transform_engine_dataflow_use_public_ips=dataflow_use_public_ips,
    enable_probabilistic_inference=True,
    # quantile forecast
    quantiles=",".join(map(lambda x: str(x), [0.25, 0.5, 0.9])),
)

job_id = "l2l-forecasting-probabilistic-inference-{}".format(EXPERIMENT_NAME)
job = aiplatform.PipelineJob(
    display_name=job_id,
    location=REGION,  # launches the pipeline job in the specified region
    template_path=template_path,
    job_id=job_id,
    pipeline_root=root_dir,
    parameter_values=parameter_values,
    enable_caching=False,
)

# job.run(sync=False,experiment=EXPERIMENT_NAME)
job.submit(
    experiment=EXPERIMENT_NAME,
    # sync=False,
    service_account=VERTEX_SA,
)

Creating PipelineJob


INFO:google.cloud.aiplatform.pipeline_jobs:Creating PipelineJob


PipelineJob created. Resource name: projects/294348452381/locations/us-central1/pipelineJobs/l2l-forecasting-probabilistic-inference-bb3f26ad-af03-48e2-b320-d2c89ac8b620


INFO:google.cloud.aiplatform.pipeline_jobs:PipelineJob created. Resource name: projects/294348452381/locations/us-central1/pipelineJobs/l2l-forecasting-probabilistic-inference-bb3f26ad-af03-48e2-b320-d2c89ac8b620


To use this PipelineJob in another session:


INFO:google.cloud.aiplatform.pipeline_jobs:To use this PipelineJob in another session:


pipeline_job = aiplatform.PipelineJob.get('projects/294348452381/locations/us-central1/pipelineJobs/l2l-forecasting-probabilistic-inference-bb3f26ad-af03-48e2-b320-d2c89ac8b620')


INFO:google.cloud.aiplatform.pipeline_jobs:pipeline_job = aiplatform.PipelineJob.get('projects/294348452381/locations/us-central1/pipelineJobs/l2l-forecasting-probabilistic-inference-bb3f26ad-af03-48e2-b320-d2c89ac8b620')


View Pipeline Job:
https://console.cloud.google.com/vertex-ai/locations/us-central1/pipelines/runs/l2l-forecasting-probabilistic-inference-bb3f26ad-af03-48e2-b320-d2c89ac8b620?project=294348452381


INFO:google.cloud.aiplatform.pipeline_jobs:View Pipeline Job:
https://console.cloud.google.com/vertex-ai/locations/us-central1/pipelines/runs/l2l-forecasting-probabilistic-inference-bb3f26ad-af03-48e2-b320-d2c89ac8b620?project=294348452381


PipelineJob projects/294348452381/locations/us-central1/pipelineJobs/l2l-forecasting-probabilistic-inference-bb3f26ad-af03-48e2-b320-d2c89ac8b620 current state:
PipelineState.PIPELINE_STATE_RUNNING


INFO:google.cloud.aiplatform.pipeline_jobs:PipelineJob projects/294348452381/locations/us-central1/pipelineJobs/l2l-forecasting-probabilistic-inference-bb3f26ad-af03-48e2-b320-d2c89ac8b620 current state:
PipelineState.PIPELINE_STATE_RUNNING


PipelineJob projects/294348452381/locations/us-central1/pipelineJobs/l2l-forecasting-probabilistic-inference-bb3f26ad-af03-48e2-b320-d2c89ac8b620 current state:
PipelineState.PIPELINE_STATE_RUNNING


INFO:google.cloud.aiplatform.pipeline_jobs:PipelineJob projects/294348452381/locations/us-central1/pipelineJobs/l2l-forecasting-probabilistic-inference-bb3f26ad-af03-48e2-b320-d2c89ac8b620 current state:
PipelineState.PIPELINE_STATE_RUNNING


PipelineJob projects/294348452381/locations/us-central1/pipelineJobs/l2l-forecasting-probabilistic-inference-bb3f26ad-af03-48e2-b320-d2c89ac8b620 current state:
PipelineState.PIPELINE_STATE_RUNNING


INFO:google.cloud.aiplatform.pipeline_jobs:PipelineJob projects/294348452381/locations/us-central1/pipelineJobs/l2l-forecasting-probabilistic-inference-bb3f26ad-af03-48e2-b320-d2c89ac8b620 current state:
PipelineState.PIPELINE_STATE_RUNNING


PipelineJob projects/294348452381/locations/us-central1/pipelineJobs/l2l-forecasting-probabilistic-inference-bb3f26ad-af03-48e2-b320-d2c89ac8b620 current state:
PipelineState.PIPELINE_STATE_RUNNING


INFO:google.cloud.aiplatform.pipeline_jobs:PipelineJob projects/294348452381/locations/us-central1/pipelineJobs/l2l-forecasting-probabilistic-inference-bb3f26ad-af03-48e2-b320-d2c89ac8b620 current state:
PipelineState.PIPELINE_STATE_RUNNING


PipelineJob projects/294348452381/locations/us-central1/pipelineJobs/l2l-forecasting-probabilistic-inference-bb3f26ad-af03-48e2-b320-d2c89ac8b620 current state:
PipelineState.PIPELINE_STATE_RUNNING


INFO:google.cloud.aiplatform.pipeline_jobs:PipelineJob projects/294348452381/locations/us-central1/pipelineJobs/l2l-forecasting-probabilistic-inference-bb3f26ad-af03-48e2-b320-d2c89ac8b620 current state:
PipelineState.PIPELINE_STATE_RUNNING


PipelineJob projects/294348452381/locations/us-central1/pipelineJobs/l2l-forecasting-probabilistic-inference-bb3f26ad-af03-48e2-b320-d2c89ac8b620 current state:
PipelineState.PIPELINE_STATE_RUNNING


INFO:google.cloud.aiplatform.pipeline_jobs:PipelineJob projects/294348452381/locations/us-central1/pipelineJobs/l2l-forecasting-probabilistic-inference-bb3f26ad-af03-48e2-b320-d2c89ac8b620 current state:
PipelineState.PIPELINE_STATE_RUNNING


PipelineJob projects/294348452381/locations/us-central1/pipelineJobs/l2l-forecasting-probabilistic-inference-bb3f26ad-af03-48e2-b320-d2c89ac8b620 current state:
PipelineState.PIPELINE_STATE_RUNNING


INFO:google.cloud.aiplatform.pipeline_jobs:PipelineJob projects/294348452381/locations/us-central1/pipelineJobs/l2l-forecasting-probabilistic-inference-bb3f26ad-af03-48e2-b320-d2c89ac8b620 current state:
PipelineState.PIPELINE_STATE_RUNNING


PipelineJob projects/294348452381/locations/us-central1/pipelineJobs/l2l-forecasting-probabilistic-inference-bb3f26ad-af03-48e2-b320-d2c89ac8b620 current state:
PipelineState.PIPELINE_STATE_RUNNING


INFO:google.cloud.aiplatform.pipeline_jobs:PipelineJob projects/294348452381/locations/us-central1/pipelineJobs/l2l-forecasting-probabilistic-inference-bb3f26ad-af03-48e2-b320-d2c89ac8b620 current state:
PipelineState.PIPELINE_STATE_RUNNING


PipelineJob projects/294348452381/locations/us-central1/pipelineJobs/l2l-forecasting-probabilistic-inference-bb3f26ad-af03-48e2-b320-d2c89ac8b620 current state:
PipelineState.PIPELINE_STATE_RUNNING


INFO:google.cloud.aiplatform.pipeline_jobs:PipelineJob projects/294348452381/locations/us-central1/pipelineJobs/l2l-forecasting-probabilistic-inference-bb3f26ad-af03-48e2-b320-d2c89ac8b620 current state:
PipelineState.PIPELINE_STATE_RUNNING


PipelineJob projects/294348452381/locations/us-central1/pipelineJobs/l2l-forecasting-probabilistic-inference-bb3f26ad-af03-48e2-b320-d2c89ac8b620 current state:
PipelineState.PIPELINE_STATE_RUNNING


INFO:google.cloud.aiplatform.pipeline_jobs:PipelineJob projects/294348452381/locations/us-central1/pipelineJobs/l2l-forecasting-probabilistic-inference-bb3f26ad-af03-48e2-b320-d2c89ac8b620 current state:
PipelineState.PIPELINE_STATE_RUNNING


PipelineJob projects/294348452381/locations/us-central1/pipelineJobs/l2l-forecasting-probabilistic-inference-bb3f26ad-af03-48e2-b320-d2c89ac8b620 current state:
PipelineState.PIPELINE_STATE_RUNNING


INFO:google.cloud.aiplatform.pipeline_jobs:PipelineJob projects/294348452381/locations/us-central1/pipelineJobs/l2l-forecasting-probabilistic-inference-bb3f26ad-af03-48e2-b320-d2c89ac8b620 current state:
PipelineState.PIPELINE_STATE_RUNNING


PipelineJob projects/294348452381/locations/us-central1/pipelineJobs/l2l-forecasting-probabilistic-inference-bb3f26ad-af03-48e2-b320-d2c89ac8b620 current state:
PipelineState.PIPELINE_STATE_RUNNING


INFO:google.cloud.aiplatform.pipeline_jobs:PipelineJob projects/294348452381/locations/us-central1/pipelineJobs/l2l-forecasting-probabilistic-inference-bb3f26ad-af03-48e2-b320-d2c89ac8b620 current state:
PipelineState.PIPELINE_STATE_RUNNING


PipelineJob projects/294348452381/locations/us-central1/pipelineJobs/l2l-forecasting-probabilistic-inference-bb3f26ad-af03-48e2-b320-d2c89ac8b620 current state:
PipelineState.PIPELINE_STATE_RUNNING


INFO:google.cloud.aiplatform.pipeline_jobs:PipelineJob projects/294348452381/locations/us-central1/pipelineJobs/l2l-forecasting-probabilistic-inference-bb3f26ad-af03-48e2-b320-d2c89ac8b620 current state:
PipelineState.PIPELINE_STATE_RUNNING


PipelineJob projects/294348452381/locations/us-central1/pipelineJobs/l2l-forecasting-probabilistic-inference-bb3f26ad-af03-48e2-b320-d2c89ac8b620 current state:
PipelineState.PIPELINE_STATE_RUNNING


INFO:google.cloud.aiplatform.pipeline_jobs:PipelineJob projects/294348452381/locations/us-central1/pipelineJobs/l2l-forecasting-probabilistic-inference-bb3f26ad-af03-48e2-b320-d2c89ac8b620 current state:
PipelineState.PIPELINE_STATE_RUNNING


PipelineJob projects/294348452381/locations/us-central1/pipelineJobs/l2l-forecasting-probabilistic-inference-bb3f26ad-af03-48e2-b320-d2c89ac8b620 current state:
PipelineState.PIPELINE_STATE_RUNNING


INFO:google.cloud.aiplatform.pipeline_jobs:PipelineJob projects/294348452381/locations/us-central1/pipelineJobs/l2l-forecasting-probabilistic-inference-bb3f26ad-af03-48e2-b320-d2c89ac8b620 current state:
PipelineState.PIPELINE_STATE_RUNNING


PipelineJob projects/294348452381/locations/us-central1/pipelineJobs/l2l-forecasting-probabilistic-inference-bb3f26ad-af03-48e2-b320-d2c89ac8b620 current state:
PipelineState.PIPELINE_STATE_RUNNING


INFO:google.cloud.aiplatform.pipeline_jobs:PipelineJob projects/294348452381/locations/us-central1/pipelineJobs/l2l-forecasting-probabilistic-inference-bb3f26ad-af03-48e2-b320-d2c89ac8b620 current state:
PipelineState.PIPELINE_STATE_RUNNING


PipelineJob projects/294348452381/locations/us-central1/pipelineJobs/l2l-forecasting-probabilistic-inference-bb3f26ad-af03-48e2-b320-d2c89ac8b620 current state:
PipelineState.PIPELINE_STATE_RUNNING


INFO:google.cloud.aiplatform.pipeline_jobs:PipelineJob projects/294348452381/locations/us-central1/pipelineJobs/l2l-forecasting-probabilistic-inference-bb3f26ad-af03-48e2-b320-d2c89ac8b620 current state:
PipelineState.PIPELINE_STATE_RUNNING


PipelineJob projects/294348452381/locations/us-central1/pipelineJobs/l2l-forecasting-probabilistic-inference-bb3f26ad-af03-48e2-b320-d2c89ac8b620 current state:
PipelineState.PIPELINE_STATE_RUNNING


INFO:google.cloud.aiplatform.pipeline_jobs:PipelineJob projects/294348452381/locations/us-central1/pipelineJobs/l2l-forecasting-probabilistic-inference-bb3f26ad-af03-48e2-b320-d2c89ac8b620 current state:
PipelineState.PIPELINE_STATE_RUNNING


PipelineJob projects/294348452381/locations/us-central1/pipelineJobs/l2l-forecasting-probabilistic-inference-bb3f26ad-af03-48e2-b320-d2c89ac8b620 current state:
PipelineState.PIPELINE_STATE_RUNNING


INFO:google.cloud.aiplatform.pipeline_jobs:PipelineJob projects/294348452381/locations/us-central1/pipelineJobs/l2l-forecasting-probabilistic-inference-bb3f26ad-af03-48e2-b320-d2c89ac8b620 current state:
PipelineState.PIPELINE_STATE_RUNNING


PipelineJob run completed. Resource name: projects/294348452381/locations/us-central1/pipelineJobs/l2l-forecasting-probabilistic-inference-bb3f26ad-af03-48e2-b320-d2c89ac8b620


INFO:google.cloud.aiplatform.pipeline_jobs:PipelineJob run completed. Resource name: projects/294348452381/locations/us-central1/pipelineJobs/l2l-forecasting-probabilistic-inference-bb3f26ad-af03-48e2-b320-d2c89ac8b620


In [None]:
pipeline_task_details = job.gca_resource.job_detail.task_details

# experiment logging

[experiments with pipelines](https://cloud.google.com/vertex-ai/docs/experiments/add-pipelinerun-experiment#associate-pipeline-run-with-experiment-run)

In [None]:
def log_pipeline_job_to_experiment_sample(
    experiment_name: str,
    pipeline_job_display_name: str,
    template_path: str,
    pipeline_root: str,
    parameter_values: Optional[Dict[str, Any]],
    project: str,
    location: str,
):
    aiplatform.init(project=project, location=location)

    pipeline_job = aiplatform.PipelineJob(
        display_name=pipeline_job_display_name,
        template_path=template_path,
        pipeline_root=pipeline_root,
        parameter_values=parameter_values,
    )

    pipeline_job.submit(experiment=experiment_name)

In [None]:
def log_pipeline_job_sample(
    experiment_name: str,
    run_name: str,
    pipeline_job: aiplatform.PipelineJob,
    project: str,
    location: str,
):
    aiplatform.init(experiment=experiment_name, project=project, location=location)

    aiplatform.start_run(run=run_name, resume=True)

    aiplatform.log(pipeline_job=pipeline_job)

##Batch prediction

### For liquor dataset

In [None]:
print(f"Running Batch prediction for model: {forecasting_mp_model.display_name}")


batch_predict_bq_output_uri_prefix = f"bq://{PROJECT_ID}"


# Not use this since FTE not support US dataset in us-central1, please copy this
# BigQuery table to your own BigQuery dataset that is located in us-central1.
# PREDICTION_DATASET_BQ_PATH = (
#     "bq://bigquery-public-data:iowa_liquor_sales_forecasting.2021_sales_predict"
# )

PREDICTION_DATASET_BQ_PATH = (
    f"bq://{PROJECT_ID}.iowa_liquor_sales_forecasting_us_central1.2021_sales_predict"
)



batch_prediction_job = forecasting_mp_model.batch_predict(
    job_display_name=f"forecasting_iowa_liquor_sales_forecasting_predictions",
    bigquery_source=PREDICTION_DATASET_BQ_PATH,
    instances_format="bigquery",
    bigquery_destination_prefix=batch_predict_bq_output_uri_prefix,
    predictions_format="bigquery",
    generate_explanation=False,
    sync=True,
)

print(batch_prediction_job)

Creating BatchPredictionJob


INFO:google.cloud.aiplatform.jobs:Creating BatchPredictionJob


BatchPredictionJob created. Resource name: projects/294348452381/locations/us-central1/batchPredictionJobs/4296530389217837056


INFO:google.cloud.aiplatform.jobs:BatchPredictionJob created. Resource name: projects/294348452381/locations/us-central1/batchPredictionJobs/4296530389217837056


To use this BatchPredictionJob in another session:


INFO:google.cloud.aiplatform.jobs:To use this BatchPredictionJob in another session:


bpj = aiplatform.BatchPredictionJob('projects/294348452381/locations/us-central1/batchPredictionJobs/4296530389217837056')


INFO:google.cloud.aiplatform.jobs:bpj = aiplatform.BatchPredictionJob('projects/294348452381/locations/us-central1/batchPredictionJobs/4296530389217837056')


View Batch Prediction Job:
https://console.cloud.google.com/ai/platform/locations/us-central1/batch-predictions/4296530389217837056?project=294348452381


INFO:google.cloud.aiplatform.jobs:View Batch Prediction Job:
https://console.cloud.google.com/ai/platform/locations/us-central1/batch-predictions/4296530389217837056?project=294348452381


BatchPredictionJob projects/294348452381/locations/us-central1/batchPredictionJobs/4296530389217837056 current state:
JobState.JOB_STATE_RUNNING


INFO:google.cloud.aiplatform.jobs:BatchPredictionJob projects/294348452381/locations/us-central1/batchPredictionJobs/4296530389217837056 current state:
JobState.JOB_STATE_RUNNING


BatchPredictionJob projects/294348452381/locations/us-central1/batchPredictionJobs/4296530389217837056 current state:
JobState.JOB_STATE_RUNNING


INFO:google.cloud.aiplatform.jobs:BatchPredictionJob projects/294348452381/locations/us-central1/batchPredictionJobs/4296530389217837056 current state:
JobState.JOB_STATE_RUNNING


BatchPredictionJob projects/294348452381/locations/us-central1/batchPredictionJobs/4296530389217837056 current state:
JobState.JOB_STATE_RUNNING


INFO:google.cloud.aiplatform.jobs:BatchPredictionJob projects/294348452381/locations/us-central1/batchPredictionJobs/4296530389217837056 current state:
JobState.JOB_STATE_RUNNING


BatchPredictionJob projects/294348452381/locations/us-central1/batchPredictionJobs/4296530389217837056 current state:
JobState.JOB_STATE_RUNNING


INFO:google.cloud.aiplatform.jobs:BatchPredictionJob projects/294348452381/locations/us-central1/batchPredictionJobs/4296530389217837056 current state:
JobState.JOB_STATE_RUNNING


BatchPredictionJob projects/294348452381/locations/us-central1/batchPredictionJobs/4296530389217837056 current state:
JobState.JOB_STATE_RUNNING


INFO:google.cloud.aiplatform.jobs:BatchPredictionJob projects/294348452381/locations/us-central1/batchPredictionJobs/4296530389217837056 current state:
JobState.JOB_STATE_RUNNING


BatchPredictionJob projects/294348452381/locations/us-central1/batchPredictionJobs/4296530389217837056 current state:
JobState.JOB_STATE_RUNNING


INFO:google.cloud.aiplatform.jobs:BatchPredictionJob projects/294348452381/locations/us-central1/batchPredictionJobs/4296530389217837056 current state:
JobState.JOB_STATE_RUNNING


BatchPredictionJob projects/294348452381/locations/us-central1/batchPredictionJobs/4296530389217837056 current state:
JobState.JOB_STATE_RUNNING


INFO:google.cloud.aiplatform.jobs:BatchPredictionJob projects/294348452381/locations/us-central1/batchPredictionJobs/4296530389217837056 current state:
JobState.JOB_STATE_RUNNING


BatchPredictionJob projects/294348452381/locations/us-central1/batchPredictionJobs/4296530389217837056 current state:
JobState.JOB_STATE_RUNNING


INFO:google.cloud.aiplatform.jobs:BatchPredictionJob projects/294348452381/locations/us-central1/batchPredictionJobs/4296530389217837056 current state:
JobState.JOB_STATE_RUNNING


BatchPredictionJob projects/294348452381/locations/us-central1/batchPredictionJobs/4296530389217837056 current state:
JobState.JOB_STATE_RUNNING


INFO:google.cloud.aiplatform.jobs:BatchPredictionJob projects/294348452381/locations/us-central1/batchPredictionJobs/4296530389217837056 current state:
JobState.JOB_STATE_RUNNING


BatchPredictionJob projects/294348452381/locations/us-central1/batchPredictionJobs/4296530389217837056 current state:
JobState.JOB_STATE_RUNNING


INFO:google.cloud.aiplatform.jobs:BatchPredictionJob projects/294348452381/locations/us-central1/batchPredictionJobs/4296530389217837056 current state:
JobState.JOB_STATE_RUNNING


BatchPredictionJob projects/294348452381/locations/us-central1/batchPredictionJobs/4296530389217837056 current state:
JobState.JOB_STATE_SUCCEEDED


INFO:google.cloud.aiplatform.jobs:BatchPredictionJob projects/294348452381/locations/us-central1/batchPredictionJobs/4296530389217837056 current state:
JobState.JOB_STATE_SUCCEEDED


BatchPredictionJob run completed. Resource name: projects/294348452381/locations/us-central1/batchPredictionJobs/4296530389217837056


INFO:google.cloud.aiplatform.jobs:BatchPredictionJob run completed. Resource name: projects/294348452381/locations/us-central1/batchPredictionJobs/4296530389217837056


<google.cloud.aiplatform.jobs.BatchPredictionJob object at 0x7f8878d0cb80> 
resource name: projects/294348452381/locations/us-central1/batchPredictionJobs/4296530389217837056
