# Forecasting with Vertex Tabular Workflows

[Tabular Workflows for Forecasting](https://cloud.google.com/vertex-ai/docs/tabular-data/tabular-workflows/overview#f) is the complete pipeline for forecasting tasks. It is similar to the AutoML API, but allows you to choose what to control and what to automate. Instead of having controls for the whole pipeline, you have controls for every step in the pipeline. These pipeline controls include:

* Data splitting
* Feature engineering
* Architecture search
* Model training
* Model ensembling

**Objectives of this notebook**
* train with and forecast *Iowa liquor BigQuery public dataset*
* Use Tabular Workflows to orchestrate Vertex Forecast pipeline
* Track experiments
* Run model evalutaions for trained forecast models

**TODOs**
* `skip architecture search` in a retraining pipeline
* upload v2 of a model and its evals

In [1]:
# !pip3 install {USER_FLAG} google-cloud-aiplatform kfp google-cloud-pipeline-components --upgrade
# !pip3 install --no-cache-dir {USER_FLAG} PyYAML==5.3.1 

In [2]:
!python3 -c "import kfp; print('KFP SDK version: {}'.format(kfp.__version__))"
!python3 -c "import google_cloud_pipeline_components; print('google_cloud_pipeline_components version: {}'.format(google_cloud_pipeline_components.__version__))"

KFP SDK version: 2.4.0
google_cloud_pipeline_components version: 2.3.0


## Load notebook config

> use the prefix defined in 00-env-setup

In [3]:
CREATE_NEW_ASSETS = True

In [4]:
# naming convention for all cloud resources
VERSION        = "v1"              # TODO
PREFIX         = f'forecast-refresh-{VERSION}'   # TODO

print(f"PREFIX = {PREFIX}")

PREFIX = forecast-refresh-v1


In [5]:
# staging GCS
GCP_PROJECTS             = !gcloud config get-value project
PROJECT_ID               = GCP_PROJECTS[0]

# ! gcloud config set project $PROJECT_ID

# GCS bucket and paths
BUCKET_NAME              = f'{PREFIX}-{PROJECT_ID}-gcs'
BUCKET_URI               = f'gs://{BUCKET_NAME}'

config = !gsutil cat {BUCKET_URI}/config/notebook_env.py
print(config.n)
exec(config.n)


PROJECT_ID               = "hybrid-vertex"
PROJECT_NUM              = "934903580331"
LOCATION                 = "us-central1"

REGION                   = "us-central1"
BQ_LOCATION              = "US"
VPC_NETWORK_NAME         = "ucaip-haystack-vpc-network"

VERTEX_SA                = "934903580331-compute@developer.gserviceaccount.com"

PREFIX                   = "forecast-refresh-v1"
VERSION                  = "v1"

BUCKET_NAME              = "forecast-refresh-v1-hybrid-vertex-gcs"
BUCKET_URI               = "gs://forecast-refresh-v1-hybrid-vertex-gcs"

DATA_GCS_PREFIX          = "data"
DATA_PATH                = "gs://forecast-refresh-v1-hybrid-vertex-gcs/data"


VPC_NETWORK_FULL         = "projects/934903580331/global/networks/ucaip-haystack-vpc-network"



In [6]:
# For a list of available model metrics, go here:
!gsutil ls $BUCKET_URI

gs://forecast-refresh-v1-hybrid-vertex-gcs/automl_forecasting_pipeline/
gs://forecast-refresh-v1-hybrid-vertex-gcs/config/


### Define Vertex Experiment

In [7]:
EXPERIMENT_TAG     = "tide-twrkflow-eval"
EXPERIMENT_VERSION = "v1"

EXPERIMENT_NAME = f"{EXPERIMENT_TAG}-{EXPERIMENT_VERSION}"

print(EXPERIMENT_NAME)

tide-twrkflow-eval-v1


## Imports

In [8]:
# Import required modules
import json
import datetime
from pprint import pprint
from typing import Any, Dict, List, Optional

from google.cloud import aiplatform, storage, bigquery

# from google_cloud_pipeline_components.types.artifact_types import VertexDataset
from google_cloud_pipeline_components.preview.automl.forecasting import \
    utils as automl_forecasting_utils


# Construct a BigQuery client object.
bq_client = bigquery.Client(project=PROJECT_ID)

aiplatform.init(
    experiment=EXPERIMENT_NAME, 
    project=PROJECT_ID, 
    location=REGION
)

import sys
sys.path.append("..")
from src import helpers

## Create BigQuery Dataset

In [10]:
BIGQUERY_DATASET_NAME = EXPERIMENT_NAME.replace("-","_")

if CREATE_NEW_ASSETS:
    ds = bigquery.Dataset(f"{PROJECT_ID}.{BIGQUERY_DATASET_NAME}")
    ds.location = BQ_LOCATION
    ds = bq_client.create_dataset(dataset = ds, exists_ok = True)
    # print(ds.full_dataset_id)
else:
    ds = bigquery.Dataset(f"{PROJECT_ID}.{BIGQUERY_DATASET_NAME}")
    
ds 
# ds.dataset_id
# ds.full_dataset_id

Dataset(DatasetReference('hybrid-vertex', 'tide_twrkflow_eval_v1'))

## prepare train job

In [11]:
# Dataflow's fully qualified subnetwork name, when empty the default subnetwork will be used.
dataflow_subnetwork           = None 

# Specifies whether Dataflow workers use public IP addresses.
dataflow_use_public_ips       = True

NOW                           = datetime.datetime.now().strftime("%d %H:%M:%S.%f").replace(" ","").replace(":","_").replace(".","_")
ROOT_DIR                      = f"{BUCKET_URI}/automl_forecasting_pipeline/{EXPERIMENT_NAME}/run-{NOW}"
time_column                   = "date"
time_series_identifier_column = "store_name"
target_column                 = "sale_dollars"
data_source_csv_filenames     = None

print(f"ROOT_DIR = {ROOT_DIR}")

ROOT_DIR = gs://forecast-refresh-v1-hybrid-vertex-gcs/automl_forecasting_pipeline/tide-twrkflow-eval-v1/run-0216_31_00_023491


In [12]:
data_source_bigquery_table_path = (
    "bq://bigquery-public-data.iowa_liquor_sales_forecasting.2020_sales_train"
)

training_fraction = 0.8
validation_fraction = 0.1
test_fraction = 0.1

predefined_split_key = None
if predefined_split_key:
    training_fraction = None
    validation_fraction = None
    test_fraction = None

weight_column = None

features = [
    time_column,
    target_column,
    "city",
    "zip_code",
    "county",
]

available_at_forecast_columns = [time_column]
unavailable_at_forecast_columns = [target_column]
time_series_attribute_columns = ["city", "zip_code", "county"]

forecast_horizon = 150
context_window = 150

print(f"available_at_forecast_columns    = {available_at_forecast_columns}")
print(f"unavailable_at_forecast_columns  = {unavailable_at_forecast_columns}")
print(f"time_series_attribute_columns    = {time_series_attribute_columns}")

available_at_forecast_columns    = ['date']
unavailable_at_forecast_columns  = ['sale_dollars']
time_series_attribute_columns    = ['city', 'zip_code', 'county']


In [13]:
transformations = helpers.generate_transformation(auto_column_names=features)

print(f"transformations       = {transformations}\n")

transformations       = {'auto': ['date', 'sale_dollars', 'city', 'zip_code', 'county'], 'numeric': [], 'categorical': [], 'text': [], 'timestamp': []}



In [14]:
# For a list of available model metrics, go here:
!gsutil ls $BUCKET_URI/

gs://forecast-refresh-v1-hybrid-vertex-gcs/automl_forecasting_pipeline/
gs://forecast-refresh-v1-hybrid-vertex-gcs/config/


# Vertex Forecast Training

**Optimization Objectives** ([docs](https://cloud.google.com/vertex-ai/docs/tabular-data/forecasting-parameters#optimization-objectives))

| Objective  | API                      | Use case |
| :--------: | :------------:           | :------------------------------------- |
| RMSE       | `minimize-rmse`          | Minimize root-mean-squared error (RMSE). Captures more extreme values accurately and is less biased when aggregating predictions.Default value. |
| MAE        | `minimize-mae`           | Minimize mean-absolute error (MAE). Views extreme values as outliers with less impact on model. |
| RMSLE      | `minimize-rmsle`         | Minimize root-mean-squared log error (RMSLE). Penalizes error on relative size rather than absolute value. Useful when both predicted and actual values can be large. |
| RMSPE      | `minimize-rmspe`         | Minimize root-mean-squared percentage error (RMSPE). Captures a large range of values accurately. Similar to RMSE, but relative to target magnitude. Useful when the range of values is large. |
| WAPE       | `minimize-wape-mae`      | Minimize the combination of weighted absolute percentage error (WAPE) and mean-absolute-error (MAE). Useful when the actual values are low. |
| QUANTILE   | `minimize-quantile-loss` | Minimize the scaled pinball loss of the defined quantiles to quantify uncertainty in estimates. Quantile predictions quantify the uncertainty of predictions. They measure the likelihood of a prediction being within a range. |


**TiDE on Vertex Tabluar Workflows**
* [src](https://github.com/kubeflow/pipelines/blob/master/components/google-cloud/google_cloud_pipeline_components/preview/automl/forecasting/utils.py#L413)

#### TODO

* add `with dsl.ParallelFor(LIST) as cw:` for parallel jobs with diff params (e.g., statmike [example](https://github.com/statmike/vertex-ai-mlops/blob/main/Applied%20Forecasting/Vertex%20AI%20Pipelines%20-%20Forecasting%20Tournament%20with%20Kubeflow%20Pipelines%20(KFP).ipynb)

In [19]:
# Number of weak models in the final ensemble model.
num_selected_trials           = 5
train_budget_milli_node_hours = 250  # 30 minutes

optimization_objective        = "minimize-rmspe" 

RUN_EVALUATION                = True

PROBABILISTIC_INFER           = False
# QUANTILES                     = [0.25, 0.5, 0.9] # [0.05, 0.25, 0.50, 0.75, 0.95]

JOB_ID                        = f"{EXPERIMENT_NAME}-{NOW}".replace("_","-")

print(f"JOB_ID = {JOB_ID}")

JOB_ID = tide-twrkflow-eval-v1-0216-31-00-023491


## (1) TiDE - full AutoML train & eval

TiDE stands for "Time series Dense Encoder", which is a new model type in Vertex Forecasting and has the best training and inference performance while not sacrificing any model quality.

For more details, please see https://ai.googleblog.com/2023/04/recent-advances-in-deep-long-horizon.html

You will create a skip evaluation AutoML Forecasting pipeline with the following customizations:
- Limit the hyperparameter search space
- Change machine type and tuning / training parallelism

In [20]:
(
    template_path,
    parameter_values,
) = automl_forecasting_utils.get_time_series_dense_encoder_forecasting_pipeline_and_parameters(
    project=PROJECT_ID,
    location=REGION,
    root_dir=ROOT_DIR,
    target_column=target_column,
    optimization_objective=optimization_objective,
    transformations=transformations,
    train_budget_milli_node_hours=train_budget_milli_node_hours,
    data_source_csv_filenames=data_source_csv_filenames,
    data_source_bigquery_table_path=data_source_bigquery_table_path,
    weight_column=weight_column,
    predefined_split_key=predefined_split_key,
    training_fraction=training_fraction,
    validation_fraction=validation_fraction,
    test_fraction=test_fraction,
    num_selected_trials=num_selected_trials,
    time_column=time_column,
    time_series_identifier_columns=[time_series_identifier_column],
    time_series_attribute_columns=time_series_attribute_columns,
    available_at_forecast_columns=available_at_forecast_columns,
    unavailable_at_forecast_columns=unavailable_at_forecast_columns,
    forecast_horizon=forecast_horizon,
    context_window=context_window,
    dataflow_subnetwork=dataflow_subnetwork,
    dataflow_use_public_ips=dataflow_use_public_ips,
    run_evaluation=RUN_EVALUATION,                          # set True to eval on test/valid set
    evaluated_examples_bigquery_path=f'bq://{PROJECT_ID}.{BIGQUERY_DATASET_NAME}',
    enable_probabilistic_inference=PROBABILISTIC_INFER,
    # holiday_regions=['US','AE'],
    
    ### quantile forecast
    # quantiles=QUANTILES,
    
    ### hierarchical forecast
    # group_columns=XXXX,
    # group_total_weight=XXXX,
    # temporal_total_weight=XXXX,
    # group_temporal_total_weight=XXXX,
)


job = aiplatform.PipelineJob(
    display_name=JOB_ID,
    location=REGION,  # launches the pipeline job in the specified region
    template_path=template_path,
    job_id=JOB_ID,
    pipeline_root=ROOT_DIR,
    parameter_values=parameter_values,
    enable_caching=True,
    # Uncomment the following line if you want to use Vertex managed dataset.
    # input_artifacts={'vertex_dataset': vertex_dataset_artifact_id},
)

# job.run(sync=False,experiment=EXPERIMENT_NAME)
job.submit(
    experiment=EXPERIMENT_NAME,
    # sync=False,
    service_account=VERTEX_SA,
)

Creating PipelineJob
PipelineJob created. Resource name: projects/934903580331/locations/us-central1/pipelineJobs/tide-twrkflow-eval-v1-0216-31-00-023491
To use this PipelineJob in another session:
pipeline_job = aiplatform.PipelineJob.get('projects/934903580331/locations/us-central1/pipelineJobs/tide-twrkflow-eval-v1-0216-31-00-023491')
View Pipeline Job:
https://console.cloud.google.com/vertex-ai/locations/us-central1/pipelines/runs/tide-twrkflow-eval-v1-0216-31-00-023491?project=934903580331
Associating projects/934903580331/locations/us-central1/pipelineJobs/tide-twrkflow-eval-v1-0216-31-00-023491 to Experiment: tide-twrkflow-eval-v1


In [42]:
template_path

'/home/jupyter/.local/lib/python3.10/site-packages/google_cloud_pipeline_components/preview/automl/forecasting/time_series_dense_encoder_forecasting_pipeline.yaml'

In [47]:
parameter_values #['context_window']

{'project': 'hybrid-vertex',
 'location': 'us-central1',
 'root_dir': 'gs://forecast-refresh-v1-hybrid-vertex-gcs/automl_forecasting_pipeline/tide-twrkflow-eval-v1/run-0218_06_46_910402',
 'evaluated_examples_bigquery_path': 'bq://hybrid-vertex.tide_twrkflow_eval_v1',
 'target_column': 'sale_dollars',
 'optimization_objective': 'minimize-rmspe',
 'transformations': {'auto': ['date',
   'sale_dollars',
   'city',
   'zip_code',
   'county'],
  'numeric': [],
  'categorical': [],
  'text': [],
  'timestamp': []},
 'train_budget_milli_node_hours': 250.0,
 'time_column': 'date',
 'time_series_identifier_columns': ['store_name'],
 'time_series_attribute_columns': ['city', 'zip_code', 'county'],
 'available_at_forecast_columns': ['date'],
 'unavailable_at_forecast_columns': ['sale_dollars'],
 'forecast_horizon': 150,
 'context_window': 150,
 'stage_1_tuning_result_artifact_uri': 'gs://forecast-refresh-v1-hybrid-vertex-gcs/automl_forecasting_pipeline/tide-twrkflow-eval-v1/run-0216_31_00_02349

In [21]:
pipeline_task_details = job.task_details

for task_deets in pipeline_task_details:
    print(task_deets.task_name)

model-evaluation-forecasting-2
string-not-empty
get-or-create-model-description-2
condition-2
automl-forecasting-ensemble-2
finalize-eval-quantile-parameters-2
table-to-uri-2
model-upload-2
exit-handler-1
feature-transform-engine
calculate-training-parameters-2
model-batch-explanation-2
model-batch-predict-2
set-optional-inputs
split-materialized-data
condition-5
tide-twrkflow-eval-v1-0216-31-00-023491
feature-attribution-2
model-evaluation-import-2
training-configurator-and-validator
get-prediction-image-uri-2
automl-tabular-finalizer
condition-4
automl-forecasting-stage-1-tuner
get-predictions-column-2


### Get trained model

In [61]:
for put in pipeline_task_details[0].outputs:
    print(put)
    #['outputs']

evaluation_metrics


In [63]:
pipeline_task_details[0].outputs == "evaluation_metrics"

False

In [22]:
stage_1_tuner_task = helpers.get_task_detail(
    pipeline_task_details, "automl-forecasting-stage-1-tuner"
)
stage_1_tuning_result_artifact_uri = (
    stage_1_tuner_task.outputs["tuning_result_output"].artifacts[0].uri
)
print(f"stage_1_tuning_result_artifact_uri: {stage_1_tuning_result_artifact_uri}")

# get uploaded model
upload_model_task = helpers.get_task_detail(
    pipeline_task_details, "model-upload-2"
)

forecasting_mp_model_artifact = (
    upload_model_task.outputs["model"].artifacts[0]
)

forecasting_mp_model = aiplatform.Model(forecasting_mp_model_artifact.metadata['resourceName'])
print(f"forecasting_mp_model: {forecasting_mp_model}")

stage_1_tuning_result_artifact_uri: gs://forecast-refresh-v1-hybrid-vertex-gcs/automl_forecasting_pipeline/tide-twrkflow-eval-v1/run-0216_31_00_023491/934903580331/tide-twrkflow-eval-v1-0216-31-00-023491/automl-forecasting-stage-1-tuner_7922991591873052672/tuning_result_output
forecasting_mp_model: <google.cloud.aiplatform.models.Model object at 0x7fe368435c90> 
resource name: projects/934903580331/locations/us-central1/models/6045359021093814272


In [64]:
forecasting_mp_model

<google.cloud.aiplatform.models.Model object at 0x7fe368435c90> 
resource name: projects/934903580331/locations/us-central1/models/6045359021093814272

### Model Evaluations

In [23]:
if RUN_EVALUATION:
    forecast_EVALS = forecasting_mp_model.list_model_evaluations()

    for model_evaluation in forecast_EVALS:
        pprint(model_evaluation.to_dict())
        
else:
    print(f"Model evaluations were set to: {RUN_EVALUATION}")

{'createTime': '2024-01-02T18:02:35.473399Z',
 'displayName': 'Vertex Forecasting pipeline',
 'metadata': {'evaluation_dataset_path': ['bq://hybrid-vertex.vertex_feature_transform_engine_staging_us.vertex_ai_fte_split_output_test_staging_ida7e27b170ff745988461a32b7dbde998'],
              'evaluation_dataset_type': 'bigquery',
              'pipeline_job_id': '2379523757292126208',
              'pipeline_job_resource_name': 'projects/934903580331/locations/us-central1/pipelineJobs/tide-twrkflow-eval-v1-0216-31-00-023491'},
 'metrics': {'meanAbsoluteError': 8433.899,
             'meanAbsolutePercentageError': 98.310875,
             'rSquared': 0.019479696,
             'rootMeanSquaredError': 15775.902,
             'rootMeanSquaredLogError': 5.336631,
             'rootMeanSquaredPercentageError': 98.75063,
             'weightedAbsolutePercentageError': 99.72643},
 'metricsSchemaUri': 'gs://google-cloud-aiplatform/schema/modelevaluation/forecasting_metrics_1.0.0.yaml',
 'modelExpla

In [24]:
if RUN_EVALUATION:
    # Get evaluations
    model_evaluations = forecasting_mp_model.list_model_evaluations()

    # Print the evaluation metrics
    for evaluation in model_evaluations:
        evaluation = evaluation.to_dict()
        print("Model's evaluation metrics from training:\n")
        metrics = evaluation["metrics"]
        for metric in metrics.keys():
            print(f"metric: {metric}, value: {metrics[metric]}\n")

Model's evaluation metrics from training:

metric: meanAbsolutePercentageError, value: 98.310875

metric: rSquared, value: 0.019479696

metric: weightedAbsolutePercentageError, value: 99.72643

metric: rootMeanSquaredError, value: 15775.902

metric: rootMeanSquaredPercentageError, value: 98.75063

metric: meanAbsoluteError, value: 8433.899

metric: rootMeanSquaredLogError, value: 5.336631



In [40]:
metrics

{'weightedAbsolutePercentageError': 99.77842,
 'rSquared': 0.0033285806,
 'rootMeanSquaredPercentageError': 98.67152,
 'rootMeanSquaredError': 15777.557,
 'meanAbsolutePercentageError': 98.48151,
 'rootMeanSquaredLogError': 5.496845,
 'meanAbsoluteError': 8438.295}

## (2) TiDE - skip architecture search

Instead of doing architecture search everytime, we can reuse the existing architecture search result. This could help:
1. reducing the variation of the output model
2. reducing training cost

The existing architecture search result is stored in the `tuning_result_output` output of the `automl-forecasting-stage-1-tuner` component. You can manually input it or get it programmatically.

**(Optional)  Parameter(s)**
* `stage_1_tuning_result_artifact_uri` (str): URI of the hyperparameter tuning result from a previous pipeline run.
* `stage_1_tuner_worker_pool_specs_override` (str): Configures number and type of machines for training

In [26]:
NOW      = datetime.datetime.now().strftime("%d %H:%M:%S.%f").replace(" ","").replace(":","_").replace(".","_")
ROOT_DIR = f"{BUCKET_URI}/automl_forecasting_pipeline/{EXPERIMENT_NAME}/run-{NOW}"

JOB_ID   = f"skip-arch-{EXPERIMENT_NAME}-{NOW}".replace("_","-")

print(f"JOB_ID: {JOB_ID}")
print(f"ROOT_DIR: {ROOT_DIR}")

print(forecasting_mp_model)
print(stage_1_tuning_result_artifact_uri)

JOB_ID: skip-arch-tide-twrkflow-eval-v1-0218-06-46-910402
ROOT_DIR: gs://forecast-refresh-v1-hybrid-vertex-gcs/automl_forecasting_pipeline/tide-twrkflow-eval-v1/run-0218_06_46_910402
<google.cloud.aiplatform.models.Model object at 0x7fe368435c90> 
resource name: projects/934903580331/locations/us-central1/models/6045359021093814272
gs://forecast-refresh-v1-hybrid-vertex-gcs/automl_forecasting_pipeline/tide-twrkflow-eval-v1/run-0216_31_00_023491/934903580331/tide-twrkflow-eval-v1-0216-31-00-023491/automl-forecasting-stage-1-tuner_7922991591873052672/tuning_result_output


In [36]:
worker_pool_specs_override = [
  {
      "machine_spec": {
          "machine_type": "n1-standard-8"
      }
  }, # override for TF chief node
  {},  # override for TF worker node, since it's not used, leave it empty
  {},  # override for TF ps node, since it's not used, leave it empty
  {
    "machine_spec": {
        "machine_type": "n1-standard-4" # override for TF evaluator node
    }
  }
]

In [27]:
# Number of weak models in the final ensemble model.
num_selected_trials = 5

train_budget_milli_node_hours = 250  # 15 minutes

(
    template_path,
    parameter_values,
) = automl_forecasting_utils.get_time_series_dense_encoder_forecasting_pipeline_and_parameters(
    project=PROJECT_ID,
    location=REGION,
    root_dir=ROOT_DIR,
    target_column=target_column,
    optimization_objective=optimization_objective,
    transformations=transformations,
    train_budget_milli_node_hours=train_budget_milli_node_hours,
    data_source_csv_filenames=data_source_csv_filenames,
    data_source_bigquery_table_path=data_source_bigquery_table_path,
    weight_column=weight_column,
    predefined_split_key=predefined_split_key,
    training_fraction=training_fraction,
    validation_fraction=validation_fraction,
    test_fraction=test_fraction,
    num_selected_trials=num_selected_trials,
    time_column=time_column,
    time_series_identifier_columns=[time_series_identifier_column],
    time_series_attribute_columns=time_series_attribute_columns,
    available_at_forecast_columns=available_at_forecast_columns,
    unavailable_at_forecast_columns=unavailable_at_forecast_columns,
    forecast_horizon=forecast_horizon,
    context_window=context_window,
    dataflow_subnetwork=dataflow_subnetwork,
    dataflow_use_public_ips=dataflow_use_public_ips,
    stage_1_tuning_result_artifact_uri=stage_1_tuning_result_artifact_uri,
    # stage_1_tuner_worker_pool_specs_override=worker_pool_specs_override,
    # stage_2_trainer_worker_pool_specs_override=worker_pool_specs_override,
    run_evaluation=RUN_EVALUATION,
    evaluated_examples_bigquery_path=f'bq://{PROJECT_ID}.{BIGQUERY_DATASET_NAME}',
    enable_probabilistic_inference=PROBABILISTIC_INFER,
    # holiday_regions=['US','AE'],
)

# job_id = "tide-forecasting-skip-architecture-search-{}".format(uuid.uuid4())
job = aiplatform.PipelineJob(
    display_name=JOB_ID,
    location=REGION,  # launches the pipeline job in the specified region
    template_path=template_path,
    job_id=JOB_ID,
    pipeline_root=ROOT_DIR,
    parameter_values=parameter_values,
    enable_caching=False,
    # Uncomment the following line if you want to use Vertex managed dataset.
    # input_artifacts={'vertex_dataset': vertex_dataset_artifact_id},
)

job.submit(
    experiment=EXPERIMENT_NAME,
    # sync=False,
    service_account=VERTEX_SA,
)

Creating PipelineJob
PipelineJob created. Resource name: projects/934903580331/locations/us-central1/pipelineJobs/skip-arch-tide-twrkflow-eval-v1-0218-06-46-910402
To use this PipelineJob in another session:
pipeline_job = aiplatform.PipelineJob.get('projects/934903580331/locations/us-central1/pipelineJobs/skip-arch-tide-twrkflow-eval-v1-0218-06-46-910402')
View Pipeline Job:
https://console.cloud.google.com/vertex-ai/locations/us-central1/pipelines/runs/skip-arch-tide-twrkflow-eval-v1-0218-06-46-910402?project=934903580331
Associating projects/934903580331/locations/us-central1/pipelineJobs/skip-arch-tide-twrkflow-eval-v1-0218-06-46-910402 to Experiment: tide-twrkflow-eval-v1


In [29]:
skip_arch_search_pipeline_task_details = job.task_details

for task_deets in skip_arch_search_pipeline_task_details:
    print(task_deets.task_name)

string-not-empty
model-evaluation-forecasting
importer
calculate-training-parameters
condition-2
finalize-eval-quantile-parameters
automl-forecasting-ensemble
table-to-uri
exit-handler-1
get-or-create-model-description
feature-transform-engine
skip-arch-tide-twrkflow-eval-v1-0218-06-46-910402
model-batch-explanation
split-materialized-data
model-batch-predict
get-prediction-image-uri
automl-forecasting-stage-2-tuner
feature-attribution
set-optional-inputs
training-configurator-and-validator
model-evaluation-import
automl-tabular-finalizer
model-upload
condition-3
condition-4
get-predictions-column


### Get trained model

In [31]:
# get tuning stage task
stage_2_tuner_task = helpers.get_task_detail(
    skip_arch_search_pipeline_task_details, "automl-forecasting-stage-2-tuner"
)
stage_2_tuning_result_artifact_uri = stage_2_tuner_task.outputs["tuning_result_output"].artifacts[0].uri
print(f"stage-2 result URI     : \n{stage_2_tuning_result_artifact_uri}\n")

stage-2 result URI     : 
gs://forecast-refresh-v1-hybrid-vertex-gcs/automl_forecasting_pipeline/tide-twrkflow-eval-v1/run-0218_06_46_910402/934903580331/skip-arch-tide-twrkflow-eval-v1-0218-06-46-910402/automl-forecasting-stage-2-tuner_3380548417716486144/tuning_result_output



In [32]:
# get uploaded model
upload_model_task_v2 = helpers.get_task_detail(
    skip_arch_search_pipeline_task_details, "model-upload"
)
forecasting_mp_model_v2_artifact = upload_model_task_v2.outputs["model"].artifacts[0]

forecasting_mp_model_v2 = aiplatform.Model(forecasting_mp_model_v2_artifact.metadata['resourceName'])

print(f"forecasting_mp_model_v2 : \n{forecasting_mp_model_v2}")

forecasting_mp_model_v2 : 
<google.cloud.aiplatform.models.Model object at 0x7fe368423730> 
resource name: projects/934903580331/locations/us-central1/models/8995216777021489152


In [33]:
# get values for stage-2 trials
for task_deets in skip_arch_search_pipeline_task_details:
    if task_deets.task_name == f"{JOB_ID}":
        # break
        stage_2_parallel_trials = task_deets.execution.metadata.get(key="input:stage_2_num_parallel_trials")
        stage_2_worker_pool_spec = task_deets.execution.metadata.get(key="input:stage_2_trainer_worker_pool_specs_override")
    
print(f"stage_2_parallel_trials  : {stage_2_parallel_trials}")
print(f"stage_2_worker_pool_spec : {stage_2_worker_pool_spec}")

stage_2_parallel_trials  : 35.0
stage_2_worker_pool_spec : []


### Model Evaluations

In [34]:
if RUN_EVALUATION:
    forecast_EVALS = forecasting_mp_model_v2.list_model_evaluations()

    for model_evaluation in forecast_EVALS:
        pprint(model_evaluation.to_dict())
        
else:
    print(f"Model evaluations were set to: {RUN_EVALUATION}")

{'createTime': '2024-01-02T19:22:27.042018Z',
 'displayName': 'Vertex Forecasting pipeline',
 'metadata': {'evaluation_dataset_path': ['bq://hybrid-vertex.vertex_feature_transform_engine_staging_us.vertex_ai_fte_split_output_test_staging_id60fc0696a7344a6f8ccc674c5611627f'],
              'evaluation_dataset_type': 'bigquery',
              'pipeline_job_id': '8371844536485281792',
              'pipeline_job_resource_name': 'projects/934903580331/locations/us-central1/pipelineJobs/skip-arch-tide-twrkflow-eval-v1-0218-06-46-910402'},
 'metrics': {'meanAbsoluteError': 8438.295,
             'meanAbsolutePercentageError': 98.48151,
             'rSquared': 0.0033285806,
             'rootMeanSquaredError': 15777.557,
             'rootMeanSquaredLogError': 5.496845,
             'rootMeanSquaredPercentageError': 98.67152,
             'weightedAbsolutePercentageError': 99.77842},
 'metricsSchemaUri': 'gs://google-cloud-aiplatform/schema/modelevaluation/forecasting_metrics_1.0.0.yaml',
 '

In [35]:
if RUN_EVALUATION:
    # Get evaluations
    model_evaluations = forecasting_mp_model_v2.list_model_evaluations()

    # Print the evaluation metrics
    for evaluation in model_evaluations:
        evaluation = evaluation.to_dict()
        print("Model's evaluation metrics from training:\n")
        metrics = evaluation["metrics"]
        for metric in metrics.keys():
            print(f"metric: {metric}, value: {metrics[metric]}\n")

Model's evaluation metrics from training:

metric: weightedAbsolutePercentageError, value: 99.77842

metric: rSquared, value: 0.0033285806

metric: rootMeanSquaredPercentageError, value: 98.67152

metric: rootMeanSquaredError, value: 15777.557

metric: meanAbsolutePercentageError, value: 98.48151

metric: rootMeanSquaredLogError, value: 5.496845

metric: meanAbsoluteError, value: 8438.295



**Finished**