# Continuous Training with AutoML Vertex Pipelines

**Learning Objectives:**
1. Learn how to use Vertex AutoML pre-built components
1. Learn how to build a Vertex AutoML pipeline with these components using BigQuery as a data source
1. Learn how to compile, upload, and run the Vertex AutoML pipeline


In this lab, you will build, deploy, and run a Vertex AutoML pipeline that orchestrates the **Vertex AutoML AI** services to train, tune, and deploy a model. 

## Setup

In [37]:
from google.cloud import aiplatform

In [38]:
REGION = "us-central1"
PROJECT = !(gcloud config get-value project)
PROJECT = PROJECT[0]

E0330 16:43:44.795260313   20745 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies


In [39]:
# Set `PATH` to include the directory containing KFP CLI
PATH = %env PATH
%env PATH=/home/jupyter/.local/bin:{PATH}

env: PATH=/home/jupyter/.local/bin:/home/jupyter/.local/bin:/home/jupyter/.local/bin:/usr/local/cuda/bin:/opt/conda/bin:/opt/conda/condabin:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games


## Understanding the pipeline design


The workflow implemented by the pipeline is defined using a Python based Domain Specific Language (DSL). The pipeline's DSL is in the `pipeline_vertex/pipeline_vertex_automl.py` file that we will generate below.

The pipeline's DSL has been designed to avoid hardcoding any environment specific settings like file paths or connection strings. These settings are provided to the pipeline code through a set of environment variables.


## Building and deploying the pipeline

### Exercise

Complete the pipeline below:

In [40]:
%%writefile ./pipeline_vertex/pipeline_vertex_automl.py
# Copyright 2021 Google LLC

# Licensed under the Apache License, Version 2.0 (the "License"); you may not
# use this file except in compliance with the License. You may obtain a copy of
# the License at

# https://www.apache.org/licenses/LICENSE-2.0

# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS"
# BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either
# express or implied. See the License for the specific language governing
# permissions and limitations under the License.

"""Kubeflow Covertype Pipeline."""

import os

from google_cloud_pipeline_components.aiplatform import (
    AutoMLTabularTrainingJobRunOp,
    EndpointCreateOp,
    ModelDeployOp,
    TabularDatasetCreateOp,
)
from kfp.v2 import dsl

PIPELINE_ROOT = os.getenv("PIPELINE_ROOT")
PROJECT = os.getenv("PROJECT")
DATASET_SOURCE = os.getenv("DATASET_SOURCE")
PIPELINE_NAME = os.getenv("PIPELINE_NAME", "covertype")
DISPLAY_NAME = os.getenv("MODEL_DISPLAY_NAME", PIPELINE_NAME)
TARGET_COLUMN = os.getenv("TARGET_COLUMN", "Cover_Type")
SERVING_MACHINE_TYPE = os.getenv("SERVING_MACHINE_TYPE", "n1-standard-16")


@dsl.pipeline(
    name=f"{PIPELINE_NAME}-vertex-automl-pipeline",
    description=f"AutoML Vertex Pipeline for {PIPELINE_NAME}",
    pipeline_root=PIPELINE_ROOT,
)
def create_pipeline():

    dataset_create_task = TabularDatasetCreateOp(
        display_name=DISPLAY_NAME,
        bq_source=DATASET_SOURCE,
        project=PROJECT
    )

    automl_training_task = AutoMLTabularTrainingJobRunOp(
        project=PROJECT,
        display_name=DISPLAY_NAME,
        optimization_prediction_type="classification",
        dataset=dataset_create_task.outputs["dataset"],
        target_column=TARGET_COLUMN
    )

    endpoint_create_task = EndpointCreateOp(
        project=PROJECT,
        display_name=DISPLAY_NAME
    )

    model_deploy_task = ModelDeployOp(  # pylint: disable=unused-variable
        model=automl_training_task.outputs["model"],
        endpoint=endpoint_create_task.outputs["endpoint"],
        deployed_model_display_name=DISPLAY_NAME,
        dedicated_resources_machine_type=SERVING_MACHINE_TYPE,
        dedicated_resources_min_replica_count=1,
        dedicated_resources_max_replica_count=1
    )


Overwriting ./pipeline_vertex/pipeline_vertex_automl.py


### Compile the pipeline

Let's start by defining the environment variables that will be passed to the pipeline compiler:

In [41]:
ARTIFACT_STORE = f"gs://{PROJECT}-kfp-artifact-store"
PIPELINE_ROOT = f"{ARTIFACT_STORE}/pipeline"
DATASET_SOURCE = f"bq://{PROJECT}.covertype_dataset.covertype"

%env PIPELINE_ROOT={PIPELINE_ROOT}
%env PROJECT={PROJECT}
%env REGION={REGION}
%env DATASET_SOURCE={DATASET_SOURCE}

env: PIPELINE_ROOT=gs://qwiklabs-gcp-01-9a9d18213c32-kfp-artifact-store/pipeline
env: PROJECT=qwiklabs-gcp-01-9a9d18213c32
env: REGION=us-central1
env: DATASET_SOURCE=bq://qwiklabs-gcp-01-9a9d18213c32.covertype_dataset.covertype


Let us make sure that the `ARTIFACT_STORE` has been created, and let us create it if not:

In [42]:
!gsutil ls | grep ^{ARTIFACT_STORE}/$ || gsutil mb -l {REGION} {ARTIFACT_STORE}

E0330 16:43:45.623335063   20745 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies


gs://qwiklabs-gcp-01-9a9d18213c32-kfp-artifact-store/


#### Use the CLI compiler to compile the pipeline

We compile the pipeline from the Python file we generated into a JSON description using the following command:

In [43]:
PIPELINE_JSON = "covertype_automl_vertex_pipeline.json"

In [44]:
!pip install kfp google-cloud-pipeline-components

E0330 16:43:47.357186900   20745 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies




### Exercise

Compile the pipeline with the `dsl-compile-v2` command line:

In [45]:
!dsl-compile-v2 --py pipeline_vertex/pipeline_vertex_automl.py --output $PIPELINE_JSON

E0330 16:43:50.864172465   20745 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies




**Note:** You can also use the Python SDK to compile the pipeline:

```python
from kfp.v2 import compiler

compiler.Compiler().compile(
    pipeline_func=create_pipeline, 
    package_path=PIPELINE_JSON,
)

```

The result is the pipeline file. 

In [46]:
!head {PIPELINE_JSON}

{
  "pipelineSpec": {
    "components": {
      "comp-automl-tabular-training-job": {
        "executorLabel": "exec-automl-tabular-training-job",
        "inputDefinitions": {
          "artifacts": {
            "dataset": {
              "artifactType": {
                "schemaTitle": "google.VertexDataset",


E0330 16:43:54.040148340   20745 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies


### Deploy the pipeline package

### Exercise

Upload and run the pipeline to Vertex AI using `aiplatform.PipelineJob`:

In [48]:
from google.cloud import aiplatform
aiplatform.init(project=PROJECT, location=REGION)
pipeline = aiplatform.PipelineJob(
display_name='covertype_kfp_pipeline',
template_path=PIPELINE_JSON,
enable_caching=False,
)
pipeline.run(service_account="qwiklabs-gcp-01-9a9d18213c32@qwiklabs-gcp-01-9a9d18213c32.iam.gserviceaccount.com")

INFO:google.cloud.aiplatform.pipeline_jobs:Creating PipelineJob
INFO:google.cloud.aiplatform.pipeline_jobs:PipelineJob created. Resource name: projects/785019792420/locations/us-central1/pipelineJobs/covertype-vertex-automl-pipeline-20220330164517
INFO:google.cloud.aiplatform.pipeline_jobs:To use this PipelineJob in another session:
INFO:google.cloud.aiplatform.pipeline_jobs:pipeline_job = aiplatform.PipelineJob.get('projects/785019792420/locations/us-central1/pipelineJobs/covertype-vertex-automl-pipeline-20220330164517')
INFO:google.cloud.aiplatform.pipeline_jobs:View Pipeline Job:
https://console.cloud.google.com/vertex-ai/locations/us-central1/pipelines/runs/covertype-vertex-automl-pipeline-20220330164517?project=785019792420
INFO:google.cloud.aiplatform.pipeline_jobs:PipelineJob projects/785019792420/locations/us-central1/pipelineJobs/covertype-vertex-automl-pipeline-20220330164517 current state:
PipelineState.PIPELINE_STATE_RUNNING
INFO:google.cloud.aiplatform.pipeline_jobs:Pipeli

PermissionDenied: 403 Vertex AI API has not been used in project 785019792420 before or it is disabled. Enable it by visiting https://console.developers.google.com/apis/api/aiplatform.googleapis.com/overview?project=785019792420 then retry. If you enabled this API recently, wait a few minutes for the action to propagate to our systems and retry. [links {
  description: "Google developers console API activation"
  url: "https://console.developers.google.com/apis/api/aiplatform.googleapis.com/overview?project=785019792420"
}
, reason: "SERVICE_DISABLED"
domain: "googleapis.com"
metadata {
  key: "consumer"
  value: "projects/785019792420"
}
metadata {
  key: "service"
  value: "aiplatform.googleapis.com"
}
]

Copyright 2021 Google LLC

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    https://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.