# 04 - Test and Deploy Training Pipeline to Vertex Pipelines

The purpose of this notebook is to test, deploy, and run the `TFX` pipeline on `Vertex Pipelines`. The notebook covers the following tasks:
1. Run the tests locally.
2. Run the pipeline using `Vertex Pipelines`
3. Execute the pipeline deployment `CI/CD` steps using `Cloud Build`.

## Setup

### Import libraries

In [107]:
import os
import kfp
import tfx

print("Tensorflow Version:", tfx.__version__)
print("KFP Version:", kfp.__version__)

Tensorflow Version: 1.2.0
KFP Version: 1.7.1


### Setup Google Cloud project

In [108]:
PROJECT = 'aiops-industrialization' # Change to your project id.
REGION = 'us-central1' # Change to your region.
BUCKET = 'aiops-industrialization-bucket-ravi'  # Change to your bucket name.
SERVICE_ACCOUNT = "175728527123-compute@developer.gserviceaccount.com"

if PROJECT == "" or PROJECT is None or PROJECT == "[your-project-id]":
    # Get your GCP project id from gcloud
    shell_output = !gcloud config list --format 'value(core.project)' 2>/dev/null
    PROJECT = shell_output[0]
    
if SERVICE_ACCOUNT == "" or SERVICE_ACCOUNT is None or SERVICE_ACCOUNT == "[your-service-account]":
    # Get your GCP project id from gcloud
    shell_output = !gcloud config list --format 'value(core.account)' 2>/dev/null
    SERVICE_ACCOUNT = shell_output[0]
    
if BUCKET == "" or BUCKET is None or BUCKET == "[your-bucket-name]":
    # Get your bucket name to GCP projet id
    BUCKET = PROJECT
    # Try to create the bucket if it doesn'exists
    ! gsutil mb -l $REGION gs://$BUCKET
    print("")
    
print("Project ID:", PROJECT)
print("Region:", REGION)
print("Bucket name:", BUCKET)
print("Service Account:", SERVICE_ACCOUNT)

Project ID: aiops-industrialization
Region: us-central1
Bucket name: aiops-industrialization-bucket-ravi
Service Account: 175728527123-compute@developer.gserviceaccount.com


### Set configurations

In [109]:
BQ_LOCATION = 'US'
BQ_DATASET_NAME = 'playground_us' # Change to your BQ dataset name.
BQ_TABLE_NAME = 'chicago_taxitrips_prep'

VERSION = 'v01'
DATASET_DISPLAY_NAME = 'chicago-taxi-tips'
MODEL_DISPLAY_NAME = f'{DATASET_DISPLAY_NAME}-classifier-{VERSION}'
PIPELINE_NAME = f'{MODEL_DISPLAY_NAME}-train-pipeline'

CICD_IMAGE_NAME = 'cicd:latest'
CICD_IMAGE_URI = f"gcr.io/{PROJECT}/{CICD_IMAGE_NAME}"

In [99]:
!rm -r src/raw_schema/.ipynb_checkpoints/

rm: cannot remove 'src/raw_schema/.ipynb_checkpoints/': No such file or directory


## 1. Run the CICD steps locally

### Set pipeline configurations for the local run

In [100]:
os.environ["DATASET_DISPLAY_NAME"] = DATASET_DISPLAY_NAME
os.environ["MODEL_DISPLAY_NAME"] =  MODEL_DISPLAY_NAME
os.environ["PIPELINE_NAME"] = PIPELINE_NAME
os.environ["PROJECT"] = PROJECT
os.environ["REGION"] = REGION
os.environ["BQ_LOCATION"] = BQ_LOCATION
os.environ["BQ_DATASET_NAME"] = BQ_DATASET_NAME
os.environ["BQ_TABLE_NAME"] = BQ_TABLE_NAME
os.environ["GCS_LOCATION"] = f"gs://{BUCKET}/{DATASET_DISPLAY_NAME}/e2e_tests"
os.environ["TRAIN_LIMIT"] = "1000"
os.environ["TEST_LIMIT"] = "100"
os.environ["UPLOAD_MODEL"] = "0"
os.environ["ACCURACY_THRESHOLD"] = "0.1"
os.environ["BEAM_RUNNER"] = "DirectRunner"
os.environ["TRAINING_RUNNER"] = "local"

In [101]:
from src.tfx_pipelines import config
import importlib
importlib.reload(config)

for key, value in config.__dict__.items():
    if key.isupper(): print(f'{key}: {value}')

PROJECT: aiops-industrialization
REGION: us-central1
GCS_LOCATION: gs://aiops-industrialization-bucket-ravi/chicago-taxi-tips/e2e_tests
ARTIFACT_STORE_URI: gs://aiops-industrialization-bucket-ravi/chicago-taxi-tips/e2e_tests/tfx_artifacts
MODEL_REGISTRY_URI: gs://aiops-industrialization-bucket-ravi/chicago-taxi-tips/model_registry
DATASET_DISPLAY_NAME: chicago-taxi-tips
MODEL_DISPLAY_NAME: chicago-taxi-tips-classifier-v01
PIPELINE_NAME: chicago-taxi-tips-classifier-v01-train-pipeline
ML_USE_COLUMN: ml_use
EXCLUDE_COLUMNS: trip_start_timestamp
TRAIN_LIMIT: 1000
TEST_LIMIT: 100
SERVE_LIMIT: 0
NUM_TRAIN_SPLITS: 4
NUM_EVAL_SPLITS: 1
ACCURACY_THRESHOLD: 0.1
USE_KFP_SA: False
TFX_IMAGE_URI: gcr.io/aiops-industrialization/chicago-taxi-tips:v01
BEAM_RUNNER: DirectRunner
BEAM_DIRECT_PIPELINE_ARGS: ['--project=aiops-industrialization', '--temp_location=gs://aiops-industrialization-bucket-ravi/chicago-taxi-tips/e2e_tests/temp']
BEAM_DATAFLOW_PIPELINE_ARGS: ['--project=aiops-industrialization', '-

### Run unit tests

In [None]:
!py.test ./src/tests/datasource_utils_tests.py -s

In [None]:
!py.test src/tests/model_tests.py -s

### Run e2e pipeline test

In [None]:
!py.test src/tests/pipeline_deployment_tests.py::test_e2e_pipeline -s

platform linux -- Python 3.7.10, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
rootdir: /home/jupyter/mlops-with-vertex-ai
plugins: anyio-3.3.0
collecting ... 2021-08-31 13:58:38.662547: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
collected 1 item                                                               

src/tests/pipeline_deployment_tests.py upload_model: 0
Pipeline e2e test artifacts stored in: gs://aiops-industrialization-bucket-ravi/chicago-taxi-tips/e2e_tests
ML metadata store is ready.
F

______________________________ test_e2e_pipeline _______________________________

    def test_e2e_pipeline():
    
        project = os.getenv("PROJECT")
        region = os.getenv("REGION")
        model_display_name = os.getenv("MODEL_DISPLAY_NAME")
        dataset_display_name = os.getenv("DATASET_DISPLAY_NAME")
        gcs_location = os.getenv("GCS_LOCATION")
        model_registry = os.getenv("MODEL_REGISTRY_URI")
   

## 2. Run the training pipeline using Vertex Pipelines



### Set the pipeline configurations for the Vertex AI run

In [110]:
os.environ["DATASET_DISPLAY_NAME"] = DATASET_DISPLAY_NAME
os.environ["MODEL_DISPLAY_NAME"] = MODEL_DISPLAY_NAME
os.environ["PIPELINE_NAME"] = PIPELINE_NAME
os.environ["PROJECT"] = PROJECT
os.environ["REGION"] = REGION
os.environ["GCS_LOCATION"] = f"gs://{BUCKET}/{DATASET_DISPLAY_NAME}"
os.environ["TRAIN_LIMIT"] = "85000"
os.environ["TEST_LIMIT"] = "15000"
os.environ["BEAM_RUNNER"] = "DataflowRunner"
os.environ["TRAINING_RUNNER"] = "vertex"
os.environ["TFX_IMAGE_URI"] = f"gcr.io/{PROJECT}/{DATASET_DISPLAY_NAME}:{VERSION}"

In [111]:
from src.tfx_pipelines import config
import importlib
importlib.reload(config)

for key, value in config.__dict__.items():
    if key.isupper(): print(f'{key}: {value}')

PROJECT: aiops-industrialization
REGION: us-central1
GCS_LOCATION: gs://aiops-industrialization-bucket-ravi/chicago-taxi-tips
ARTIFACT_STORE_URI: gs://aiops-industrialization-bucket-ravi/chicago-taxi-tips/tfx_artifacts
MODEL_REGISTRY_URI: gs://aiops-industrialization-bucket-ravi/chicago-taxi-tips/model_registry
DATASET_DISPLAY_NAME: chicago-taxi-tips
MODEL_DISPLAY_NAME: chicago-taxi-tips-classifier-v01
PIPELINE_NAME: chicago-taxi-tips-classifier-v01-train-pipeline
ML_USE_COLUMN: ml_use
EXCLUDE_COLUMNS: trip_start_timestamp
TRAIN_LIMIT: 85000
TEST_LIMIT: 15000
SERVE_LIMIT: 0
NUM_TRAIN_SPLITS: 4
NUM_EVAL_SPLITS: 1
ACCURACY_THRESHOLD: 0.1
USE_KFP_SA: False
TFX_IMAGE_URI: gcr.io/aiops-industrialization/chicago-taxi-tips:v01
BEAM_RUNNER: DataflowRunner
BEAM_DIRECT_PIPELINE_ARGS: ['--project=aiops-industrialization', '--temp_location=gs://aiops-industrialization-bucket-ravi/chicago-taxi-tips/temp']
BEAM_DATAFLOW_PIPELINE_ARGS: ['--project=aiops-industrialization', '--temp_location=gs://aiops

### Build the ML container image

This is the `TFX` runtime environment for the training pipeline steps.

In [112]:
!echo $TFX_IMAGE_URI

gcr.io/aiops-industrialization/chicago-taxi-tips:v01


In [113]:
!gcloud builds submit --tag $TFX_IMAGE_URI . --timeout=15m --machine-type=e2-highcpu-8

Creating temporary tarball archive of 61 file(s) totalling 1.4 MiB before compression.
Some files were not included in the source upload.

Check the gcloud log [/home/jupyter/.config/gcloud/logs/2021.08.31/14.04.16.479943.log] to see which files and the contents of the
default gcloudignore file used (see `$ gcloud topic gcloudignore` to learn
more).

Uploading tarball of [.] to [gs://aiops-industrialization_cloudbuild/source/1630418656.561653-a4e8d2ea597a4b0fbad1c8317dca12e3.tgz]
Created [https://cloudbuild.googleapis.com/v1/projects/aiops-industrialization/locations/global/builds/71160d87-0081-4bb0-80a0-ed8cb3462259].
Logs are available at [https://console.cloud.google.com/cloud-build/builds/71160d87-0081-4bb0-80a0-ed8cb3462259?project=175728527123].
----------------------------- REMOTE BUILD OUTPUT ------------------------------
starting build "71160d87-0081-4bb0-80a0-ed8cb3462259"

FETCHSOURCE
Fetching storage object: gs://aiops-industrialization_cloudbuild/source/1630418656.561653-

In [114]:
from tfx.components import (
    StatisticsGen,
    ExampleValidator,
    Transform,
    Trainer,
    Evaluator,
    Pusher,
)

### Compile pipeline

In [140]:
from src.tfx_pipelines import runner

pipeline_definition_file = f'{config.PIPELINE_NAME}.json'
#print(pipeline_definition_file)
pipeline_definition = runner.compile_training_pipeline(pipeline_definition_file)

NameError: name 'statistics_gen' is not defined

In [143]:
PIPELINES_STORE = f"gs://{BUCKET}/{DATASET_DISPLAY_NAME}/compiled_pipelines/"
print(PIPELINES_STORE)
#!gsutil cp {pipeline_definition_file} {PIPELINES_STORE}

gs://aiops-industrialization-bucket-ravi/chicago-taxi-tips/compiled_pipelines/


### Submit run to Vertex Pipelines

In [144]:
from kfp.v2.google.client import AIPlatformClient

pipeline_client = AIPlatformClient(
    project_id=PROJECT, region=REGION)
                 
job = pipeline_client.create_run_from_job_spec(
    job_spec_path=pipeline_definition_file,
    parameter_values={
        'learning_rate': 0.003,
        'batch_size': 512,
        'hidden_units': '128,128',
        'num_epochs': 30,
    }
)

FileNotFoundError: [Errno 2] No such file or directory: 'chicago-taxi-tips-classifier-v01-train-pipeline.json'

### Extracting pipeline runs metadata

In [None]:
from google.cloud import aiplatform as vertex_ai

pipeline_df = vertex_ai.get_pipeline_df(PIPELINE_NAME)
pipeline_df = pipeline_df[pipeline_df.pipeline_name == PIPELINE_NAME]
pipeline_df.T

## 3. Execute the pipeline deployment CI/CD steps in Cloud Build

The CI/CD routine is defined in the [pipeline-deployment.yaml](pipeline-deployment.yaml) file, and consists of the following steps:
1. Clone the repository to the build environment.
2. Run unit tests.
3. Run a local e2e test of the pipeline.
4. Build the ML container image for pipeline steps.
5. Compile the pipeline.
6. Upload the pipeline to Cloud Storage.

### Build CI/CD container Image for Cloud Build

This is the runtime environment where the steps of testing and deploying the pipeline will be executed.

In [None]:
!echo $CICD_IMAGE_URI

In [None]:
!gcloud builds submit --tag $CICD_IMAGE_URI build/. --timeout=15m --machine-type=e2-highcpu-8

### Run CI/CD from pipeline deployment using Cloud Build

In [None]:
REPO_URL = "https://github.com/ksalama/ucaip-labs.git" # Change to your github repo.
BRANCH = "main"

GCS_LOCATION = f"gs://{BUCKET}/{DATASET_DISPLAY_NAME}/"
TEST_GCS_LOCATION = f"gs://{BUCKET}/{DATASET_DISPLAY_NAME}/e2e_tests"
CI_TRAIN_LIMIT = 1000
CI_TEST_LIMIT = 100
CI_UPLOAD_MODEL = 0
CI_ACCURACY_THRESHOLD = 0.1
BEAM_RUNNER = "DataflowRunner"
TRAINING_RUNNER = "vertex"
VERSION = 'tfx-0-30'
PIPELINE_NAME = f'{MODEL_DISPLAY_NAME}-train-pipeline'
PIPELINES_STORE = os.path.join(GCS_LOCATION, "compiled_pipelines")

TFX_IMAGE_URI = f"gcr.io/{PROJECT}/{DATASET_DISPLAY_NAME}:{VERSION}"

SUBSTITUTIONS=f"""\
_REPO_URL='{REPO_URL}',\
_BRANCH={BRANCH},\
_CICD_IMAGE_URI={CICD_IMAGE_URI},\
_PROJECT={PROJECT},\
_REGION={REGION},\
_GCS_LOCATION={GCS_LOCATION},\
_TEST_GCS_LOCATION={TEST_GCS_LOCATION},\
_BQ_LOCATION={BQ_LOCATION},\
_BQ_DATASET_NAME={BQ_DATASET_NAME},\
_BQ_TABLE_NAME={BQ_TABLE_NAME},\
_DATASET_DISPLAY_NAME={DATASET_DISPLAY_NAME},\
_MODEL_DISPLAY_NAME={MODEL_DISPLAY_NAME},\
_CI_TRAIN_LIMIT={CI_TRAIN_LIMIT},\
_CI_TEST_LIMIT={CI_TEST_LIMIT},\
_CI_UPLOAD_MODEL={CI_UPLOAD_MODEL},\
_CI_ACCURACY_THRESHOLD={CI_ACCURACY_THRESHOLD},\
_BEAM_RUNNER={BEAM_RUNNER},\
_TRAINING_RUNNER={TRAINING_RUNNER},\
_TFX_IMAGE_URI={TFX_IMAGE_URI},\
_PIPELINE_NAME={PIPELINE_NAME},\
_PIPELINES_STORE={PIPELINES_STORE}\
"""

!echo $SUBSTITUTIONS

In [None]:
!gcloud builds submit --no-source --timeout=60m --config build/pipeline-deployment.yaml --substitutions {SUBSTITUTIONS} --machine-type=e2-highcpu-8