In [None]:
# Copyright 2023 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Getting Started with Vertex AI Turbo Templates

This notebook sets up infrastructure to run production-ready pipelines on Google Cloud. Follow this three-part notebook series to get started in a local Jupyter notebook or in [Vertex AI Workbench](https://cloud.google.com/vertex-ai-notebooks):

1. **[Infrastructure Setup](./02_run_pipelines.ipynb) - this notebook**
1. [Run Pipelines](./02_run_pipelines.ipynb)
1. [Infrastructure Clean Up](./02_run_pipelines.ipynb)


**Prerequisites:**

- [Google Cloud SDK (gcloud)](https://cloud.google.com/sdk/docs/quickstart)
- Make
- [Terraform](https://www.terraform.io)

**For Vertex AI Workbench users**: 
Uncomment and execute the following cell to install Terraform.
Restart the notebook kernel or the Workbench instance to ensure `terraform` is available in the `PATH`.
Then return to this notebook and continue with the next section.

In [1]:
# ! bash ./scripts/install_terraform.sh

./scripts/install_terraform.sh: line 11: terraform: command not found


## Authenticate

Set your project ID and authenticate using your Google Account:

In [11]:
! gcloud config get-value project

uk-gap-proximity-dev


In [12]:
VERTEX_PROJECT_ID = "uk-gap-proximity-dev"
GOOGLE_ACCOUNT = "thierry.dacae@sky.uk"
! gcloud config set project {VERTEX_PROJECT_ID} --quiet
! gcloud config set account {GOOGLE_ACCOUNT} --quiet
! printf 'Y' gcloud auth login

Updated property [core/project].
Updated property [core/account].
Y

## Clone Code

**If you haven't cloned the template, yet:** Uncomment and execute the following cell to clone the code.

In [13]:
# ! git clone -b develop https://github.com/teamdatatonic/vertex-pipelines-end-to-end-samples

Cloning into 'vertex-pipelines-end-to-end-samples'...
remote: Enumerating objects: 4041, done.[K
remote: Counting objects: 100% (1071/1071), done.[K
remote: Compressing objects: 100% (462/462), done.[K
remote: Total 4041 (delta 699), reused 678 (delta 597), pack-reused 2970[K
Receiving objects: 100% (4041/4041), 3.42 MiB | 24.29 MiB/s, done.
Resolving deltas: 100% (2323/2323), done.


Switch to the folder in which the template code is cloned to:

In [14]:
%cd vertex-pipelines-end-to-end-samples/

/home/jupyter/010 ASN Model v2/MLOPS Hackathlon/mlops-hackathon/docs/notebooks/vertex-pipelines-end-to-end-samples


In [15]:
%pwd

'/home/jupyter/010 ASN Model v2/MLOPS Hackathlon/mlops-hackathon/docs/notebooks/vertex-pipelines-end-to-end-samples'

Configure your code by setting the variables:
- `VERTEX_PROJECT_ID` - as set above
- `VERTEX_LOCATION` - location of the cloud project
- `BQ_LOCATION` - location of the BigQuery dataset, for this notebook example you can leave this as-is
- `RESOURCE_SUFFIX` - suffix (e.g. `<your name>`) to facilitate running concurrent pipelines in the same Google Cloud project. Change if working in a team to avoid overwriting resources during development 

In [16]:
%%writefile env.sh
#!/bin/bash
VERTEX_PROJECT_ID=VERTEX_PROJECT_ID
VERTEX_LOCATION=europe-west2
BQ_LOCATION=US
RESOURCE_SUFFIX=default

Writing env.sh


For most use cases you won't need to change the following variables unless you've modified the Terraform code.

In [17]:
%%writefile -a env.sh
# Optional
VERTEX_CMEK_IDENTIFIER=
VERTEX_NETWORK=
# Leave as-is
VERTEX_SA_EMAIL=vertex-pipelines@${VERTEX_PROJECT_ID}.iam.gserviceaccount.com
VERTEX_PIPELINE_ROOT=gs://${VERTEX_PROJECT_ID}-pl-root
CONTAINER_IMAGE_REGISTRY=${VERTEX_LOCATION}-docker.pkg.dev/${VERTEX_PROJECT_ID}/vertex-images

Appending to env.sh


## Deploy Infrastructure


The cloud infrastructure is managed using Terraform and is defined in the [`terraform`](terraform) directory. There are three Terraform modules defined in [`terraform/modules`](terraform/modules):

- `cloudfunction` - deploys a (Pub/Sub-triggered) Cloud Function from local source code
- `scheduled_pipelines` - deploys Cloud Scheduler jobs that will trigger Vertex Pipeline runs (via the above Cloud Function)
- `vertex_deployment` - deploys Cloud infrastructure required for running Vertex Pipelines, including enabling APIs, creating buckets, Artifact Registry repos, service accounts, and IAM permissions.

**Enable APIs**:

In [21]:
!gcloud auth login --quiet

Go to the following link in your browser:

    https://accounts.google.com/o/oauth2/auth?response_type=code&client_id=32555940559.apps.googleusercontent.com&redirect_uri=https%3A%2F%2Fsdk.cloud.google.com%2Fauthcode.html&scope=openid+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fuserinfo.email+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fcloud-platform+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fappengine.admin+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fsqlservice.login+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fcompute+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Faccounts.reauth&state=osIw7TKTmwEK7SRlclvzfAVJwOfWKn&prompt=consent&access_type=offline&code_challenge=_cr9soPg04PTp9Ot83zBz6rBEh_EMLAgDS0ReH-rRas&code_challenge_method=S256

Enter authorization code: ^C


Command killed by keyboard interrupt



In [18]:
! gcloud services enable cloudresourcemanager.googleapis.com serviceusage.googleapis.com

[1;31mERROR:[0m (gcloud.services.enable) Your current active account [thierry.dacae@sky.uk] does not have any valid credentials
Please run:

  $ gcloud auth login

to obtain new credentials.

For service account, please activate it first:

  $ gcloud auth activate-service-account ACCOUNT


**Create Cloud Storage bucket:**

Store the [Terraform state files](https://developer.hashicorp.com/terraform/language/state/remote) in the bucket `[project-id]-tfstate`:

In [None]:
! source env.sh && gsutil mb -l $VERTEX_LOCATION -p $VERTEX_PROJECT_ID gs://$VERTEX_PROJECT_ID-tfstate

**Deploy:**

In [None]:
! make deploy auto-approve=true

You've successfully deployed a `dev` environment! 🎉 
Continue with [this notebook](./02_run_pipelines.ipynb) to run your first Vertex AI Pipelines in the deployed project.

**Note:** If you'd like to deploy separate cloud environments as shown below, try out `make deploy env=dev` where you can replace `dev` with `test` or `prod`.

**Troubleshooting:** If enabling of APIs or the deployment fails, check whether your Google user account has the appropriate permissions.