# Install necessary python packages

In [None]:
! pip3 install -U google-cloud-aiplatform

In [None]:
! pip3 install google-cloud-storage

# Parameter settings

In [None]:
PROJECT_ID = "[your-project-id]"  # @param {type:"string"}

In [None]:
if PROJECT_ID == "" or PROJECT_ID is None or PROJECT_ID == "[your-project-id]":
    # Get your GCP project id from gcloud
    shell_output = !gcloud config list --format 'value(core.project)' 2>/dev/null
    PROJECT_ID = shell_output[0]
    print("Project ID:", PROJECT_ID)

In [None]:
!gcloud config set project $PROJECT_ID

In [None]:
REGION = "europe-west1"  # @param {type: "string"}

Name for the model that we'll deploy. Will be the name given to the model model and endpoint that will be created in this notebook.

In [None]:
# TODO: uncomment and fill in
# MODEL_NAME="<MODEL_NAME>"

Models to be served on an endpoint need to first be created on [Vertex AI Models](https://console.cloud.google.com/vertex-ai/models). They can be created from a training or by importing from
your model artifacts. In the case of a Tensorflow model, this is the folder containing your variables and .pb file.

In [None]:
# TODO: uncomment and fill in
# ARTIFACT_LOCATION_GCS = "<GCS_PATH>"

Machine type for serving. If required, GPUs can be added. See [guidelines](https://cloud.google.com/vertex-ai/docs/predictions/configure-compute#gpus) which GPUs are supported by the machines.

In [None]:
SERVING_MACHINE_TYPE="n1-standard-2"
SERVING_GPU, SERVING_NGPU = (None, None)  # example: (aip.gapic.AcceleratorType.NVIDIA_TESLA_K80.name, 2)

Serving from Vertex AI endpoints is done from Docker images that run an HTTP server. The image is selected here.

In [None]:
TF = "2-4"  # 1.15 to 2.4 is supported at the time of writing (26/05/2021) 
if TF[0] == "2":
    if SERVING_GPU:
        SERVING_VERSION = "tf2-gpu.{}".format(TF)
    else:
        SERVING_VERSION = "tf2-cpu.{}".format(TF)
else:
    if SERVING_GPU:
        SERVING_VERSION = "tf-gpu.{}".format(TF)
    else:
        SERVING_VERSION = "tf-cpu.{}".format(TF)

SERVING_IMAGE_URI = "europe-docker.pkg.dev/vertex-ai/prediction/{}:latest".format(SERVING_VERSION)

print("Deployment:", SERVING_IMAGE_URI, SERVING_GPU, SERVING_NGPU)

In [None]:
from datetime import datetime

TIMESTAMP = datetime.now().strftime("%Y%m%d%H%M%S")

In [None]:
from google.cloud import aiplatform as aip
from typing import Optional, Sequence, Dict, Tuple

In [None]:
# API service endpoint
API_ENDPOINT = "{}-aiplatform.googleapis.com".format(REGION)

In [None]:
def upload_model_sample(
    project: str,
    location: str,
    display_name: str,
    serving_container_image_uri: str,
    artifact_uri: Optional[str] = None,
    serving_container_predict_route: Optional[str] = None,
    serving_container_health_route: Optional[str] = None,
    description: Optional[str] = None,
    serving_container_command: Optional[Sequence[str]] = None,
    serving_container_args: Optional[Sequence[str]] = None,
    serving_container_environment_variables: Optional[Dict[str, str]] = None,
    serving_container_ports: Optional[Sequence[int]] = None,
    instance_schema_uri: Optional[str] = None,
    parameters_schema_uri: Optional[str] = None,
    prediction_schema_uri: Optional[str] = None,
    explanation_metadata: Optional[aip.explain.ExplanationMetadata] = None,
    explanation_parameters: Optional[aip.explain.ExplanationParameters] = None,
    sync: bool = True):
    """Function to upload a model to Vertex AI Models.
    
    Args:
        project (str): Required. Project ID.
        location (str): Required. Region you want to upload the model to.
        display_name (str): Required. The display name of the Model. The name can be up to 128
        characters long and can be consist of any UTF-8 characters.
        serving_container_image_uri (str): Required. The URI of the Model serving container.
    Information about the remaining parameters can be found here:
    https://github.com/googleapis/python-aiplatform/blob/master/google/cloud/aiplatform/models.py
    """

    aip.init(project=project, location=location)

    model = aip.Model.upload(
        display_name=display_name,
        artifact_uri=artifact_uri,
        serving_container_image_uri=serving_container_image_uri,
        serving_container_predict_route=serving_container_predict_route,
        serving_container_health_route=serving_container_health_route,
        instance_schema_uri=instance_schema_uri,
        parameters_schema_uri=parameters_schema_uri,
        prediction_schema_uri=prediction_schema_uri,
        description=description,
        serving_container_command=serving_container_command,
        serving_container_args=serving_container_args,
        serving_container_environment_variables=serving_container_environment_variables,
        serving_container_ports=serving_container_ports,
        explanation_metadata=explanation_metadata,
        explanation_parameters=explanation_parameters,
        sync=sync,
    )

    model.wait()

    print(model.display_name)
    print(model.resource_name)
    return model

Upload the model. Since we're importing a model from a GCS bucket, we
need to specify the artifact uri.

This would not have to be required  if we were to have created a model with custom training. This would require a custom container as explained [here](https://cloud.google.com/vertex-ai/docs/predictions/use-custom-container#aiplatform_upload_model_highlight_container-python). The parameters starting with `serving_container_*` will then need to be set accordingly.

This would also not have been required if we were to have created an AutoML model. As can be seen [here](https://cloud.google.com/vertex-ai/docs/training/automl-api#aiplatform_create_training_pipeline_image_classification_sample-python), an aiplatform.AutoMLImageTrainingJob.run() creates an
`aiplatform.models.Model` that can call the `.upload()` function without needing to specify the artifact uri.

In [None]:
model = upload_model_sample(
    project=PROJECT_ID,
    location=REGION,
    display_name=f"{MODEL_NAME}_{TIMESTAMP}",
    serving_container_image_uri = SERVING_IMAGE_URI,
    artifact_uri=ARTIFACT_LOCATION_GCS
    )

Enpoints are easy to create and have no models assigned to them initially. We can consider them as placeholders. The endpoint ID, necessary for calling a prediction, is now fixed

In [None]:
def create_endpoint_sample(
    project: str,
    display_name: str,
    location: str):
    """Function to create an endpoint on Vertex AI Endpoints.
    
    Args:
        project (str): Required. Project ID.
        location (str): Required. Region to retreive an endpoint from.
        display_name (str): Required. The display name of the Model. The name can be up to 128
        characters long and can be consist of any UTF-8 characters.  
    """
    aip.init(project=project, location=location)

    endpoint = aip.Endpoint.create(
        display_name=display_name, project=project, location=location,
    )

    print(endpoint.display_name)
    print(endpoint.resource_name)
    return endpoint

In [None]:
endpoint = create_endpoint_sample(
    project=PROJECT_ID,
    display_name=f"{MODEL_NAME}_{TIMESTAMP}",
    location=REGION)

### Endpoint ID:

In [None]:
endpoint.name

### No models have been deployed on the endpoint yet 

In [None]:
endpoint.list_models()

# Deploy model on endpoint

In [None]:
def deploy_model_with_dedicated_resources_sample(
    project,
    location,
    model_name: str,
    machine_type: str,
    endpoint: Optional[aip.Endpoint] = None,
    deployed_model_display_name: Optional[str] = None,
    traffic_percentage: Optional[int] = 0,
    traffic_split: Optional[Dict[str, int]] = None,
    min_replica_count: int = 1,
    max_replica_count: int = 1,
    accelerator_type: Optional[str] = None,
    accelerator_count: Optional[int] = None,
    explanation_metadata: Optional[aip.explain.ExplanationMetadata] = None,
    explanation_parameters: Optional[aip.explain.ExplanationParameters] = None,
    metadata: Optional[Sequence[Tuple[str, str]]] = (),
    sync: bool = True,
):
    """Function to deploy a model on an endpoint.
    
    Args:
        project (str): Required. Project ID.
        location (str): Required. Region endpoint is set up.
        display_name (str): Required. The display name of the Model. The name can be up to 128
        characters long and can be consist of any UTF-8 characters.
        model_name (str): Required. ID of model.
    Information about the remaining parameters can be found here:
    https://github.com/googleapis/python-aiplatform/blob/master/google/cloud/aiplatform/models.py
    """

    aip.init(project=project, location=location)

    model = aip.Model(model_name=model_name)

    # The explanation_metadata and explanation_parameters should only be
    # provided for a custom trained model and not an AutoML model.
    model.deploy(
        endpoint=endpoint,
        deployed_model_display_name=deployed_model_display_name,
        traffic_percentage=traffic_percentage,
        traffic_split=traffic_split,
        machine_type=machine_type,
        min_replica_count=min_replica_count,
        max_replica_count=max_replica_count,
        accelerator_type=accelerator_type,
        accelerator_count=accelerator_count,
        explanation_metadata=explanation_metadata,
        explanation_parameters=explanation_parameters,
        metadata=metadata,
        sync=sync,
    )

    model.wait()

    print(model.display_name)
    print(model.resource_name)
    return model


## Some important metrics for deployment:

min_replica_count: The minimum number of nodes for this deployment. The node count can be increased or decreased as required by the prediction load, up to the maximum number of nodes, but will never fall below this number.

max_replica_count: The maximum number of nodes for this deployment. The node count can be increased or decreased as required by the prediction load, but will never exceed the maximum. If you omit the max_replica_count parameter, then maximum number of nodes is set to the value of min_replica_count.

traffic_percentage (int): Optional. Desired traffic to newly deployed model. Defaults to 0 if there are pre-existing deployed models. Defaults to 100 if there are no pre-existing deployed models. Negative values should not be provided. Traffic of previously deployed models at the endpoint will be scaled down to accommodate new deployed model's traffic. Should not be provided if traffic_split is provided.

traffic_split (Dict[str, int]): Optional. A map from a DeployedModel's ID to the percentage of this Endpoint's traffic that should be forwarded to that DeployedModel. If a DeployedModel's ID is not listed in this map, then it receives no traffic. The traffic percentage values must add up to 100, or map must be empty if the Endpoint is to not accept any traffic at the moment. Key for model being deployed is "0". Should not be provided if traffic_percentage is provided.


In [None]:
deploy_model_with_dedicated_resources_sample(
    location=REGION,
    project=PROJECT_ID,
    endpoint=endpoint,
    model_name=model.name,
    deployed_model_display_name=model.display_name,
    machine_type=SERVING_MACHINE_TYPE,
    accelerator_type=SERVING_GPU,
    accelerator_count=SERVING_NGPU
)

In [None]:
endpoint.deploy_models()