# 利用Vertex Pipeline实现图片分类模型的自动化训练与部署（AutoML）

## 步骤一：环境准备

### 1.1 安装依赖包

In [1]:
import os

# Google Cloud Notebook
if os.path.exists("/opt/deeplearning/metadata/env_version"):
    USER_FLAG = "--user"
else:
    USER_FLAG = ""

In [None]:
! pip3 install --upgrade google-cloud-aiplatform $USER_FLAG
! pip3 install -U google-cloud-storage $USER_FLAG
! pip3 install $USER kfp google-cloud-pipeline-components --upgrade

重启kernel，使环境生效。

In [3]:
import os

if not os.getenv("IS_TESTING"):
    # Automatically restart kernel after installs
    import IPython

    app = IPython.Application.instance()
    app.kernel.do_shutdown(True)

### 1.2 设置环境变量

In [None]:
PROJECT_ID = "" 
if PROJECT_ID == "" or PROJECT_ID is None or PROJECT_ID == "[your-project-id]":
    # Get your GCP project id from gcloud
    shell_output = ! gcloud config list --format 'value(core.project)' 2>/dev/null
    PROJECT_ID = shell_output[0]
    print("Project ID:", PROJECT_ID)

将<PROJECT_ID>替换成上一步输出的project_ID

In [34]:
PROJECT_ID = <PROJECT_ID>

In [None]:
! gcloud config set project $PROJECT_ID

In [4]:
REGION = "us-central1" 

### 1.3 创建资源

创建GCS存储桶，该存储桶的用途包括：  
- 存放Vertex AI Dataset源文件。  
- 存放Vertex Pipeline每个步骤产生的Output数据（Vertex的Pipeline通常由多个步骤组成，在很多场景下，某一步骤的任务会使用上一步骤或上几个步骤产生的output数据）。  
- 存放训练后的模型，Vertex training job在结束模型的训练后，会将模型存放在此存储桶上，最后Vertex Predict Endpoint在部署模型时会从此存储桶下载模型。

将<your_name>替换成自己名字的缩写。

In [5]:
BUCKET_NAME = "gs://vertex-ai-pipeline-automl-<your_name>"

In [None]:
! gsutil mb -l $REGION $BUCKET_NAME

### 1.4 权限配置

Vertex AI 会为workbench实例自动分配并创建service account，通过为该service account分配存储桶的操作权限，用户可以在workbench实例上对存储桶进行数据上传、下载等操作。

In [None]:
SERVICE_ACCOUNT = "" 
if (
    SERVICE_ACCOUNT == ""
    or SERVICE_ACCOUNT is None
    or SERVICE_ACCOUNT == "[your-service-account]"
):
    # Get your GCP project id from gcloud
    shell_output = !gcloud auth list 2>/dev/null
    SERVICE_ACCOUNT = shell_output[2].strip()
    print("Service Account:", SERVICE_ACCOUNT)

将<SERVICE_ACCOUNT>替换成上一步输出的service account。

In [8]:
SERVICE_ACCOUNT = "<SERVICE_ACCOUNT>"

In [9]:
! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectCreator $BUCKET_NAME
! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectViewer $BUCKET_NAME

### 1.5 上传数据集描述文件

描述文件是一个格式为csv的文件，该文件内描述每个图片在GCS上存储的位置以及对应的分类，Vertex AI会利用该描述文件自动生成数据集。

In [None]:
! gsutil cp /home/jupyter/vertex-ai-lab/dataset.csv $BUCKET_NAME

## 步骤二： 编写Vertex Pipeline

Vertex AI Pipeline 提供了功能完备的sdk供开发者调用，开发者可以利用SDK快速构建机器学习工作流，并将工作流按照特定的顺序和逻辑编排起来。此外Vertex AI Pipeline与Vertex AI其它的功能以及GCP的其它服务进行了深度集成，从而使开发者可以快速轻松地创建工作流的任务，如创建Vertex AI训练任务、创建Vertex AI模型部署、向Big Query传输数据等。Vertex AI Pipeline与开源的Kubeflow pipeline接口兼容，并在开源的基础上做了更多的扩展，对于熟悉Kubeflow Pipeline的开发人员，可以快速地掌握Vertex AI Pipeline。

在本实验中，我们将利用Vertex AI Pipeline的sdk定义一个端到端的机器学习工作流，改了流程由四个步骤组成：  
- 创建Vertex AI 数据集，该任务会从GCS存储桶上读取dataset数据源文件，完成数据集的创建。
- 创建Auto ML训练任务，AutoML任务会自动加载Vertex AI dataset，自动分析数据集后选择最佳的模型进行训练。
- 创建Vertex AI Endpoint。
- 部署Vertex AI Model 至 Vertex AI Endpoint。


- 导入依赖包

In [30]:
import google.cloud.aiplatform as aip
import kfp
from google_cloud_pipeline_components import aiplatform as gcc_aip
import os

- 设置环境变量

In [12]:
API_ENDPOINT = "{}-aiplatform.googleapis.com".format(REGION)
PIPELINE_ROOT = "{}/pipeline_root/flowers".format(BUCKET_NAME)
DATA_SOURCE = BUCKET_NAME + "/dataset.csv"

- 初始Vertex AI

In [24]:
aip.init(project=PROJECT_ID, staging_bucket=BUCKET_NAME)

- 定义Pipeline流程

In [14]:
@kfp.dsl.pipeline(name="automl-training-pipeline")
def pipeline(project: str = PROJECT_ID, region: str = REGION, data_source: str = DATA_SOURCE):
    ds_op = gcc_aip.ImageDatasetCreateOp(
        project=project,
        display_name="train-automl-flowers-dataset",
        gcs_source=data_source,
        import_schema_uri=aip.schema.dataset.ioformat.image.single_label_classification,
    )

    training_job_run_op = gcc_aip.AutoMLImageTrainingJobRunOp(
        project=project,
        display_name="train-automl-flowers",
        prediction_type="classification",
        model_type="CLOUD",
        dataset=ds_op.outputs["dataset"],
        model_display_name="train-automl-flowers",
        training_fraction_split=0.6,
        validation_fraction_split=0.2,
        test_fraction_split=0.2,
        budget_milli_node_hours=8000,
    )

    endpoint_op = gcc_aip.EndpointCreateOp(
        project=project,
        location=region,
        display_name="train-automl-flowers",
    )

    gcc_aip.ModelDeployOp(
        model=training_job_run_op.outputs["model"],
        endpoint=endpoint_op.outputs["endpoint"],
        automatic_resources_min_replica_count=1,
        automatic_resources_max_replica_count=1,
    )

## 步骤三：编译 Vertex Pipeline

编译Vertex AI Pipeline，编译后，Vertex AI 会将上一步所定义的Pipeline转换成json描述文件。

In [None]:
from kfp.v2 import compiler  

compiler.Compiler().compile(
    pipeline_func=pipeline,
    package_path="automl training pipeline.json".replace(" ", "-"),
)

## 步骤四： 运行Pipeline

可通过两种方式创建和运行Pipeline：控制台或者命令行。在创建Pipeline时，需要使用上一步编译产生的json文件。

In [None]:
from datetime import datetime

TIMESTAMP = datetime.now().strftime("%Y%m%d%H%M%S")

DISPLAY_NAME = "automl-training-pipeline_" + TIMESTAMP

job = aip.PipelineJob(
    display_name=DISPLAY_NAME,
    template_path="/home/jupyter/vertex-ai-lab/automl training pipeline.json".replace(" ", "-"),
    pipeline_root=PIPELINE_ROOT,
    enable_caching=False,
)

job.run()

Pipeline创建后，可以到Google Cloud控制台上查看Pipeline的运行情况。

## 步骤五：预测图片分类

- 定义环境变量

将<ENDPOINR_ID>替换成实际的Vertex AI Endpoint ID

In [None]:
IMG_WIDTH = 128
COLUMNS = ['dandelion', 'daisy', 'tulips', 'sunflowers', 'roses']
ENDPOINT_ID = <ENDPOINT_ID>

aip_client = aip.gapic.PredictionServiceClient(client_options={
    'api_endpoint': 'us-central1-aiplatform.googleapis.com'
})

aip_endpoint_name = f'projects/{PROJECT_ID}/locations/us-central1/endpoints/{ENDPOINT_ID}'

- 定义图像预处理函数与Vertex AI Predict函数

In [None]:
import tensorflow as tf
import logging
from google.protobuf import json_format
from google.protobuf.struct_pb2 import Value

def preprocess_image(path):
    image = tf.io.read_file(path)
    image = tf.image.decode_jpeg(image, channels=3)
    image = tf.image.resize(image, [128, 128])
    image /= 255.0 
    return image.numpy().tolist()

def get_prediction_v1(instance):
    logging.info('Sending prediction request to AI Platform ...')
    try:
        pb_instance = json_format.ParseDict(instance, Value())
        response = aip_client.predict(endpoint=aip_endpoint_name,
                                      instances=[pb_instance])
        print(response.predictions[0])
    except Exception as err:
        logging.error(f'Prediction request failed: {type(err)}: {err}')
        return None

def get_prediction_v2(instance):
    logging.info('Sending prediction request to AI Platform ...')
    try:
        pb_instance = json_format.ParseDict(instance, Value())
        response = aip_client.predict(endpoint=aip_endpoint_name,
                                      instances=[pb_instance])
        max_value = max(response.predictions[0])
        max_index = response.predictions[0].index(max_value)
        print(COLUMNS[max_index])
    except Exception as err:
        logging.error(f'Prediction request failed: {type(err)}: {err}')
        return None

- 读取图片，可尝试读取~/img文件夹内不同的图片，观察预测结果。

In [None]:
instance = preprocess_image("/home/jupyter/vertex-ai-lab/img/tulips.jpeg")

- 查看预测结果，在多分类场景下，Vertex AI Endpoint会返回分类概率的列表

In [None]:
get_prediction_v1(instance)

- 直接返回概率列表可读性不强，在实际应用场景下，需要对预测结果进行进一步处理，让结果更容易让人理解。

In [None]:
get_prediction_v2(instance)

## 步骤六：清除环境

In [59]:
DISPLAY_NAME="train-automl-flowers"

In [None]:
try:
    endpoints = aip.Endpoint.list(
        filter=f"display_name={DISPLAY_NAME}", order_by="create_time"
    )
    endpoint = endpoints[0]
    model_id = endpoint.list_models()[0]._pb.id
    endpoint.undeploy(model_id)
    aip.Endpoint.delete(endpoint.resource_name)
    print("Deleted endpoint:", endpoint)
except Exception as e:
    print(e)

In [None]:
try:
    models = aip.Model.list(
        filter=f"display_name={DISPLAY_NAME}", order_by="create_time"
    )
    model = models[0]
    aip.Model.delete(model)
    print("Deleted model:", model)
except Exception as e:
    print(e)

In [None]:
job.delete()