In [None]:
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

从2024年9月15日开始，您只能通过迁移至Vertex AI Gemini提示和调整来定制分类、实体提取和情感分析模型。不再提供为Vertex AI AutoML的文本分类、实体提取和情感分析目标训练或更新模型。您可以继续使用现有的Vertex AI AutoML文本目标直到2025年6月15日。有关Gemini如何通过改进的提示功能提供增强用户体验的更多信息，请参阅[调整简介](https://cloud.google.com/vertex-ai/generative-ai/docs/models/tune-gemini-overview)。

# Vertex AI管道：使用google-cloud-pipeline-components的AutoML文本分类管道

<table align="left">
  <td>
    <a href="https://colab.research.google.com/github/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/official/model_evaluation/automl_text_classification_model_evaluation.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/colab-logo-32px.png" alt="Colab logo"> 在Colab中运行
    </a>
  </td>
  <td>
    <a href="https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/official/model_evaluation/automl_text_classification_model_evaluation.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/github-logo-32px.png" alt="GitHub logo">
      在GitHub上查看
    </a>
  </td>
  <td>
    <a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/main/notebooks/official/model_evaluation/automl_text_classification_model_evaluation.ipynb">
      <img src="https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32" alt="Vertex AI logo">
      在Vertex AI Workbench中打开
     </a>
  </td>
</table>
<br/><br/><br/>

注意**: 该笔记本已在以下环境中进行了测试：

* Python 版本 = 3.9

## 概述

本笔记本演示了如何使用Vertex AI分类模型评估组件来评估自动ML文本分类模型。模型评估帮助您根据评估指标确定模型性能，并在必要时改进模型。

了解更多关于[Vertex AI模型评估](https://cloud.google.com/vertex-ai/docs/evaluation/introduction)和[文本数据分类](https://cloud.google.com/vertex-ai/docs/training-overview#classification_for_text)。

### 目标

在本教程中，您将学习如何使用`Vertex AI Pipelines`和`Google Cloud Pipeline Components`来构建和评估一个`AutoML`文本分类模型。

本教程使用以下 Google Cloud ML 服务和资源：

- Vertex AI `Datasets`
- Vertex AI `Training`(AutoML 文本分类) 
- Vertex AI `Model Registry`
- Vertex AI `Pipelines`
- Vertex AI `Batch Predictions`

执行的步骤包括：

- 创建一个 Vertex AI `Dataset`。
- 在`Dataset`资源上训练一个 Automl 文本分类模型。
- 将训练好的`AutoML 模型资源`导入到管道中。
- 运行一个`Batch Prediction`作业。
- 使用`分类评估组件`评估 AutoML 模型。
- 将评估指标导入到 AutoML 模型资源中。

### 数据集

本教程使用的数据集是来自[Kaggle Datasets](https://www.kaggle.com/ritresearch/happydb)的[Happy Moments dataset](https://www.kaggle.com/ritresearch/happydb)。在本教程中使用的数据集版本存储在一个公共云存储桶中。

### 费用

本教程使用 Google Cloud 的可计费组件：

* Vertex AI
* Cloud Storage

了解 [Vertex AI
价格](https://cloud.google.com/vertex-ai/pricing) 和 [Cloud Storage
价格](https://cloud.google.com/storage/pricing)，并使用 [定价计算器](https://cloud.google.com/products/calculator/)
根据您的预期使用量生成费用估算。

安装

安装执行此笔记本所需的软件包。

In [None]:
! pip3 install --upgrade google-cloud-aiplatform \
                         google-cloud-storage \
                         kfp google-cloud-pipeline-components==1.0.25 \
                         ndjson --quiet

### 仅限Colab：取消下面单元格的注释以重新启动内核

In [None]:
# Automatically restart kernel after installs so that your environment can access the new packages
# import IPython

# app = IPython.Application.instance()
# app.kernel.do_shutdown(True)

## 在开始之前

### 设置您的谷歌云项目

**无论您的笔记本环境如何，以下步骤都是必需的。**

1. [选择或创建谷歌云项目](https://console.cloud.google.com/cloud-resource-manager)。首次创建帐户时，您将获得$300免费信用额用于支付计算/存储费用。

2. [确保您的项目已启用计费](https://cloud.google.com/billing/docs/how-to/modify-project)。

3. [启用 Vertex AI 和 Dataflow API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com,dataflow.googleapis.com)。

4. 如果您在本地运行此笔记本，则需要安装 [Cloud SDK](https://cloud.google.com/sdk)。

设置您的项目ID

**如果您不知道您的项目ID**，请尝试以下操作：
* 运行 `gcloud config list`。
* 运行 `gcloud projects list`。
* 查看支持页面：[查找项目ID](https://support.google.com/googleapi/answer/7014113)

In [None]:
PROJECT_ID = "[your-project-id]"  # @param {type:"string"}

# Set the project id
! gcloud config set project {PROJECT_ID}

区域

您还可以更改 Vertex AI 使用的 `REGION` 变量。了解有关 [Vertex AI 区域](https://cloud.google.com/vertex-ai/docs/general/locations) 的更多信息。

In [None]:
REGION = "us-central1"  # @param {type: "string"}

### 认证您的 Google Cloud 账户

根据您的 Jupyter 环境，您可能需要手动进行身份验证。请按照以下相关说明进行操作。

**1. Vertex AI Workbench**
* 不需要做任何操作，因为您已经过身份验证。

**2. 本地 JupyterLab 实例，请取消注释并运行：**

In [None]:
# ! gcloud auth login

3. 合作，取消注释并运行:

In [None]:
# from google.colab import auth
# auth.authenticate_user()

请查看如何向您的服务帐户授予云存储权限的方法，网址为 https://cloud.google.com/storage/docs/gsutil/commands/iam#ch-examples。

创建一个云存储桶

创建一个存储桶来存储中间构件，比如数据集。

In [None]:
BUCKET_URI = f"gs://your-bucket-name-{PROJECT_ID}-unique"  # @param {type:"string"}

只有在您的存储桶不存在时：运行以下单元格来创建您的云存储存储桶。

In [None]:
! gsutil mb -l {REGION} -p {PROJECT_ID} {BUCKET_URI}

#### 服务账户

**如果您不知道您的服务账户**，请尝试使用`gcloud`命令在下面执行第二个单元格来获取您的服务账户。

In [None]:
SERVICE_ACCOUNT = "[your-service-account]"  # @param {type:"string"}

In [None]:
import sys

IS_COLAB = "google.colab" in sys.modules

if (
    SERVICE_ACCOUNT == ""
    or SERVICE_ACCOUNT is None
    or SERVICE_ACCOUNT == "[your-service-account]"
):
    # Get your service account from gcloud
    if not IS_COLAB:
        shell_output = !gcloud auth list 2>/dev/null
        SERVICE_ACCOUNT = shell_output[2].replace("*", "").strip()

    if IS_COLAB:
        shell_output = ! gcloud projects describe  $PROJECT_ID
        project_number = shell_output[-1].split(":")[1].strip().replace("'", "")
        SERVICE_ACCOUNT = f"{project_number}-compute@developer.gserviceaccount.com"

    print("Service Account:", SERVICE_ACCOUNT)

设置 Vertex AI Pipelines 的服务账号访问权限

运行以下命令来授予您的服务账号对您在上一步创建的存储桶中读取和写入管道工件的访问权限。您只需要对每个服务账号运行此步骤一次。

In [None]:
! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectCreator $BUCKET_URI

! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectViewer $BUCKET_URI

### 导入库

In [None]:
import json

import kfp
import matplotlib.pyplot as plt
import ndjson
from google.cloud import aiplatform, aiplatform_v1, storage
from kfp.v2 import compiler  # noqa: F811

### 初始化用于 Python 的 Vertex AI SDK

为您的项目和相应的存储桶初始化 Python 的 Vertex AI SDK。

In [None]:
aiplatform.init(project=PROJECT_ID, staging_bucket=BUCKET_URI)

## 训练和部署AutoML文本分类模型

在这个笔记本中，您将执行从数据集构建到模型部署和评估的所有步骤，使用Vertex AI pipelines。

作为第一步，您将构建训练和部署管道。该管道包括以下任务：
1. 创建一个Vertex AI文本数据集。
2. 训练一个AutoML文本分类模型。
3. 创建一个Vertex AI端点。
4. 将AutoML模型部署到Vertex AI端点。

该管道使用`Google Cloud Pipeline Components`软件包中的预构建组件来执行每个任务。

了解更多关于[Google Cloud Pipeline Components](https://cloud.google.com/vertex-ai/docs/pipelines/components-introduction)。 

设置所需的训练和部署管道参数。

In [None]:
# Specify the GCS path for the text dataset
IMPORT_FILE = "gs://cloud-ml-data/NL-classification/happiness.csv"

# provide dataset display name
DATASET_DISPLAY_NAME = "happydb-dataset-unique"

# provide training job display name
TRAINING_JOB_DISPLAY_NAME = "happydb-automl-job-unique"

# provide model display name
MODEL_DISPLAY_NAME = "happydb-automl-model-unique"

# provide endpoint display name
ENDPOINT_DISPLAY_NAME = "happydb-classification-endpoint-unique"

# provide pipeline job display name
TRAINING_PIPELINE_DISPLAY_NAME = "happydb-training-pipeline-unique"

# provide Cloud Storage root folder path for saving the artifacts
PIPELINE_ROOT = f"{BUCKET_URI}/pipeline_root/happydb"

# provide path to store the compiled pipeline package
TRAINING_PIPELINE_PATH = "automl_text_classification_pipeline.json"

定义Vertex AI管道。

了解如何构建[Vertex AI管道](https://cloud.google.com/vertex-ai/docs/pipelines/build-pipeline)。

In [None]:
@kfp.dsl.pipeline(name=TRAINING_PIPELINE_DISPLAY_NAME)
def pipeline(
    import_file: str,
    dataset_display_name: str,
    training_job_display_name: str,
    model_display_name: str,
    endpoint_display_name: str,
    project: str = PROJECT_ID,
    region: str = REGION,
    training_split: float = 0.4,
    validation_split: float = 0.3,
    test_split: float = 0.3,
):
    from google_cloud_pipeline_components import aiplatform as gcc_aip
    from google_cloud_pipeline_components.v1.endpoint import (EndpointCreateOp,
                                                              ModelDeployOp)

    # component to create the dataset
    dataset_create_task = gcc_aip.TextDatasetCreateOp(
        display_name=dataset_display_name,
        gcs_source=import_file,
        import_schema_uri=aiplatform.schema.dataset.ioformat.text.multi_label_classification,
        project=project,
    )
    # component to run AutoML training job
    training_run_task = gcc_aip.AutoMLTextTrainingJobRunOp(
        dataset=dataset_create_task.outputs["dataset"],
        display_name=training_job_display_name,
        prediction_type="classification",
        multi_label=True,
        training_fraction_split=training_split,
        validation_fraction_split=validation_split,
        test_fraction_split=test_split,
        model_display_name=model_display_name,
        project=project,
    )
    # component to create an endpoint
    endpoint_op = EndpointCreateOp(
        project=project,
        location=region,
        display_name=endpoint_display_name,
    )
    # component to deploy the model the endpoint
    _ = ModelDeployOp(
        model=training_run_task.outputs["model"],
        endpoint=endpoint_op.outputs["endpoint"],
        automatic_resources_min_replica_count=1,
        automatic_resources_max_replica_count=1,
    )

### 编译管道

接下来，将管道编译成一个json包。

In [None]:
compiler.Compiler().compile(
    pipeline_func=pipeline,
    package_path=TRAINING_PIPELINE_PATH,
)

### 运行培训和部署管道

现在，创建一个Vertex AI管道作业来运行管道。请注意，在管道定义过程中，默认将训练、验证和测试分割比例设置为0.4、0.3和0.3。根据需要进行更改。

为了创建管道作业，您需要指定以下参数：

- `display_name`：管道的名称，在Google Cloud控制台中显示。
- `template_path`：PipelineJob或PipelineSpec JSON或YAML文件的路径。可以是本地路径、Google Cloud Storage URI或Artifact Registry URI。
- `parameter_values`：运行时参数名称与控制管道运行的值之间的映射。
- `enable_caching`：设置为True以开启运行时的缓存。

了解更多关于[PipelineJob](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.PipelineJob)。

In [None]:
# set the values to be passed as input parameters to the pipeline
training_parameters = {
    "import_file": IMPORT_FILE,
    "dataset_display_name": DATASET_DISPLAY_NAME,
    "training_job_display_name": TRAINING_JOB_DISPLAY_NAME,
    "model_display_name": MODEL_DISPLAY_NAME,
    "endpoint_display_name": ENDPOINT_DISPLAY_NAME,
}

# create a pipeline job
training_job = aiplatform.PipelineJob(
    display_name=TRAINING_PIPELINE_DISPLAY_NAME,
    template_path=TRAINING_PIPELINE_PATH,
    pipeline_root=PIPELINE_ROOT,
    parameter_values=training_parameters,
    enable_caching=False,
)

# run the job
training_job.run(sync=True)

点击生成的链接以在云控制台中查看您的运行。

在用户界面中，当您点击它们时，许多流水线DAG节点会展开或折叠。这里是DAG的部分展开视图（点击图像查看更大的版本）。

<a href="https://storage.googleapis.com/amy-jo/images/mp/automl_text_classif.png" target="_blank"><img src="https://storage.googleapis.com/amy-jo/images/mp/automl_text_classif.png" width="40%"/></a>

通过过滤显示名称获取创建的模型。

In [None]:
models = aiplatform.Model.list(
    filter=f"display_name={MODEL_DISPLAY_NAME}", order_by="create_time"
)
if models:
    model = models[0]
print(model)

为模型获取可用的评估指标。

In [None]:
# Get evaluations
model_evaluations = model.list_model_evaluations()

model_evaluation = list(model_evaluations)[0]

# Print the evaluation metrics
for evaluation in model_evaluations:
    evaluation = evaluation.to_dict()
    print("Model's evaluation metrics from Training:\n")
    metrics = evaluation["metrics"]
    for metric in metrics.keys():
        print(f"metric: {metric}, value: {metrics[metric]}\n")

运行模型的批量预测

为了评估模型，需要一批测试数据以及基准真相。在评估模型之前，您会为模型生成一个批处理预测任务，以查看模型是否能够批量生成预测。在Vertex AI中，您无需部署模型即可在其上运行批处理预测任务。

要创建批处理预测任务，您必须首先格式化输入实例（以JSONL格式）并存储在Google Cloud Storage桶中。您还需要提供一个Google Cloud Storage桶来保存结果。

#### 格式化输入实例
在此步骤中，实例以JSONL格式进行格式化。JSONL文档中的每一行都需要格式化如下所示。

```
{ "content": "gs://sourcebucket/datasets/texts/source_text.txt", "mimeType": "text/plain"}
```

JSON结构中的`content`字段必须是指向包含模型文本输入的文档的Google Cloud Storage URI。

了解有关[批量预测](https://cloud.google.com/ai-platform-unified/docs/predictions/batch-predictions#text)的更多信息。

In [None]:
# define a set of test samples
instances = [
    {
        "Text": "I went on a successful date with someone I felt sympathy and connection with.",
        "Labels": "affection",
    },
    {
        "Text": "I was happy when my son got 90% marks in his examination",
        "Labels": "affection",
    },
    {"Text": "I went to the gym this morning and did yoga.", "Labels": "exercise"},
    {
        "Text": "We had a serious talk with some friends of ours who have been flaky lately. They understood and we had a good evening hanging out.",
        "Labels": "bonding",
    },
    {
        "Text": "I went with grandchildren to butterfly display at Crohn Conservatory",
        "Labels": "affection",
    },
    {"Text": "I meditated last night.", "Labels": "leisure"},
    {
        "Text": "I made a new recipe for peasant bread, and it came out spectacular!",
        "Labels": "achievement",
    },
    {
        "Text": "I got gift from my elder brother which was really surprising me",
        "Labels": "affection",
    },
    {"Text": "YESTERDAY MY MOMS BIRTHDAY SO I ENJOYED", "Labels": "enjoy_the_moment"},
    {
        "Text": "Watching cupcake wars with my three teen children",
        "Labels": "affection",
    },
    {"Text": "I came in 3rd place in my Call of Duty video game.", "Labels": "leisure"},
    {
        "Text": "I completed my 5 miles run without break. It makes me feel strong.",
        "Labels": "exercise",
    },
    {"Text": "went to movies with my friends it was fun", "Labels": "bonding"},
    {
        "Text": "I was shorting Gold and made $200 from the trade.",
        "Labels": "achievement",
    },
    {
        "Text": "Hearing Songs It can be nearly impossible to go from angry to happy, so you're just looking for the thought that eases you out of your angry feeling and moves you in the direction of happiness. It may take a while, but as long as you're headed in a more positive direction youall be doing yourself a world of good.",
        "Labels": "enjoy_the_moment",
    },
    {
        "Text": "My son performed very well for a test preparation.",
        "Labels": "affection",
    },
    {"Text": "I helped my neighbour to fix their car damages.", "Labels": "bonding"},
    {
        "Text": "Managed to get the final trophy in a game I was playing.",
        "Labels": "achievement",
    },
    {
        "Text": "A hot kiss with my girl friend last night made my day",
        "Labels": "bonding",
    },
    {
        "Text": "My new BCAAs came in the mail. Yay! Strawberry Lemonade flavored aminos make my heart happy.",
        "Labels": "affection",
    },
    {"Text": "Got A in class.", "Labels": "achievement"},
    {
        "Text": "My sister called me from abroad this morning after some long years. Such a happy occassion for all family members.",
        "Labels": "affection",
    },
    {
        "Text": "The cake I made today came out amazing. It tasted amazing as well.",
        "Labels": "achievement",
    },
    {
        "Text": "There are two types of people in the world: those who choose to be happy, and those who choose to be unhappy. Contrary to popular belief, happiness doesn't come from fame, fortune, other people, or material possessions",
        "Labels": "enjoy_the_moment",
    },
    {
        "Text": "My grandmother start to walk from the bed after a long time.",
        "Labels": "affection",
    },
    {"Text": "i was able to hit a top spin serve in tennis", "Labels": "achievement"},
    {
        "Text": "I napped with my husband on the bed this afternoon and it was sweet to cuddle so close to him.",
        "Labels": "affection",
    },
    {
        "Text": "My co-woker started playing a Carley Rae Jepsen song from her phone while ringing out customers.",
        "Labels": "leisure",
    },
    {
        "Text": "My son woke me up to a fantastic breakfast of eggs, his special hamburger patty and pancakes.",
        "Labels": "affection",
    },
    {
        "Text": "After a long time my brother gave a suprise visit to my house yesterday.",
        "Labels": "affection",
    },
]

# define the input file name
BATCH_JOB_INPUT_FILE = "happiness-batch-prediction-input.jsonl"

将数据保存到云存储桶

创建一个新的云存储 blob，将单个实例作为文本文件上传到桶中，然后创建包含实例 URI 的 JSONL 文件。

In [None]:
# Instantiate the Storage client and create the new bucket
storage_client = storage.Client()
bucket = storage_client.bucket(BUCKET_URI[5:])
# Iterate over the prediction instances and create a new text file
input_file_data = []
for count, instance in enumerate(instances):
    instance_name = f"input_{count}.txt"
    instance_file_uri = f"{BUCKET_URI}/batch-prediction-input/{instance_name}"
    # Add the data to store in the JSONL input file.
    tmp_data = {"content": instance_file_uri, "mimeType": "text/plain"}
    input_file_data.append(tmp_data)

    # Create the new instance file
    blob = bucket.blob("batch-prediction-input/" + instance_name)
    blob.upload_from_string(instance["Text"])


input_str = "\n".join([str(d) for d in input_file_data])
file_blob = bucket.blob(f"{BATCH_JOB_INPUT_FILE}")
file_blob.upload_from_string(input_str)

创建并运行批量预测作业

In [None]:
# provide display name for the batch prediction job
BATCH_JOB_DISPLAY_NAME = "happydb-batch-prediction-job-unique"

# create the batch prediction job
batch_prediction_job = model.batch_predict(
    job_display_name=BATCH_JOB_DISPLAY_NAME,
    gcs_source=f"{BUCKET_URI}/{BATCH_JOB_INPUT_FILE}",
    gcs_destination_prefix=f"{BUCKET_URI}/output",
    sync=True,
)
batch_prediction_job_name = batch_prediction_job.resource_name

In [None]:
# fetch the job details
batch_job = aiplatform.jobs.BatchPredictionJob(batch_prediction_job_name)
print(f"Batch prediction job state: {str(batch_job.state)}")

从批量预测作业获取预测结果

加载保存在指定输出 Cloud Storage 路径的批量预测结果。

In [None]:
bp_iter_outputs = batch_job.iter_outputs()

prediction_results = list()
for blob in bp_iter_outputs:
    if blob.name.split("/")[-1].startswith("prediction"):
        prediction_results.append(blob.name)

for prediction_result in prediction_results:
    gfile_name = f"gs://{bp_iter_outputs.bucket.name}/{prediction_result}".replace(
        BUCKET_URI + "/", ""
    )
    data = bucket.get_blob(gfile_name).download_as_string()
    data = ndjson.loads(data)
    print(data)

创建包含真实数据的输入文件用于评估

评估组件需要将真实数据作为输入文件的一部分，以便将预测结果与其进行比较和评估。

In [None]:
# set the file name for saving the input with ground truth data
BATCH_JOB_INPUT_EVAL_FILE = "happydb-input-with-groundtruth.jsonl"

In [None]:
# Instantiate the Storage client and create the new bucket
storage_client = storage.Client()
bucket = storage_client.bucket(BUCKET_URI[5:])
# Iterate over the prediction instances, creating a new TXT file
# for each.
input_file_data = []
for count, instance in enumerate(instances):
    instance_name = f"input_{count}.txt"
    instance_file_uri = (
        f"{BUCKET_URI}/evaluation-batch-prediction-input/{instance_name}"
    )
    # Add the data to store in the JSONL input file.
    # ground_truth variable in each json instance is needed to act as ground_truth for the evaluation task
    tmp_data = {
        "content": instance_file_uri,
        "mimeType": "text/plain",
        "ground_truth": instance["Labels"],
    }
    input_file_data.append(tmp_data)

    # Create the new instance file
    blob = bucket.blob("evaluation-batch-prediction-input/" + instance_name)
    blob.upload_from_string(instance["Text"])

input_str = json.dumps(input_file_data[0])
for i in input_file_data[1:]:
    input_str = input_str + "\n" + json.dumps(i)
file_blob = bucket.blob(f"{BATCH_JOB_INPUT_EVAL_FILE}")
file_blob.upload_from_string(input_str)

创建一个模型评估的流水线

在这一部分，您可以通过调用`evaluate`函数运行一个批量预测作业，并评估从Vertex AI流水线得到的结果。了解更多关于[evaluate函数](https://github.com/googleapis/python-aiplatform/blob/main/google/cloud/aiplatform/models.py#L5127)的信息。

设置评估流水线的参数。

### 定义参数来运行评估函数

指定运行`evaluate`函数所需的参数。

以下是`evaluate`函数参数的说明：

- `prediction_type`: 评估运行所涉及的问题类型。目前支持的问题类型为“分类”和“回归”。
- `target_field_name`: 用作分类目标的列的名称。
- `gcs_source_uris`: 批量预测输入实例的Cloud Storage存储桶URI列表。
- `class_labels`: 数据集中目标字段的所有类别名称列表。
- `generate_feature_attributions`: 可选项。模型评估作业是否应该生成特征归因。如果未指定，默认为False。

In [None]:
DATA_SOURCE = f"{BUCKET_URI}/{BATCH_JOB_INPUT_EVAL_FILE}"
CLASS_LABELS = [
    "affection",
    "exercise",
    "bonding",
    "leisure",
    "achievement",
    "enjoy_the_moment",
    "nature",
]

evaluation_job = model.evaluate(
    prediction_type="classification",
    target_field_name="ground_truth",
    gcs_source_uris=[DATA_SOURCE],
    class_labels=CLASS_LABELS,
    generate_feature_attributions=False,
)

print("Waiting model evaluation is in process")
evaluation_job.wait()

检查评估结果

要查看管道是否成功运行，请单击上面生成的链接，以在Cloud控制台中查看管道图。

在显示的管道中，点击节点时会展开或折叠。 下面可以看到管道的部分展开视图的示例（单击图像查看更大版本）。

<img src="images/automl-text-classification-evaluation-image.PNG">

获取模型评估结果

在评估管道完成后，运行下面的单元格以打印评估指标。

In [None]:
model_evaluation = evaluation_job.get_model_evaluation()

In [None]:
# Iterate over the pipeline tasks
for (
    task
) in model_evaluation._backing_pipeline_job._gca_resource.job_detail.task_details:
    # Obtain the artifacts from the evaluation task
    if (
        ("model-evaluation" in task.task_name)
        and ("model-evaluation-import" not in task.task_name)
        and (
            task.state == aiplatform_v1.types.PipelineTaskDetail.State.SUCCEEDED
            or task.state == aiplatform_v1.types.PipelineTaskDetail.State.SKIPPED
        )
    ):
        evaluation_metrics = task.outputs.get("evaluation_metrics").artifacts[
            0
        ]  # ['artifacts']
        evaluation_metrics_gcs_uri = evaluation_metrics.uri

print(evaluation_metrics)
print(evaluation_metrics_gcs_uri)

### 可视化指标

使用条形图来可视化可用的指标，如`auRoc`和`logLoss`。

In [None]:
metrics = []
values = []
for i in evaluation_metrics.metadata.items():
    metrics.append(i[0])
    values.append(i[1])
plt.figure(figsize=(15, 5))
plt.bar(x=metrics, height=values)
plt.title("Evaluation Metrics")
plt.ylabel("Value")
plt.show()

### 检查模型注册表中的模型评估

为确保模型评估成功导入到模型资源中，列出评估并打印它们。

In [None]:
# get the model evaluation configuration from the pipeline job
for (
    task
) in model_evaluation._backing_pipeline_job._gca_resource.job_detail.task_details:
    if "model-evaluation-import" in task.task_name:
        val = json.loads(task.execution.metadata.get("output:gcp_resources"))
        model_evaluation = val["resources"][0]

In [None]:
# Print the evaluation metrics
model_evaluation_id = model_evaluation["resourceUri"].split("/")[-1]
print(model_evaluation_id)

# get evaluations from the model
evaluation = model.get_model_evaluation()
evaluation = evaluation.to_dict()
print("Model's evaluation metrics:\n")
metrics = evaluation["metrics"]
for metric in metrics.keys():
    print(f"metric: {metric}, value: {metrics[metric]}\n")

清理

要清理此项目中使用的所有Google Cloud资源，您可以[删除用于本教程的Google Cloud项目](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects)。

否则，您可以删除在本教程中创建的各个资源：

- 评估作业
- 批处理预测作业
- 训练和部署作业
- 终端
- 模型
- 数据集
- 云存储桶（设置 `delete_bucket` 为True以进行删除）

In [None]:
delete_bucket = False

# # delete the evaluation job
evaluation_job.delete()

# # delete the batch prediction job
batch_prediction_job.delete()

# delete the training job
training_job.delete()

# list the endpoints filtering the display name
endpoints = aiplatform.Endpoint.list(
    filter=f"display_name={ENDPOINT_DISPLAY_NAME}", order_by="create_time"
)

# delete the endpoint
if endpoints:
    endpoint = endpoints[0]
    endpoint.undeploy_all()
    endpoint.delete()
    print("Deleted endpoint:", endpoint)

# list the models filtering the display name
models = aiplatform.Model.list(
    filter=f"display_name={MODEL_DISPLAY_NAME}", order_by="create_time"
)
# delete the model
if models:
    model = models[0]
    model.delete()
    print("Deleted model:", model)

# list the datasets filtering the display name
datasets = aiplatform.TextDataset.list(
    filter=f"display_name={DATASET_DISPLAY_NAME}", order_by="create_time"
)
# delete the dataset
if datasets:
    dataset = datasets[0]
    dataset.delete()
    print("Deleted dataset:", dataset)

# delete the Cloud Storage bucket
if delete_bucket and os.getenv("IS_TESTING"):
    ! gsutil rm -r $BUCKET_URI