## 使用在线端点部署 llama 推理服务

本示例展示了如何将 `text-generation` 类型的模型部署到在线端点进行推理。

### 任务
`text-generation` 是生成新文本的任务。这些模型可以完成不完整的文本或进行改写。文本生成的一些常见应用包括代码生成和故事生成。

### 模型
能够执行 `text-generation` 任务的模型都标记有 `task: text-generation`。在本 notebook 中，我们将使用 `gpt2` 模型。

### 推理数据
我们将使用[book corpus](https://huggingface.co/datasets/bookcorpus)数据集。

### 大纲
* 设置先决条件
* 选择要部署的模型
* 下载并准备推理数据
* 部署模型进行实时推理
* 测试端点
* 清理资源

### 1. 设置先决条件

**要求**
- 对机器学习有基本了解
- 拥有具有活跃订阅的Azure账户 - [免费创建账户](https://azure.microsoft.com/free/?WT.mc_id=A261C142F)
- 拥有带计算集群的Azure ML工作区 - [配置工作区](https://aka.ms/azureml-workspace-configuration)
- Python环境
- 已安装Azure机器学习Python SDK v2 - [安装说明](https://aka.ms/azureml-sdkv2-install) - 查看入门部分


* 连接到AzureML工作区。在[设置SDK认证](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-setup-authentication?tabs=sdk)了解更多。替换下面的 `<WORKSPACE_NAME>`、`<RESOURCE_GROUP>` 和 `<SUBSCRIPTION_ID>`。
* 连接到 `azureml` 系统注册表

定义一些资源变量

In [6]:
subscription_id = "b1d92895-527c-4a67-91d2-8f653d9ee248"
resource_group_name = "test-ml-rg2"
workspace_name = "test-ml-ws"
model_name = "gpt2"
region = "westus3"

### 1.1 创建资源组

检查资源组是否存在

In [5]:
!az group exists --name {resource_group_name}

true


没有则创建资源组

In [3]:
!az group create --name {resource_group_name} --location {region}

[91m(InvalidResourceGroup) The provided resource group name '{resource_group_name}' has these invalid characters: '{}'. The name can only be a letter, digit, '-', '.', '(', ')' or '_'. For more details, visit https://aka.ms/ResourceGroupNamingRestrictions .
Code: InvalidResourceGroup
Message: The provided resource group name '{resource_group_name}' has these invalid characters: '{}'. The name can only be a letter, digit, '-', '.', '(', ')' or '_'. For more details, visit https://aka.ms/ResourceGroupNamingRestrictions .[0m


In [7]:
from azure.ai.ml import MLClient
from azure.identity import (
    DefaultAzureCredential,
    InteractiveBrowserCredential
)

try:
    credential = DefaultAzureCredential()
    credential.get_token("https://management.azure.com/.default")
except Exception as ex:
    credential = InteractiveBrowserCredential()

workspace_ml_client = MLClient(
    credential,
    subscription_id=subscription_id,
    resource_group_name=resource_group_name,
)
# the models, fine tuning pipelines and environments are available in the AzureML system registry, "azureml"
registry_ml_client = MLClient(credential, registry_name="azureml")

Overriding of current TracerProvider is not allowed
Overriding of current LoggerProvider is not allowed
Overriding of current MeterProvider is not allowed
Attempting to instrument while already instrumented
Attempting to instrument while already instrumented
Attempting to instrument while already instrumented
Attempting to instrument while already instrumented


### 1.2 创建基础工作区

创建工作区默认会创建一下资源：

- name - 工作区名称
- location - 区域
- display_name - 工作区显示名称
- description - 工作区描述
- tags - (可选)工作区标签

使用前面创建的 `MLClient` 对象创建工作区：

In [25]:
# Creating a unique workspace name with current datetime to avoid conflicts
from azure.ai.ml.entities import Workspace

ws_basic = Workspace(
    name=workspace_name,
    location=region,
    display_name=workspace_name,
    description="This example shows how to create a basic workspace",
    hbi_workspace=False,
    tags=dict(purpose="demo"),
)

ws_basic = workspace_ml_client.workspaces.begin_create(ws_basic).result()
print(ws_basic)

The deployment request test-ml-ws-3394499 was accepted. ARM deployment URI for reference: 
https://portal.azure.com//#blade/HubsExtension/DeploymentDetailsBlade/overview/id/%2Fsubscriptions%2Fb1d92895-527c-4a67-91d2-8f653d9ee248%2FresourceGroups%2Ftest-ml-rg2%2Fproviders%2FMicrosoft.Resources%2Fdeployments%2Ftest-ml-ws-3394499
Creating Log Analytics Workspace: (testmlwslogalyti8fff3011  ) ..  Done (20s)
Creating Application Insights: (testmlwsinsightsb7cbc563  )  Done (25s)
Creating Key Vault: (testmlwskeyvaultdb6d2371  )  Done (22s)
Creating Storage Account: (testmlwsstoragebe5f1e438  )  Done (23s)
Creating AzureML Workspace: (test-ml-ws  ) ..  Done (21s)
Total time : 52s



allow_roleassignment_on_rg: true
application_insights: /subscriptions/b1d92895-527c-4a67-91d2-8f653d9ee248/resourceGroups/test-ml-rg2/providers/Microsoft.insights/components/testmlwsinsightsb7cbc563
description: This example shows how to create a basic workspace
discovery_url: https://westus3.api.azureml.ms/discovery
display_name: test-ml-ws
enable_data_isolation: false
hbi_workspace: false
id: /subscriptions/b1d92895-527c-4a67-91d2-8f653d9ee248/resourceGroups/test-ml-rg2/providers/Microsoft.MachineLearningServices/workspaces/test-ml-ws
identity:
  principal_id: 9c707698-cdec-4430-9111-cbd2088d1eea
  tenant_id: 16b3c013-d300-468d-ac64-7eda0820b6d3
  type: system_assigned
key_vault: /subscriptions/b1d92895-527c-4a67-91d2-8f653d9ee248/resourceGroups/test-ml-rg2/providers/Microsoft.Keyvault/vaults/testmlwskeyvaultdb6d2371
location: westus3
managed_network:
  isolation_mode: disabled
  outbound_rules: []
mlflow_tracking_uri: azureml://westus3.api.azureml.ms/mlflow/v1.0/subscriptions/b1d928

更新指定了 workspace 的 `MLClient`

In [8]:
workspace_ml_client = MLClient(
    credential,
    subscription_id=subscription_id,
    resource_group_name=resource_group_name,
    workspace_name=workspace_name
)

Overriding of current TracerProvider is not allowed
Overriding of current LoggerProvider is not allowed
Overriding of current MeterProvider is not allowed
Attempting to instrument while already instrumented
Attempting to instrument while already instrumented
Attempting to instrument while already instrumented
Attempting to instrument while already instrumented


### 2. 选择要部署的模型

在AzureML Studio的模型目录中浏览模型，按 `text-generation` 任务进行筛选。在本例中，我们使用 `gpt2` 模型。如果您是为不同模型打开此notebook，请相应地替换模型名称和版本。

In [9]:
version_list = list(registry_ml_client.models.list(model_name))
if len(version_list) == 0:
    print("Model not found in registry")
else:
    model_version = version_list[0].version
    foundation_model = registry_ml_client.models.get(model_name, model_version)
    print(
        "\n\nUsing model name: {0}, version: {1}, id: {2} for inferencing".format(
            foundation_model.name, foundation_model.version, foundation_model.id
        )
    )



Using model name: gpt2, version: 18, id: azureml://registries/azureml/models/gpt2/versions/18 for inferencing


### 3. 下载并准备推理数据

接下来的几个单元格展示了基本的数据准备：
* 可视化一些数据行
* 以可以作为在线推理端点输入的格式保存几个样本

In [10]:
# Download a small sample of the dataset into the ./book-corpus-dataset directory
%run ./book-corpus-dataset/download-dataset.py --download_dir ./book-corpus-dataset

Loading train split of bookcorpus dataset...


In [11]:
# load the ./book-corpus-dataset/train.jsonl file into a pandas dataframe and show the first 5 rows
import pandas as pd

pd.set_option(
    "display.max_colwidth", 0
)  # set the max column width to 0 to display the full text
train_df = pd.read_json("./book-corpus-dataset/train.jsonl", lines=True)
train_df.head(2)

Unnamed: 0,text
0,megan smiled .
1,"he stared at her for a moment , unblinking and unmoving ."


### 4. 将模型部署到在线端点
在线端点提供了一个持久的REST API，可用于与需要使用模型的应用程序集成。


In [10]:
import time, sys
from azure.ai.ml.entities import (
    ManagedOnlineEndpoint,
    ManagedOnlineDeployment,
    OnlineRequestSettings,
)

# Create online endpoint - endpoint names need to be unique in a region, hence using timestamp to create unique endpoint name
timestamp = int(time.time())
online_endpoint_name = "text-generation-1747146946"
# create an online endpoint
endpoint = ManagedOnlineEndpoint(
    name=online_endpoint_name,
    description="Online endpoint for "
    + foundation_model.name
    + ", for text-generation task",
    auth_mode="key",
)
workspace_ml_client.begin_create_or_update(endpoint).wait()

创建一个在线部署，指定部署名称、端点名称、模型ID、实例类型、实例数量和请求设置

In [12]:
# create a deployment
demo_deployment = ManagedOnlineDeployment(
    name="demo",
    endpoint_name=online_endpoint_name,
    model=foundation_model.id,
    environment_variables={
        "ENGINE_NAME": "vllm"
    },
    instance_type="STANDARD_NC4AS_T4_V3",
    instance_count=1,
    request_settings=OnlineRequestSettings(
        request_timeout_ms=30000,
    ),
)
workspace_ml_client.online_deployments.begin_create_or_update(demo_deployment).wait()
endpoint.traffic = {"demo": 100}
workspace_ml_client.begin_create_or_update(endpoint).result()

Check: endpoint text-generation-1747146946 exists


...................................................................................................................................................................................................................................................

HttpResponseError: (BadArgument) Startup task failed due to authorization error. Please see troubleshooting guide, available here: https://aka.ms/oe-tsg#error-badargument
Code: BadArgument
Message: Startup task failed due to authorization error. Please see troubleshooting guide, available here: https://aka.ms/oe-tsg#error-badargument

### 5. 测试端点

我们将从测试数据集中获取一些样本数据并提交到在线端点进行推理。


In [45]:
import json
import os

# read the ./book-corpus-dataset/train.jsonl file into a pandas dataframe
df = pd.read_json("./book-corpus-dataset/train.jsonl", lines=True)
# escape single and double quotes in the text column
df["text"] = df["text"].str.replace("'", "\\'").str.replace('"', '\\"')
# pick 1 random row
sample_df = df.sample(1)
# create a json object with the key as "inputs" and value as a list of values from the article column of the sample_df dataframe
sample_json = {"inputs": sample_df["text"].tolist()}
# save the json object to a file named sample_score.json in the ./book-corpus-dataset folder
test_json = {"inputs": {"input_string": sample_df["text"].tolist()}}
# save the json object to a file named sample_score.json in the ./book-corpus-dataset folder
with open(os.path.join(".", "book-corpus-dataset", "sample_score.json"), "w") as f:
    json.dump(test_json, f)
sample_df.head()

Unnamed: 0,text
0,megan questioned .


调用 invoke 方法测试，指定以下参数：
- endpoint_name - 端点名称
- deployment_name - 部署名称
- request_file - 请求文件路径

In [None]:
# score the sample_score.json file using the online endpoint with the azureml endpoint invoke method
response = workspace_ml_client.online_endpoints.invoke(
    endpoint_name=online_endpoint_name,
    deployment_name="demo",
    request_file="./book-corpus-dataset/sample_score.json",
)
print("raw response: \n", response, "\n")
# convert the json response to a pandas dataframe
response_df = pd.read_json(response)
response_df.head()

### 6. 删除在线端点
别忘了删除在线端点，否则您将继续为端点使用的计算资源付费

In [20]:
workspace_ml_client.online_endpoints.begin_delete(name=online_endpoint_name).wait()

...................................................................................

### 7. Troubelshooting

#### 7.1 解决存储账户网络访问权限问题

配置允许公网访问

In [None]:
# Enable public network access for Azure storage account using az cli
!az storage account update \
    --name $(az storage account list -g $resource_group_name --query "[0].name" -o tsv) \
    --resource-group $resource_group_name \
    --public-network-access Enabled \
    --default-action Allow

配置允许存储账户密钥访问

In [None]:
# Enable storage account key access using az cli
!az storage account update \
    --name $(az storage account list -g $resource_group_name --query "[0].name" -o tsv) \
    --resource-group $resource_group_name \
    --allow-shared-key-access true