Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Huggingface Model Deployer #2376

Merged
merged 48 commits into from
Mar 8, 2024
Merged
Show file tree
Hide file tree
Changes from 27 commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
0f7840d
Initial implementation of huggingface model deployer
dudeperf3ct Jan 30, 2024
0c1a1de
Add missing step init
dudeperf3ct Jan 30, 2024
1355422
Simplify modify_endpoint_name function and fix docstrings
dudeperf3ct Jan 31, 2024
f7ddd6b
Merge branch 'develop' into huggingface-model-deployer
dudeperf3ct Jan 31, 2024
2fdde4b
Formatting logger
dudeperf3ct Jan 31, 2024
44c8cd0
Add License to new files
dudeperf3ct Feb 1, 2024
a6231fa
Enhancements as per PR review comments
dudeperf3ct Feb 1, 2024
51e83d1
Merge branch 'develop' into huggingface-model-deployer
dudeperf3ct Feb 1, 2024
3c34a80
Add logging message to catch KeyError
dudeperf3ct Feb 1, 2024
efb9158
Merge branch 'develop' into huggingface-model-deployer
avishniakov Feb 1, 2024
446454d
Remove duplicate variable
dudeperf3ct Feb 1, 2024
3d5593a
Reorder lines for clarity
dudeperf3ct Feb 1, 2024
d619519
Merge branch 'develop' into huggingface-model-deployer
safoinme Feb 1, 2024
33ea8a4
Add docs for huggingface model deployer
dudeperf3ct Feb 5, 2024
147ec07
Merge branch 'develop' into huggingface-model-deployer
dudeperf3ct Feb 5, 2024
d9b16fa
Merge branch 'develop' into huggingface-model-deployer
strickvl Feb 5, 2024
cd17777
Fix CI errors
dudeperf3ct Feb 5, 2024
d9f3069
Merge remote-tracking branch 'origin/huggingface-model-deployer' into…
dudeperf3ct Feb 5, 2024
479166b
Merge branch 'develop' into huggingface-model-deployer
dudeperf3ct Feb 5, 2024
d2dc135
Fix get_model_info function arguments
dudeperf3ct Feb 5, 2024
b2452aa
More CI fixes
dudeperf3ct Feb 6, 2024
f88a675
Merge branch 'develop' into huggingface-model-deployer
dudeperf3ct Feb 6, 2024
3d97c26
Add minimal supported version for Inference Endpoint API in huggingfa…
dudeperf3ct Feb 6, 2024
7e6a7d9
Merge branch 'develop' into huggingface-model-deployer
strickvl Feb 6, 2024
8c96a8f
Relax 'adlfs' package requirement in azure integrations
dudeperf3ct Feb 6, 2024
f141fed
update TOC (#2406)
strickvl Feb 6, 2024
1a1bd40
Relax 's3fs' version in s3 integration
dudeperf3ct Feb 6, 2024
fe8534a
Merge branch 'develop' into huggingface-model-deployer
dudeperf3ct Feb 6, 2024
f98ed17
Bugs fixed running a test deployment pipeline
dudeperf3ct Feb 7, 2024
e85a0e8
Add deployment pipelines to huggingface integration test
dudeperf3ct Feb 7, 2024
06d52ec
Remove not required check on service running in tests
dudeperf3ct Feb 7, 2024
9a400b4
Merge branch 'develop' into huggingface-model-deployer
strickvl Feb 7, 2024
3acaa80
Address PR comments on documentation and suggested renaming in code
dudeperf3ct Feb 8, 2024
5b96840
Merge remote-tracking branch 'origin/huggingface-model-deployer' into…
dudeperf3ct Feb 8, 2024
5dd45d8
Merge branch 'develop' into huggingface-model-deployer
dudeperf3ct Feb 8, 2024
2c9c5a2
Add partial test for huggingface_deployment
dudeperf3ct Feb 8, 2024
c3caa59
Fix typo in test function
dudeperf3ct Feb 8, 2024
4dde857
Merge branch 'develop' into huggingface-model-deployer
dudeperf3ct Feb 8, 2024
3eab2e0
Merge branch 'develop' into huggingface-model-deployer
strickvl Feb 12, 2024
3de4783
Merge branch 'develop' into huggingface-model-deployer
strickvl Feb 27, 2024
a8f5e1d
Update pyproject.toml
strickvl Feb 27, 2024
9e17cb1
Update pyproject.toml
strickvl Feb 27, 2024
85277a7
Merge branch 'develop' into huggingface-model-deployer
strickvl Feb 27, 2024
68557b3
Relax gcfs
strickvl Feb 27, 2024
6eb5b64
Update model deployers table
dudeperf3ct Mar 7, 2024
fa17af1
Merge 'develop' branch into 'huggingface-model-deployer' branch
dudeperf3ct Mar 7, 2024
f7daf26
Fix lint issue
dudeperf3ct Mar 7, 2024
76206a9
Merge branch 'develop' into huggingface-model-deployer
strickvl Mar 7, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
dudeperf3ct marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -0,0 +1,153 @@
---
description: Deploying models to Huggingface Inference Endpoints with Hugging Face :hugging_face:.
---

# Hugging Face :hugging_face:

Hugging Face Inference Endpoints provides a secure production solution to easily deploy any `transformers`, `sentence-transformers`, and `diffusers` models on a dedicated and autoscaling infrastructure managed by Hugging Face. An Inference Endpoint is built from a model from the [Hub](https://huggingface.co/models).

This service provides dedicated and autoscaling infrastructure managed by Hugging Face, allowing you to deploy models without dealing with containers and GPUs.

## When to use it?

You should use Hugging Face Model Deployer:

* if you want to deploy [Transformers, Sentence-Transformers, or Diffusion models](https://huggingface.co/docs/inference-endpoints/supported_tasks) on dedicated and secure infrastructure.
* if you prefer a fully-managed production solution for inference without the need to handle containers and GPUs.
* if your goal is to turn your models into production-ready APIs with minimal infrastructure or MLOps involvement * Cost-effectiveness is crucial, and you want to pay only for the raw compute resources you use.
dudeperf3ct marked this conversation as resolved.
Show resolved Hide resolved
* Enterprise security is a priority, and you need to deploy models into secure offline endpoints accessible only via a direct connection to your Virtual Private Cloud (VPCs).

If you are looking for a more easy way to deploy your models locally, you can use the [MLflow Model Deployer](mlflow.md) flavor.

## How to deploy it?

The Huggingface Model Deployer flavor is provided by the Huggingface ZenML integration, so you need to install it on your local machine to be able to deploy your models. You can do this by running the following command:
dudeperf3ct marked this conversation as resolved.
Show resolved Hide resolved

```bash
zenml integration install huggingface -y
```

To register the Huggingface model deployer with ZenML you need to run the following command:
dudeperf3ct marked this conversation as resolved.
Show resolved Hide resolved

```bash
zenml model-deployer register <MODEL_DEPLOYER_NAME> --flavor=huggingface --token=<YOUR_HF_TOKEN> --namespace=<YOUR_HF_NAMESPACE>
```

Here,

* `token` parameter is the huggingface authentication token. It can be managed through [huggingface settings](https://huggingface.co/settings/tokens).
dudeperf3ct marked this conversation as resolved.
Show resolved Hide resolved
* `namespace` parameter is used for listing and creating the inference endpoints. It can take any of the following values, username or organization name or `*` depending on where inference endpoint should be created.
dudeperf3ct marked this conversation as resolved.
Show resolved Hide resolved

We can now use the model deployer in our stack.

```bash
zenml stack update <CUSTOM_STACK_NAME> --model-deployer=<MODEL_DEPLOYER_NAME>
```

See the [huggingface_model_deployer_step](https://sdkdocs.zenml.io/latest/integration_code_docs/integrations-seldon/#zenml.integrations.huggingface.steps.huggingface_deployer.huggingface_model_deployer_step) for an example of using the Huggingface Model Deployer to deploy a model inside a ZenML pipeline step.
dudeperf3ct marked this conversation as resolved.
Show resolved Hide resolved

## Configuration

Within the `HuggingFaceServiceConfig` you can configure:

* `model_name`: the name of the model in ZenML.
* `endpoint_name`: the name of inference endpoint. We add a prefix `zenml-` and first 8 characters of service uuid as suffix to the endpoint name.
dudeperf3ct marked this conversation as resolved.
Show resolved Hide resolved
* `repository`: The repository name in the user’s namespace (`{username}/{model_id}`) or in the organization namespace (`{organization}/{model_id}`) from the Hugging Face hub.
* `framework`: The machine learning framework used for the model (e.g. `"custom"`, `"pytorch"` )
* `accelerator`: The hardware accelerator to be used for inference. (e.g. `"cpu"`, `"gpu"`)
* `instance_size`: The size of the instance to be used for hosting the model (e.g. `"large"`, `"xxlarge"`)
* `instance_type`: Inference Endpoints offers a selection of curated CPU and GPU instances. (e.g. `"c6i"`, `"g5.12xlarge"`)
strickvl marked this conversation as resolved.
Show resolved Hide resolved
* `region`: The cloud region in which the Inference Endpoint will be created (e.g. `"us-east-1"`, `"eu-west-1"` for `vendor = aws` and `"eastus"` for Microsoft Azure vendor.).
* `vendor`: The cloud provider or vendor where the Inference Endpoint will be hosted (e.g. `"aws"`).
* `token`: The huggingface authentication token. It can be managed through [huggingface settings](https://huggingface.co/settings/tokens). The same token can be passed used while registering the Huggingface model deployer.
dudeperf3ct marked this conversation as resolved.
Show resolved Hide resolved
* `account_id`: (Optional) The account ID used to link a VPC to a private Inference Endpoint (if applicable).
* `min_replica`: (Optional) The minimum number of replicas (instances) to keep running for the Inference Endpoint. Defaults to `0`.
* `max_replica`: (Optional) The maximum number of replicas (instances) to scale to for the Inference Endpoint. Defaults to `1`.
* `revision`: (Optional) The specific model revision to deploy on the Inference Endpoint for the Hugging Face repository .
* `task`: Select a supported [Machine Learning Task](https://huggingface.co/docs/inference-endpoints/supported_tasks). (e.g. `"text-classification"`, `"text-generation"`)
* `custom_image`: (Optional) A custom Docker image to use for the Inference Endpoint.
* `namespace`: The namespace where the Inference Endpoint will be created. The same namespace can be passed used while registering the Huggingface model deployer.
dudeperf3ct marked this conversation as resolved.
Show resolved Hide resolved
* `endpoint_type`: (Optional) The type of the Inference Endpoint, which can be `"protected"`, `"public"` (default) or `"private"`.

For more information and a full list of configurable attributes of the Huggingface Model Deployer, check out
dudeperf3ct marked this conversation as resolved.
Show resolved Hide resolved
the [API Docs]().
dudeperf3ct marked this conversation as resolved.
Show resolved Hide resolved

### Run inference on a provisioned inference endpoint

The following code example shows how to run inference against provisioned inference endpoint:
dudeperf3ct marked this conversation as resolved.
Show resolved Hide resolved

```python
from typing import Annotated
from zenml import step, pipeline
from zenml.integrations.huggingface.model_deployers import HuggingFaceModelDeployer
from zenml.integrations.huggingface.services import HuggingFaceDeploymentService


# Load a prediction service deployed in another pipeline
@step(enable_cache=False)
def prediction_service_loader(
pipeline_name: str,
pipeline_step_name: str,
running: bool = True,
model_name: str = "default",
) -> HuggingFaceDeploymentService:
"""Get the prediction service started by the deployment pipeline.

Args:
pipeline_name: name of the pipeline that deployed the MLflow prediction
server
step_name: the name of the step that deployed the MLflow prediction
server
running: when this flag is set, the step only returns a running service
model_name: the name of the model that is deployed
"""
# get the Huggingface model deployer stack component
model_deployer = HuggingFaceModelDeployer.get_active_model_deployer()

# fetch existing services with same pipeline name, step name and model name
existing_services = model_deployer.find_model_server(
pipeline_name=pipeline_name,
pipeline_step_name=pipeline_step_name,
model_name=model_name,
running=running,
)

if not existing_services:
raise RuntimeError(
f"No Huggingface inference endpoint deployed by step "
f"'{pipeline_step_name}' in pipeline '{pipeline_name}' with name "
f"'{model_name}' is currently running."
)

return existing_services[0]


# Use the service for inference
@step
def predictor(
service: HuggingFaceDeploymentService,
data: str
) -> Annotated[str, "predictions"]:
"""Run a inference request against a prediction service"""

prediction = service.predict(data)
return prediction


@pipeline
def huggingface_deployment_inference_pipeline(
pipeline_name: str, pipeline_step_name: str = "huggingface_model_deployer_step",
):
inference_data = ...
model_deployment_service = prediction_service_loader(
pipeline_name=pipeline_name,
pipeline_step_name=pipeline_step_name,
)
predictions = predictor(model_deployment_service, inference_data)
```

For more information and a full list of configurable attributes of the Huggingface Model Deployer, check out
the [SDK Docs]().
dudeperf3ct marked this conversation as resolved.
Show resolved Hide resolved

<!-- For scarf -->
<figure><img alt="ZenML Scarf" referrerpolicy="no-referrer-when-downgrade" src="https://static.scarf.sh/a.png?x-pxid=f0b4f458-0a54-4fcd-aa95-d5ee424815bc" /></figure>
2 changes: 2 additions & 0 deletions docs/mocked_libs.json
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,8 @@
"great_expectations.types",
"hvac",
"hvac.exceptions",
"huggingface_hub",
"huggingface_hub.utils",
"kfp",
"kfp.compiler",
"kfp.v2",
Expand Down
1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -448,5 +448,6 @@ module = [
"mlstacks.*",
"matplotlib.*",
"IPython.*",
"huggingface_hub.*"
]
ignore_missing_imports = true
2 changes: 1 addition & 1 deletion src/zenml/integrations/azure/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ class AzureIntegration(Integration):

NAME = AZURE
REQUIREMENTS = [
"adlfs==2021.10.0",
"adlfs>=2021.10.0",
"azure-keyvault-keys",
"azure-keyvault-secrets",
"azure-identity==1.10.0",
Expand Down
2 changes: 1 addition & 1 deletion src/zenml/integrations/huggingface/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ class HuggingfaceIntegration(Integration):
"""Definition of Huggingface integration for ZenML."""

NAME = HUGGINGFACE
REQUIREMENTS = ["transformers<=4.31", "datasets", "huggingface_hub"]
REQUIREMENTS = ["transformers<=4.31", "datasets", "huggingface_hub>0.19.0"]

@classmethod
def activate(cls) -> None:
Expand Down
13 changes: 13 additions & 0 deletions src/zenml/integrations/huggingface/flavors/__init__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,16 @@
# Copyright (c) ZenML GmbH 2024. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at:
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
# or implied. See the License for the specific language governing
# permissions and limitations under the License.
"""Huggingface integration flavors."""
dudeperf3ct marked this conversation as resolved.
Show resolved Hide resolved

from zenml.integrations.huggingface.flavors.huggingface_model_deployer_flavor import ( # noqa
Expand Down
Original file line number Diff line number Diff line change
@@ -1,9 +1,21 @@
# Copyright (c) ZenML GmbH 2024. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at:
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
# or implied. See the License for the specific language governing
# permissions and limitations under the License.
"""Huggingface model deployer flavor."""
dudeperf3ct marked this conversation as resolved.
Show resolved Hide resolved
from typing import TYPE_CHECKING, Dict, Optional, Type
from typing import TYPE_CHECKING, Any, Dict, Optional, Type

from pydantic import BaseModel

from zenml.config.base_settings import BaseSettings
from zenml.integrations.huggingface import HUGGINGFACE_MODEL_DEPLOYER_FLAVOR
from zenml.model_deployers.base_model_deployer import (
BaseModelDeployerConfig,
Expand All @@ -20,7 +32,7 @@
class HuggingFaceBaseConfig(BaseModel):
"""Huggingface Inference Endpoint configuration."""

endpoint_name: Optional[str] = "zenml-"
endpoint_name: str = "zenml-"
repository: Optional[str] = None
framework: Optional[str] = None
accelerator: Optional[str] = None
Expand All @@ -30,21 +42,17 @@ class HuggingFaceBaseConfig(BaseModel):
vendor: Optional[str] = None
token: Optional[str] = None
account_id: Optional[str] = None
min_replica: Optional[int] = 0
max_replica: Optional[int] = 1
min_replica: int = 0
max_replica: int = 1
revision: Optional[str] = None
task: Optional[str] = None
custom_image: Optional[Dict] = None
custom_image: Optional[Dict[str, Any]] = None
namespace: Optional[str] = None
endpoint_type: str = "public"


class HuggingFaceModelDeployerSettings(HuggingFaceBaseConfig, BaseSettings):
"""Settings for the Huggingface model deployer."""


class HuggingFaceModelDeployerConfig(
BaseModelDeployerConfig, HuggingFaceModelDeployerSettings
BaseModelDeployerConfig, HuggingFaceBaseConfig
):
"""Configuration for the Huggingface model deployer.

Expand Down
14 changes: 14 additions & 0 deletions src/zenml/integrations/huggingface/model_deployers/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,18 @@
# Copyright (c) ZenML GmbH 2024. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at:
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
# or implied. See the License for the specific language governing
# permissions and limitations under the License.
"""Initialization of the Huggingface model deployers."""
dudeperf3ct marked this conversation as resolved.
Show resolved Hide resolved

from zenml.integrations.huggingface.model_deployers.huggingface_model_deployer import ( # noqa
HuggingFaceModelDeployer,
)
Expand Down
Loading
Loading