zenml-io · safoinme · Mar 8, 2024 · Jan 30, 2024 · Jan 30, 2024 · Jan 31, 2024
diff --git a/docs/book/stacks-and-components/component-guide/model-deployers/huggingface.md b/docs/book/stacks-and-components/component-guide/model-deployers/huggingface.md
@@ -0,0 +1,153 @@
+---
+description: Deploying models to Huggingface Inference Endpoints with Hugging Face :hugging_face:.
+---
+
+# Hugging Face :hugging_face:
+
+Hugging Face Inference Endpoints provides a secure production solution to easily deploy any `transformers`, `sentence-transformers`, and `diffusers` models on a dedicated and autoscaling infrastructure managed by Hugging Face. An Inference Endpoint is built from a model from the [Hub](https://huggingface.co/models).
+
+This service provides dedicated and autoscaling infrastructure managed by Hugging Face, allowing you to deploy models without dealing with containers and GPUs.
+
+## When to use it?
+
+You should use Hugging Face Model Deployer:
+
+* if you want to deploy [Transformers, Sentence-Transformers, or Diffusion models](https://huggingface.co/docs/inference-endpoints/supported_tasks) on dedicated and secure infrastructure.
+* if you prefer a fully-managed production solution for inference without the need to handle containers and GPUs.
+* if your goal is to turn your models into production-ready APIs with minimal infrastructure or MLOps involvement * Cost-effectiveness is crucial, and you want to pay only for the raw compute resources you use.
+* Enterprise security is a priority, and you need to deploy models into secure offline endpoints accessible only via a direct connection to your Virtual Private Cloud (VPCs).
+
+If you are looking for a more easy way to deploy your models locally, you can use the [MLflow Model Deployer](mlflow.md) flavor.
+
+## How to deploy it?
+
+The Huggingface Model Deployer flavor is provided by the Huggingface ZenML integration, so you need to install it on your local machine to be able to deploy your models. You can do this by running the following command:
+
+```bash
+zenml integration install huggingface -y
+```
+
+To register the Huggingface model deployer with ZenML you need to run the following command:
+
+```bash
+zenml model-deployer register <MODEL_DEPLOYER_NAME> --flavor=huggingface --token=<YOUR_HF_TOKEN> --namespace=<YOUR_HF_NAMESPACE>
+```
+
+Here,
+
+* `token` parameter is the huggingface authentication token. It can be managed through [huggingface settings](https://huggingface.co/settings/tokens).
+* `namespace` parameter is used for listing and creating the inference endpoints. It can take any of the following values, username or organization name or `*` depending on where inference endpoint should be created.
+
+We can now use the model deployer in our stack.
+
+```bash
+zenml stack update <CUSTOM_STACK_NAME> --model-deployer=<MODEL_DEPLOYER_NAME>
+```
+
+See the [huggingface_model_deployer_step](https://sdkdocs.zenml.io/latest/integration_code_docs/integrations-seldon/#zenml.integrations.huggingface.steps.huggingface_deployer.huggingface_model_deployer_step) for an example of using the Huggingface Model Deployer to deploy a model inside a ZenML pipeline step.
+
+## Configuration
+
+Within the `HuggingFaceServiceConfig` you can configure:
+
+* `model_name`: the name of the model in ZenML.
+* `endpoint_name`: the name of inference endpoint. We add a prefix `zenml-` and first 8 characters of service uuid as suffix to the endpoint name.
+* `repository`: The repository name in the user’s namespace (`{username}/{model_id}`) or in the organization namespace (`{organization}/{model_id}`) from the Hugging Face hub.
+* `framework`: The machine learning framework used for the model (e.g. `"custom"`, `"pytorch"` )
+* `accelerator`: The hardware accelerator to be used for inference. (e.g. `"cpu"`, `"gpu"`)
+* `instance_size`: The size of the instance to be used for hosting the model (e.g. `"large"`, `"xxlarge"`)
+* `instance_type`: Inference Endpoints offers a selection of curated CPU and GPU instances. (e.g. `"c6i"`, `"g5.12xlarge"`)
+* `region`: The cloud region in which the Inference Endpoint will be created (e.g. `"us-east-1"`, `"eu-west-1"` for `vendor = aws` and `"eastus"` for Microsoft Azure vendor.).
+* `vendor`: The cloud provider or vendor where the Inference Endpoint will be hosted (e.g. `"aws"`).
+* `token`: The huggingface authentication token. It can be managed through [huggingface settings](https://huggingface.co/settings/tokens). The same token can be passed used while registering the Huggingface model deployer.
+* `account_id`: (Optional) The account ID used to link a VPC to a private Inference Endpoint (if applicable).
+* `min_replica`: (Optional) The minimum number of replicas (instances) to keep running for the Inference Endpoint. Defaults to `0`.
+* `max_replica`: (Optional) The maximum number of replicas (instances) to scale to for the Inference Endpoint. Defaults to `1`.
+* `revision`: (Optional) The specific model revision to deploy on the Inference Endpoint for the Hugging Face repository .
+* `task`: Select a supported [Machine Learning Task](https://huggingface.co/docs/inference-endpoints/supported_tasks). (e.g. `"text-classification"`, `"text-generation"`)
+* `custom_image`: (Optional) A custom Docker image to use for the Inference Endpoint.
+* `namespace`: The namespace where the Inference Endpoint will be created. The same namespace can be passed used while registering the Huggingface model deployer.
+* `endpoint_type`: (Optional) The type of the Inference Endpoint, which can be `"protected"`, `"public"` (default) or `"private"`.
+
+For more information and a full list of configurable attributes of the Huggingface Model Deployer, check out
+the [API Docs]().
+
+### Run inference on a provisioned inference endpoint
+
+The following code example shows how to run inference against provisioned inference endpoint:
+
+```python
+from typing import Annotated
+from zenml import step, pipeline
+from zenml.integrations.huggingface.model_deployers import HuggingFaceModelDeployer
+from zenml.integrations.huggingface.services import HuggingFaceDeploymentService
+
+
+# Load a prediction service deployed in another pipeline
+@step(enable_cache=False)
+def prediction_service_loader(
+    pipeline_name: str,
+    pipeline_step_name: str,
+    running: bool = True,
+    model_name: str = "default",
+) -> HuggingFaceDeploymentService:
+    """Get the prediction service started by the deployment pipeline.
+
+    Args:
+        pipeline_name: name of the pipeline that deployed the MLflow prediction
+            server
+        step_name: the name of the step that deployed the MLflow prediction
+            server
+        running: when this flag is set, the step only returns a running service
+        model_name: the name of the model that is deployed
+    """
+    # get the Huggingface model deployer stack component
+    model_deployer = HuggingFaceModelDeployer.get_active_model_deployer()
+
+    # fetch existing services with same pipeline name, step name and model name
+    existing_services = model_deployer.find_model_server(
+        pipeline_name=pipeline_name,
+        pipeline_step_name=pipeline_step_name,
+        model_name=model_name,
+        running=running,
+    )
+
+    if not existing_services:
+        raise RuntimeError(
+            f"No Huggingface inference endpoint deployed by step "
+            f"'{pipeline_step_name}' in pipeline '{pipeline_name}' with name "
+            f"'{model_name}' is currently running."
+        )
+
+    return existing_services[0]
+
+
+# Use the service for inference
+@step
+def predictor(
+    service: HuggingFaceDeploymentService,
+    data: str
+) -> Annotated[str, "predictions"]:
+    """Run a inference request against a prediction service"""
+
+    prediction = service.predict(data)
+    return prediction
+
+
+@pipeline
+def huggingface_deployment_inference_pipeline(
+    pipeline_name: str, pipeline_step_name: str = "huggingface_model_deployer_step",
+):
+    inference_data = ...
+    model_deployment_service = prediction_service_loader(
+        pipeline_name=pipeline_name,
+        pipeline_step_name=pipeline_step_name,
+    )
+    predictions = predictor(model_deployment_service, inference_data)
+```
+
+For more information and a full list of configurable attributes of the Huggingface Model Deployer, check out
+the [SDK Docs]().
+
+<!-- For scarf -->
+<figure><img alt="ZenML Scarf" referrerpolicy="no-referrer-when-downgrade" src="https://static.scarf.sh/a.png?x-pxid=f0b4f458-0a54-4fcd-aa95-d5ee424815bc" /></figure>
diff --git a/docs/mocked_libs.json b/docs/mocked_libs.json
@@ -106,6 +106,8 @@
     "great_expectations.types",
     "hvac",
     "hvac.exceptions",
+    "huggingface_hub",
+    "huggingface_hub.utils",
     "kfp",
     "kfp.compiler",
     "kfp.v2",

diff --git a/pyproject.toml b/pyproject.toml
@@ -448,5 +448,6 @@ module = [
     "mlstacks.*",
     "matplotlib.*",
     "IPython.*",
+    "huggingface_hub.*"
 ]
 ignore_missing_imports = true
diff --git a/src/zenml/integrations/azure/__init__.py b/src/zenml/integrations/azure/__init__.py
@@ -39,7 +39,7 @@ class AzureIntegration(Integration):
 
     NAME = AZURE
     REQUIREMENTS = [
-        "adlfs==2021.10.0",
+        "adlfs>=2021.10.0",
         "azure-keyvault-keys",
         "azure-keyvault-secrets",
         "azure-identity==1.10.0",

diff --git a/src/zenml/integrations/huggingface/__init__.py b/src/zenml/integrations/huggingface/__init__.py
@@ -26,7 +26,7 @@ class HuggingfaceIntegration(Integration):
     """Definition of Huggingface integration for ZenML."""
 
     NAME = HUGGINGFACE
-    REQUIREMENTS = ["transformers<=4.31", "datasets", "huggingface_hub"]
+    REQUIREMENTS = ["transformers<=4.31", "datasets", "huggingface_hub>0.19.0"]
 
     @classmethod
     def activate(cls) -> None:

diff --git a/src/zenml/integrations/huggingface/flavors/__init__.py b/src/zenml/integrations/huggingface/flavors/__init__.py
@@ -1,3 +1,16 @@
+#  Copyright (c) ZenML GmbH 2024. All Rights Reserved.
+#
+#  Licensed under the Apache License, Version 2.0 (the "License");
+#  you may not use this file except in compliance with the License.
+#  You may obtain a copy of the License at:
+#
+#       https://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
+#  or implied. See the License for the specific language governing
+#  permissions and limitations under the License.
 """Huggingface integration flavors."""
 
 from zenml.integrations.huggingface.flavors.huggingface_model_deployer_flavor import (  # noqa

diff --git a/src/zenml/integrations/huggingface/flavors/huggingface_model_deployer_flavor.py b/src/zenml/integrations/huggingface/flavors/huggingface_model_deployer_flavor.py
@@ -1,9 +1,21 @@
+#  Copyright (c) ZenML GmbH 2024. All Rights Reserved.
+#
+#  Licensed under the Apache License, Version 2.0 (the "License");
+#  you may not use this file except in compliance with the License.
+#  You may obtain a copy of the License at:
+#
+#       https://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
+#  or implied. See the License for the specific language governing
+#  permissions and limitations under the License.
 """Huggingface model deployer flavor."""
-from typing import TYPE_CHECKING, Dict, Optional, Type
+from typing import TYPE_CHECKING, Any, Dict, Optional, Type
 
 from pydantic import BaseModel
 
-from zenml.config.base_settings import BaseSettings
 from zenml.integrations.huggingface import HUGGINGFACE_MODEL_DEPLOYER_FLAVOR
 from zenml.model_deployers.base_model_deployer import (
     BaseModelDeployerConfig,
@@ -20,7 +32,7 @@
 class HuggingFaceBaseConfig(BaseModel):
     """Huggingface Inference Endpoint configuration."""
 
-    endpoint_name: Optional[str] = "zenml-"
+    endpoint_name: str = "zenml-"
     repository: Optional[str] = None
     framework: Optional[str] = None
     accelerator: Optional[str] = None
@@ -30,21 +42,17 @@ class HuggingFaceBaseConfig(BaseModel):
     vendor: Optional[str] = None
     token: Optional[str] = None
     account_id: Optional[str] = None
-    min_replica: Optional[int] = 0
-    max_replica: Optional[int] = 1
+    min_replica: int = 0
+    max_replica: int = 1
     revision: Optional[str] = None
     task: Optional[str] = None
-    custom_image: Optional[Dict] = None
+    custom_image: Optional[Dict[str, Any]] = None
     namespace: Optional[str] = None
     endpoint_type: str = "public"
 
 
-class HuggingFaceModelDeployerSettings(HuggingFaceBaseConfig, BaseSettings):
-    """Settings for the Huggingface model deployer."""
-
-
 class HuggingFaceModelDeployerConfig(
-    BaseModelDeployerConfig, HuggingFaceModelDeployerSettings
+    BaseModelDeployerConfig, HuggingFaceBaseConfig
 ):
     """Configuration for the Huggingface model deployer.
 

diff --git a/src/zenml/integrations/huggingface/model_deployers/__init__.py b/src/zenml/integrations/huggingface/model_deployers/__init__.py
@@ -1,4 +1,18 @@
+#  Copyright (c) ZenML GmbH 2024. All Rights Reserved.
+#
+#  Licensed under the Apache License, Version 2.0 (the "License");
+#  you may not use this file except in compliance with the License.
+#  You may obtain a copy of the License at:
+#
+#       https://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
+#  or implied. See the License for the specific language governing
+#  permissions and limitations under the License.
 """Initialization of the Huggingface model deployers."""
+
 from zenml.integrations.huggingface.model_deployers.huggingface_model_deployer import (  # noqa
     HuggingFaceModelDeployer,
 )