-
Notifications
You must be signed in to change notification settings - Fork 408
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Huggingface Model Deployer #2376
Huggingface Model Deployer #2376
Conversation
Important Auto Review SkippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the To trigger a single review, invoke the WalkthroughThe update introduces Huggingface integration into ZenML, allowing for deploying machine learning models using Huggingface's infrastructure. It adds necessary configurations, implements a model deployer with methods for managing deployments, and provides a deployment service for handling inference endpoints. This integration facilitates continuous deployment pipelines within ZenML, leveraging Huggingface's capabilities for model serving. Changes
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
@CodeRabbit review |
@coderabbitai review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 6
Configuration used: .coderabbit.yaml
Files selected for processing (9)
- src/zenml/integrations/huggingface/init.py (1 hunks)
- src/zenml/integrations/huggingface/flavors/init.py (1 hunks)
- src/zenml/integrations/huggingface/flavors/huggingface_model_deployer_flavor.py (1 hunks)
- src/zenml/integrations/huggingface/model_deployers/init.py (1 hunks)
- src/zenml/integrations/huggingface/model_deployers/huggingface_model_deployer.py (1 hunks)
- src/zenml/integrations/huggingface/services/init.py (1 hunks)
- src/zenml/integrations/huggingface/services/huggingface_deployment.py (1 hunks)
- src/zenml/integrations/huggingface/steps/init.py (1 hunks)
- src/zenml/integrations/huggingface/steps/huggingface_deployer.py (1 hunks)
Files skipped from review due to trivial changes (1)
- src/zenml/integrations/huggingface/model_deployers/init.py
Additional comments: 22
src/zenml/integrations/huggingface/steps/__init__.py (1)
- 3-5: The import statement is correctly structured and follows Python best practices.
src/zenml/integrations/huggingface/services/__init__.py (1)
- 3-6: The import statement is correctly structured, adhering to Python best practices. The use of
# noqa
suppresses linting errors for unused imports, which is acceptable in__init__.py
files where the purpose is often to expose public interfaces.src/zenml/integrations/huggingface/flavors/__init__.py (2)
- 3-7: The import statement is correctly structured, adhering to Python best practices. The use of
# noqa
suppresses linting errors for unused imports, which is acceptable in__init__.py
files where the purpose is often to expose public interfaces.- 9-13: The
__all__
declaration is correctly used to define the public interface of the module. This is a good practice as it explicitly specifies which names are intended to be imported whenfrom module import *
is used.src/zenml/integrations/huggingface/__init__.py (3)
- 15-22: The constants
HUGGINGFACE_MODEL_DEPLOYER_FLAVOR
andHUGGINGFACE_SERVICE_ARTIFACT
are correctly defined and follow Python naming conventions for constants. This enhances readability and maintainability of the code.- 29-29: The
REQUIREMENTS
list correctly specifies the dependencies for the Huggingface integration, including a specific version constraint fortransformers
. This is important for ensuring compatibility and preventing potential conflicts with other packages.- 37-48: The
flavors
method is correctly implemented to declare stack component flavors for the Huggingface integration. This method enhances the modularity and extensibility of the ZenML framework by allowing it to dynamically recognize and utilize different deployment flavors.src/zenml/integrations/huggingface/steps/huggingface_deployer.py (5)
- 20-25: The decorator
@step(enable_cache=False)
is correctly applied to thehuggingface_model_deployer_step
function, indicating that caching is disabled for this step. This is appropriate for deployment steps where the outcome might depend on external state or services that are not captured in the step's inputs.- 39-42: The use of
cast
to ensure the correct type ofmodel_deployer
is a good practice for type safety. This helps maintain the integrity of the code by ensuring thatmodel_deployer
is indeed an instance ofHuggingFaceModelDeployer
.- 51-54: Modifying the
service_config
with runtime information from the pipeline context is a good practice. It ensures that the deployment service is correctly associated with the specific pipeline run, enhancing traceability and manageability of deployed models.- 64-80: The logic to reuse the last model server if the deployment decision is negative and an existing model server is not running is sound. It ensures that a model server is available at all times, which is crucial for maintaining service availability.
- 82-95: The logic to deploy or update a model based on the
deploy_decision
and the existence of previous deployments is correctly implemented. This approach allows for flexibility in managing deployments and ensures that the latest model version is served.src/zenml/integrations/huggingface/flavors/huggingface_model_deployer_flavor.py (3)
- 20-39: The
HuggingFaceBaseConfig
class is well-defined with optional attributes for configuring the Huggingface Inference Endpoint. UsingOptional
for these attributes provides flexibility in configuration, allowing users to specify only the necessary parameters for their deployment scenario.- 46-56: The
HuggingFaceModelDeployerConfig
class correctly extendsBaseModelDeployerConfig
andHuggingFaceModelDeployerSettings
, combining general model deployer configuration with Huggingface-specific settings. The use ofSecretField
for thetoken
attribute is a good practice for handling sensitive information securely.- 63-122: The
HuggingFaceModelDeployerFlavor
class is correctly implemented with properties that define the flavor's characteristics, such asname
,docs_url
,sdk_docs_url
,logo_url
, andconfig_class
. This implementation follows best practices for defining stack component flavors in ZenML, enhancing the framework's extensibility.src/zenml/integrations/huggingface/services/huggingface_deployment.py (4)
- 27-34: The
HuggingFaceServiceConfig
class is well-defined, extendingHuggingFaceBaseConfig
andServiceConfig
. This design allows for a rich configuration specific to Huggingface services while maintaining compatibility with ZenML's service management framework.- 41-55: The
HuggingFaceDeploymentService
class correctly specifies itsSERVICE_TYPE
with relevant metadata. This is important for the ZenML service registry to correctly identify and manage instances of this service type.- 61-68: The constructor of
HuggingFaceDeploymentService
is correctly implemented, calling the superclass constructor with the provided configuration. This ensures that the service is correctly initialized with its specific configuration.- 104-124: The
provision
method correctly implements the logic to create or update a Huggingface inference endpoint. The use of.wait(timeout=POLLING_TIMEOUT)
ensures that the method waits for the endpoint to be provisioned, enhancing reliability.src/zenml/integrations/huggingface/model_deployers/huggingface_model_deployer.py (3)
- 340-382: The
_matches_search_criteria
method correctly checks if an existing service matches the input criteria. This method enhances the flexibility of thefind_model_server
method by allowing partial matches based on provided criteria.- 384-418: The methods
stop_model_server
,start_model_server
, anddelete_model_server
are correctly implemented to manage the lifecycle of model servers. These methods enhance the manageability of deployed models by providing straightforward mechanisms to start, stop, and delete model servers.- 443-457: The
get_model_server_info
method correctly returns implementation-specific information that might be relevant to the user. This method enhances the usability of the deployment service by providing easy access to important information such as the prediction URL.
def check_status(self) -> Tuple[ServiceState, str]: | ||
"""Check the the current operational state of the HuggingFace deployment. | ||
|
||
Returns: | ||
The operational state of the HuggingFace deployment and a message | ||
providing additional information about that state (e.g. a | ||
description of the error, if one is encountered). | ||
""" | ||
# TODO: Support all different InferenceEndpointStatus | ||
try: | ||
_ = self.hf_endpoint.status | ||
except (InferenceEndpointError, HfHubHTTPError): | ||
return (ServiceState.INACTIVE, "") | ||
|
||
if self.hf_endpoint.status == InferenceEndpointStatus.RUNNING: | ||
return ( | ||
ServiceState.ACTIVE, | ||
"HuggingFace Inference Endpoint deployment is available", | ||
) | ||
|
||
if self.hf_endpoint.status == InferenceEndpointStatus.SCALED_TO_ZERO: | ||
return ( | ||
ServiceState.ACTIVE, | ||
"HuggingFace Inference Endpoint deployment is scaled to zero", | ||
) | ||
|
||
if self.hf_endpoint.status == InferenceEndpointStatus.FAILED: | ||
return ( | ||
ServiceState.ERROR, | ||
"HuggingFace Inference Endpoint deployment failed: ", | ||
) | ||
|
||
if self.hf_endpoint.status == InferenceEndpointStatus.PENDING: | ||
return ( | ||
ServiceState.PENDING_STARTUP, | ||
"HuggingFace Inference Endpoint deployment is being created: ", | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check_status
method is correctly implemented to check the operational state of the Huggingface deployment. However, the TODO comment indicates that support for all different InferenceEndpointStatus
values is not yet implemented. This should be addressed to ensure comprehensive status reporting.
Consider implementing support for all InferenceEndpointStatus
values to provide comprehensive status reporting for Huggingface deployments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for this I think we can probably implement the other statuses, no? Those all seem reasonable statuses to want to represent somehow. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@strickvl: Yes, implementing support for all InferenceEndpointStatus
values would indeed enhance the comprehensiveness of status reporting for Huggingface deployments. It's a good idea to represent all possible statuses to provide clearer insights into the deployment state. I recommend looking into the huggingface_hub
documentation or source code to understand all available InferenceEndpointStatus
values. Then, you can map each status to the most appropriate ServiceState
and provide meaningful status messages accordingly. This approach will ensure that users have a clear understanding of the operational state of their deployments at any given time. If you need further assistance or have specific questions as you implement this, feel free to ask!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@safoinme do you have an opinion about this. proposal on the table would be:
class InferenceEndpointStatus(str, Enum):
PENDING = "pending" --> PENDING_STARTUP
INITIALIZING = "initializing" --> PENDING_STARTUP
UPDATING = "updating" --> ACTIVE
UPDATE_FAILED = "updateFailed" --> ERROR
RUNNING = "running" --> ACTIVE
PAUSED = "paused" # not sure about this one as we don't handle this. Users can manually pause endpoint via UI
FAILED = "failed" ---> ERROR
SCALED_TO_ZERO = "scaledToZero" --> ACTIVE
but we'll reflect the status back to the user in the CLI and on the dashboard, so I think paused is a reasonable one to have, even if we're not manipulating that ourselves. I would consider adding 'updating' and 'paused' to the zenml status options. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!
src/zenml/integrations/huggingface/services/huggingface_deployment.py
Outdated
Show resolved
Hide resolved
src/zenml/integrations/huggingface/model_deployers/huggingface_model_deployer.py
Outdated
Show resolved
Hide resolved
def _create_new_service( | ||
self, timeout: int, config: HuggingFaceServiceConfig | ||
) -> HuggingFaceDeploymentService: | ||
"""Creates a new HuggingFaceDeploymentService. | ||
|
||
Args: | ||
timeout: the timeout in seconds to wait for the Huggingface inference endpoint | ||
to be provisioned and successfully started or updated. | ||
config: the configuration of the model to be deployed with Huggingface model deployer. | ||
|
||
Returns: | ||
The HuggingFaceServiceConfig object that can be used to interact | ||
with the Huggingface inference endpoint. | ||
""" | ||
# create a new service for the new model | ||
service = HuggingFaceDeploymentService(config) | ||
|
||
# Use first 8 characters of UUID as artifact version | ||
# Add same 8 characters as suffix to endpoint name | ||
service_metadata = service.dict() | ||
artifact_version = str(service_metadata["uuid"])[:8] | ||
|
||
service.config.endpoint_name = self.modify_endpoint_name( | ||
service.config.endpoint_name, artifact_version | ||
) | ||
|
||
logger.info( | ||
f"Creating an artifact {HUGGINGFACE_SERVICE_ARTIFACT} with service instance attached as metadata." | ||
" If there's an active pipeline and/or model this artifact will be associated with it." | ||
) | ||
|
||
service_metadata = service.dict() | ||
|
||
save_artifact( | ||
service, | ||
HUGGINGFACE_SERVICE_ARTIFACT, | ||
version=artifact_version, | ||
is_deployment_artifact=True, | ||
) | ||
# UUID object is not json serializable | ||
service_metadata["uuid"] = str(service_metadata["uuid"]) | ||
log_artifact_metadata( | ||
artifact_name=HUGGINGFACE_SERVICE_ARTIFACT, | ||
artifact_version=artifact_version, | ||
metadata={HUGGINGFACE_SERVICE_ARTIFACT: service_metadata}, | ||
) | ||
|
||
service.start(timeout=timeout) | ||
return service |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _create_new_service
method is well-implemented, covering the creation of a new HuggingFaceDeploymentService, including modifying the endpoint name, saving the service as an artifact, and starting the service. However, the method logs the creation of an artifact before actually creating it, which might be misleading if the artifact creation fails.
Consider moving the log statement after the artifact has been successfully saved to ensure accurate logging.
src/zenml/integrations/huggingface/model_deployers/huggingface_model_deployer.py
Outdated
Show resolved
Hide resolved
src/zenml/integrations/huggingface/model_deployers/huggingface_model_deployer.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really nice addition! I had some comments and nits etc, but the two big pieces to add would be docs updates and testing (where possible). I see, for example, that hf offers a way to emulate custom endpoints in the https://github.com/huggingface/hf-endpoints-emulator repo / package.
For docs, we'll need a new doc in the deployers section explaining exactly how to set this up on HF and how to use within ZenML.
Really excited about this one! Thanks for putting in the work!
src/zenml/integrations/huggingface/flavors/huggingface_model_deployer_flavor.py
Outdated
Show resolved
Hide resolved
src/zenml/integrations/huggingface/model_deployers/huggingface_model_deployer.py
Outdated
Show resolved
Hide resolved
src/zenml/integrations/huggingface/steps/huggingface_deployer.py
Outdated
Show resolved
Hide resolved
src/zenml/integrations/huggingface/services/huggingface_deployment.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for all the work and the effort you've put in this PR. It's definitely a great addition to our deployers. I've got a couple of small suggestions.
Can't wait to see this released!
src/zenml/integrations/huggingface/model_deployers/huggingface_model_deployer.py
Outdated
Show resolved
Hide resolved
src/zenml/integrations/huggingface/model_deployers/huggingface_model_deployer.py
Outdated
Show resolved
Hide resolved
src/zenml/integrations/huggingface/model_deployers/huggingface_model_deployer.py
Outdated
Show resolved
Hide resolved
src/zenml/integrations/huggingface/model_deployers/huggingface_model_deployer.py
Outdated
Show resolved
Hide resolved
src/zenml/integrations/huggingface/model_deployers/huggingface_model_deployer.py
Show resolved
Hide resolved
src/zenml/integrations/huggingface/services/huggingface_deployment.py
Outdated
Show resolved
Hide resolved
@dudeperf3ct can you check the CI errors and update as appropriate? there's some linting issues and also some docstrings to be added as far as I can see. |
docs/book/stacks-and-components/component-guide/model-deployers/huggingface.md
Show resolved
Hide resolved
docs/book/stacks-and-components/component-guide/model-deployers/huggingface.md
Outdated
Show resolved
Hide resolved
docs/book/stacks-and-components/component-guide/model-deployers/huggingface.md
Outdated
Show resolved
Hide resolved
docs/book/stacks-and-components/component-guide/model-deployers/huggingface.md
Outdated
Show resolved
Hide resolved
docs/book/stacks-and-components/component-guide/model-deployers/huggingface.md
Outdated
Show resolved
Hide resolved
src/zenml/integrations/huggingface/model_deployers/huggingface_model_deployer.py
Outdated
Show resolved
Hide resolved
src/zenml/integrations/huggingface/services/huggingface_deployment.py
Outdated
Show resolved
Hide resolved
src/zenml/integrations/huggingface/services/huggingface_deployment.py
Outdated
Show resolved
Hide resolved
src/zenml/integrations/huggingface/services/huggingface_deployment.py
Outdated
Show resolved
Hide resolved
This should allow the dependencies to resolve.
b500b29
to
6eb5b64
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's
* Initial implementation of huggingface model deployer * Add missing step init * Simplify modify_endpoint_name function and fix docstrings * Formatting logger * Add License to new files * Enhancements as per PR review comments * Add logging message to catch KeyError * Remove duplicate variable * Reorder lines for clarity * Add docs for huggingface model deployer * Fix CI errors * Fix get_model_info function arguments * More CI fixes * Add minimal supported version for Inference Endpoint API in huggingface_hub * Relax 'adlfs' package requirement in azure integrations * update TOC (zenml-io#2406) * Relax 's3fs' version in s3 integration * Bugs fixed running a test deployment pipeline * Add deployment pipelines to huggingface integration test * Remove not required check on service running in tests * Address PR comments on documentation and suggested renaming in code * Add partial test for huggingface_deployment * Fix typo in test function * Update pyproject.toml This should allow the dependencies to resolve. * Update pyproject.toml * Relax gcfs * Update model deployers table * Fix lint issue --------- Co-authored-by: Andrei Vishniakov <31008759+avishniakov@users.noreply.github.com> Co-authored-by: Safoine El Khabich <34200873+safoinme@users.noreply.github.com> Co-authored-by: Alex Strick van Linschoten <strickvl@users.noreply.github.com>
Describe changes
I implemented ModelDeployer component to work with Huggingface repos.
Pre-requisites
Please ensure you have done the following:
develop
and the open PR is targetingdevelop
. If your branch wasn't based on develop read Contribution guide on rebasing branch to develop.Types of changes
Summary by CodeRabbit