Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Huggingface Model Deployer #2376

Merged
merged 48 commits into from
Mar 8, 2024

Conversation

dudeperf3ct
Copy link
Contributor

@dudeperf3ct dudeperf3ct commented Jan 30, 2024

Describe changes

I implemented ModelDeployer component to work with Huggingface repos.

Pre-requisites

Please ensure you have done the following:

  • I have read the CONTRIBUTING.md document.
  • If my change requires a change to docs, I have updated the documentation accordingly.
  • I have added tests to cover my changes.
  • I have based my new branch on develop and the open PR is targeting develop. If your branch wasn't based on develop read Contribution guide on rebasing branch to develop.
  • If my changes require changes to the dashboard, these changes are communicated/requested.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Other (add details above)

Summary by CodeRabbit

  • New Features
    • Introduced Huggingface integration for deploying machine learning models using Huggingface's infrastructure.
    • Added support for configuring and managing Huggingface inference endpoints.
    • New deployment functionality for Huggingface models, including creation, update, and stopping of model deployment services.
    • Implemented a pipeline step for continuous deployment with Huggingface Inference Endpoint.

Copy link
Contributor

coderabbitai bot commented Jan 30, 2024

Important

Auto Review Skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository.

To trigger a single review, invoke the @coderabbitai review command.

Walkthrough

The update introduces Huggingface integration into ZenML, allowing for deploying machine learning models using Huggingface's infrastructure. It adds necessary configurations, implements a model deployer with methods for managing deployments, and provides a deployment service for handling inference endpoints. This integration facilitates continuous deployment pipelines within ZenML, leveraging Huggingface's capabilities for model serving.

Changes

File(s) Change Summary
src/zenml/integrations/huggingface/__init__.py Introduced imports and constants for Huggingface integration, including model deployer flavor and service artifact constants. Updated requirements to include "huggingface_hub".
src/zenml/integrations/huggingface/flavors/... Added classes for Huggingface integration flavors, including model deployer configuration and base config.
src/zenml/integrations/huggingface/model_deployers/... Launched Huggingface Model Deployer with methods for deployment management.
src/zenml/integrations/huggingface/services/... Initiated Huggingface Service, implementing deployment and management functionality for Huggingface models.
src/zenml/integrations/huggingface/steps/... Implemented a deployment step for continuous deployment with Huggingface, including service configuration and deployment logic.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit-tests for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit tests for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit tests.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • The JSON schema for the configuration file is available here.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/coderabbit-overrides.v2.json

CodeRabbit Discord Community

Join our Discord Community to get help, request features, and share feedback.

@htahir1
Copy link
Contributor

htahir1 commented Jan 30, 2024

@CodeRabbit review

@htahir1
Copy link
Contributor

htahir1 commented Jan 30, 2024

@coderabbitai review

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 6

Configuration used: .coderabbit.yaml

Commits Files that changed from the base of the PR and between 9f31c6b and 0c1a1de.
Files selected for processing (9)
  • src/zenml/integrations/huggingface/init.py (1 hunks)
  • src/zenml/integrations/huggingface/flavors/init.py (1 hunks)
  • src/zenml/integrations/huggingface/flavors/huggingface_model_deployer_flavor.py (1 hunks)
  • src/zenml/integrations/huggingface/model_deployers/init.py (1 hunks)
  • src/zenml/integrations/huggingface/model_deployers/huggingface_model_deployer.py (1 hunks)
  • src/zenml/integrations/huggingface/services/init.py (1 hunks)
  • src/zenml/integrations/huggingface/services/huggingface_deployment.py (1 hunks)
  • src/zenml/integrations/huggingface/steps/init.py (1 hunks)
  • src/zenml/integrations/huggingface/steps/huggingface_deployer.py (1 hunks)
Files skipped from review due to trivial changes (1)
  • src/zenml/integrations/huggingface/model_deployers/init.py
Additional comments: 22
src/zenml/integrations/huggingface/steps/__init__.py (1)
  • 3-5: The import statement is correctly structured and follows Python best practices.
src/zenml/integrations/huggingface/services/__init__.py (1)
  • 3-6: The import statement is correctly structured, adhering to Python best practices. The use of # noqa suppresses linting errors for unused imports, which is acceptable in __init__.py files where the purpose is often to expose public interfaces.
src/zenml/integrations/huggingface/flavors/__init__.py (2)
  • 3-7: The import statement is correctly structured, adhering to Python best practices. The use of # noqa suppresses linting errors for unused imports, which is acceptable in __init__.py files where the purpose is often to expose public interfaces.
  • 9-13: The __all__ declaration is correctly used to define the public interface of the module. This is a good practice as it explicitly specifies which names are intended to be imported when from module import * is used.
src/zenml/integrations/huggingface/__init__.py (3)
  • 15-22: The constants HUGGINGFACE_MODEL_DEPLOYER_FLAVOR and HUGGINGFACE_SERVICE_ARTIFACT are correctly defined and follow Python naming conventions for constants. This enhances readability and maintainability of the code.
  • 29-29: The REQUIREMENTS list correctly specifies the dependencies for the Huggingface integration, including a specific version constraint for transformers. This is important for ensuring compatibility and preventing potential conflicts with other packages.
  • 37-48: The flavors method is correctly implemented to declare stack component flavors for the Huggingface integration. This method enhances the modularity and extensibility of the ZenML framework by allowing it to dynamically recognize and utilize different deployment flavors.
src/zenml/integrations/huggingface/steps/huggingface_deployer.py (5)
  • 20-25: The decorator @step(enable_cache=False) is correctly applied to the huggingface_model_deployer_step function, indicating that caching is disabled for this step. This is appropriate for deployment steps where the outcome might depend on external state or services that are not captured in the step's inputs.
  • 39-42: The use of cast to ensure the correct type of model_deployer is a good practice for type safety. This helps maintain the integrity of the code by ensuring that model_deployer is indeed an instance of HuggingFaceModelDeployer.
  • 51-54: Modifying the service_config with runtime information from the pipeline context is a good practice. It ensures that the deployment service is correctly associated with the specific pipeline run, enhancing traceability and manageability of deployed models.
  • 64-80: The logic to reuse the last model server if the deployment decision is negative and an existing model server is not running is sound. It ensures that a model server is available at all times, which is crucial for maintaining service availability.
  • 82-95: The logic to deploy or update a model based on the deploy_decision and the existence of previous deployments is correctly implemented. This approach allows for flexibility in managing deployments and ensures that the latest model version is served.
src/zenml/integrations/huggingface/flavors/huggingface_model_deployer_flavor.py (3)
  • 20-39: The HuggingFaceBaseConfig class is well-defined with optional attributes for configuring the Huggingface Inference Endpoint. Using Optional for these attributes provides flexibility in configuration, allowing users to specify only the necessary parameters for their deployment scenario.
  • 46-56: The HuggingFaceModelDeployerConfig class correctly extends BaseModelDeployerConfig and HuggingFaceModelDeployerSettings, combining general model deployer configuration with Huggingface-specific settings. The use of SecretField for the token attribute is a good practice for handling sensitive information securely.
  • 63-122: The HuggingFaceModelDeployerFlavor class is correctly implemented with properties that define the flavor's characteristics, such as name, docs_url, sdk_docs_url, logo_url, and config_class. This implementation follows best practices for defining stack component flavors in ZenML, enhancing the framework's extensibility.
src/zenml/integrations/huggingface/services/huggingface_deployment.py (4)
  • 27-34: The HuggingFaceServiceConfig class is well-defined, extending HuggingFaceBaseConfig and ServiceConfig. This design allows for a rich configuration specific to Huggingface services while maintaining compatibility with ZenML's service management framework.
  • 41-55: The HuggingFaceDeploymentService class correctly specifies its SERVICE_TYPE with relevant metadata. This is important for the ZenML service registry to correctly identify and manage instances of this service type.
  • 61-68: The constructor of HuggingFaceDeploymentService is correctly implemented, calling the superclass constructor with the provided configuration. This ensures that the service is correctly initialized with its specific configuration.
  • 104-124: The provision method correctly implements the logic to create or update a Huggingface inference endpoint. The use of .wait(timeout=POLLING_TIMEOUT) ensures that the method waits for the endpoint to be provisioned, enhancing reliability.
src/zenml/integrations/huggingface/model_deployers/huggingface_model_deployer.py (3)
  • 340-382: The _matches_search_criteria method correctly checks if an existing service matches the input criteria. This method enhances the flexibility of the find_model_server method by allowing partial matches based on provided criteria.
  • 384-418: The methods stop_model_server, start_model_server, and delete_model_server are correctly implemented to manage the lifecycle of model servers. These methods enhance the manageability of deployed models by providing straightforward mechanisms to start, stop, and delete model servers.
  • 443-457: The get_model_server_info method correctly returns implementation-specific information that might be relevant to the user. This method enhances the usability of the deployment service by providing easy access to important information such as the prediction URL.

Comment on lines 135 to 171
def check_status(self) -> Tuple[ServiceState, str]:
"""Check the the current operational state of the HuggingFace deployment.

Returns:
The operational state of the HuggingFace deployment and a message
providing additional information about that state (e.g. a
description of the error, if one is encountered).
"""
# TODO: Support all different InferenceEndpointStatus
try:
_ = self.hf_endpoint.status
except (InferenceEndpointError, HfHubHTTPError):
return (ServiceState.INACTIVE, "")

if self.hf_endpoint.status == InferenceEndpointStatus.RUNNING:
return (
ServiceState.ACTIVE,
"HuggingFace Inference Endpoint deployment is available",
)

if self.hf_endpoint.status == InferenceEndpointStatus.SCALED_TO_ZERO:
return (
ServiceState.ACTIVE,
"HuggingFace Inference Endpoint deployment is scaled to zero",
)

if self.hf_endpoint.status == InferenceEndpointStatus.FAILED:
return (
ServiceState.ERROR,
"HuggingFace Inference Endpoint deployment failed: ",
)

if self.hf_endpoint.status == InferenceEndpointStatus.PENDING:
return (
ServiceState.PENDING_STARTUP,
"HuggingFace Inference Endpoint deployment is being created: ",
)
Copy link
Contributor

@coderabbitai coderabbitai bot Jan 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check_status method is correctly implemented to check the operational state of the Huggingface deployment. However, the TODO comment indicates that support for all different InferenceEndpointStatus values is not yet implemented. This should be addressed to ensure comprehensive status reporting.

Consider implementing support for all InferenceEndpointStatus values to provide comprehensive status reporting for Huggingface deployments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for this I think we can probably implement the other statuses, no? Those all seem reasonable statuses to want to represent somehow. WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@strickvl: Yes, implementing support for all InferenceEndpointStatus values would indeed enhance the comprehensiveness of status reporting for Huggingface deployments. It's a good idea to represent all possible statuses to provide clearer insights into the deployment state. I recommend looking into the huggingface_hub documentation or source code to understand all available InferenceEndpointStatus values. Then, you can map each status to the most appropriate ServiceState and provide meaningful status messages accordingly. This approach will ensure that users have a clear understanding of the operational state of their deployments at any given time. If you need further assistance or have specific questions as you implement this, feel free to ask!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@safoinme do you have an opinion about this. proposal on the table would be:

class InferenceEndpointStatus(str, Enum):
    PENDING = "pending"  --> PENDING_STARTUP
    INITIALIZING = "initializing"  --> PENDING_STARTUP
    UPDATING = "updating"  --> ACTIVE
    UPDATE_FAILED = "updateFailed" --> ERROR
    RUNNING = "running"  --> ACTIVE
    PAUSED = "paused"   # not sure about this one as we don't handle this. Users can manually pause endpoint via UI
    FAILED = "failed"   ---> ERROR
    SCALED_TO_ZERO = "scaledToZero"  --> ACTIVE

but we'll reflect the status back to the user in the CLI and on the dashboard, so I think paused is a reasonable one to have, even if we're not manipulating that ourselves. I would consider adding 'updating' and 'paused' to the zenml status options. WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

Comment on lines 94 to 142
def _create_new_service(
self, timeout: int, config: HuggingFaceServiceConfig
) -> HuggingFaceDeploymentService:
"""Creates a new HuggingFaceDeploymentService.

Args:
timeout: the timeout in seconds to wait for the Huggingface inference endpoint
to be provisioned and successfully started or updated.
config: the configuration of the model to be deployed with Huggingface model deployer.

Returns:
The HuggingFaceServiceConfig object that can be used to interact
with the Huggingface inference endpoint.
"""
# create a new service for the new model
service = HuggingFaceDeploymentService(config)

# Use first 8 characters of UUID as artifact version
# Add same 8 characters as suffix to endpoint name
service_metadata = service.dict()
artifact_version = str(service_metadata["uuid"])[:8]

service.config.endpoint_name = self.modify_endpoint_name(
service.config.endpoint_name, artifact_version
)

logger.info(
f"Creating an artifact {HUGGINGFACE_SERVICE_ARTIFACT} with service instance attached as metadata."
" If there's an active pipeline and/or model this artifact will be associated with it."
)

service_metadata = service.dict()

save_artifact(
service,
HUGGINGFACE_SERVICE_ARTIFACT,
version=artifact_version,
is_deployment_artifact=True,
)
# UUID object is not json serializable
service_metadata["uuid"] = str(service_metadata["uuid"])
log_artifact_metadata(
artifact_name=HUGGINGFACE_SERVICE_ARTIFACT,
artifact_version=artifact_version,
metadata={HUGGINGFACE_SERVICE_ARTIFACT: service_metadata},
)

service.start(timeout=timeout)
return service
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _create_new_service method is well-implemented, covering the creation of a new HuggingFaceDeploymentService, including modifying the endpoint name, saving the service as an artifact, and starting the service. However, the method logs the creation of an artifact before actually creating it, which might be misleading if the artifact creation fails.

Consider moving the log statement after the artifact has been successfully saved to ensure accurate logging.

@avishniakov avishniakov marked this pull request as draft January 31, 2024 08:07
@avishniakov avishniakov marked this pull request as ready for review January 31, 2024 08:07
@strickvl strickvl added enhancement New feature or request dependencies Pull requests that update a dependency file labels Jan 31, 2024
Copy link
Contributor

@strickvl strickvl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice addition! I had some comments and nits etc, but the two big pieces to add would be docs updates and testing (where possible). I see, for example, that hf offers a way to emulate custom endpoints in the https://github.com/huggingface/hf-endpoints-emulator repo / package.

For docs, we'll need a new doc in the deployers section explaining exactly how to set this up on HF and how to use within ZenML.

Really excited about this one! Thanks for putting in the work!

Copy link
Contributor

@safoinme safoinme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for all the work and the effort you've put in this PR. It's definitely a great addition to our deployers. I've got a couple of small suggestions.
Can't wait to see this released!

@strickvl
Copy link
Contributor

strickvl commented Feb 5, 2024

@dudeperf3ct can you check the CI errors and update as appropriate? there's some linting issues and also some docstrings to be added as far as I can see.

@strickvl strickvl self-requested a review February 8, 2024 09:38
pyproject.toml Outdated Show resolved Hide resolved
@strickvl strickvl self-requested a review March 7, 2024 17:35
Copy link
Contributor

@safoinme safoinme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's :shipit:

@safoinme safoinme merged commit edebc51 into zenml-io:develop Mar 8, 2024
60 checks passed
adtygan pushed a commit to adtygan/zenml that referenced this pull request Mar 21, 2024
* Initial implementation of huggingface model deployer

* Add missing step init

* Simplify modify_endpoint_name function and fix docstrings

* Formatting logger

* Add License to new files

* Enhancements as per PR review comments

* Add logging message to catch KeyError

* Remove duplicate variable

* Reorder lines for clarity

* Add docs for huggingface model deployer

* Fix CI errors

* Fix get_model_info function arguments

* More CI fixes

* Add minimal supported version for Inference Endpoint API in huggingface_hub

* Relax 'adlfs' package requirement in azure integrations

* update TOC (zenml-io#2406)

* Relax 's3fs' version in s3 integration

* Bugs fixed running a test deployment pipeline

* Add deployment pipelines to huggingface integration test

* Remove not required check on service running in tests

* Address PR comments on documentation and suggested renaming in code

* Add partial test for huggingface_deployment

* Fix typo in test function

* Update pyproject.toml

This should allow the dependencies to resolve.

* Update pyproject.toml

* Relax gcfs

* Update model deployers table

* Fix lint issue

---------

Co-authored-by: Andrei Vishniakov <31008759+avishniakov@users.noreply.github.com>
Co-authored-by: Safoine El Khabich <34200873+safoinme@users.noreply.github.com>
Co-authored-by: Alex Strick van Linschoten <strickvl@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Pull requests that update a dependency file enhancement New feature or request run-slow-ci
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants