Skip to content

Implement Serverside batching for Async API #2065

Closed
@vishalbollu

Description

@vishalbollu

Currently the AsyncAPI workers poll for a single message from the SQS queue and invoke the predict function once per message. Expose configuration in the AsyncAPI spec to configure workers to retrieve multiple messages in a single SQS poll request.

Proposed solution

Update the AsyncAPI configuration to allow the specification of server side batching keys. The response from predictor should be validated to be a list. We can validate that the returned list has the same number of elements and assume that the return list elements are in the same order as the input to match the request id with the response.

# api.yaml
- name: my-api
  kind: AsyncAPI
  predictor:
    ...
    server_side_batching:  # (optional)
      max_batch_size: <int>  # the maximum number of requests to aggregate before running inference
      batch_interval: <duration>  # the maximum amount of time to spend waiting for additional requests before running inference on the batch of requests

The predictor needs to respond with a list

Questions

  • Verify that sqs_client.recieve_message waits for the specified duration if there aren't enough messages in the queue to satisfy the max_batch_size

Metadata

Metadata

Assignees

No one assigned

    Labels

    AsyncAPISomething related to the AsyncAPI kindenhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions