Closed
Description
Currently the AsyncAPI workers poll for a single message from the SQS queue and invoke the predict function once per message. Expose configuration in the AsyncAPI spec to configure workers to retrieve multiple messages in a single SQS poll request.
Proposed solution
Update the AsyncAPI configuration to allow the specification of server side batching keys. The response from predictor should be validated to be a list. We can validate that the returned list has the same number of elements and assume that the return list elements are in the same order as the input to match the request id with the response.
# api.yaml
- name: my-api
kind: AsyncAPI
predictor:
...
server_side_batching: # (optional)
max_batch_size: <int> # the maximum number of requests to aggregate before running inference
batch_interval: <duration> # the maximum amount of time to spend waiting for additional requests before running inference on the batch of requests
The predictor needs to respond with a list
Questions
- Verify that sqs_client.recieve_message waits for the specified duration if there aren't enough messages in the queue to satisfy the
max_batch_size