A fault-tolerant LLM routing system that decouples inference from AWS Bedrock by routing prefill and decode tasks through SQS and ensuring zero-downtime scaling with a graceful drain sidecar.
aws bedrock disaggregation disaggregate disaggregated-serving disaggregated-prefill-decode disaggregated-inference
-
Updated
Mar 24, 2026 - Python