Text Generation Inference (TGI) is a Rust, Python and gRPC server for text generation inference.
This notebook shows how to deploy bigscience/bloom-7b1, an open-access Multilingual language model, to an Amazon SageMaker real-time endpoint with TGI backend.
For a list of optimized architectures for hosting with TGI can be found here