Skip to content

0.0.7

Latest
Compare
Choose a tag to compare
@rmccorm4 rmccorm4 released this 08 May 02:58
· 6 commits to main since this release
8c491be

What's Changed

  • Sync with Triton 24.04
  • Bump TRT-LLM version to 0.9.0
  • Add support for llama-2-7b-chat, llama-3-8b, and llama-3-8b-instruct for both vLLM and TRT-LLM
  • Improve error checking and error messages of building TRT-LLM engines
  • Log the underlying convert_checkpoint.py and trtllm-build commands for reproducibility/visibility
  • Don't call convert_checkpoint.py if converted weights are already found
  • Call convert_checkpoint.py via subprocess to improve total memory usage
  • Attempt to cleanup failed trtllm models in model repository if import or engine building fails, rather than leaving the model repository in an unfinished state.
  • Update tests to wait for both HTTP and GRPC server endpoints to be ready before testing
    • Fixes intermittent ConnectionRefusedError in CI tests

Full Changelog: 0.0.6...0.0.7