Skip to content

Deserializing Engine Version Mismatch #531

@LanceB57

Description

@LanceB57

System Info

  • GPU: NVIDIA H100
  • TensorRT-LLM v0.10.0
  • tensorrtllm_backend v0.10.0
  • tritonserver 24.03

Who can help?

@byshiue @sche

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

Follow this article on TensorRT-LLM performance using v0.10.0 to get a Llama3-8B-Instruct engine made. Then, try to start a Triton Inference Server in the Docker container nvcr.io/nvidia/tritonserver:24.03-trtllm-python-py3; I choose the 24.03 version as suggested here.

Expected behavior

The Triton Inference Server runs.

actual behavior

When the Triton Inference Server runs and tries to deserialize the engine, I get Serialization assertion stdVersionRead == kSERIALIZATION_VERSION failed.Version tag does not match. Note: Current Version: 228, Serialized Engine Version: 237.

additional notes

I know that this is the result of a version mismatch, but how do I resolve it? I thought that TensorRT-LLM matches with the 24.03 tritonserver containers because of the release notes from TensorRT-LLM v0.10.0, but it's not working.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions