-
Notifications
You must be signed in to change notification settings - Fork 132
Description
The model is vicuna_13b. I use build.py to generate a 2-gpu model.
E1025 09:17:53.664847 126 backend_model.cc:553] ERROR: Failed to create instance: unexpected error when creating modelInstanceState: TrtGptModelInflightBatching requires GPT attention plugin with packed input and paged KV cache.
E1025 09:17:53.664952 126 model_lifecycle.cc:622] failed to load 'tensorrt_llm' version 1: Internal: unexpected error when creating modelInstanceState: TrtGptModelInflightBatching requires GPT attention plugin with packed input and paged KV cache.
I1025 09:17:53.664979 126 model_lifecycle.cc:757] failed to load 'tensorrt_llm'
E1025 09:17:53.665119 126 model_repository_manager.cc:563] Invalid argument: ensemble 'ensemble' depends on 'tensorrt_llm' which has no loaded version. Model 'tensorrt_llm' loading failed with error: version 1 is at UNAVAILABLE state: Internal: unexpected error when creating modelInstanceState: TrtGptModelInflightBatching requires GPT attention plugin with packed input and paged KV cache.;
I1025 09:17:53.665245 126 server.cc:604]