How to load llama. I tried, but I got an error

The model is vicuna_13b. I use build.py to generate a 2-gpu model.

E1025 09:17:53.664847 126 backend_model.cc:553] ERROR: Failed to create instance: unexpected error when creating modelInstanceState: TrtGptModelInflightBatching requires GPT attention plugin with packed input and paged KV cache.
E1025 09:17:53.664952 126 model_lifecycle.cc:622] failed to load 'tensorrt_llm' version 1: Internal: unexpected error when creating modelInstanceState: TrtGptModelInflightBatching requires GPT attention plugin with packed input and paged KV cache.
I1025 09:17:53.664979 126 model_lifecycle.cc:757] failed to load 'tensorrt_llm'
E1025 09:17:53.665119 126 model_repository_manager.cc:563] Invalid argument: ensemble 'ensemble' depends on 'tensorrt_llm' which has no loaded version. Model 'tensorrt_llm' loading failed with error: version 1 is at UNAVAILABLE state: Internal: unexpected error when creating modelInstanceState: TrtGptModelInflightBatching requires GPT attention plugin with packed input and paged KV cache.;
I1025 09:17:53.665245 126 server.cc:604]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to load llama. I tried, but I got an error #40

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

How to load llama. I tried, but I got an error #40

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions