Skip to content

How to load llama. I tried, but I got an error #40

@UncleFB

Description

@UncleFB

The model is vicuna_13b. I use build.py to generate a 2-gpu model.

E1025 09:17:53.664847 126 backend_model.cc:553] ERROR: Failed to create instance: unexpected error when creating modelInstanceState: TrtGptModelInflightBatching requires GPT attention plugin with packed input and paged KV cache.
E1025 09:17:53.664952 126 model_lifecycle.cc:622] failed to load 'tensorrt_llm' version 1: Internal: unexpected error when creating modelInstanceState: TrtGptModelInflightBatching requires GPT attention plugin with packed input and paged KV cache.
I1025 09:17:53.664979 126 model_lifecycle.cc:757] failed to load 'tensorrt_llm'
E1025 09:17:53.665119 126 model_repository_manager.cc:563] Invalid argument: ensemble 'ensemble' depends on 'tensorrt_llm' which has no loaded version. Model 'tensorrt_llm' loading failed with error: version 1 is at UNAVAILABLE state: Internal: unexpected error when creating modelInstanceState: TrtGptModelInflightBatching requires GPT attention plugin with packed input and paged KV cache.;
I1025 09:17:53.665245 126 server.cc:604]

Metadata

Metadata

Assignees

Labels

triagedIssue has been triaged by maintainers

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions