Loading checkpoint shards take forever: llama-13b #962

Cherchercher · 2024-04-18T18:31:52Z

Cherchercher
Apr 18, 2024

What does "Loading checkpoint shards" do and do we need it? If not straightly needed, how can i skip this?

Running model as is to test simple queries but it takes forever to "load checkpoint shards"

docker run --rm -it -p 3000:3000 ghcr.io/bentoml/openllm start llama --model-id huggyllama/llama-13b --backend pt