Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docker run --gpus 'all' 报错,多卡不支持吗 #53

Open
guiniao opened this issue Jan 19, 2024 · 1 comment
Open

docker run --gpus 'all' 报错,多卡不支持吗 #53

guiniao opened this issue Jan 19, 2024 · 1 comment

Comments

@guiniao
Copy link

guiniao commented Jan 19, 2024

A40双卡服务器,使用GPU部署服务时
docker run --gpus 'all' --shm-size 1g -p 9090:80 -v $HOME/models:/data
--env LOG_LEVEL="info,text_generation_router=debug"
ghcr.nju.edu.cn/huggingface/text-generation-inference:1.0.3
--model-id /data/CodeShell-7B-Chat --num-shard 1
--max-total-tokens 5000 --max-input-length 4096
--max-stop-sequences 12 --trust-remote-code

报错如下:
024-01-19T08:15:44.995533Z ERROR warmup{max_input_length=4096 max_prefill_tokens=4096}:warmup: text_generation_client: router/client/src/lib.rs:33: Server error: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!
Error: Warmup(Generation("Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!"))
2024-01-19T08:15:45.052858Z ERROR text_generation_launcher: Webserver Crashed
2024-01-19T08:15:45.052873Z INFO text_generation_launcher: Shutting down shards
2024-01-19T08:15:45.395141Z INFO shard-manager: text_generation_launcher: Shard terminated rank=0
Error: WebserverFailed

@lomoonmoonbird
Copy link

image
NotImplementedError: sharded is not supported for AutoModel

看起来是的 codeshell属于AUTOModel吧

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants