docker run --gpus 'all' 报错，多卡不支持吗 #53

guiniao · 2024-01-19T08:23:01Z

A40双卡服务器，使用GPU部署服务时
docker run --gpus 'all' --shm-size 1g -p 9090:80 -v $HOME/models:/data
--env LOG_LEVEL="info,text_generation_router=debug"
ghcr.nju.edu.cn/huggingface/text-generation-inference:1.0.3
--model-id /data/CodeShell-7B-Chat --num-shard 1
--max-total-tokens 5000 --max-input-length 4096
--max-stop-sequences 12 --trust-remote-code

报错如下：
024-01-19T08:15:44.995533Z ERROR warmup{max_input_length=4096 max_prefill_tokens=4096}:warmup: text_generation_client: router/client/src/lib.rs:33: Server error: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!
Error: Warmup(Generation("Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!"))
2024-01-19T08:15:45.052858Z ERROR text_generation_launcher: Webserver Crashed
2024-01-19T08:15:45.052873Z INFO text_generation_launcher: Shutting down shards
2024-01-19T08:15:45.395141Z INFO shard-manager: text_generation_launcher: Shard terminated rank=0
Error: WebserverFailed

lomoonmoonbird · 2024-04-25T01:48:45Z

NotImplementedError: sharded is not supported for AutoModel

看起来是的 codeshell属于AUTOModel吧

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docker run --gpus 'all' 报错，多卡不支持吗 #53

docker run --gpus 'all' 报错，多卡不支持吗 #53

guiniao commented Jan 19, 2024

lomoonmoonbird commented Apr 25, 2024

docker run --gpus 'all' 报错，多卡不支持吗 #53

docker run --gpus 'all' 报错，多卡不支持吗 #53

Comments

guiniao commented Jan 19, 2024

lomoonmoonbird commented Apr 25, 2024