This repository was archived by the owner on Oct 9, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 112
Issues: huggingface/transformers-bloom-inference
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
ds_inference success but OOM when use tp_presharded_mode=True
#92
by LiuShixing
was closed Jun 1, 2023
Bloom176B RuntimeError: expected scalar type Half but found BFloat16
#89
by wohenniubi
was closed Jun 9, 2023
Can not generate text correctly after loading an int8 model
#80
by moonlightian
was closed Jul 8, 2023
Why does ds-inference int8 run slower than ds-inference fp16?
#79
by DominickZhang
was closed May 10, 2023
"bloom-ds-zero-inference.py" works but "inference_server.cli --deployment_framework ds_zero" fails
#68
by richarddwang
was closed Jun 17, 2024
Why is the throughput of DS-inference doubled when using 4 A100 GPUs compared to 8 A100 GPUs
#59
by DominickZhang
was closed Apr 6, 2023
Max tokens generated remains constant for whatever the input token size
#55
by vamsikrishnav
was closed Feb 21, 2023
Previous Next
ProTip!
no:milestone will show everything without a milestone.