This repository was archived by the owner on Oct 9, 2024. It is now read-only.

huggingface / transformers-bloom-inference Public archive

Notifications
Fork 112
Star 562

Code
Issues 20
Pull requests 2
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Issues: huggingface/transformers-bloom-inference

Labels 9 Milestones 0

Clear current search query, filters, and sorts

20 Open 44 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

does this work for llama 65B

#98 by GradientGuru was closed Jul 31, 2023

Can I combine fastertransformer to make it faster

#93 by xsj4cs was closed May 31, 2023

ds_inference success but OOM when use tp_presharded_mode=True

#92 by LiuShixing was closed Jun 1, 2023

accelerate in bloom-inference-scripts?

#91 by jeromeku was closed May 23, 2023

Bloom176B RuntimeError: expected scalar type Half but found BFloat16

#89 by wohenniubi was closed Jun 9, 2023

question regarding the float16 and bfloat

#87 by allanj was closed May 10, 2023

why no use deepspeed.init_inference in zero benchmark

#86 by tingshua-yts was closed May 31, 2023

Unable to reload a quantized model

#85 by moonlightian was closed May 10, 2023

The details of hf-accelerate pp.

#83 by tohneecao was closed May 20, 2023

BUILD ERROR with nvcc

#81 by tohneecao was closed May 10, 2023

Can not generate text correctly after loading an int8 model

#80 by moonlightian was closed Jul 8, 2023

Why does ds-inference int8 run slower than ds-inference fp16?

#79 by DominickZhang was closed May 10, 2023

tokenizer.json 乱码怎么解析

#78 by hongshengxin was closed May 10, 2023

concurrent requests

#75 by ustclan was closed Apr 7, 2023

Is there a way to initialize a random weight for

#74 by PannenetsF was closed Apr 19, 2023

beam search

#73 by syp1997 was closed Apr 19, 2023

Short response for bloom inferring

#70 by raihan0824 was closed Mar 29, 2023

Should I use bf16 or fp16?

#69 by richarddwang was closed Mar 23, 2023

"bloom-ds-zero-inference.py" works but "inference_server.cli --deployment_framework ds_zero" fails

#68 by richarddwang was closed Jun 17, 2024

how to run the server?

#64 by raihan0824 was closed Mar 29, 2023

Distributed Training using the same loading method

#61 by ananda1996ai was closed Mar 14, 2023

OOM of CUDA when using one GPU

#60 by xiongjun19 was closed May 31, 2023

Why is the throughput of DS-inference doubled when using 4 A100 GPUs compared to 8 A100 GPUs

#59 by DominickZhang was closed Apr 6, 2023

Max tokens generated remains constant for whatever the input token size

#55 by vamsikrishnav was closed Feb 21, 2023

How do build a web api for deepspeed inference

#52 by vamsikrishnav was closed Feb 17, 2023

Previous 1 2 Next

Previous Next

ProTip! no:milestone will show everything without a milestone.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly