This repository was archived by the owner on Oct 9, 2024. It is now read-only.

huggingface / transformers-bloom-inference Public archive

Notifications
Fork 112
Star 562

Code
Issues 20
Pull requests 2
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Issues: huggingface/transformers-bloom-inference

Labels 9 Milestones 0

Clear current search query, filters, and sorts

20 Open 44 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

transformers 4.21.3 doesn't support load_in_8bit

#3 by SadeghRahnamoon was closed Nov 11, 2022

👍

Inference with 4*A100 GPUs

#26 by xiang-deng was closed Nov 9, 2022

👍

Multiple people querying MII issue

#6 by mayank31398 was closed Sep 19, 2022

8x40GB GPU instance failing with microsoft/bloom-deepspeed-inference-int8

#8 by tjarmain was closed Feb 18, 2023

Wrong prediction from "bloom-deepspeed-inference-int8"

#10 by zomux was closed Sep 20, 2022

Which version of huggingface_hub to use

#15 by younesbelkada was closed Oct 8, 2022

running error with "Bus error: nonexistent physical address"

#16 by Emerald01 was closed Feb 18, 2023

transformers 4.22.1 fails with accelerate

#17 by mayank31398 was closed Oct 18, 2022

DeepSpeed runtime partition failed

#18 by lanking520 was closed Oct 11, 2022

Running Error with microsoft/bloom-deepspeed-inference-fp16

#22 by yihaocs was closed Oct 21, 2022

Error running bloom-7b1 model with deepspeed

#24 by anoopkunchukuttan was closed Nov 11, 2022

transformers_bloom_parallel link 404

#2 by zcrypt0 was closed Sep 19, 2022

[Question] 4-GPU shard microsoft/bloom-deepspeed-inference-int8

#4 by zcrypt0 was closed Sep 16, 2022

Newly added model_class argument has not been reflected to server code

#33 by koreyou was closed Nov 16, 2022

Inference returns nan log-probability

#38 by vinhngx was closed Feb 7, 2023

RuntimeError: Error building extension 'transformer_inference'

#48 by Mahyar-Ali was closed Feb 2, 2023

how to inference by multi-node and multi-card?

#51 by vicwer was closed Mar 14, 2023

How do build a web api for deepspeed inference

#52 by vamsikrishnav was closed Feb 17, 2023

Max tokens generated remains constant for whatever the input token size

#55 by vamsikrishnav was closed Feb 21, 2023

Why is the throughput of DS-inference doubled when using 4 A100 GPUs compared to 8 A100 GPUs

#59 by DominickZhang was closed Apr 6, 2023

OOM of CUDA when using one GPU

#60 by xiongjun19 was closed May 31, 2023

Distributed Training using the same loading method

#61 by ananda1996ai was closed Mar 14, 2023

how to run the server?

#64 by raihan0824 was closed Mar 29, 2023

does this work for llama 65B

#98 by GradientGuru was closed Jul 31, 2023

Inference hangs after GPU OOM

#32 by xiang-deng was closed Nov 22, 2022

Previous 1 2 Next

Previous Next

ProTip! Add no:assignee to see everything that’s not assigned.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly