huggingface / accelerate Public

Notifications
Fork 1k
Star 8.5k

Code
Issues 103
Pull requests 22
Actions
Projects 1
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Issues: huggingface/accelerate

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

103 Open 1,641 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

Add Support for MiCS Shards on Deepspeed

#3442 opened Mar 13, 2025 by sam-h-bean

Issue with distribution strategy for sagemaker

#3440 opened Mar 13, 2025 by felix-dumont-neoxia

ppc64le version is significantly outdated.

#3436 opened Mar 11, 2025 by yaniker

Using Deepspeed zero 3 to save Checkpoint

#3435 opened Mar 11, 2025 by JYX1216

2 of 4 tasks

Issue with model generations and using GRPOTrainer with device_map='auto' passed while loading model.

#3434 opened Mar 10, 2025 by debdeepsanyal

2 of 4 tasks

Could not load random states in Pytorch >= 2.4

#3433 opened Mar 10, 2025 by XiaoyuBIE1994

2 of 4 tasks

Slowdown caused by accelerate or pytorch?

#3431 opened Mar 9, 2025 by ZeshengLiu22

Can i run a big model on several machines ?

#3429 opened Mar 8, 2025 by shaofengzeng

Problem with accelerate == 1.4.0 and deepspeed ==0.16.4 when training NLP models

#3428 opened Mar 8, 2025 by CaoYiwei

4 tasks

Multi-Node Training Fails with NCCL Communication Errors on NVIDIA DGX Cloud

#3426 opened Mar 7, 2025 by mahdip72

Unwrapping for generation with FSDP

#3425 opened Mar 6, 2025 by VityaVitalich

2 of 4 tasks

Partial accelerator.prepare() Usage

#3422 opened Mar 6, 2025 by zhanglixuan0720

How to sync distribute model paramaters when training with continual learning fashion?

#3421 opened Mar 5, 2025 by Iranb

Add an option in set_seed to disable cuDNN benchmarking

#3418 opened Mar 2, 2025 by Orimalca

GPU/CPU Offloading and preload_module_classes

#3415 opened Feb 27, 2025 by Giuseppe5

1 of 4 tasks

Kubeflow + Accelerate for distributed GPU training

#3414 opened Feb 27, 2025 by githubthunder

Unable to create tensorboard log file when passing parameters through Dictionary Unpacking

#3412 opened Feb 25, 2025 by Zhuofeng-Li

2 of 4 tasks

AttributeError: 'AcceleratorState' object has no attribute 'distributed_type'

#3410 opened Feb 24, 2025 by ErwinZhou

2 of 4 tasks

How do multi-GPUs make ids?

#3407 opened Feb 23, 2025 by lilitao0517

No Insrtuctions or script available to run training DL models on multiple CPUS.

#3406 opened Feb 22, 2025 by madhavi1102

I try to train our model with stylegan-2, find a bug, how I can fix it

#3404 opened Feb 20, 2025 by lingtengqiu

Transformers test_cpu_offload tests fail with KeyError: 'xpu:0'

#3402 opened Feb 20, 2025 by dvrogozh

Save the model before SLURM kill the job

#3400 opened Feb 17, 2025 by SamuelLarkin

Something WRONG when I saving the trained model with deepspeed stage 3 optimization config

#3399 opened Feb 16, 2025 by ZYM66

2 of 4 tasks

Randomization with seed not working

#3398 opened Feb 15, 2025 by nasosger

2 of 4 tasks

Previous 1 2 3 4 5 Next

Previous Next

ProTip! Type g p on any issue or pull request to go back to the pull request listing page.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly