Multi GPU with PEFT on LLM #102

avishniakov · 2024-04-17T14:26:56Z

This PR brings the multi-GPU DDP showcase to the PEFT training.
It is using the function helper from the core and the other changes are mostly related to Accelerate specifically.

ZenML core companion PR: zenml-io/zenml#2746 [ blocking merging of this one ]

dagshub · 2024-04-17T14:26:59Z

Join the discussion on DagsHub!

avishniakov · 2024-05-03T14:29:51Z

@schustmi @htahir1 you are optional reviewers, just in case you have interest 🙂

htahir1

I havnt given this a fair shake but in general what is the better way of doing this vs using subprocess :-D Any ideas?

llm-lora-finetuning/README.md

llm-lora-finetuning/utils/cuda.py

avishniakov · 2024-05-06T10:55:52Z

I havnt given this a fair shake but in general what is the better way of doing this vs using subprocess :-D Any ideas?

Not sure, if this answers your question, but I plan to extend the ZenML core to serve automated capabilities for the creation of wrappers + making calls. https://zenml.atlassian.net/jira/software/c/projects/OSSK/boards/13?selectedIssue=OSSK-535
What's, in general, off with subprocessing in your opinion?

htahir1 · 2024-05-06T11:07:18Z

@avishniakov Personally I feel they are quite unstable and unreliable... a better way would to use the internal library and do this in code right?

avishniakov · 2024-05-06T11:13:22Z

@avishniakov Personally I feel they are quite unstable and unreliable... a better way would to use the internal library and do this in code right?

In theory, we can hack around the accelerate.command.launch module, but in this module they still will call subprocess.Popen for you, so you cannot get away from subprocessing anyway. I will explore how we can use the module directly.

avishniakov · 2024-05-07T11:55:00Z

I somewhat heavily reworked how the preparation of the functions was done in this project. This is tightly coupled with the changes on the ZenML side.

Looking forward to some conceptual feedback. There are definitely a few weak points:

Cache invalidation mechanism due to the use of the external function is far from perfect.
The calls are handled via the function from inside the step. It is not straightforward to make the step "script-function" by itself. This is doable but would need more effort and shaking of how we work with steps in the core.

avishniakov · 2024-06-18T15:53:20Z

To be merged after 0.58.3/0.59.0 is released

multi GPU with PEFT on LLM

5d2a2ee

avishniakov requested a review from strickvl April 17, 2024 14:26

avishniakov added 10 commits April 17, 2024 16:28

eof

3e9776a

fixes for subprocess

d77afdc

callback patch

7cc1f01

tidy up

377ec0f

new iteration

8856e5c

lint

0c85e8b

fsspec fix

27e6795

pin datasets to lower version

f257630

relax datasets pin a bit

f68a469

polish for step operators

65cdc7e

avishniakov changed the title ~~[WIP] Multi GPU with PEFT on LLM~~ Multi GPU with PEFT on LLM May 3, 2024

avishniakov marked this pull request as ready for review May 3, 2024 14:24

avishniakov requested review from htahir1 and schustmi May 3, 2024 14:25

strickvl added enhancement New feature or request internal labels May 3, 2024

htahir1 reviewed May 3, 2024

View reviewed changes

strickvl reviewed May 3, 2024

View reviewed changes

llm-lora-finetuning/README.md Outdated Show resolved Hide resolved

llm-lora-finetuning/README.md Outdated Show resolved Hide resolved

llm-lora-finetuning/utils/cuda.py Outdated Show resolved Hide resolved

avishniakov added 3 commits May 7, 2024 13:44

push some functionality to the core

93398cd

format

2bb3ab9

update README

261e2ce

update README

c22092b

use AccelerateScaler

b51a111

avishniakov mentioned this pull request May 14, 2024

Accelerate scaler zenml-io/zenml#2677

Closed

9 tasks

pass bit config around

f3943b2

Base automatically changed from feature/OSSK-499-llm-finetune-with-peft to main May 23, 2024 14:42

avishniakov added 6 commits June 4, 2024 12:46

functional way

555997e

Merge branch 'main' into feature/OSSK-514-multi-gpu-with-peft

28faf6c

remove configs

65b5e11

restore configs

d77f50f

restore reqs

fd3887d

accelerate as a function from the core

5264011

avishniakov mentioned this pull request Jun 5, 2024

Accelerate runner helper method zenml-io/zenml#2746

Merged

9 tasks

avishniakov added 4 commits June 5, 2024 12:01

reduce README

d9172c7

og metadata separately

817a1b2

resume logging

6d988eb

add trust_remote_code=True

80a1084

avishniakov added 2 commits June 20, 2024 11:18

final touches

bb9dc65

final touches

329cf17

avishniakov merged commit f733589 into main Jun 27, 2024
1 of 3 checks passed

avishniakov deleted the feature/OSSK-514-multi-gpu-with-peft branch June 27, 2024 07:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi GPU with PEFT on LLM #102

Multi GPU with PEFT on LLM #102

avishniakov commented Apr 17, 2024 •

edited

Loading

dagshub bot commented Apr 17, 2024

avishniakov commented May 3, 2024

htahir1 left a comment

avishniakov commented May 6, 2024

htahir1 commented May 6, 2024

avishniakov commented May 6, 2024

avishniakov commented May 7, 2024

avishniakov commented Jun 18, 2024

Multi GPU with PEFT on LLM #102

Multi GPU with PEFT on LLM #102

Conversation

avishniakov commented Apr 17, 2024 • edited Loading

dagshub bot commented Apr 17, 2024

avishniakov commented May 3, 2024

htahir1 left a comment

Choose a reason for hiding this comment

avishniakov commented May 6, 2024

htahir1 commented May 6, 2024

avishniakov commented May 6, 2024

avishniakov commented May 7, 2024

avishniakov commented Jun 18, 2024

avishniakov commented Apr 17, 2024 •

edited

Loading