Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi GPU with PEFT on LLM #102

Merged
merged 29 commits into from
Jun 27, 2024
Merged

Conversation

avishniakov
Copy link
Contributor

@avishniakov avishniakov commented Apr 17, 2024

This PR brings the multi-GPU DDP showcase to the PEFT training.
It is using the function helper from the core and the other changes are mostly related to Accelerate specifically.

ZenML core companion PR: zenml-io/zenml#2746 [ blocking merging of this one ]

Copy link

dagshub bot commented Apr 17, 2024

@avishniakov avishniakov changed the title [WIP] Multi GPU with PEFT on LLM Multi GPU with PEFT on LLM May 3, 2024
@avishniakov avishniakov marked this pull request as ready for review May 3, 2024 14:24
@avishniakov
Copy link
Contributor Author

@schustmi @htahir1 you are optional reviewers, just in case you have interest 🙂

@strickvl strickvl added enhancement New feature or request internal labels May 3, 2024
Copy link
Contributor

@htahir1 htahir1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I havnt given this a fair shake but in general what is the better way of doing this vs using subprocess :-D Any ideas?

llm-lora-finetuning/README.md Outdated Show resolved Hide resolved
llm-lora-finetuning/README.md Outdated Show resolved Hide resolved
llm-lora-finetuning/utils/cuda.py Outdated Show resolved Hide resolved
@avishniakov
Copy link
Contributor Author

I havnt given this a fair shake but in general what is the better way of doing this vs using subprocess :-D Any ideas?

Not sure, if this answers your question, but I plan to extend the ZenML core to serve automated capabilities for the creation of wrappers + making calls. https://zenml.atlassian.net/jira/software/c/projects/OSSK/boards/13?selectedIssue=OSSK-535
What's, in general, off with subprocessing in your opinion?

@htahir1
Copy link
Contributor

htahir1 commented May 6, 2024

@avishniakov Personally I feel they are quite unstable and unreliable... a better way would to use the internal library and do this in code right?

@avishniakov
Copy link
Contributor Author

@avishniakov Personally I feel they are quite unstable and unreliable... a better way would to use the internal library and do this in code right?

In theory, we can hack around the accelerate.command.launch module, but in this module they still will call subprocess.Popen for you, so you cannot get away from subprocessing anyway. I will explore how we can use the module directly.

@avishniakov
Copy link
Contributor Author

I somewhat heavily reworked how the preparation of the functions was done in this project. This is tightly coupled with the changes on the ZenML side.

Looking forward to some conceptual feedback. There are definitely a few weak points:

  • Cache invalidation mechanism due to the use of the external function is far from perfect.
  • The calls are handled via the function from inside the step. It is not straightforward to make the step "script-function" by itself. This is doable but would need more effort and shaking of how we work with steps in the core.

@avishniakov avishniakov mentioned this pull request May 14, 2024
9 tasks
Base automatically changed from feature/OSSK-499-llm-finetune-with-peft to main May 23, 2024 14:42
@avishniakov
Copy link
Contributor Author

To be merged after 0.58.3/0.59.0 is released

@avishniakov avishniakov merged commit f733589 into main Jun 27, 2024
1 of 3 checks passed
@avishniakov avishniakov deleted the feature/OSSK-514-multi-gpu-with-peft branch June 27, 2024 07:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request internal
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants