-
-
Notifications
You must be signed in to change notification settings - Fork 10.5k
Updated TRL integration docs #25684
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updated TRL integration docs #25684
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request updates the TRL integration documentation, which is a valuable improvement. The new information about supported trainers and modes is helpful. I've provided a couple of suggestions to enhance clarity and consistency in the documentation. Specifically, I've pointed out an inconsistency between the list of trainers and the list of configurations, and noted that the usage of the vllm_mode
parameter could be clarified.
Please fix Also you can fix the pre-commit checks with: pip install pre-commit
pre-commit install
pre-commit run -a --hook-stage manual markdownlint (for future changes pre-commit will run all the non-manual hooks automatically now) |
56bebc5
to
e87da96
Compare
303251a
to
46a52da
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like some unrelated changes have been picked up in a merge. Can they please be reverted?
docs/training/trl.md
Outdated
- [`trl.GRPOConfig.use_vllm`](https://huggingface.co/docs/trl/main/en/grpo_trainer#trl.GRPOConfig.use_vllm) | ||
- [`trl.OnlineDPOConfig.use_vllm`](https://huggingface.co/docs/trl/main/en/online_dpo_trainer#trl.OnlineDPOConfig.use_vllm) | ||
- [`trl.RLOOConfig.use_vllm`](https://huggingface.co/docs/trl/main/en/rloo_trainer#trl.RLOOConfig.use_vllm) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this list necessary now that the more complete link has been added above?
If yes, can we make it match the list above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed the list since we already have the links to the docs above to simplify.
I've also added some more details about the modes of using vLLM during TRL training.
Signed-off-by: sergiopaniego <sergiopaniegoblanco@gmail.com>
Signed-off-by: sergiopaniego <sergiopaniegoblanco@gmail.com>
Signed-off-by: sergiopaniego <sergiopaniegoblanco@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: sergiopaniego <sergiopaniegoblanco@gmail.com>
e6a282d
to
3a266fe
Compare
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
Signed-off-by: sergiopaniego <sergiopaniegoblanco@gmail.com>
Signed-off-by: sergiopaniego <sergiopaniegoblanco@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for improving this doc!
Signed-off-by: sergiopaniego <sergiopaniegoblanco@gmail.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: sergiopaniego <sergiopaniegoblanco@gmail.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: yewentao256 <zhyanwentao@126.com>
Purpose
Update TRL integration docs with more recent details.
@hmellor
Essential Elements of an Effective PR Description Checklist
supported_models.md
andexamples
for a new model.