Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support to Multi-node Training #989

Closed
8 tasks
BiEchi opened this issue Nov 8, 2023 · 3 comments
Closed
8 tasks

Support to Multi-node Training #989

BiEchi opened this issue Nov 8, 2023 · 3 comments
Labels
enhancement Feature that is not a new algorithm or an algorithm enhancement on hold Won't be worked on for now, but maybe later

Comments

@BiEchi
Copy link

BiEchi commented Nov 8, 2023

  • I have marked all applicable categories:
    • exception-raising bug
    • RL algorithm bug
    • documentation request (i.e. "X is missing from the documentation.")
    • new feature request
  • I have visited the source website
  • I have searched through the issue tracker for duplicates
  • I have mentioned version numbers, operating system and environment, where applicable:
    import tianshou, gymnasium as gym, torch, numpy, sys
    print(tianshou.__version__, gym.__version__, torch.__version__, numpy.__version__, sys.version, sys.platform)

Hello, we're using Tianshou for a large model, which requires multi-node training. I see the multi-GPU training tutorial here: https://tianshou.readthedocs.io/en/master/tutorials/cheatsheet.html#multi-gpu. However, I'm not sure whether this can be applied to multi-node training. Any previous attempts are greatly appreciated!

@MischaPanch
Copy link
Collaborator

In principle this should work since tianshou supports ray for parallelization, which in turn supports multiple nodes. @Trinkle23897 may know more.

I haven't used tianshou for LLMs yet (trlx seemed like a good fit for that) but would love to hear experiences and use cases. Could you provide some info on your use case and why you want to use tianshou, or is it non-disclosable for your case?

@MischaPanch MischaPanch added the question Further information is requested label Nov 8, 2023
@BiEchi
Copy link
Author

BiEchi commented Nov 9, 2023

Thank you for the prompt response. Unfortuntely, we cannot provide additional details on the specific use cases as this is an ongoing research project. Thank you again for your help and your understanding!

@MischaPanch
Copy link
Collaborator

I see, thanks. Currently, multi-node parallelization and LLM support are not the main points on my radar, so I won't be able to help you with this at least until 2024. But other users/devs might have more experience with this

@MischaPanch MischaPanch added enhancement Feature that is not a new algorithm or an algorithm enhancement on hold Won't be worked on for now, but maybe later and removed question Further information is requested labels Jan 8, 2024
@BiEchi BiEchi closed this as completed Feb 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Feature that is not a new algorithm or an algorithm enhancement on hold Won't be worked on for now, but maybe later
Projects
None yet
Development

No branches or pull requests

2 participants