Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feat] Added support for Tensorflow2 strategy distributed training and Horovod AllToAll synchronous distributed training. #347

Merged
merged 6 commits into from
Aug 2, 2023

Conversation

MoFHeka
Copy link
Contributor

@MoFHeka MoFHeka commented Jul 3, 2023

Now we can simply run distributed training with TFRA by using TF strategy or Horovod in Keras API.
See the demo 'demo/dynamic_embedding/movielens-1m-keras-ps' and 'demo/dynamic_embedding/movielens-1m-keras-with-horovod'

Solving problem that lack of disk space when GitHub CI.

Also fix some bug.

Description

Brief Description of the PR:

Fixes # (issue)

Type of change

  • Bug fix
  • New Tutorial
  • Updated or additional documentation
  • Additional Testing
  • New Feature

Checklist:

  • I've properly formatted my code according to the guidelines
    • By running yapf
    • By running clang-format
  • This PR addresses an already submitted issue for TensorFlow Recommenders-Addons
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works

How Has This Been Tested?

Run new test and new demo.

@MoFHeka MoFHeka requested a review from rhdong as a code owner July 3, 2023 14:05
@MoFHeka MoFHeka requested a review from Lifann July 3, 2023 14:09
@MoFHeka MoFHeka force-pushed the master-dev branch 7 times, most recently from d095bf1 to debe847 Compare July 4, 2023 11:34
@MoFHeka MoFHeka force-pushed the master-dev branch 3 times, most recently from 6d75cf5 to b43964f Compare July 17, 2023 18:05
@fuhailin
Copy link
Contributor

In the pull request workflow, kindly suggest that never use a force-push, the reviewer has to review the whole branch all over again. No diff between versions of the branch, and bringing some unrelated-histories in the pull branch. When I worked for DeepRec, they suggested me reopen a clean PR without force-push.

@MoFHeka MoFHeka force-pushed the master-dev branch 3 times, most recently from 77bae4c to cefa36a Compare July 18, 2023 13:57
…d Horovod AllToAll synchronous distributed training.

[fix] Fix some git auto merge mistake. And also fix to be compatible with latest Keras optimizer.
…rtions of the command lambda function in Redis backend.
Copy link
Member

@rhdong rhdong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@rhdong rhdong merged commit 81050bc into tensorflow:master Aug 2, 2023
33 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants