New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DC-ASGD(Delay Compensated Asynchronous Stochastic Gradient Descent)? #8744
Comments
I am also very interested with this. Any update? |
I can do this |
@aselle Tensorflow's documentation on distributed computing leads me to believe that the end user is responsible for assigning workers and parameter servers to different devices. Should I then only implement Gradient Descent with Delay Compensation, which the user will have to implement asynchronously? |
I haven't read the paper in depth, but from a quick skim of the paper you could implement the update rule as a |
I already implemented this. Currently, I am verifying its impact on metrics, such as AUC, in our ads data. I will send the pull request if this is really useful. |
Is support for the adaptive variance parameter version of DC-ASGD planned on being added? The paper found that it performed better in all cases than the constant version which is currently implemented. |
Closing this issue, as the PR for DC-ASGD was merged (#9551). Thank you! 👍 |
DC-ASGD is Microsoft's very useful algorithm for distributed asynchronous training. Compared with the ordinary ASGD algorithm, DC-ASGD has no significant loss in speed, but can get almost the same effect as Sequential SGD. As far as I know, other mainstream deep learning open source tools have implemented this algorithm, such as: CNTK, Mxnet, Paddle and so on. But in Tensorflow I have not found similar modules.
What related GitHub issues or StackOverflow threads have you found by searching the web for your problem?
microsoft/CNTK#1295
PaddlePaddle/Paddle#185
apache/mxnet#3614
What other attempted solutions have you tried?
I tried to implement this algorithm in Tensorflow by myself. I do not have enough ability to do this now.
The link address of the paper
Asynchronous Stochastic Gradient Descent with Delay Compensation for Distributed Deep Learning
The text was updated successfully, but these errors were encountered: