Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DC-ASGD(Delay Compensated Asynchronous Stochastic Gradient Descent)? #8744

Closed
liufei1656 opened this issue Mar 27, 2017 · 8 comments
Closed
Labels
stat:contribution welcome Status - Contributions welcome type:feature Feature requests

Comments

@liufei1656
Copy link

liufei1656 commented Mar 27, 2017

DC-ASGD is Microsoft's very useful algorithm for distributed asynchronous training. Compared with the ordinary ASGD algorithm, DC-ASGD has no significant loss in speed, but can get almost the same effect as Sequential SGD. As far as I know, other mainstream deep learning open source tools have implemented this algorithm, such as: CNTK, Mxnet, Paddle and so on. But in Tensorflow I have not found similar modules.

What related GitHub issues or StackOverflow threads have you found by searching the web for your problem?

microsoft/CNTK#1295
PaddlePaddle/Paddle#185
apache/mxnet#3614

What other attempted solutions have you tried?

I tried to implement this algorithm in Tensorflow by myself. I do not have enough ability to do this now.

The link address of the paper

Asynchronous Stochastic Gradient Descent with Delay Compensation for Distributed Deep Learning

@renyi533
Copy link

I am also very interested with this. Any update?

@aselle aselle added type:feature Feature requests stat:contribution welcome Status - Contributions welcome labels Mar 28, 2017
@just-a-jazz
Copy link

I can do this

@just-a-jazz
Copy link

@aselle Tensorflow's documentation on distributed computing leads me to believe that the end user is responsible for assigning workers and parameter servers to different devices. Should I then only implement Gradient Descent with Delay Compensation, which the user will have to implement asynchronously?

@aselle
Copy link
Contributor

aselle commented Apr 4, 2017

@mrry, might be able to give you more direction in this vein (also @tfboyd )

@mrry
Copy link
Contributor

mrry commented Apr 8, 2017

I haven't read the paper in depth, but from a quick skim of the paper you could implement the update rule as a tf.train.Optimizer subclass, and the asynchronous execution would be provided by whatever training loop the user used. (It would presumably "work" in a synchronous case as well, because wtwbak would be zero, and hence it would devolve into classic SGD.)

@YirenBj
Copy link

YirenBj commented Apr 19, 2017

I already implemented this. Currently, I am verifying its impact on metrics, such as AUC, in our ads data. I will send the pull request if this is really useful.

@MorganGellert
Copy link

Is support for the adaptive variance parameter version of DC-ASGD planned on being added? The paper found that it performed better in all cases than the constant version which is currently implemented.

@dynamicwebpaige
Copy link
Contributor

Closing this issue, as the PR for DC-ASGD was merged (#9551). Thank you! 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stat:contribution welcome Status - Contributions welcome type:feature Feature requests
Projects
None yet
Development

No branches or pull requests

8 participants