Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backwards fault recovery #2

Open
NikolayBlagoev opened this issue Mar 6, 2024 · 0 comments
Open

Backwards fault recovery #2

NikolayBlagoev opened this issue Mar 6, 2024 · 0 comments

Comments

@NikolayBlagoev
Copy link

Great work with this paper and congraturlations!

I had a quick question how disconnects are handled during a backwards pass. From the paper it seems that timeouts are only triggered on a forward pass. But during a backwards pass you need to return the gradients of a node's input to a node which has had that batch pass through it. I couldn't find any explanation on how this is handled in the paper and from what I see in the code, it seems just a random new expert is chosen, which doesn't seem to be a sound solution.

I was wondering if I am missing something.

Also, to replicate the results, what commands do you use to run the setup.py? Both install and build seem to be insufficient

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant