Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Second replication begins if first replication is not finished #8

Closed
woodsb02 opened this issue Oct 1, 2017 · 7 comments
Closed

Second replication begins if first replication is not finished #8

woodsb02 opened this issue Oct 1, 2017 · 7 comments
Labels
Milestone

Comments

@woodsb02
Copy link
Contributor

woodsb02 commented Oct 1, 2017

During first replication of many Gigabytes of data, I initially had the interval of the pull job set as 10m, and the first replication would not be finished by the time the second one was called to start. I checked the status many hours later and could see numerous ssh sessions running which led me to believe multiple replication jobs were now running at once (which I dont think should ever happen). I expected that if another replication job was called to start before the previous had finished, the new job would just be cancelled entirely.

I did not look into the state of my replicated data, or if the replications were proceeding ok. It was purely the fact that multiple zrepl ssh sessions were running that led me to believe this was the behaviour.

@problame problame added the bug label Oct 2, 2017
@problame problame added this to the 0.0.2 milestone Oct 2, 2017
@problame
Copy link
Member

problame commented Oct 2, 2017

This is a bug. WIll fix.

problame added a commit that referenced this issue Oct 3, 2017
The documentation describes intended behavior.

Apparently, there are some bugs regarding *patient* tasks.

refs #8
refs #13
@problame
Copy link
Member

problame commented Oct 5, 2017

So I guess this was a pull job? I cannot reproduce the issue.

cmd/config_job_pull.go:94 JobStart() is strictly sequential and will not reconnect unless the previous pull finished.

What's still an open issue: where did all the dangling ssh sessions come from?

@woodsb02
Copy link
Contributor Author

woodsb02 commented Oct 5, 2017

I will try to replicate it this weekend, and will report back on my findings. I didn’t spend the time to investigate and record my findings last time... I just remember seeing numerous lines from “sudo pgrep -lf zrepl”

@problame
Copy link
Member

Were you able to replicate the described behavior?

@problame
Copy link
Member

problame commented Nov 4, 2017

OK, I was able to observer the issue on a testing system. I saw lots of defunct processes, most likely ssh processes that timed out but were not waitpid() for by zrepl.
Sadly, the logs are gone because the testing system was also used to test TCP logger, which doesn't handle timeouts on the connection well, see #26

@problame
Copy link
Member

So I think I fixed the issue in 6b5bd0a --- it just landed in zrepl master.
Are you in a situation where you can just build zrepl master and check if the issue is resolved?

@woodsb02
Copy link
Contributor Author

woodsb02 commented Mar 4, 2018

Hi Christian,
Yes, you are correct - this issue was the same as the one reported in #56.
I have just finished testing with the new latest (unreleased) version of zrepl, and can confirm this is now fixed.
Thanks for your work on fixing this!
Cheers,
Ben

@problame problame closed this as completed Mar 4, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants