Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Master doesn't send heartbeats to replica while scanning wals #4461

Closed
rtokarev opened this issue Aug 28, 2019 · 1 comment
Closed

Master doesn't send heartbeats to replica while scanning wals #4461

rtokarev opened this issue Aug 28, 2019 · 1 comment
Assignees
Labels
bug Something isn't working replication

Comments

@rtokarev
Copy link
Contributor

I've got a situation when a replica couldn't recover from a master because it scans WALs too long before sending the first row. It doesn't send heartbeats to replica while scanning, so replication disconnect timeout occurred in a replica.

It seems that in my case the first row to start to recover from is located at the end of the 00000000000003991061.xlog.

019-08-28 11:29:33.793 [21848] main/4851/main I> subscribed replica ffdaa1d1-c57c-4795-9bb7-33179fadfbe0 at fd 15, aka 10.246.1.39:3301, peer of 10.246.1.6:36686
2019-08-28 11:29:33.793 [21848] main/4851/main I> remote vclock {1: 4376075, 2: 5} local vclock {1: 4376095, 2: 5}
2019-08-28 11:29:33.800 [21848] relay/10.246.1.6:36686/101/main I> recover from `/var/lib/tarantool/xtaz_2//00000000000003991061.xlog'
2019-08-28 11:29:39.845 [21848] relay/10.246.1.6:36686/101/main I> done `/var/lib/tarantool/xtaz_2//00000000000003991061.xlog'
2019-08-28 11:29:39.846 [21848] relay/10.246.1.6:36686/101/main I> recover from `/var/lib/tarantool/xtaz_2//00000000000004376090.xlog'
2019-08-28 11:29:39.846 [21848] relay/10.246.1.6:36686/101/main I> done `/var/lib/tarantool/xtaz_2//00000000000004376090.xlog'
2019-08-28 11:29:39.846 [21848] relay/10.246.1.6:36686/101/main coio.cc:370 !> SystemError unexpected EOF when reading from socket, called on fd 15, aka 10.246.1.39:3301, peer of 10.246.1.6:36686: Broken pipe
2019-08-28 11:29:39.846 [21848] relay/10.246.1.6:36686/101/main C> exiting the relay loop
@kyukhin kyukhin added bug Something isn't working replication labels Sep 26, 2019
@kyukhin kyukhin added this to the 2.4.1 milestone Sep 26, 2019
@kyukhin kyukhin modified the milestones: 2.4.1, 2.4.2 Apr 10, 2020
@kyukhin kyukhin modified the milestones: 2.4.2, 2.4.3 Jun 22, 2020
@kyukhin kyukhin modified the milestones: 2.4.3, wishlist Oct 23, 2020
@kyukhin kyukhin added the teamS label Jun 20, 2022
@kyukhin kyukhin removed this from the wishlist milestone Jun 20, 2022
@sergos
Copy link
Contributor

sergos commented Jun 20, 2022

Closing as duplicate of #6706 (resolved)

@sergos sergos closed this as completed Jun 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working replication
Projects
None yet
Development

No branches or pull requests

3 participants