Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Point in time recovery (PITR) Part 2 #6267

Closed
arindamnayak opened this issue Jun 4, 2020 · 8 comments · Fixed by #6408
Closed

RFC: Point in time recovery (PITR) Part 2 #6267

arindamnayak opened this issue Jun 4, 2020 · 8 comments · Fixed by #6408
Milestone

Comments

@arindamnayak
Copy link
Contributor

arindamnayak commented Jun 4, 2020

This is an extension to #4886 .

Feature Description

With current PITR support in vitess, it is possible to restore to the last backup timestamp. But if we want to go back to the exact time for the restore, it is not possible to apply that delta change. For e.g. say we have last backup at 12:00 AM and it is required to restore upto 3:15 AM. As of now, we can restore the backup upto 12 AM. With current change, it is possible to restore the data till 3:15 AM.

Use cases

This remains still the same as part-1(#4886). Here is the following use case.

  • accidental deletion of data
  • corruption of data due to application bugs

Precondition

  • There should be regular backup taken, just so that we don’t have to replay all the binlogs from the start.
  • There should be binlogs available till the required point.
  • All preconditions of part-1 RFC should be met.

Proposed Design

Screenshot from 2020-06-04 12-00-09

There will be a binlog server which will connect to the mysql server of the master tablet. In a sharded cluster with n shards, there would be n binlog servers.
There is scheduled backup available at regular intervals.
Say we have to recover the data to 6:15 AM, then we will create a restore keyspace from 6 AM backup and it will connect with the binlog server to get the incremental data for the last 15 min.

Binlog server

There should be a binlog server which uses a reliable file storage system. It should be highly available so that we don’t miss any binlogs. For a sharded environment, we need to run separate binlog servers for each shard. For binlog server, mysql-ripple can be used. The lifecycle of a binlog server has to be managed independently.

Applying binlogs

While creating the recovery keyspace, we accept a timestamp. Using that information, we will extract the required GTID up to which the binlog will be applied to restored backup. The recovered replica will replicate from binlog server to apply the binlogs needed to get to the required GTID using the mysql replication command(START SLAVE UNTIL SQL_BEFORE_GTIDS = ‘xxxx-xx-xx:y-z’)

Note: we will choose the last GTID before the provided recovery timestamp.

Getting GTID from timestamp

While creating the recovery keyspace, we have got the required timestamp(#Ref) to restore up to. Also we have the GTID of the last recent backup (the time closer to the required time) E.g. for PITR for 6:15AM, the last recent backup is 6 AM ( considering we have 6 hr scheduled backup). Then we will connect with binlog server as replica, asking that start_pos = current_GTID of last backup and we will read all event sequentially till the timestamp of event is less than or equal to the requested timestamp(#Ref), once we reach here, we will note the GTID.

Getting the data till desired point of time.

At this point we have got the following things.

  • Last available backup.
  • GTID till which we need to replicate from the binlog (the incremental data)

First, we will restore to the last available backup. Then we will connect to the binlog server as a replica with START SLAVE UNTIL SQL_BEFORE_GTIDS = ‘xxxx-xx-xx:y-z’ option, which will apply the incremental data till desired point of time.

FAQ

New configuration

While restoring the tablet, you have to specify the binlog server details as the command line argument of the vttablet process.

If we have multiple shards in keyspace, then you need to spawn multiple binlog servers and while doing recovery (of particular shard/shards), pass that information in the cmd line arguments.

Binlog server and its state management

As of now, there will be no binlog server provided out of box in vitess. You will have to spawn the binlog server yourself and connect it with the master tablet’s database. Since the master tablet can be changed via reparenting/other ways, you have to change the binlog server to point to the new master. Also the binlog server needs to be highly available as the binlog files are critical for the restore. If you have a sharded database, then you will need multiple binlog servers for each master of shard.

@zmagg
Copy link
Contributor

zmagg commented Jun 4, 2020

As of now, there will be no binlog server provided out of box in vitess.

I've been confused about the binlog server part of the PITR approach. Why are we using ripple instead of VRep? What do we get out of it?

@arindamnayak
Copy link
Contributor Author

As of now, there will be no binlog server provided out of box in vitess.

I've been confused about the binlog server part of the PITR approach. Why are we using ripple instead of VRep? What do we get out of it?

To restore back to exactly certain point of time, we need the continous binlogs to be available, which any binlog server can manage it(as in saving the binlog files to storage). Whereas VRep helps in replicating any database i.e. read the binlogs and apply to the database. That is why we need a binlog server and ripple can be used as binlog server.

@deepthi
Copy link
Member

deepthi commented Jun 4, 2020

To elaborate a bit more on @arindamnayak's point above:

  • VReplication is essentially an application built on top of mysql replication. As such, it adds complexity. We don't need anything more than MySQL replication for PITR.
  • Recovery tablets are intended to be "read-only". This is not a problem for MySQL replication. VReplication requires a master / writeable tablet as the target.

@derekperkins
Copy link
Member

VReplication is essentially an application built on top of mysql replication. As such, it adds complexity.

It's a known quantity and if you're running Vitess, you're running VReplication. Certainly that's less complexity than introducing an entire new unmanaged binlog server that you have to manage outside the context of Vitess. I wouldn't want to have to deploy and learn ripple or any other binlog server when I shouldn't have to. That's how we ended up in the current undesirable Orchestrator state.

@derekperkins
Copy link
Member

I'm not sure how this compares functionality wise, but if we are trying to keep things simple, maybe we could make a binlog server a first class citizen and use https://github.com/flike/kingbus that's also written in Go.

@deepthi
Copy link
Member

deepthi commented Jun 9, 2020

@derekperkins we are trying to first address the situation where people already have a binlog server and we can connect to that. I agree that a VReplication based solution will be more native with fewer moving parts.
kingbus looks interesting, though it doesn't seem very active.

@derekperkins
Copy link
Member

derekperkins commented Jun 9, 2020

I totally get tackling it a piece at a time, just hoping that the solution is built with that in mind. The dream would be for Vitess to own binlogs entirely, for replication purposes, backups, PITR, etc.
see discussion in #3581

@derekperkins
Copy link
Member

Also, rereading that conversation makes me wish that @alainjobart would make the jump to PlanetScale. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants