New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Amazon rds peridically deletes binlog files, `bin/maxwell` throws error and stops #282

Closed
jitendra-1217 opened this Issue Feb 19, 2016 · 11 comments

Comments

Projects
None yet
4 participants
@jitendra-1217

jitendra-1217 commented Feb 19, 2016

Amazon rds periodically deletes binlog files,

and so when I re-run bin/maxwell it throws folloing:

11:53:37,999 INFO  Maxwell - Maxwell is booting (StdoutProducer), starting at BinlogPosition[mysql-bin-changelog.000321:962]
11:53:38,004 INFO  SchemaStore - Restoring schema id 2 (last modified at BinlogPosition[mysql-bin-changelog.000258:354])
11:53:38,819 INFO  TransportImpl - connecting to host: test-abc.cmrsvp71mtvr.ap-southeast-1.rds.amazonaws.com, port: 3306
11:53:38,858 INFO  TransportImpl - connected to host: test-abc.cmrsvp71mtvr.ap-southeast-1.rds.amazonaws.com, port: 3306, context: AbstractTransport.Context[threadId=399,scra$
ble=@o2IVU)kb)BQR(mn_r@&,protocolVersion=10,serverHost=test-abc.cmrsvp71mtvr.ap-southeast-1.rds.amazonaws.com,serverPort=3306,serverStatus=2,serverCollation=8,serverVersion=5$
6.23-log,serverCapabilities=65535]
11:53:38,866 INFO  AuthenticatorImpl - start to login, user: maxwell, host: test-abc.cmrsvp71mtvr.ap-southeast-1.rds.amazonaws.com, port: 3306
11:53:38,870 INFO  AuthenticatorImpl - login successfully, user: maxwell, detail: OKPacket[packetMarker=0,affectedRows=0,insertId=0,serverStatus=2,warningCount=0,message=<null>]
11:53:38,896 ERROR MaxwellReplicator - Missing binlog 'mysql-bin-changelog.000321' on test-abc.cmrsvp71xr9m.ap-southeast-1.rds.amazonaws.com
11:53:38,896 ERROR MaxwellReplicator - Transport exception #1236

To avoid this:

If I delete entries from maxwell.positions (for the deleted binlog files in rds) it works, but that I did manually.

Is their any alternative in maxwell to skip not found bin log files and continue with what is available?? Or can I do any code change, if above is a valid & accepted issue with no other

If anyone can suggest something?

@osheroff

This comment has been minimized.

Show comment
Hide comment
@osheroff

osheroff Feb 19, 2016

Collaborator

@jitendra-1217 to confirm, you left maxwell down for awhile, and then when you came back its binlogs were deleted and it couldn't continue?

Collaborator

osheroff commented Feb 19, 2016

@jitendra-1217 to confirm, you left maxwell down for awhile, and then when you came back its binlogs were deleted and it couldn't continue?

@osheroff

This comment has been minimized.

Show comment
Hide comment
@osheroff

osheroff Feb 19, 2016

Collaborator

If that's the case, there's really no safe way to automatically pick up from anywhere other than the current position. #183 explores the option of having a command line flag that forces maxwell to simply drop its database and recapture the schema and all that, but I suppose another way to do it would be a little utility that forced maxwell's position to a specific binlog/offset:

  bin/maxwell-set-position --host=XXX --user=XX --file=mysql-bin-changelog.000322 --position=4

which would at least let you recover by hand.

Collaborator

osheroff commented Feb 19, 2016

If that's the case, there's really no safe way to automatically pick up from anywhere other than the current position. #183 explores the option of having a command line flag that forces maxwell to simply drop its database and recapture the schema and all that, but I suppose another way to do it would be a little utility that forced maxwell's position to a specific binlog/offset:

  bin/maxwell-set-position --host=XXX --user=XX --file=mysql-bin-changelog.000322 --position=4

which would at least let you recover by hand.

@osheroff

This comment has been minimized.

Show comment
Hide comment
@osheroff

osheroff Feb 19, 2016

Collaborator

any interest in trying to code that up @jitendra-1217 ?

Collaborator

osheroff commented Feb 19, 2016

any interest in trying to code that up @jitendra-1217 ?

@jitendra-1217

This comment has been minimized.

Show comment
Hide comment
@jitendra-1217

jitendra-1217 Feb 21, 2016

@jitendra-1217 to confirm, you left maxwell down for awhile, and then when you came back its binlogs were deleted and it couldn't continue?

yes i did that (while testing). Are you saying if maxwell is always running - this won't happen. It will keep maxwell.posisions updated? I tried it works fine when maxwell is never shut. Just wanted to confirm with you.

any interest in trying to code that up @jitendra-1217 ?

yes i will do that - it is needed.

Quoting from the other issue link: #183

also we should catch TransportException and advise the user that they may reset the master postiion with the flag.

Can you suggest a way where if this is (binlog file missing) is the case in the exception - Instead of just advising to update positions with the utility, maxwell auto updates it and re-run the code here: https://github.com/zendesk/maxwell/blob/master/src/main/java/com/zendesk/maxwell/MaxwellReplicator.java#L87 ??

jitendra-1217 commented Feb 21, 2016

@jitendra-1217 to confirm, you left maxwell down for awhile, and then when you came back its binlogs were deleted and it couldn't continue?

yes i did that (while testing). Are you saying if maxwell is always running - this won't happen. It will keep maxwell.posisions updated? I tried it works fine when maxwell is never shut. Just wanted to confirm with you.

any interest in trying to code that up @jitendra-1217 ?

yes i will do that - it is needed.

Quoting from the other issue link: #183

also we should catch TransportException and advise the user that they may reset the master postiion with the flag.

Can you suggest a way where if this is (binlog file missing) is the case in the exception - Instead of just advising to update positions with the utility, maxwell auto updates it and re-run the code here: https://github.com/zendesk/maxwell/blob/master/src/main/java/com/zendesk/maxwell/MaxwellReplicator.java#L87 ??

@osheroff

This comment has been minimized.

Show comment
Hide comment
@osheroff

osheroff Feb 21, 2016

Collaborator

Are you saying if maxwell is always running - this won't happen. It will keep maxwell.posisions updated?

yup.

Can you suggest a way where if this is (binlog file missing) is the case in the exception - Instead of just advising to update positions with the utility, maxwell auto updates it and re-run the code here:

Nope. There's no safe way to do this. Here's the deal:

  • In order to output rows from the binlog, Maxwell needs an up-to-date copy of the mysql schema.
  • in order to do that, Maxwell must act as a full mysql replica -- it especially can't miss DDL (alter table) statements, or else it risks crashing or outputting corrupted data.
  • If the binlog we were expecting goes missing, there's no way to tell if we also missed some DDL and our copy is now out of sync with the master. It's the same way you'd never just point a mysql replica at a random master position.

So while it'd be good to give the user a tool to reset the position by hand (if they know that no DDL has happened, or they just want to risk it), it's not something I want to try to to automatically.

Collaborator

osheroff commented Feb 21, 2016

Are you saying if maxwell is always running - this won't happen. It will keep maxwell.posisions updated?

yup.

Can you suggest a way where if this is (binlog file missing) is the case in the exception - Instead of just advising to update positions with the utility, maxwell auto updates it and re-run the code here:

Nope. There's no safe way to do this. Here's the deal:

  • In order to output rows from the binlog, Maxwell needs an up-to-date copy of the mysql schema.
  • in order to do that, Maxwell must act as a full mysql replica -- it especially can't miss DDL (alter table) statements, or else it risks crashing or outputting corrupted data.
  • If the binlog we were expecting goes missing, there's no way to tell if we also missed some DDL and our copy is now out of sync with the master. It's the same way you'd never just point a mysql replica at a random master position.

So while it'd be good to give the user a tool to reset the position by hand (if they know that no DDL has happened, or they just want to risk it), it's not something I want to try to to automatically.

@jitendra-1217

This comment has been minimized.

Show comment
Hide comment
@jitendra-1217

jitendra-1217 Feb 21, 2016

So while it'd be good to give the user a tool to reset the position by hand (if they know that no DDL has happened, or they just want to risk it), it's not something I want to try to to automatically.

👍 understood, thanks.

let me give a pr for above proposed utility.

jitendra-1217 commented Feb 21, 2016

So while it'd be good to give the user a tool to reset the position by hand (if they know that no DDL has happened, or they just want to risk it), it's not something I want to try to to automatically.

👍 understood, thanks.

let me give a pr for above proposed utility.

@xmlking

This comment has been minimized.

Show comment
Hide comment
@xmlking

xmlking Feb 23, 2016

For Maxwell High Availability, it would be nice if two+ instances of Maxwells running together (active - passive mode) and sharing same Maxwell schema database and only one replicating change records.
They can keep state of which Maxwell is active in the Maxwell schema database itself. secondarily Maxwell can find if primary is down by 1. heartbeats or 2. watching how much maxwell.positions is behind to current binlog position or 3. via tracking with Zookeeper.

xmlking commented Feb 23, 2016

For Maxwell High Availability, it would be nice if two+ instances of Maxwells running together (active - passive mode) and sharing same Maxwell schema database and only one replicating change records.
They can keep state of which Maxwell is active in the Maxwell schema database itself. secondarily Maxwell can find if primary is down by 1. heartbeats or 2. watching how much maxwell.positions is behind to current binlog position or 3. via tracking with Zookeeper.

@smferguson

This comment has been minimized.

Show comment
Hide comment
@smferguson

smferguson Dec 29, 2016

Contributor

@osheroff re: #282 (comment)
are you certain about this in an aws rds setting? i think i'm seeing otherwise, but would like to be wrong.

Contributor

smferguson commented Dec 29, 2016

@osheroff re: #282 (comment)
are you certain about this in an aws rds setting? i think i'm seeing otherwise, but would like to be wrong.

@osheroff

This comment has been minimized.

Show comment
Hide comment
@osheroff

osheroff Dec 29, 2016

Collaborator

@smferguson, am I certain about what?

Collaborator

osheroff commented Dec 29, 2016

@smferguson, am I certain about what?

@smferguson

This comment has been minimized.

Show comment
Hide comment
@smferguson

smferguson Dec 30, 2016

Contributor

@osheroff about this answer, but nevermind. i got the answer and started that PR (#523):

Are you saying if maxwell is always running - this won't happen. It will keep maxwell.posisions updated?

yup.

Contributor

smferguson commented Dec 30, 2016

@osheroff about this answer, but nevermind. i got the answer and started that PR (#523):

Are you saying if maxwell is always running - this won't happen. It will keep maxwell.posisions updated?

yup.

@osheroff

This comment has been minimized.

Show comment
Hide comment
@osheroff

osheroff Jun 24, 2018

Collaborator

thinking this was an issue with heartbeats/filtering anyway.

Collaborator

osheroff commented Jun 24, 2018

thinking this was an issue with heartbeats/filtering anyway.

@osheroff osheroff closed this Jun 24, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment