Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Maxwell loses connection during RDS backups #326
We're experiencing outages on Maxwell that correlate with the timing of our RDS backups.
From the docs, it shows:
The stacktraces look like:
Our setup is currently
We're currently looking at pointing Maxwell at our production DB instance, or spinning up a manual replica that is multi-az (not Amazon's "read replica"-style instance). Any guidance is much appreciated! Thank you.
We have enabled binlog retention, and the process seems to be working fine in circumstances other than these I/O suspension backup periods.
I'm not sure that there's much Maxwell (or any other replicating process) could do to fix this but I figured it was worth a shot to get a discussion going. I've filed an AWS support ticket as well to see if there are workarounds for this behavior.
Perusing the Maxwell source a bit, and I haven't been able to find anything regarding a configurable connection timeout. Is my assessment accurate?
@ckampfe I'm assuming raising this will help you:
I'm curious why you're running it against the replica? Are you writing maxwell's binlog position to the replica too? As you said, pointing maxwell at your master would also solve this.
Finally-- we use runit to manage maxwell, you could look at doing the same, or anything similar eg. supervisord. That way the process will restart if it bails for any reason.
AWS Support told us that bin logs are rotated during an I/O freeze (regardless of binlog retention being set). So that explains why we're getting that error. They opened a feature request for us to address this problem, but don't have an ETA.
To mitigate it, we'll either need to run Maxwell against the primary DB, or spin up a manual replica where we can enable bin logs without needing backups enabled. We'll probably be doing the former.