improvement: run the batch ingestion process in parallel #14

ghost · 2017-07-24T06:42:06Z

As an improvement:
run the batch-process (executing fn_process_batch) in parallel (thread/forkl) to the parsing of binary logs when using start_replica

That would help keep up with the catchup of the database.

Note: I am not asking for parallel table processing when running init_replica, but that would also help ;)

The text was updated successfully, but these errors were encountered:

ghost · 2017-07-24T06:44:14Z

One idea: why not run this fn_process_batch only in the database using a trigger on the t_replica_batch table.

the4thdoctor · 2017-07-24T07:24:36Z

Hi, the version 2 I'm currently working on it will split the read and reply in two separate subprocess. I'll try to release it by the end of the year. It will also add the init replica in parallel with a less invasive flush process.

the4thdoctor · 2017-07-24T08:17:20Z

Btw, I'm getting curious about your migration. Any chance you'll be able to write a case study? :-)

ghost · 2017-07-24T09:46:23Z

well - in principle we got a single mysql instance/process running 3 schema.
We want to migrate those 3 to postgres (2 to the same database as 2 diffferent schema, the last maybe to a different instance) and we want the switchover to run with minimal downtime.
So here pg_chameleon is the tool to allow this.

As for speed we found that:
a) initial migration of the biggest database takes about 6 hours (~50GB) - we have not tested the others yet, but they are smaller
b) that when running start_replica (recovering) we see that there is a lot of time when we only process bin logs while postgres is sitting idle.

So it is not something "special".

The only thing is that I find a lot of things that - to me - looks strange and as we want to have all the data in postgressql we have to make sure that there is no dataloss. Hence all those questions with regards to logging and missmatches...

Besides that I am totally happy with the tool itself - thanks for providing it!

the4thdoctor · 2017-07-26T08:34:00Z

thanks for sharing :)
I'm happy the tool is proving useful.
the replica process is the biggest issue of this initial version as do not provides read and replay in parallel. And I found quite difficult to change this approach because I made wrong decisions when I wrote the initial implementation.

I'll try to speed up the development of version 2 to provide a better experience :)

the4thdoctor · 2017-07-31T20:47:26Z

@martinsperl-kognitiv I've just pushed an improvement for the replay function. My tests shown a faster execution with the reduction of cpu load and io wait. If you wanna give it a try. :)

The upgrade procedure will add an extra table used by the procedure. Be sure to stop all the replica processes before upgrading the schema and take a backup of sch_chameleon.

the4thdoctor · 2017-08-04T10:15:38Z

the version 1.7 will have the threaded option for running the read and replay in parallel

the4thdoctor added v2.0 enhancement labels Jul 24, 2017

the4thdoctor self-assigned this Jul 24, 2017

the4thdoctor added this to the Version 2.0 milestone Jul 25, 2017

the4thdoctor added chameleon_v1 and removed v2.0 labels Aug 4, 2017

the4thdoctor modified the milestones: ver1.7, ver2.0 Aug 4, 2017

the4thdoctor closed this as completed Aug 4, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

improvement: run the batch ingestion process in parallel #14

improvement: run the batch ingestion process in parallel #14

ghost commented Jul 24, 2017

ghost commented Jul 24, 2017

the4thdoctor commented Jul 24, 2017 •

edited

Loading

the4thdoctor commented Jul 24, 2017

ghost commented Jul 24, 2017

the4thdoctor commented Jul 26, 2017

the4thdoctor commented Jul 31, 2017 •

edited

Loading

the4thdoctor commented Aug 4, 2017

improvement: run the batch ingestion process in parallel #14

improvement: run the batch ingestion process in parallel #14

Comments

ghost commented Jul 24, 2017

ghost commented Jul 24, 2017

the4thdoctor commented Jul 24, 2017 • edited Loading

the4thdoctor commented Jul 24, 2017

ghost commented Jul 24, 2017

the4thdoctor commented Jul 26, 2017

the4thdoctor commented Jul 31, 2017 • edited Loading

the4thdoctor commented Aug 4, 2017

the4thdoctor commented Jul 24, 2017 •

edited

Loading

the4thdoctor commented Jul 31, 2017 •

edited

Loading