Skip to content

Real-time data replication from OLTP to OLAP dbs

License

Notifications You must be signed in to change notification settings

yunqiqiliang/rt_transfer

 
 

Repository files navigation

Artie Transfer

⚡️ Blazing fast data replication between OLTP and OLAP databases ⚡️


Learn more »

Artie Transfer is a real time data replication solution for databases and data warehouses/data lakes.

Typical ETL solutions leverage batched processes or schedulers (DAGs, Airflow), which means the data latency in the downstream data warehouse is often several hours to days old. This problem gets exacerbated as data volumes grow (batched processes take increasingly longer to run).

Artie leverages change data capture (CDC) and stream processing to perform data syncs in a more efficient way, which enables sub-minute latency. Use Artie Transfer to reduce data latency from several hours to seconds!

Benefits of Artie Transfer:

  • Sub-minute data latency so you always have access to live production data.
  • Easy to use: just set up a simple configuration file, and you're good to go!
  • Automatic table creation and schema detection.
  • Artie has automatic retries and its processing is idempotent.
  • Built to scale: handle anywhere from 1GB to 100+ TB of data.
  • Built-in error reporting along with rich telemetry statistics.

Take a look at the Getting started on how to get started with Artie Transfer!

Architecture

Pre-requisites

As you can see from the architecture above, Artie Transfer sits behind Kafka and expects CDC messages to be in a particular format. Please see the currently supported section on what sources and destinations are supported.

The optimal set-up looks something like this:

  • One Kafka topic per table (such that we can toggle the number of partitions based on throughput)
  • The partition key is the primary key for the table (to avoid out-of-order writes at the row level)

To see all of the supported databases, check out the Supported section

Examples

To run Artie Transfer's stack locally, please refer to the examples folder.

Getting started

Getting started guide

What is currently supported?

Transfer is aiming to provide coverage across all OLTPs and OLAPs databases. Currently Transfer supports:

  • Message Queues

    • Kafka (default)
    • Google Pub/Sub
  • Destinations:

    • Snowflake
    • BigQuery
    • Redshift
    • S3
  • Sources:

    • MongoDB
    • PostgreSQL, we support the following replication slot plug-ins: pgoutput, decoderbufs, wal2json
    • MySQL

If the database you are using is not on the list, feel free to file for a feature request.

Configuration File

Telemetry

Artie Transfer's telemetry guide

Tests

Transfer is written in Go and uses counterfeiter to mock. To run the tests, run the following commands:

make generate
make test

Release

Artie Transfer is released through GoReleaser, and we use it to cross-compile our binaries on the releases as well as our Dockerhub. If your operating system or architecture is not supported, please file a feature request!

License

Artie Transfer is licensed under ELv2. Please see the LICENSE file for additional information. If you have any licensing questions please email hi@artie.so.

About

Real-time data replication from OLTP to OLAP dbs

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Go 99.8%
  • Other 0.2%