Skip to content

tapdata/tapdata

master
Switch branches/tags
Code

build status

English Readme

中文简要文档地址

Online Document: https://tapdata.github.io/

What is Tapdata?

Tapdata is a live data platform designed to connect data silos and provide fresh data to the downstream operational applications & operational analytics.

Env Prepare

  1. Please make sure you have Docker installed on your machine before you get starated.
  2. Currently we only tested on linux OS(No specific flavor requirement).
  3. clone repo: git clone https://github.com/tapdata/tapdata.git && cd tapdata

Last Release Branch

release-v2.9

Quick Use

This is the easiest way to experiment Tapdata:

run bash build/quick-use.sh will pull docker image and start an all-inone container

Quick Build

Alternatively, you may build the project using following command:

  1. run bash build/quick-dev.sh will build a docker image from source and start a all in one container

If you want to build in docker, please install docker and set build/env.sh tapdata_build_env to "docker" (default)

If you want to build in local, please install:

  1. JDK
  2. maven set build/env.sh tapdata_build_env to "local"

run bash build/clean.sh If you want to clean build target

Quick Steps

If everything is ok, now you should be in a terminal window, follow next steps, have a try!

Create New DataSource

# 1. mongodb
source = DataSource("mongodb", "$name").uri("$uri")

# 2. mysql
source = DataSource("mysql", "$name").host("$host").port($port).username("$username").port($port).db("$db")

# 3. pg
source = DataSource("postgres", "$name").host("$host").port($port).username("$username").port($port).db("$db").schema("$schema").logPluginName("wal2json")

# save will check all config, and load schema from source
source.save()

Preview Table

  1. use $name will switch datasource context
  2. show tables will display all tables in current datasource
  3. desc $table_name will display table schema

Migrate A Table

migrate job is real time default

# 1. create a pipeline
p = Pipeline("$name")

# 2. use readFrom and writeTo describe a migrate job
p.readFrom("$source_name.$table").write("$sink_name.$table")

# 3. start job
p.start()

# 4. monitor job
p.monitor()
p.logs()

# 5. stop job
p.stop()

Migrate Aable With UDF

No record schema change support in current version, will support in few days

If you want to change record schema, please use mongodb as sink

# 1. define a python function
def fn(record):
    record["x"] = 1
    return record

# 2. using processor between source and target
p.readFrom(...).processor(fn).writeTo(...)

Migrate Multi Tables

migrate job is real time default

# 1. create a pipeline
p = Pipeline("$name")

# 2. use readFrom and writeTo describe a migrate job, multi table relation syntax is a little different
source = Source("$datasource_name", ["table1", "table2"...])
source = Source("$datasource_name", table_re="xxx.*")

# 3. using prefix/suffix add table prefix/suffix
p.readFrom(source).writeTo("$datasource_name", prefix="", suffix="")

# 4. start job
p.start()

Manager

  1. show datasources will show all data sources, you can use delete datasource $name delete it if now job using it
  2. show jobs will show all jobs and it's stats
  3. logs job $job_name [limit=20] [t=5] [tail=True] will show job log
  4. monitor job $job_name will keep show job metrics
  5. status job $job_name will show job status(running/stopped...)

License

Tapdata uses multiple licenses.

The license for a particular work is defined with following prioritized rules:

  • License directly present in the file
  • LICENSE file in the same directory as the work
  • First LICENSE found when exploring parent directories up to the project top level directory

Defaults to Server Side Public License. For PDK Connectors, the license is Apache V2.

Join now