Skip to content
Data Pipeline Framework using the singer.io spec
Python PLpgSQL Shell Dockerfile
Branch: master
Clone or download
Samira-El Merge pull request #325 from transferwise/release-0.12.4
[RELEASE] Bump to 0.12.4 and add changelog
Latest commit 0ba875a Feb 19, 2020
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.circleci [AP-487] Fixed formatting issues and pylint warnings (#299) Jan 22, 2020
.github Add pull request template (#307) Jan 15, 2020
bin shuffle dev-project files (#205) Sep 25, 2019
dev-project add --connectors=all so that target-postgres is installed in pipeline… Jan 13, 2020
docs [RELEASE] Bump to 0.12.0 and add changelog (#311) Jan 21, 2020
pipelinewise
singer-connectors [AP-XXXX] Bump tap-jira (#324) Feb 19, 2020
tests
.coveragerc Restructure directory layout (#143) Aug 2, 2019
.dockerignore Make dev-project virtualenvs available only in container (#206) Sep 26, 2019
.gitignore [AP-487] Fixed formatting issues and pylint warnings (#299) Jan 22, 2020
.pre-commit-config.yaml [AP-487] Fixed formatting issues and pylint warnings (#299) Jan 22, 2020
.style.yapf [AP-487] Fixed formatting issues and pylint warnings (#299) Jan 22, 2020
.yapfignore [AP-487] Fixed formatting issues and pylint warnings (#299) Jan 22, 2020
CHANGELOG.md Fixed version in changelog Feb 19, 2020
Dockerfile Selectively install singer connectors (#275) Nov 26, 2019
LICENSE Update license references Aug 16, 2019
MANIFEST.in
README.md [AP-487] Fixed formatting issues and pylint warnings (#299) Jan 22, 2020
entrypoint.sh docker args with executable prefix and with no user Sep 4, 2019
install.sh Selectively install singer connectors (#275) Nov 26, 2019
motd Docker local dev environment Jul 29, 2019
pylintrc [AP-487] Fixed formatting issues and pylint warnings (#299) Jan 22, 2020
pytest.ini
setup.py [RELEASE] Bump to 0.12.4 and add changelog Feb 19, 2020

README.md

PipelineWise

PyPI - Python Version License: Apache2

PipelineWise is a Data Pipeline Framework using the Singer.io specification to ingest and replicate data from various sources to various destinations. Documentation is available at https://transferwise.github.io/pipelinewise/

Logo

Features

  • Built with ELT in mind: PipelineWise fits into the ELT landscape but does not do traditional ETL. PipelineWise ingests data first into DWH in the original format and the “transformation” shifts to the end of the data pipeline. Load time transformations are still supported but complex mapping and joins have to be done once the data is replicated into the Data Warehouse.
  • Replication Methods: CDC (Log Based), Key-Based Incremental and Full Table snapshots
  • Managed Schema Changes: When source data changes, PipelineWise detects the change and alters the schema in your DWH automatically
  • Load time transformations: Ideal place to obfuscate, mask or filter sensitive data that should never be replicated in the Data Warehouse
  • YAML based configuration: Data pipelines are defined as YAML files, ensuring that the entire configuration is kept under version control
  • Lightweight: No daemons or database setup are required
  • Extensible: PipelineWise is using Singer.io compatible taps and target connectors. New connectors can be added to PipelineWise with relatively small effort

Table of Contents

Connectors

Tap extracts data from any source and write it to a standard stream in a JSON-based format, and target consumes data from taps and do something with it, like load it into a file, API or database

Type Name Latest Version Description
Tap Postgres PyPI version Extracts data from PostgreSQL databases. Supporting Log-Based Inremental, Key-Based Incremental and Full Table replications
Tap MySQL PyPI version Extracts data from MySQL databases. Supporting Log-Based Inremental, Key-Based Incremental and Full Table replications
Tap Oracle PyPI version Extracts data from Oracle databases. Supporting Log-Based Inremental, Key-Based Incremental and Full Table replications
Tap Kafka PyPI version Extracts data from Kafka topics
Tap AdWords PyPI version Extracts data Google Ads API (former Google Adwords) using OAuth and support incremental loading based on input state
Tap S3 CSV PyPI version Extracts data from S3 csv files (currently a fork of tap-s3-csv because we wanted to use our own auth method)
Tap Zendesk PyPI version Extracts data from Zendesk using OAuth and Key-Based incremental replications
Tap Snowflake PyPI version Extracts data from Snowflake databases. Supporting Key-Based Incremental and Full Table replications
Tap Salesforce PyPI version Extracts data from Salesforce database using BULK and REST extraction API with Key-Based incremental replications
Tap Jira PyPI version Extracts data from Atlassian Jira using Base auth or OAuth credentials
Target Postgres PyPI version Loads data from any tap into PostgreSQL database
Target Redshift PyPI version Loads data from any tap into Amazon Redshift Data Warehouse
Target Snowflake PyPI version Loads data from any tap into Snowflake Data Warehouse
Target S3 CSV PyPI version Uploads data from any tap to S3 in CSV format
Transform Field PyPI version Transforms fields from any tap and sends the results to any target. Recommended for data masking/ obfuscation

Running from docker

If you have Docker installed then using docker is the easiest and recommended method of start using PipelineWise.

  1. Build an executable docker images that has every required dependency and it's isolated from your host system:

    $ docker build -t pipelinewise:latest .
  2. Once the image is ready, create an alias to the docker wrapper script:

    $ alias pipelinewise="$(PWD)/bin/pipelinewise-docker"
  3. Check if the installation was successfully by running the pipelinewise status command:

    $ pipelinewise status
    
    Tap ID    Tap Type      Target ID     Target Type      Enabled    Status    Last Sync    Last Sync Result
    --------  ------------  ------------  ---------------  ---------  --------  -----------  ------------------
    0 pipeline(s)

You can run any pipelinewise command at this point. Tutorials to create and run pipelines is at https://transferwise.github.io/pipelinewise/installation_guide/creating_pipelines.html .

PS:

For the tests to work, run the commands inside a PipelineWise container. To create, start and get a bash shell in the container:

$ docker-compose up -d pipelinewise
$ docker exec -it pipelinewise_dev bash

Building from source

  1. Make sure that every dependencies installed on your system:

    • Python 3.x
    • python3-dev
    • python3-venv
    • postgresql
  2. Run the install script that installs the PipelineWise CLI and every supported singer connectors into separated virtual environments:

    $ ./install.sh --connectors=all

    Press Y to accept the license agreement of the required singer components. To automate the installation and accept every license agreement run ./install --acceptlicenses Use the optional --connectors=...,... argument to install only a specific list of singer connectors.

  3. To start CLI you need to activate the CLI virtual environment and has to set PIPELINEWISE_HOME environment variable:

    $ source {ACTUAL_ABSOLUTE_PATH}/.virtualenvs/pipelinewise/bin/activate
    $ export PIPELINEWISE_HOME={ACTUAL_ABSOLUTE_PATH}

    (The ACTUAL_ABSOLUTE_PATH differs on every system, the install script prints you the correct command that fits to your system once the installation completed)

  4. Check if the installation was successfully by running the pipelinewise status command:

    $ pipelinewise status
    
    Tap ID    Tap Type      Target ID     Target Type      Enabled    Status    Last Sync    Last Sync Result
    --------  ------------  ------------  ---------------  ---------  --------  -----------  ------------------
    0 pipeline(s)

You can run any pipelinewise command at this point. Tutorials to create and run pipelines is at https://transferwise.github.io/pipelinewise/installation_guide/creating_pipelines.html .

To run unit tests:

$ pytest --ignore tests/end_to_end

Note: End-to-end tests are ignored because it requires specific environment with source and target databases. You need to use the docker development environment to run end to end tests. Read more details at Developing with Docker.

To run unit tests and generate code coverage:

$ coverage run -m pytest --ignore tests/end_to_end && coverage report

To generate HTML coverage report.

$ coverage run -m pytest --ignore tests/end_to_end && coverage html -d coverage_html

Note: The HTML report will be generated in coverage_html/index.html

Developing with Docker

If you have Docker and Docker Compose installed, you can create a local development environment that includes not only the PipelineWise executables but a pre-configured development project as well with some databases as source and targets for a more convenient development experience.

For further instructions about setting up local development environment go to Test Project for Docker Development Environment.

Contribution

To add new taps and targets follow the instructions on Contribution Page

Links

License

Apache License Version 2.0

See LICENSE to see the full text.

You can’t perform that action at this time.