Skip to content
Permalink
Branch: master
Find file Copy path
Find file Copy path
1 contributor

Users who have contributed to this file

176 lines (130 sloc) 10.8 KB

Outflux - Migrate InfluxDB to TimescaleDB

Go Report Card

This repo contains code for exporting complete InfluxDB databases or selected measurements to TimescaleDB.

Installation

Installing from source

Outflux is a Go project managed by dep (The go dependency management tool). To download the proper dependency versions, dep must be installed on your system. Instructions can be found on the official documentation page.

# Fetch the source code of Outflux
$ go get github.com/timescale/outflux
$ cd $GOPATH/src/github.com/timescale/outflux

# Fetch the required dependencies
$ dep ensure -v

# Install the Outflux binary:
$ cd cmd/outlux
$ go install 

Binary releases

We upload prepackaged binaries available for GNU/Linux, Windows and MacOS in the releases. Just download the binary, extract the compressed tarball and run the executable

How to use

Outflux supports InfluxDB versions 1.0 and upwards. We explicitly test for compatibility for versions 1.0, 1.5, 1.6, 1.7 and the latest tag of the InfluxDB docker container.

Before using it

It is recommended that you have some InfluxDB database with some data. For testing purposes you can check out the TSBS Data Loader Tool part of the Time Series Benchmark Suite. It can generate large ammounts of data for and load them in influx. Data can be generated with one command, just specify the format as 'influx', and then load it in with another command.

⚠️ Connection params ⚠️

Detailed information about how to pass the connection parameters to Outflux can be found at the bottom of this document at the Connection

Schema Transfer

The Outflux CLI has two commands. The first one is schema-transfer. This command will discover the schema of a InfluxDB database, or specific measurements in a InfluxDB database, and depending on the strategy selected create or verify a TimescaleDB database that could hold the data.

The possible flags for the command can be seen by running

$ cd $GOPATH/bin/
$ ./outflux schema-transfer --help

Usage of the is outflux schema-transfer database [measure1 measure2 ...] [flags]. Where database is the name of the InfluxDB database you wish to export. [measure1 ...] are optional and if specified will export only those measurements from the selected database. Additionally you can specify the retention policy as retention_policy.measure or "retention-policy"."measure name" if some of the identifiers contain a space or dash. ⚠️ The resulting target table will be named "retention_policy.measure" in TimescaleDB

For example outflux schema-transfer benchmark cpu mem will discover the schema for the cpu and mem measurements from the benchmark database.

Available flags for schema-transfer are:

flag type default description
input-server string http://localhost:8086 Location of the input database, http(s)://location:port.
input-pass string Password to use when connecting to the input database
input-user string Username to use when connecting to the input database
output-conn string sslmode=disable Connection string to use to connect to the output database
output-schema string The schema of the output database that the data will be inserted into
schema-strategy string CreateIfMissing Strategy to use for preparing the schema of the output database. Valid options: ValidateOnly, CreateIfMissing, DropAndCreate, DropCascadeAndCreate
tags-as-json bool false If this flag is set to true, then the Tags of the influx measures being exported will be combined into a single JSONb column in Timescale
tags-column string tags When tags-as-json is set, this column specifies the name of the JSON column for the tags
fields-as-json bool false If this flag is set to true, then the Fields of the influx measures being exported will be combined into a single JSONb column in Timescale
fields-column string fields When fields-as-json is set, this column specifies the name of the JSON column for the fields
quiet bool false If specified will suppress any log to STDOUT

Migrate

The second command of the Outflux CLI is migrate. The possible flags for the command can be seen by running:

$ cd $GOPATH/bin/
$ ./outflux migrate --help

Usage of the command is outflux migrate database [measure1 measure2 ...] [flags]. Where database is the name of the database you wish to export. [measure1 measure2 ...] are optional and if specified will export only those measurements from the selected database.

For example outflux migrate benchmark cpu mem will export the cpu and mem measurements from the benchmark database. On the other hand outflux migrate benchmark will export all measurements in the benchmark database.

Available flags are:

flag type default description
input-server string http://localhost:8086 Location of the input database, http(s)://location:port.
input-pass string Password to use when connecting to the input database
input-user string Username to use when connecting to the input database
limit uint64 0 If specified will limit the export points to its value. 0 = NO LIMIT
from string If specified will export data with a timestamp >= of its value. Accepted format: RFC3339
to string If specified will export data with a timestamp <= of its value. Accepted format: RFC3339
output-conn string sslmode=disable Connection string to use to connect to the output database
output-schema string public The schema of the output database that the data will be inserted into.
schema-strategy string CreateIfMissing Strategy to use for preparing the schema of the output database. Valid options: ValidateOnly, CreateIfMissing, DropAndCreate, DropCascadeAndCreate
chunk-size uint16 15000 The export query will request data in chunks of this size. Must be > 0
batch-size uint16 8000 The size of the batch inserted in to the output database
data-buffer uint16 15000 Size of the buffer holding exported data ready to be inserted in the output database
max-parallel uint8 2 Number of parallel measure extractions. One InfluxDB measure is exported using 1 worker
rollback-on-external-error bool true If set, when an error occurs while extracting the data, the insertion will be rollbacked. Otherwise it will try to commit
tags-as-json bool false If this flag is set to true, then the Tags of the influx measures being exported will be combined into a single JSONb column in Timescale
tags-column string tags When tags-as-json is set, this column specifies the name of the JSON column for the tags
fields-as-json bool false If this flag is set to true, then the Fields of the influx measures being exported will be combined into a single JSONb column in Timescale
fields-column string fields When fields-as-json is set, this column specifies the name of the JSON column for the fields
quiet bool false If specified will suppress any log to STDOUT

Examples

  • Use environment variables for determining output db connection
$ PGPORT=5433
$ PGDATABASE=test
$ PGUSER=test
...
$ ./outflux schema-transfer benchmark
  • Export the complete 'benchmark' database on 'localhost:8086' to the 'targetdb' database on localhost:5432. Use environment variable to set InfluxDB password
$ PGDATABASE=some_default_db
$ INFLUX_PASSWORD=test
...
$ outflux migrate benchmark \
> --input-user=test \
> --input-pass=test \
> --output-conn='dbname=targetdb user=test password=test' \
  • Export only measurement 'cpu' from 'two_week' retention policy in the 'benchmark' database. Drop the existing '"two_week.cpu"' table in 'targetdb' if exists, create if not
$ outflux migrate benchmark two_week.cpu \
> --input-user=test \
> --input-pass=test \
> --output-con='dbname=targetdb user=test pass=test'\
> --schema-strategy=DropAndCreate
  • Export only the 1,000,000 rows from measurements 'cpu' and 'mem' from 'benchmark', starting from Jan 14th 2019 09:00
$ ./outflux migrate benchmark cpu mem \
> --input-user=test \
> --input-pass=test \
> --limit=1000000 \
> --from=2019-01-01T09:00:00Z

Connection

TimescaleDB connection params

The connection parameters to the TimescaleDB instance can be passed to Outflux in several ways. One is through the Postgres Environment Variables. Supported environment variables are: PGHOST, PGPORT, PGDATABASE, PGUSER, PGPASSWORD, PGSSLMODE, PGSSLCERT, PGSSLKEY, PGSSLROOTCERT, PGAPPNAME, PGCONNECT_TIMEOUT. If they are not specified defaults used are: host=localhost, dbname=postgres, pguser=$USER, and sslmode=disable.

The values of the enviroment variables can be OVERRIDEN by specifying the '--output-con' flag when executing Outflux.

The connection string can be in the format URI or DSN format:

  • example URI: "postgresql://username:password@host:port/dbname?connect_timeout=10"
  • example DSN: "user=username password=password host=1.2.3.4 port=5432 dbname=mydb sslmode=disable"

InfluxDB connection params

The connection parameters to the InfluxDB instance can be passed also through flags or environment variables. Supported/Expected environment variables are: INFLUX_USERNAME, INFLUX_PASSWORD. These are the same environment variables that the InfluxDB CLI uses.

If they are not set, or if you wish to override them, you can do so with the --input-user and --input-pass.

You can’t perform that action at this time.