pgstream
is an open source CDC command-line tool and library that offers Postgres replication support with DDL changes to any provided output.
- Schema change tracking and replication of DDL changes
- Multiple out of the box supported replication outputs
- Elasticsearch/OpenSearch
- Webhooks
- PostgreSQL
- Initial and on demand PostgreSQL snapshots (for when you don't need continuous replication)
- Column value transformations (anonymise your data on the go!)
- Modular deployment configuration, only requires Postgres
- Kafka support with schema based partitioning
- Extendable support for custom output plugins
pgstream
can be used via the readily available CLI or as a library.
Binaries are available for Linux, macOS & Windows, check our Releases.
To install pgstream
from the source, run the following command:
go install github.com/xataio/pgstream@latest
To install pgstream
with homebrew, run the following command:
# macOS or Linux
brew tap xataio/pgstream
brew install pgstream
If you have an environment available, with at least Postgres and whichever module resources you're planning on running, then you can skip this step. Otherwise, a docker setup is available in this repository that starts Postgres, Kafka and OpenSearch (as well as OpenSearch dashboards for easy visualisation).
docker-compose -f build/docker/docker-compose.yml up
The docker-compose file has profiles that can be used in order to bring up only the relevant containers. If for example you only want to run PostgreSQL to PostgreSQL pgstream replication you can use the pg2pg
profile as follows:
docker-compose -f build/docker/docker-compose.yml --profile pg2pg up
You can also run multiple profiles. For example to start two PostgreSQL instances and Kafka:
docker-compose -f build/docker/docker-compose.yml --profile pg2pg --profile kafka up
List of supported docker profiles:
- pg2pg
- pg2os
- pg2webhook
- kafka
This will create the pgstream
schema in the configured Postgres database, along with the tables/functions/triggers required to keep track of the schema changes. See Tracking schema changes section for more details. It will also create a replication slot for the configured database which will be used by the pgstream service.
pgstream init --pgurl "postgres://postgres:postgres@localhost?sslmode=disable"
If you want to provide the name of the replication slot to be created instead of using the default value (pgstream_<dbname>_slot
), you can use the --replication-slot
flag or set the environment variable PGSTREAM_POSTGRES_REPLICATION_SLOT_NAME
.
pgstream init --pgurl "postgres://postgres:postgres@localhost?sslmode=disable" --replication-slot test
If there are any issues or if you want to clean up the pgstream setup, you can run the following.
pgstream tear-down --pgurl "postgres://postgres:postgres@localhost?sslmode=disable"
This command will clean up all pgstream state.
Run will require the configuration to be provided, either via environment variables, config file or a combination of both. There are some sample configuration files provided in the repo that can be used as guidelines.
Example running pgstream with Postgres -> OpenSearch:
pgstream run -c pg2os.env --log-level trace
Example running pgstream with Postgres -> Kafka, and in a separate terminal, Kafka->OpenSearch:
pgstream run -c pg2kafka.env --log-level trace
pgstream run -c kafka2os.env --log-level trace
Example running pgstream with PostgreSQL -> PostgreSQL with initial snapshot enabled:
pgstream run -c pg2pg.env --log-level trace
Example running pgstream with PostgreSQL snapshot only mode -> PostgreSQL:
pgstream run -c snapshot2pg.env --log-level trace
The run command will parse the configuration provided, and initialise the configured modules. It requires at least one listener and one processor.
- PostgreSQL replication to PostgreSQL
- PostgreSQL replication to OpenSearch
- PostgreSQL replication to webhooks
- PostgreSQL replication using Kafka
- PostgreSQL snapshots
- PostgreSQL column transformations
For more advanced usage, implementation details, and detailed configuration settings, please refer to the full Documentation.
Some of the limitations of the initial release include:
- Single Kafka topic support
- Postgres plugin support limited to
wal2json
- Limited filtering
- Primary key/unique not null column required for replication
- Kafka serialisation support limited to JSON
We welcome contributions from the community! If you'd like to contribute to pgstream, please follow these guidelines:
- Create an issue for any questions, bug reports, or feature requests.
- Check the documentation and existing issues before opening a new issue.
- Fork the repository.
- Create a new branch for your feature or bug fix.
- Make your changes and write tests if applicable.
- Ensure your code passes linting and tests.
- There's a pre-commit configuration available on the root directory (
.pre-commit-config.yaml
), which can be used to validate the CI checks locally.
- There's a pre-commit configuration available on the root directory (
- Submit a pull request.
For this project, we pledge to act and interact in ways that contribute to an open, welcoming, diverse, inclusive, and healthy community.
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
If you have any questions, encounter issues, or need assistance, open an issue in this repository our join our Discord, and our community will be happy to help.
Made with ❤️ by Xata 🦋