Archival and Restoration for Postgres
Clone or download
Latest commit a27a2a9 Sep 6, 2018
Permalink
Failed to load latest commit information.
.brotli Major delta-backup refactoring (#117) Sep 21, 2018
cmd/wal-g Prefault draft Oct 24, 2018
docker Major delta-backup refactoring (#117) Sep 21, 2018
testtools Prefault draft Oct 24, 2018
vendor Major delta-backup refactoring (#117) Sep 21, 2018
walg_test Add tests for WAL skipline Nov 6, 2018
walparser Major delta-backup refactoring (#117) Sep 21, 2018
.gitignore Major delta-backup refactoring (#117) Sep 21, 2018
.travis.yml Release lzo-capable binary along with general binary Sep 6, 2018
CONTRIBUTORS Add David Fetter to contributors Sep 6, 2018
Gopkg.lock Major delta-backup refactoring (#117) Sep 21, 2018
Gopkg.toml Add brotli support (#116) Sep 14, 2018
LICENSE.md License Aug 17, 2017
Makefile Strip binary file Oct 10, 2018
README.md Temporary disable Zstd Nov 13, 2018
archive.go Wal delta (#110) Sep 6, 2018
backup.go Wal delta (#110) Sep 6, 2018
backup_file_description.go Lzma and zstd (#99) Jun 27, 2018
backup_time.go Lzma and zstd (#99) Jun 27, 2018
bandwidth_limiter.go Wal delta (#110) Sep 6, 2018
bguploader.go Major delta-backup refactoring (#117) Sep 21, 2018
block_location_reader.go Major delta-backup refactoring (#117) Sep 21, 2018
block_location_writer.go Wal delta (#110) Sep 6, 2018
block_locations_util.go Major delta-backup refactoring (#117) Sep 21, 2018
brotli_compressor.go Major delta-backup refactoring (#117) Sep 21, 2018
brotli_decompressor.go Add brotli support (#116) Sep 14, 2018
brotli_reader_from_writer.go Add brotli support (#116) Sep 14, 2018
build.sh Major delta-backup refactoring (#117) Sep 21, 2018
bundle.go Prefault draft Oct 24, 2018
cascade_closer.go Major delta-backup refactoring (#117) Sep 21, 2018
cleaner.go Major delta-backup refactoring (#117) Sep 21, 2018
cleanup.sh Major delta-backup refactoring (#117) Sep 21, 2018
commands.go Major delta-backup refactoring (#117) Sep 21, 2018
compressing_pipe_writer.go Wal delta (#110) Sep 6, 2018
compression.go Temporary disable Zstd Nov 13, 2018
config.go Major delta-backup refactoring (#117) Sep 21, 2018
configure.go Major delta-backup refactoring (#117) Sep 21, 2018
connect.go Wal delta (#110) Sep 6, 2018
crypter.go Lzma and zstd (#99) Jun 27, 2018
crypto.go Wal delta (#110) Sep 6, 2018
data_folder.go Major delta-backup refactoring (#117) Sep 21, 2018
delay_write_closer.go Wal delta (#110) Sep 6, 2018
delete.go Add tests for WAL skipline Nov 6, 2018
delta_file.go Major delta-backup refactoring (#117) Sep 21, 2018
delta_file_chan_writer.go Major delta-backup refactoring (#117) Sep 21, 2018
delta_file_manager.go Major delta-backup refactoring (#117) Sep 21, 2018
disk_data_folder.go Major delta-backup refactoring (#117) Sep 21, 2018
docker-compose.yml Wal delta (#110) Sep 6, 2018
errors.go Wal delta (#110) Sep 6, 2018
extract.go Fix extract race condition Oct 4, 2018
file_system_cleaner.go Major delta-backup refactoring (#117) Sep 21, 2018
incremental_page_reader.go Consider empty pages valid Oct 10, 2018
io_utils.go Wal delta (#110) Sep 6, 2018
lazy_cache.go Major delta-backup refactoring (#117) Sep 21, 2018
lz4_compressor.go Wal delta (#110) Sep 6, 2018
lz4_decompressor.go Major delta-backup refactoring (#117) Sep 21, 2018
lz4_reader_from_writer.go Wal delta (#110) Sep 6, 2018
lzma_compressor.go Lzma and zstd (#99) Jun 27, 2018
lzma_decompressor.go Major delta-backup refactoring (#117) Sep 21, 2018
lzma_reader_from_writer.go Wal delta (#110) Sep 6, 2018
lzo_decompressor.go Fix lzo decompression when lzo archive is padded with zeroes Oct 28, 2018
lzo_disabled.go Switch to c-lzo wrapper library Jul 13, 2018
lzo_enabled.go Switch to c-lzo wrapper library Jul 13, 2018
md5_reader.go Lzma and zstd (#99) Jun 27, 2018
named_reader.go Wal delta (#110) Sep 6, 2018
nop_tarball.go Prefault draft Oct 24, 2018
open_pgp_crypter.go Wal delta (#110) Sep 6, 2018
paged_file_delta_map.go Wal delta (#110) Sep 6, 2018
pagefile.go Wal delta (#110) Sep 6, 2018
postgres_page_header.go Consider empty pages valid Oct 10, 2018
prefetch.go Prefault draft Oct 24, 2018
queryRunner.go Major delta-backup refactoring (#117) Sep 21, 2018
reader_maker.go Wal delta (#110) Sep 6, 2018
s3_folder.go Delete garbage Oct 28, 2018
s3_reader_maker.go Wal delta (#110) Sep 6, 2018
s3_tar_ball.go Wal delta (#110) Sep 6, 2018
s3_tar_ball_maker.go Wal delta (#110) Sep 6, 2018
s3_tar_ball_sentinel_dto.go Wal delta (#110) Sep 6, 2018
saver.go Major delta-backup refactoring (#117) Sep 21, 2018
sentinel.go Lzma and zstd (#99) Jun 27, 2018
tar_ball.go Wal delta (#110) Sep 6, 2018
tar_ball_maker.go Lzma and zstd (#99) Jun 27, 2018
tar_interpreter.go Fix error formatting in symlink extraction Nov 6, 2018
time_slice.go Refactoring (#98) Jun 20, 2018
timeline.go Prefault draft Oct 24, 2018
until_eof_reader.go Wal delta (#110) Sep 6, 2018
uploader.go Major delta-backup refactoring (#117) Sep 21, 2018
utility.go Major delta-backup refactoring (#117) Sep 21, 2018
wal_delta_recorder.go Major delta-backup refactoring (#117) Sep 21, 2018
wal_delta_recording_reader.go Major delta-backup refactoring (#117) Sep 21, 2018
wal_delta_util.go Major delta-backup refactoring (#117) Sep 21, 2018
wal_part.go Major delta-backup refactoring (#117) Sep 21, 2018
wal_part_file.go Major delta-backup refactoring (#117) Sep 21, 2018
wal_part_recorder.go Major delta-backup refactoring (#117) Sep 21, 2018
zstd_compressor.go Lzma and zstd (#99) Jun 27, 2018
zstd_decompressor.go Major delta-backup refactoring (#117) Sep 21, 2018
zstd_reader_from_writer.go Wal delta (#110) Sep 6, 2018

README.md

WAL-G

Build Status Go Report Card

WAL-G is an archival restoration tool for Postgres.

WAL-G is the successor of WAL-E with a number of key differences. WAL-G uses LZ4, LZMA or Brotli compression, multiple processors and non-exclusive base backups for Postgres. More information on the design and implementation of WAL-G can be found on the Citus Data blog post "Introducing WAL-G by Citus: Faster Disaster Recovery for Postgres".

Table of Contents

Installation

A precompiled binary for Linux AMD 64 of the latest version of WAL-G can be obtained under the Releases tab.

To decompress the binary, use:

tar -zxvf wal-g.linux-amd64.tar.gz

For other incompatible systems, please consult the Development section for more information.

Configuration

Required

To connect to Amazon S3, WAL-G requires that these variables be set:

  • WALG_S3_PREFIX (eg. s3://bucket/path/to/folder) (alternative form WALE_S3_PREFIX)

WAL-G determines AWS credentials like other AWS tools. You can set AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY (optionally with AWS_SECURITY_TOKEN), or ~/.aws/credentials (optionally with AWS_PROFILE), or you can set nothing to automatically fetch credentials from the EC2 metadata service.

WAL-G uses the usual PostgreSQL environment variables to configure its connection, especially including PGHOST, PGPORT, PGUSER, and PGPASSWORD/PGPASSFILE/~/.pgpass.

PGHOST can connect over a UNIX socket. This mode is preferred for localhost connections, set PGHOST=/var/run/postgresql to use it. WAL-G will connect over TCP if PGHOST is an IP address.

Optional

WAL-G can automatically determine the S3 bucket's region using s3:GetBucketLocation, but if you wish to avoid this API call or forbid it from the applicable IAM policy, specify:

  • AWS_REGION(eg. us-west-2)

Concurrency values can be configured using:

  • WALG_DOWNLOAD_CONCURRENCY

To configure how many goroutines to use during backup-fetch and wal-push, use WALG_DOWNLOAD_CONCURRENCY. By default, WAL-G uses the minimum of the number of files to extract and 10.

  • WALG_UPLOAD_CONCURRENCY

To configure how many concurrency streams to use during backup uploading, use WALG_UPLOAD_CONCURRENCY. By default, WAL-G uses 10 streams.

  • WALG_UPLOAD_DISK_CONCURRENCY

To configure how many concurrency streams are reading disk during backup-push. By default, WAL-G uses 1 stream.

  • WALG_SENTINEL_USER_DATA

This setting allows backup automation tools to add extra information to JSON sentinel file during backup-push. This setting can be used e.g. to give user-defined names to backups.

  • WALG_PREVENT_WAL_OVERWRITE

If this setting is specified, during wal-push WAL-G will check the existence of WAL before uploading it. If the different file is already archived under the same name, WAL-G will return the non-zero exit code to prevent PostgreSQL from removing WAL.

  • AWS_ENDPOINT

Overrides the default hostname to connect to an S3-compatible service. i.e, http://s3-like-service:9000

  • AWS_S3_FORCE_PATH_STYLE

To enable path-style addressing(i.e., http://s3.amazonaws.com/BUCKET/KEY) when connecting to an S3-compatible service that lack of support for sub-domain style bucket URLs (i.e., http://BUCKET.s3.amazonaws.com/KEY). Defaults to false.

Example: Using Minio.io S3-compatible storage

AWS_ACCESS_KEY_ID: "<minio-key>"
AWS_SECRET_ACCESS_KEY: "<minio-secret>"
WALE_S3_PREFIX: "s3://my-minio-bucket/sub-dir"
AWS_ENDPOINT: "http://minio:9000"
AWS_S3_FORCE_PATH_STYLE: "true"
AWS_REGION: us-east-1
  • WALG_S3_STORAGE_CLASS

To configure the S3 storage class used for backup files, use WALG_S3_STORAGE_CLASS. By default, WAL-G uses the "STANDARD" storage class. Other supported values include "STANDARD_IA" for Infrequent Access and "REDUCED_REDUNDANCY" for Reduced Redundancy.

  • WALG_S3_SSE

To enable S3 server-side encryption, set to the algorithm to use when storing the objects in S3 (i.e., AES256, aws:kms).

  • WALG_S3_SSE_KMS_ID

If using S3 server-side encryption with aws:kms, the KMS Key ID to use for object encryption.

  • WALG_GPG_KEY_ID (alternative form WALE_GPG_KEY_ID)

To configure GPG key for encryption and decryption. By default, no encryption is used. Public keyring is cached in the file "/.walg_key_cache".

  • WALG_DELTA_MAX_STEPS

Delta-backup is difference between previously taken backup and present state. WALG_DELTA_MAX_STEPS determines how many delta backups can be between full backups. Defaults to 0. Restoration process will automatically fetch all necessary deltas and base backup and compose valid restored backup (you still need WALs after start of last backup to restore consistent cluster). Delta computation is based on ModTime of file system and LSN number of pages in datafiles.

  • WALG_DELTA_ORIGIN

To configure base for next delta backup (only if WALG_DELTA_MAX_STEPS is not exceeded). WALG_DELTA_ORIGIN can be LATEST (chaining increments), LATEST_FULL (for bases where volatile part is compact and chaining has no meaning - deltas overwrite each other). Defaults to LATEST.

  • WALG_COMPRESSION_METHOD

To configure compression method used for backups. Possible options are: lz4, 'lzma', 'brotli'. Default method is lz4. LZ4 is the fastest method, but compression ratio is bad. LZMA is way much slower, however it compresses backups about 6 times better than LZ4. Brotli is a good trade-off between speed and compression ratio which is about 3 times better than LZ4.

  • WALG_DISK_RATE_LIMIT

To configure disk read rate limit during backup-push in bytes per second.

  • WALG_NETWORK_RATE_LIMIT

To configure network upload rate limit during backup-push in bytes per second.

Usage

WAL-G currently supports these commands:

  • backup-fetch

When fetching base backups, the user should pass in the name of the backup and a path to a directory to extract to. If this directory does not exist, WAL-G will create it and any dependent subdirectories.

wal-g backup-fetch ~/extract/to/here example-backup

WAL-G can also fetch the latest backup using:

wal-g backup-fetch ~/extract/to/here LATEST
  • backup-push

When uploading backups to S3, the user should pass in the path containing the backup started by Postgres as in:

wal-g backup-push /backup/directory/path

If backup is pushed from replication slave, WAL-G will control timeline of the server. In case of promotion to master or timeline switch, backup will be uploaded but not finalized, WAL-G will exit with an error. In this case logs will contain information necessary to finalize the backup. You can use backuped data if you clearly understand entangled risks.

  • wal-fetch

When fetching WAL archives from S3, the user should pass in the archive name and the name of the file to download to. This file should not exist as WAL-G will create it for you.

WAL-G will also prefetch WAL files ahead of asked WAL file. These files will be cached in ./.wal-g/prefetch directory. Cache files older than recently asked WAL file will be deleted from the cache, to prevent cache bloat. If the file is requested with wal-fetch this will also remove it from cache, but trigger fulfilment of cache with new file.

wal-g wal-fetch example-archive new-file-name
  • wal-push

When uploading WAL archives to S3, the user should pass in the absolute path to where the archive is located.

wal-g wal-push /path/to/archive
  • backup-list

Lists names and creation time of available backups.

  • delete

Is used to delete backups and WALs before them. By default delete will perform dry run. If you want to execute deletion you have to add --confirm flag at the end of the command.

delete can operate in two modes: retain and before.

retain [FULL|FIND_FULL] %number%

if FULL is specified keep 5 full backups and everything in the middle

before [FIND_FULL] %name%

if FIND_FULL is specified WAL-G will calculate minimum backup needed to keep all deltas alive. If FIND_FULL is not specified and call can produce orphaned deltas - call will fail with the list.

retain 5 will fail if 5th is delta

retain FULL 5 will keep 5 full backups and all deltas of them

retain FIND_FULL will find necessary full for 5th

before base_000010000123123123 will fail if base_000010000123123123 is delta

before FIND_FULL base_000010000123123123 will keep everything after base of base_000010000123123123

Development

Installing

To compile and build the binary:

go get github.com/wal-g/wal-g
make all

Users can also install WAL-G by using make install. Specifying the GOBIN environment variable before installing allows the user to specify the installation location. On default, make install puts the compiled binary in go/bin.

export GOBIN=/usr/local/bin
make install

Testing

WAL-G relies heavily on unit tests. These tests do not require S3 configuration as the upload/download parts are tested using mocked objects. For more information on testing, please consult test_tools.

WAL-G will perform a round-trip compression/decompression test that generates a directory for data (eg. data...), compressed files (eg. compressed), and extracted files (eg. extracted). These directories will only get cleaned up if the files in the original data directory match the files in the extracted one.

Test coverage can be obtained using:

go test -v -coverprofile=coverage.out
go tool cover -html=coverage.out

Authors

See also the list of contributors who participated in this project.

License

This project is licensed under the Apache License, Version 2.0, but the lzo support is licensed under GPL 3.0+. Please refer to the LICENSE.md file for more details.

Acknowledgements

WAL-G would not have happened without the support of Citus Data

WAL-G came into existence as a result of the collaboration between a summer engineering intern at Citus, Katie Li, and Daniel Farina, the original author of WAL-E who currently serves as a principal engineer on the Citus Cloud team. Citus Data also has an open source extension to Postgres that distributes database queries horizontally to deliver scale and performance.

Chat

We have a Slack group to discuss WAL-G usage and development. To joint PostgreSQL slack use invite app.