Migrate Large Objects from Postgres to S3
This software is used by Tocco AG the creator of the Tocco Business Framework to transition from Postgres Large Object storage to an AWS S3 compatible storage (Ceph Object Storage). In addition, the key for referencing binaries is updated from SHA1 to SHA2.
Available Command Line Arguments
$ ./lo_migrate_cli --help Postgres Large Object to S3 Migrator USAGE: lo_migrate_cli [OPTIONS] --s3-url <URL> --access-key <KEY> --secret-key <KEY> --bucket <NAME> --pg-url <URL> FLAGS: -h, --help Prints help information -V, --version Prints version information OPTIONS: --commit-chunk <INT> Number of SHA2 hashes commited per DB transaction --committer-queue <INT> Size of the committer queue --committer-threads <INT> Number of committer threads --in-mem-max <INT> Max. size of Large Object to keep in memory (in KiB) -i, --interval <SECS> Interval in which stats are shown (in secs) -p, --pg-url <URL> Url to connect to postgres (postgresql://USER:PASS@HOST/DB_NAME) --receiver-queue <INT> Size of the receiver queue --receiver-threads <INT> Number of receiver threads -k, --access-key <KEY> S3 access key -b, --bucket <NAME> Name of the S3 bucket -s, --secret-key <KEY> S3 secret key -u, --s3-url <URL> URL to S3 endpoint --storer-queue <INT> Size of the storer queue --storer-threads <INT> Number of storer threads
Migrate binaries from database nice2_ecap to the ecap bucket:
lo_migrate_cli --pg-url postgresql://email@example.com/nice2_ecap --access-key my_key --secret-key my_secret --bucket ecap --s3-url "https://s3.tocco.ch"
Compiling the Code
This code uses features only available in the nightly builds of the Rust Compiler. Nightly is unstable and breaking changes may be introduced at any time. You may want to install a particular verison of nightly to make sure it keeps compiling. See also Troubleshooting.
install rustup (use default settings)
clone this repository:
$ git clone https://github.com/pgerber/lo-migrate.git
cd into the newly checked out repository:
$ cd lo-migrate
build the code:
$ cargo build --release
execute the binary:
$ cargo run --release -- …('…' are the parameters passed on to the executable)
You can also run the binary directly, it is located at
- If compilation of openssl-sys fails, make sure you have the dev packages for openSSL or LibreSSL installed as well as
the pkg-config build tool. On Debian based systems you can do so via
$ sudo apt-get install libssl-dev pkg-config. See openssl-sys' README for more details.
The objects are represented as
Los within the software. This objects are passed through multiple
groups of thread workers. To allow the
Los to be passed from thread to thread
multi-reader and multi-writer queues are used.
Los are passed from thread group to thread group in this order:
The Observer Thread (
This is just a single thread that reads the list of Large Object from the _nice_binary table. It creates the initial
Loobjects containing the sha1 hash, oid, mime type and size of the object. Then, it passes them on to the next thread group.
The Receiver Threads (
Threads in this group fetch the Large Objects from Postgres and embed the data in the
Loobjects. For files larger than 1 MiB (configurable), the data is written to a temporary file, smaller object are kept in memory. Additionally, the SHA2 hash of the object is calculated and also stored on the
Los are then passed on to the next thread group.
The Storer Threads (
This group of threads is responsible for writing the data to the S3 storage and setting the correct meta data (SHA2 hash, mime type, etc.). After dropping the data held in memory and removing temporary files the
Loo are passed on to the next thread group.
The Committer Threads (
This thread commits the generated SHA2 hashes to the database. By default, committing 100 hashes per transaction. Afterwards, the
Loobjects are dropped for good.
Only objects that have no SHA2 set are copied and the hash is only set once the object is in the S3 storage. This way copying can be stopped and resumed at any time without risking any data loss.
The Monitor Thread
Monitor, shows the number of
Los processed by each thread group,
status of the queue and the total amount of object migrated. To enable global stats, the atomic counters in
ThreadStat are shared amongst all threads. It is the threads' responsibility to
increase their respective counters.
Message printed by the monitor thread look like this:
******************************************************************* Status at 2017-03-10 10:10:53 (updated every: 10s), ETA: 2017-03-10 11:15:33 (1h 04m 40s) Progress Overview: 0.01%, 1200 of 22618378 object have been migrated Processed Objects by Thread Pools: observer thread - processed: 9752, current speed: 892.7 Lo/s, average speed: 441.7 Lo/s receiver threads - processed: 1558, current speed: 142.6 Lo/s, average speed: 70.6 Lo/s storer threads - processed: 1432, current speed: 131.1 Lo/s, average speed: 64.9 Lo/s committer threads - processed: 1200, current speed: 109.8 Lo/s, average speed: 54.4 Lo/s Queue Usage: receive queue - used 8192 of 8192, 100.00% full, changed by: +4235 store queue - used 532 of 1024, 11.82% full, changed by: -50 commit queue - used 83 of 8192, 0.39% full, changed by: +32 commit queue - used 0 of 8192, 0.00% full, changed by: +0