This software is used by Tocco AG the creator of the Tocco Business Framework to transition from Postgres Large Object storage to an AWS S3 compatible storage (Ceph Object Storage). In addition, the key for referencing binaries is updated from SHA1 to SHA2.
$ ./lo_migrate_cli --help
Postgres Large Object to S3 Migrator
USAGE:
lo_migrate_cli [OPTIONS] --s3-url <URL> --access-key <KEY> --secret-key <KEY> --bucket <NAME> --pg-url <URL>
FLAGS:
-h, --help Prints help information
-V, --version Prints version information
OPTIONS:
--commit-chunk <INT> Number of SHA2 hashes commited per DB transaction
--committer-queue <INT> Size of the committer queue
--committer-threads <INT> Number of committer threads
--in-mem-max <INT> Max. size of Large Object to keep in memory (in KiB)
-i, --interval <SECS> Interval in which stats are shown (in secs)
-p, --pg-url <URL> Url to connect to postgres (postgresql://USER:PASS@HOST/DB_NAME)
--receiver-queue <INT> Size of the receiver queue
--receiver-threads <INT> Number of receiver threads
-k, --access-key <KEY> S3 access key
-b, --bucket <NAME> Name of the S3 bucket
-s, --secret-key <KEY> S3 secret key
-u, --s3-url <URL> URL to S3 endpoint
--storer-queue <INT> Size of the storer queue
--storer-threads <INT> Number of storer threads
Migrate binaries from database nice2_ecap to the ecap bucket:
lo_migrate_cli --pg-url postgresql://ecap@postgres.tocco.ch/nice2_ecap --access-key my_key --secret-key my_secret --bucket ecap --s3-url "https://s3.tocco.ch"
Logging can be enabled via RUST_LOG
environment variable, for instance RUST_LOG=lo_migrate=info
enables all info
messages. Logging is provided env_logger whose
documentation provides more examples.
This code uses features only available in the nightly builds of the Rust Compiler. Nightly is unstable and breaking changes may be introduced at any time. You may want to install a particular verison of nightly to make sure it keeps compiling. See also Troubleshooting.
-
install rustup (use default settings)
-
clone this repository:
$ git clone https://github.com/pgerber/lo-migrate.git
-
cd into the newly checked out repository:
$ cd lo-migrate
-
build the code:
$ cargo build --release
-
execute the binary:
$ cargo run --release -- …
('…' are the parameters passed on to the executable)You can also run the binary directly, it is located at
target/release/lo_migrate_cli
.
- If compilation of openssl-sys fails, make sure you have the dev packages for openSSL or LibreSSL installed as well as
the pkg-config build tool. On Debian based systems you can do so via
$ sudo apt-get install libssl-dev pkg-config
. See openssl-sys' README for more details.
The objects are represented as lo
::
Lo
s within the software. This objects are passed through multiple
groups of thread workers. To allow the Lo
s to be passed from thread to thread
multi-reader and multi-writer queues are used.
The Lo
s are passed from thread group to thread group in this order:
-
The Observer Thread (
thread
::
Observer
)
This is just a single thread that reads the list of Large Object from the _nice_binary table. It creates the initialLo
objects containing the sha1 hash, oid, mime type and size of the object. Then, it passes them on to the next thread group.info: The queue between
Observer
andReceiver
is referred to as receiver queue throughout the code. -
The Receiver Threads (
thread
::
Receiver
)
Threads in this group fetch the Large Objects from Postgres and embed the data in theLo
objects. For files larger than 1 MiB (configurable), the data is written to a temporary file, smaller object are kept in memory. Additionally, the SHA2 hash of the object is calculated and also stored on theLo
s.Lo
s are then passed on to the next thread group.info: The queue between
Receiver
andStorer
is referred to as storer queue throughout the code. -
The Storer Threads (
thread
::
Storer
)
This group of threads is responsible for writing the data to the S3 storage and setting the correct meta data (SHA2 hash, mime type, etc.). After dropping the data held in memory and removing temporary files theLo
o are passed on to the next thread group.info: The queue between
Storer
andCommitter
is referred to as committer queue throughout the code. -
The Committer Threads (
thread
::
Committer
)
This thread commits the generated SHA2 hashes to the database. By default, committing 100 hashes per transaction. Afterwards, theLo
objects are dropped for good.
Only objects that have no SHA2 set are copied and the hash is only set once the object is in the S3 storage. This way copying can be stopped and resumed at any time without risking any data loss.
This thread, thread
::
Monitor
, shows the number of Lo
s processed by each thread group,
status of the queue and the total amount of object migrated. To enable global stats, the atomic counters in
thread
::
ThreadStat
are shared amongst all threads. It is the threads' responsibility to
increase their respective counters.
Message printed by the monitor thread look like this:
*******************************************************************
Status at 2017-03-10 10:10:53 (updated every: 10s), ETA: 2017-03-10 11:15:33 (1h 04m 40s)
Progress Overview:
0.01%, 1200 of 22618378 object have been migrated
Processed Objects by Thread Pools:
observer thread - processed: 9752, current speed: 892.7 Lo/s, average speed: 441.7 Lo/s
receiver threads - processed: 1558, current speed: 142.6 Lo/s, average speed: 70.6 Lo/s
storer threads - processed: 1432, current speed: 131.1 Lo/s, average speed: 64.9 Lo/s
committer threads - processed: 1200, current speed: 109.8 Lo/s, average speed: 54.4 Lo/s
Queue Usage:
receive queue - used 8192 of 8192, 100.00% full, changed by: +4235
store queue - used 532 of 1024, 11.82% full, changed by: -50
commit queue - used 83 of 8192, 0.39% full, changed by: +32
commit queue - used 0 of 8192, 0.00% full, changed by: +0