Skip to content

Migrate data from Postgres Large Objects to S3

License

Notifications You must be signed in to change notification settings

tocco/lo-migrate

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Migrate Large Objects from Postgres to S3

Build Status codecov

This software is used by Tocco AG the creator of the Tocco Business Framework to transition from Postgres Large Object storage to an AWS S3 compatible storage (Ceph Object Storage). In addition, the key for referencing binaries is updated from SHA1 to SHA2.

Usage

Available Command Line Arguments

$ ./lo_migrate_cli --help
Postgres Large Object to S3 Migrator

USAGE:
    lo_migrate_cli [OPTIONS] --s3-url <URL> --access-key <KEY> --secret-key <KEY> --bucket <NAME> --pg-url <URL>

FLAGS:
    -h, --help       Prints help information
    -V, --version    Prints version information

OPTIONS:
        --commit-chunk <INT>        Number of SHA2 hashes commited per DB transaction
        --committer-queue <INT>     Size of the committer queue
        --committer-threads <INT>   Number of committer threads
        --in-mem-max <INT>          Max. size of Large Object to keep in memory (in KiB)
    -i, --interval <SECS>           Interval in which stats are shown (in secs)
    -p, --pg-url <URL>              Url to connect to postgres (postgresql://USER:PASS@HOST/DB_NAME)
        --receiver-queue <INT>      Size of the receiver queue
        --receiver-threads <INT>    Number of receiver threads
    -k, --access-key <KEY>          S3 access key
    -b, --bucket <NAME>             Name of the S3 bucket
    -s, --secret-key <KEY>          S3 secret key
    -u, --s3-url <URL>              URL to S3 endpoint
        --storer-queue <INT>        Size of the storer queue
        --storer-threads <INT>      Number of storer threads

Example

Migrate binaries from database nice2_ecap to the ecap bucket:

lo_migrate_cli --pg-url postgresql://ecap@postgres.tocco.ch/nice2_ecap --access-key my_key --secret-key my_secret --bucket ecap --s3-url "https://s3.tocco.ch"

Debugging

Logging can be enabled via RUST_LOG environment variable, for instance RUST_LOG=lo_migrate=info enables all info messages. Logging is provided env_logger whose documentation provides more examples.

Installation

Compiling the Code

This code uses features only available in the nightly builds of the Rust Compiler. Nightly is unstable and breaking changes may be introduced at any time. You may want to install a particular verison of nightly to make sure it keeps compiling. See also Troubleshooting.

  1. install rustup (use default settings)

  2. clone this repository: $ git clone https://github.com/pgerber/lo-migrate.git

  3. cd into the newly checked out repository: $ cd lo-migrate

  4. build the code: $ cargo build --release

  5. execute the binary: $ cargo run --release -- … ('…' are the parameters passed on to the executable)

    You can also run the binary directly, it is located at target/release/lo_migrate_cli.

Troubleshooting

  • If compilation of openssl-sys fails, make sure you have the dev packages for openSSL or LibreSSL installed as well as the pkg-config build tool. On Debian based systems you can do so via $ sudo apt-get install libssl-dev pkg-config. See openssl-sys' README for more details.

Design

The objects are represented as lo::Los within the software. This objects are passed through multiple groups of thread workers. To allow the Los to be passed from thread to thread multi-reader and multi-writer queues are used.

Worker Threads

The Los are passed from thread group to thread group in this order:

  1. The Observer Thread (thread::Observer)
    This is just a single thread that reads the list of Large Object from the _nice_binary table. It creates the initial Lo objects containing the sha1 hash, oid, mime type and size of the object. Then, it passes them on to the next thread group.

    info: The queue between Observer and Receiver is referred to as receiver queue throughout the code.

  2. The Receiver Threads (thread::Receiver)
    Threads in this group fetch the Large Objects from Postgres and embed the data in the Lo objects. For files larger than 1 MiB (configurable), the data is written to a temporary file, smaller object are kept in memory. Additionally, the SHA2 hash of the object is calculated and also stored on the Los. Los are then passed on to the next thread group.

    info: The queue between Receiver and Storer is referred to as storer queue throughout the code.

  3. The Storer Threads (thread::Storer)
    This group of threads is responsible for writing the data to the S3 storage and setting the correct meta data (SHA2 hash, mime type, etc.). After dropping the data held in memory and removing temporary files the Loo are passed on to the next thread group.

    info: The queue between Storer and Committer is referred to as committer queue throughout the code.

  4. The Committer Threads (thread::Committer)
    This thread commits the generated SHA2 hashes to the database. By default, committing 100 hashes per transaction. Afterwards, the Lo objects are dropped for good.

Only objects that have no SHA2 set are copied and the hash is only set once the object is in the S3 storage. This way copying can be stopped and resumed at any time without risking any data loss.

The Monitor Thread

This thread, thread::Monitor, shows the number of Los processed by each thread group, status of the queue and the total amount of object migrated. To enable global stats, the atomic counters in thread::ThreadStat are shared amongst all threads. It is the threads' responsibility to increase their respective counters.

Message printed by the monitor thread look like this:

*******************************************************************
    Status at 2017-03-10 10:10:53 (updated every: 10s), ETA: 2017-03-10 11:15:33  (1h 04m 40s)

Progress Overview:
    0.01%, 1200 of 22618378 object have been migrated

Processed Objects by Thread Pools:
    observer thread   - processed:    9752, current speed:  892.7 Lo/s, average speed:  441.7 Lo/s
    receiver threads  - processed:    1558, current speed:  142.6 Lo/s, average speed:   70.6 Lo/s
    storer threads    - processed:    1432, current speed:  131.1 Lo/s, average speed:   64.9 Lo/s
    committer threads - processed:    1200, current speed:  109.8 Lo/s, average speed:   54.4 Lo/s

Queue Usage:
    receive queue     - used   8192 of   8192, 100.00% full, changed by:  +4235
    store queue       - used    532 of   1024,  11.82% full, changed by:    -50
    commit queue      - used     83 of   8192,   0.39% full, changed by:    +32
    commit queue      - used      0 of   8192,   0.00% full, changed by:     +0

About

Migrate data from Postgres Large Objects to S3

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Rust 96.3%
  • PLpgSQL 3.7%