Performs deep row copy from one database to another, including all rows from child and parent tables.
Ever needed to test something locally on a couple of rows of data from staging/uat/prod data? You only need a couple of entities, but the tables are intertwined with all the relationships, so you spend an hour copying all the required data from a dozen different tables just to have that two rows that you're interested in. Then next day you need to do the same, but with different rows. And you've shut down your Postgres container, so the database is clean, and you need to start from scratch.
Enters rowcp
. Rowcp has a very limited purpose - copy a couple of rows from the table in one database into the same
table in the different database while satisfying all the dependencies. All the rows from the tables that your table has
a foreign key to, and all the rows they have a foreign key to, and all rows from the tables that have a foreign key
pointing to these two rows that you're copying and so on.
rowcp copies table rows from the source database to the target database. Source and target schemas must be the same. Source and target DBMS can be different. You can use rowcp to copy data from MySQL to Postgres and vice versa.
Currently, only Postgresql, Mysql, MariaDB and h2 are supported.
To copy rows run
java -jar rowcp.jar [options] Query
To see all supported command line arguments run rowcp with --help
parameter.
Required parameters are --source-connection
and --target-connection
that specify JDBC connection string for the
source and target databases respectively.
"Query" parameter is a seed query that returns rows that need to be copied to the target database and is also required.
There are some rules this query must abide by:
- query must be a
SELECT *
query - it must have only one table in the
FROM
clause. No joins are allowed. WHERE
clause must be present to limit the number of rows selected- it must be a valid SQL query in the SQL dialect of the source database
There is no limitation on what you can have in WHERE
clause, but it is highly recommended limiting the number of rows
copied to be small (in the 1 to 10 range), as, depending on the number of relationships, the actual number of copied
rows can grow exponentially and rowcp is not build for speed and efficiency.
Instead of inserting/updating the target database, rowcp can create SQL statements for each row from the source database.
If --target-connection
starts with file:
schema instead of jdbc:
schema, then the rest of the connection line will
be used as a name of the file to which the INSERT INTO
statements will be written.
Some options, such as --target-user
, --target-password
and --update
are ignored when the target is a file. This
mode is experimental, please review the generated SQL statements before executing them.
Or, at least, some of them.
--skip-tables
- a comma-separated list of tables in the source database that are ignored. Any relationships that start
or end at this table are also ignored. Instead of table name one might use regular expression.
--skip-unknown-columns
- ignore columns that do not exist in the target database's tables. Without this option,
missing columns will be treated as errors.
--update
- allows updating target database if the row with the same primary key already exists. This is considerably
slower than default insert mode and so is not recommended.
--ignore-existing
- Ignore insert statement if the row already exists. This allows to skip rows that already exist in
target database. This option is faster than --update
but may not be supported by all databases. It
uses ON CONFLICT DO NOTHING
with Postgres and INSERT IGNORE
with Mysql
--chunk-size
- When reading from the source database this will limit the number of conditions in generated queries.
This helps to avoid generating queries that are too large to be handled by the source server. When writing to the target
database, this will set the size of the INSERT
batches.
--source-user
and --target-user
- username for connecting to the source and target database respectively.
You may also pass username as part of the connection string.
--source-password
and --target-password
. Password for the source or target database. If the argument to the option
is not provided then the user will be prompted to type the password. Password may be omitted completely if the database
does not require one or the password is passed as part of the connection string. For security purposes it is recommended
not to provide password as CLI or connection string parameter.
--schema-cache
controls usage of schema graph cache. It might take a long time to retrieve metadata from the remote
schema. Providing option --schema-cache use
will cache the schema metadata locally and will use it during the
following invocations with --schema-cache use
parameter. If schema has changed, one should use --schema-cache clear
to clear the cache. Schema is cached inside $USER/.rowcp directory. WARNING This option is considered in beta and
it's behaviour could change, use at your own risk.
All the options can be stored in a file, path to which can be passed to --parameter-file
argument.
Parameter file must contain one option per line. The query must be the last option and separated from the rest of the options by an empty line.
Lines starting with #
are treated as comments and are ignored.
--source-connection jdbc:mariadb://some_server:3306
--source-user src_user
--source-password src_password
--target-connection jdbc:mysql://localhost:3306/database
--target-user mysql
--target-password mysql
# don't forget to remove
--verbose 2
--skip-tables table1, table2
--skip-unknown-columns
SELECT * FROM table3 WHERE id in (1,2,3)
AND name IS NOT NULL;
Any options passed on command-line will override the ones specified in the parameter file.
If source table contains a column that is missing from the target table, then copy process will fail.
Use --skip-unknown-columns
option to ignore such columns.
If target table has a non-null column that is missing from the source table, copy process will fail, if target column is
nullable, then null
value will be inserted.
Rowcp is intended to help with testing, not to copy databases. It is highly recommended that seed query select a limited number of rows.
It is assumed that the target database is empty or almost empty and there would be no conflicts in the data. Rowcp does
not try to clean up target tables or update existing data, if there is a duplicate key situation, then copying will
fail (unless --update
option is selected).
Autogenerated fields in the target database are not supported. All the autoincremented fields will be inserted as they are in the source database.