GitHub - thundercumt/data_transit: a ruby gem/app used to migrate among different relational databases, supporting customized migration procedure

data_transit

data_transit is a ruby gem/app used to migrate between different relational databases, supporting customized migration procedure.

1 Introduction

data_transit relies on activerecord to generate database Models on the fly. Tt is executed within a database transaction, and should any error occur during data transit, it will cause the transaction to rollback. So don't worry about introducing dirty data into your target database.

2 Install

data_transit can be installed using gem

gem install data_transit

or

download data_transit.gem

gem install --local /where/you/put/data_transit.gem

In your command line, input data_transit, You can proceed to "how to use" if you can see message prompt as below.

data_transit

usage: data_transit command args. 4 commands are listed below...

3 How to use

3.1 Config DataBase Connection

data_transit setup_db_conn /your/yaml/database/config/file

Your db config file should be compatible with the activerecord adapter you are using and configured properly. As the name suggests, source designates which database you plan to copy data from, while target means the copy destination. Note the key 'source' and 'target' should be kept unchanged!

For example, here is sample file for oracle dbms.

database.yml

source:#don't change this line

adapter: oracle_enhanced

database: dbserver

username: xxx

password: secret

target:#don't change this line

adapter: oracle_enhanced

database: orcl

username: copy1

password: cipher

3.2 Create Database Schema(optional)

If you can have your target database schema created, move to 3.3 "copy data", otherwise use your database specific tools to generate an identical schema in your target database. Or if you don't have a handy tool to generate the schema, for instance when you need to migrate between different database systems, you can use data_transit to dump a schema description file based on your source database schema, and then use this file to create your target schema.

Note this gem is built on activerecord, so it should work well for the database schema compatible with rails conventions. Example, a single-column primary key rather than a compound primary key (primary key with more than one columns), primary key of integer type instead of other types like guid, timestamp etc.

In data_transit, I coded against the situation where non-integer is used, therefore resulting a minor problem that the batch-query feature provided by

activerecord can not be used because of its dependency on integer primary key. In this special case, careless selection of copy range might overburden network, database server because all data in the specified range are transmitted from the source database and then inserted into the target database.

3.2.1 Dump Source Database Schema

data_transit dump_schema [schema_file] [rule_file]

[schema_file] will be used to contain dumped schema, [rule_file] describes how your want data_transit to copy your data.

Note if your source schema is somewhat a legacy, you might need some manual work to adjust the generated schema_file to better meet your needs.

For example, in my test, the source schema uses obj_id as id, and uses guid as primary key type, so I need to impede activerecord auto-generating "id" column by removing primary_key => "obj_id" and adding :id=>false, and then appending primary key constraint in the end of each table definition. See below.

here is an example dumped schema file

ActiveRecord::Schema.define(:version => 0) do

create_table "table_one", primary_key => "obj_id", :force => true do |t|

#other fields

end

#other tables

end

and I manually changed the schema definition to

ActiveRecord::Schema.define(:version => 0) do

create_table "table_one", :id => false, :force => true do |t|

t.string "obj_id", :limit => 42

#other fields

end

execute "alter table table_one add primary key(obj_id)"

end

3.2.2 Create Target Database Schema

data_transit create_table [schema_file]

If everything goes well, you will see a bunch of ddl execution history.

3.3 Copy Data

data_transit copy_data [rule_file]

[rule_file] contains your copy logic. For security reasons, I changed table names

and it looks as follows.

#start of rule

start_date = "2015-01-01 00:00:00"

end_date = "2015-02-01 00:00:00"

migrate do

choose_table "APP.TABLE1","APP.TABLE2","APP.TABLE3","APP.TABLE4","APP.TABLE5","APP.TABLE6"

batch_by "ACQUISITION_TIME BETWEEN TO_DATE('#{start_date}','yyyy-mm-dd hh24:mi:ss') AND TO_DATE('#{end_date}', 'yyyy-mm-dd hh24:mi:ss')"

register_primary_key 'OBJ_ID'

end

migrate do

choose_table "APP.TABLE7","APP.TABLE8","APP.TABLE9","APP.TABLE10","APP.TABLE11","APP.TABLE12"

batch_by "ACKTIME BETWEEN TO_DATE('#{start_date}','yyyy-mm-dd hh24:mi:ss') AND TO_DATE('#{end_date}', 'yyyy-mm-dd hh24:mi:ss')"

register_primary_key 'OBJ_ID'

end

migrate do

choose_table "APP.TABLE13","APP.TABLE14","APP.TABLE15","APP.TABLE16","APP.TABLE17","APP.TABLE18"

batch_by "1>0" #query all data because these tables don't have a reasonable range

register_primary_key 'OBJ_ID'

pre_work do |targetCls| targetCls.delete_all("1>0") end #delete all in target

#post_work do |targetCls| end

end

#end of rule

Each migrate block contains a data_transit task.

"choose_table" describes which tables are included in this task. These tables

share some nature in common, and can be processed with the same rule.

"batch_by" is the query condition

"register_primary_key" describes the primary key of the tables.

"pre_work" is a block executed before each table is processed.

"post_work" is a block executed after each table is processed.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
bin		bin
lib/datatransit		lib/datatransit
vendor/cache		vendor/cache
.gitignore		.gitignore
Database.yml		Database.yml
Gemfile		Gemfile
Gemfile.lock		Gemfile.lock
LICENSE		LICENSE
README		README
README.md		README.md
Rakefile		Rakefile

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bin

bin

lib/datatransit

lib/datatransit

vendor/cache

vendor/cache

.gitignore

.gitignore

Database.yml

Database.yml

Gemfile

Gemfile

Gemfile.lock

Gemfile.lock

LICENSE

LICENSE

README

README

README.md

README.md

Rakefile

Rakefile

Repository files navigation

About

Releases

Packages

Languages

License

thundercumt/data_transit

Folders and files

Latest commit

History

Repository files navigation

About

Resources

License

Stars

Watchers

Forks

Languages