Kiba is an ETL Ruby framework.
What is ETL?
If you are unfamiliar with the notion of ETL, you will find introductions here:
- The Wikipedia page on Extract,Transform,Load
- The following article: "Rubyists - are you doing ETL unknowingly?" on Kiba's author blog
Sources, transforms and destinations
Kiba "core" (the
kiba gem) does not implement sources, transforms and destinations itself.
Instead, it provides:
- A way for you to declare ETL jobs
- A structure & conventions to implement sources/transforms/destinations
- A "runner" able to execute the job
A data pipeline or job is schematically organised like this:
- Sources are responsible for reading the data (generally row by row) ; they typically implement some file reading, database connection, or API calls to extract the data.
- Kiba then pass each row along to each transform (in order). A transform can either return the row modified, or even generate multiple output rows, or no row at all.
- Finally, the rows are sent to the destinations, which are responsible for sending the rows wherever you see fit (database, file system, API storage etc).
It is perfectly possible to have multiple jobs that you will run sequentially, each generating an output which will be used by the next job as an input.