Skip to content
Thibaut Barrère edited this page Sep 9, 2021 · 3 revisions

Kiba is an ETL Ruby framework.

What is ETL?

If you are unfamiliar with the notion of ETL, you will find introductions here:

Sources, transforms and destinations

Kiba "core" (the kiba gem) does not implement sources, transforms and destinations itself.

Instead, it provides:

  • A way for you to declare ETL jobs
  • A structure & conventions to implement sources/transforms/destinations
  • A "runner" able to execute the job

You can either implement those components yourself, or tap into the ones provided in kiba-common (Open-Source) or Kiba Pro (Commercial extension).

A data pipeline or job is schematically organised like this:

Data pipeline

In detail:

  • Sources are responsible for reading the data (generally row by row) ; they typically implement some file reading, database connection, or API calls to extract the data.
  • Kiba then pass each row along to each transform (in order). A transform can either return the row modified, or even generate multiple output rows, or no row at all.
  • Finally, the rows are sent to the destinations, which are responsible for sending the rows wherever you see fit (database, file system, API storage etc).

It is perfectly possible to have multiple jobs that you will run sequentially, each generating an output which will be used by the next job as an input.

Next: How to define ETL jobs with Kiba