Skip to content

How to define ETL jobs with Kiba

Thibaut Barrère edited this page Feb 10, 2020 · 1 revision

Kiba provides a DSL to let you define ETL jobs.

The recommended way to declare a job is by creating a dedicated module, which will use the Kiba.parse API:

module ETL
  module SyncJob
    module_function
    
    def setup(config)
      Kiba.parse do
        # called only once per run
        pre_process do
          ...
        end

        # responsible for reading the data
        source SomeSource, source_config...

        # then transforming it
        transform SomeTransform, transform_config...
        transform SomeOtherTransform, transform_config...

        # alternate block form
        transform do |row|
          # return row, modified
        end

        destination SomeDestination, destination_config...

        # a final block which will be called only if the pipeline succeeded
        post_process do
          ...
        end
      end
    end
  end
end

When one writes source SomeClass, some_config, it instructs Kiba to register the source at this point in the pipeline.

At runtime (see next section), Kiba will instantiate the class, with the provided arguments. Same goes for transforms and destinations.

Alternate block-forms are available for transforms, for convenience.

Pre-processors and post-processors are simple blocks which are called once per pipeline.

The combination of pre-processors, sources, transforms, destinations and post-processors defines your data processing pipeline for this job.

Next: How to run ETL jobs with Kiba