Implementing ETL sources
Clone this wiki locally
Kiba sources are components you can either implement yourself, or pick from other projects (such as Kiba Common and Kiba Pro).
The sources are components responsible for the extraction of data.
Sources are classes implementing:
- a constructor (to which Kiba will pass the provided arguments in the DSL)
eachmethod (which should yield rows one by one)
Rows are usually
Hash instances, but could be other structures as long as the next steps of your pipeline know how to handle them.
Since sources are classes, you can (and are encouraged to) unit test them and reuse them.
Here is a simple CSV source:
require 'csv' class MyCsvSource attr_reader :input_file def initialize(input_file) @input_file = input_file end def each CSV.open(input_file, headers: true, header_converters: :symbol) do |csv| csv.each do |row| yield(row.to_hash) end end end end
Once implemented, you can use your source within
job = Kiba.parse do source MyCsvSource, filename # SNIP end
The first argument for
source is the class name. The other arguments will be passed to the source constructor (
initialize) when Kiba runs your pipeline.
Ideally, it is recommended to open and close resources inside
each, using a block-form (as seen in this example), to ensure that the resources are closed if the pipeline is interrupted.
A couple of sources are available in kiba-common, if you want to see how they are implemented.
This wiki is tracked by git and publicly editable. You are welcome to fix errors and typos. Any defacing or vandalism of content will result in your changes being reverted and you being blocked.