-
-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow transforms to yield more than one row #4
Comments
Hi, |
Hi @marcosbeirigo! #4 is not required to achieve batch insert. It is meant for scenarios that need to denormalize one row into many rows. Imagine you'd have this csv file as a source:
and you'd want to transform such a row into:
This would currently require a 3-pass ETL process with Kiba. If a transform is able to return an array of rows, then it will become supported. Does that clear this up? Now back to batch inserting: you can do it today, without #4. You can add batching support for a destination with something similar to this: class MysqlDestination
def initialize(xxx, batch_size:)
@batch_size = batch_size
@rows = []
end
def write(row)
@rows << row
flush_rows if @rows.size >= @batch_size
end
def flush_rows
# do the batch write here
end
def close
# flush remaining rows
flush_rows
end
end Note that I may later introduce native support for batching in Kiba, maybe via some kind of middleware stacking like in Rack/Sidekiq, or similar. Another way of doing faster inserts is to use MySQL bulk import. For this you'd create a destination that would output to a delimited file, then use a Hope this helps :-) |
hi @thbar, I think the Thanks for clarifying! |
@marcosbeirigo you welcome 👍 |
Adding some thoughts: some semantic is needed to allow to return more than one row per transform (common request). One way to put this would be to explicitely use a different keyword, indicating that the transform is expected to yield zero or more times for rows, instead of having to return the row as usual. This would give: # if :bought_for field is an array
denormalize do |row|
row[:bought_for].each do |value|
yield(row.dup.merge(bought_for: value))
end
end |
Supporting this properly requires either a rewrite using fibers (I have a prototype) or some potential slow down for everything else. Putting this in standbye mode for now. I will likely implement #15 first. |
Closing this. For now exploding multi-valued attributes can be done at the source level. See this article for a detailed how-to. |
Kiba v2 supports yielding multiple rows from a class transform. See https://github.com/thbar/kiba/releases/tag/v2.0.0. |
Currently and unlike activewarehouse-etl, it is not possible to yield multiple rows from a transform.
I'd like to implement such feature because it would be useful, but I need to fully think about the consequences first. For instance, it could work by yielding an Array of a specific type (so that the row itself could be an Array too, without risk of collision).
The text was updated successfully, but these errors were encountered: