Allow to pass parameters to run when called programmatically #18

thbar · 2015-07-10T06:38:55Z

When calling Kiba.run programmatically (which apparently a growing number of users are doing), it would be very useful to be able to pass parameters to configure the run.

Points to be investigated

Must think about the best way to implement this.
Some internal classes (Control) should be made public and documented properly, maybe renamed (JobDefinition?).
This will encourage calling run multiple time on the same Control, so I must consider the potential issues with this scenario (e.g. if you store some @state in the control, you'd have to initialize it)
When used from Sidekiq or MT in general, give advice to eager load classes?
Will this leak memory or not?

Potential work-arounds for now

See 97758bb and ca5240a which may provide a way to do that now.

Related issues

Implementation points to consider

At this point I believe an input is mostly useful at parse time, for instance:

to loop over a passed variable and declare one source per item.
to use the input to feed a Source configuration.

We should also make sure that there is a way to ultimately pass that from both API calls and command-line.

The text was updated successfully, but these errors were encountered:

bcherrington · 2015-07-10T07:49:26Z

I have bene giving the passing-in and returning of parameters some thought too.

I have been toying with the idea of extending the DSL language to include “param” and “return” statements.

The “param” statement would be used to unpack parameters passed into the script to instance variables.

Similarly the “return” statement would pack any instance variables for returning back to the caller.

I agree that executing ‘run’ multiple times on the ‘control’ class needs careful consideration. One thought I had, was to parse the DSL and create a class which could then be instantiated and run by calling it’s methods directly. I think this approach would allow the garbage collector to free up unreferenced instances. However, there is a risk that approach this might detract from the simplicity of the code.

I have not given much thought to eager loading of the scripts because of the previous consideration. From a SideKiq worker, I call Kiba.run (including the script loading and parsing) each time with the script name and parameters (using my fork). I instantiate a new worker for each different tenant configured on the application.

Ideally, it would be nice to initialise (load and parse) all the different scripts and then just call the worker with parameters for the script. I still need to learn how I would try to accomplish this with SideKiq. I assume I would need to add some code to the ‘Sidekiq.configure_server’ block in the initialiser to tell SideKiq which scripts to load and label them for calling later.

From: Thibaut Barrère [mailto:notifications@github.com]
Sent: 10 July 2015 08:39
To: thbar/kiba
Subject: [kiba] Allow to pass parameters to run when called programmatically (#18)

When calling Kiba.run programmatically (which apparently a growing number of users are doing), it would be very useful to be able to pass parameters to configure the run.

Points to be investigated

Must think about the best way to implement this.
Some internal classes (Control) should be made public and documented properly, maybe renamed (JobDefinition?).
This will encourage calling run multiple time on the same Control, so I must consider the potential issues with this scenario (e.g. if you store some @State in the control, you'd have to initialize it)
When used from Sidekiq or MT in general, give advice to eager load classes?
Will this leak memory or not?

Related issues

—
Reply to this email directly or view it on GitHub #18 . https://github.com/notifications/beacon/AA8qKsZHYT-wY6X7TEEOgfVCQ3S65jNzks5ob2AAgaJpZM4FV3AS.gif

pcriv · 2015-07-14T21:21:02Z

+1

thbar · 2015-07-15T14:47:41Z

@bcherrington thanks for the thoughts, appreciated! I'm leaning toward a kind of context registry that would be accessible from the script, with two keys (or more) for input and ouput. I probably need to decouple things a bit more (but make sure to keep things simple).

In all cases thanks, I'll try to play around with these ideas and see if I can implement a clean, expandable pattern.

This is useful when doing programmatic calls to `Kiba.parse`. Note that this is experimental and implementation may change.

This so far underlines that it’s probably going to be hard to call `run` and expect to get an output, without calling `parse` each time before.

thbar · 2015-07-15T21:35:34Z

A first experiment is available. This implementation makes it more or less mandatory to call parse before each call to run (well depending on what is needed).

Maybe a better implementation could rely on some sort of Kiba.registry[:variable], but then I'll need to make sure this works with multi-threading (#19).

thbar · 2015-07-15T21:50:29Z

I also note that I initially wanted a clear separation between the parsing and run, but ultimately this approach may have drawbacks like this one too. Will ponder that.

thbar · 2016-11-14T12:49:02Z

Closing this one for now, since programmatic calls via API are not the main intended use today. I'll revisit this if/when I officially support programmatic calls.

joshRpowell · 2017-02-20T16:41:47Z

@thbar thanks for this great gem!

One of our projects was using the parsing-params branch from the main repo for the last 9+ months (gem 'kiba', git: 'https://github.com/thbar/kiba.git', branch: 'parsing-params') as noted above:

A first experiment is available. This implementation makes it more or less mandatory to call parse before each call to run (well depending on what is needed).

This branch was recently deleted. The dev who worked on the implementation is no longer with us. Can you remind me what this branch accomplished? We're getting the following:

ArgumentError: wrong number of arguments (given 3, expected 0..2)

we're running this method in a rake task:

  def run_etl(carrier, etl_filename)
    script_content = IO.read(etl_filename)
    stack = carrier.id
    # pass etl_filename to line numbers on errors
    job_definition = Kiba.parse(script_content, etl_filename, stack)
    Kiba.run(job_definition)
  end

thbar · 2017-02-20T16:48:48Z

@joshRpowell glad you like Kiba, and sorry for your trouble today!

Indeed this was an experimental branch which ultimately is not exactly how I want to solve that problem, so I deleted it (but hadn't thought someone would actually be using it).

I will make sure to mark such branches explicitely as "experimental" in the future, sorry about that!

To solve the problem right now, I suggest you fork Kiba then cherry-pick the commits here, since I think it should mostly be this.

Let me know how it goes!

thbar · 2017-02-20T16:56:14Z

@joshRpowell another possibility (since I do think you are using Kiba from a background job, right?) would be to rely on the block form of Kiba.parse to handle your definitions. This is what I use now from Sidekiq. Please note that this is a bit "fresh" as an approach, but here it comes, still:

module ETL
  module MySync
    module_function
    
    # variables could be a string, but also a Sequel connection for instance
    def setup(source_file, sequel_connection)
      Kiba.parse do
        source CSVSource, filename: source_file
        # SNIP
        destination MyDBDestination, sequel_connection: sequel_connection
      end
    end
  end
end

Which you then call with:

Kiba.run(ETL::MySync.setup(source_file, sequel_connection))

Please note that in case of errors here, Kiba.run will not call close on your destinations, so this is up to you to handle resources properly.

This approach will require a bit of refactoring on your side but it also more future-proof if you are using background jobs.

I'll address that in an upcoming blog post - hope this helps though!

joshRpowell · 2017-02-21T02:52:58Z

@thbar the cherry-pick worked. back up and running.

We are using Kiba from a Sidekiq job. I like your recommended approach and agree, some refactoring is required. I'll be in touch after I give it a try.

Thanks again!

thbar · 2017-02-21T06:08:49Z

@joshRpowell glad to hear everything's fine now 😄 - you welcome!

thbar · 2018-01-23T17:12:17Z

As a heads-up for anyone following this issue, Kiba (from v2 and upward) officially supports the programmatic job declaration which I hinted @joshRpowell about.

This is the best way to handle any form of input & output parameters, much more flexible than parsing an external text file.

thbar added the enhancement label Jul 10, 2015

thbar added a commit that referenced this issue Jul 15, 2015

Allow to pass and return variables to fine-tune parsing [#18]

e1626bc

This is useful when doing programmatic calls to `Kiba.parse`. Note that this is experimental and implementation may change.

thbar added a commit that referenced this issue Jul 15, 2015

Attempt to provide run output [#18]

a2bfa06

This so far underlines that it’s probably going to be hard to call `run` and expect to get an output, without calling `parse` each time before.

chrisb mentioned this issue Jul 30, 2015

Kiba::Job Mixin #21

Closed

thbar closed this as completed Nov 14, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow to pass parameters to run when called programmatically #18

Allow to pass parameters to run when called programmatically #18

thbar commented Jul 10, 2015

bcherrington commented Jul 10, 2015

pcriv commented Jul 14, 2015

thbar commented Jul 15, 2015

thbar commented Jul 15, 2015

thbar commented Jul 15, 2015

thbar commented Nov 14, 2016

joshRpowell commented Feb 20, 2017

thbar commented Feb 20, 2017

thbar commented Feb 20, 2017 •

edited

Loading

joshRpowell commented Feb 21, 2017

thbar commented Feb 21, 2017

thbar commented Jan 23, 2018

Allow to pass parameters to run when called programmatically #18

Allow to pass parameters to run when called programmatically #18

Comments

thbar commented Jul 10, 2015

Points to be investigated

Potential work-arounds for now

Related issues

Implementation points to consider

bcherrington commented Jul 10, 2015

pcriv commented Jul 14, 2015

thbar commented Jul 15, 2015

thbar commented Jul 15, 2015

thbar commented Jul 15, 2015

thbar commented Nov 14, 2016

joshRpowell commented Feb 20, 2017

thbar commented Feb 20, 2017

thbar commented Feb 20, 2017 • edited Loading

joshRpowell commented Feb 21, 2017

thbar commented Feb 21, 2017

thbar commented Jan 23, 2018

thbar commented Feb 20, 2017 •

edited

Loading