Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow to pass parameters to run when called programmatically #18

Closed
thbar opened this issue Jul 10, 2015 · 12 comments
Closed

Allow to pass parameters to run when called programmatically #18

thbar opened this issue Jul 10, 2015 · 12 comments

Comments

@thbar
Copy link
Owner

thbar commented Jul 10, 2015

When calling Kiba.run programmatically (which apparently a growing number of users are doing), it would be very useful to be able to pass parameters to configure the run.

Points to be investigated

  • Must think about the best way to implement this.
  • Some internal classes (Control) should be made public and documented properly, maybe renamed (JobDefinition?).
  • This will encourage calling run multiple time on the same Control, so I must consider the potential issues with this scenario (e.g. if you store some @state in the control, you'd have to initialize it)
  • When used from Sidekiq or MT in general, give advice to eager load classes?
  • Will this leak memory or not?

Potential work-arounds for now

See 97758bb and ca5240a which may provide a way to do that now.

Related issues

Implementation points to consider

At this point I believe an input is mostly useful at parse time, for instance:

  • to loop over a passed variable and declare one source per item.
  • to use the input to feed a Source configuration.

We should also make sure that there is a way to ultimately pass that from both API calls and command-line.

@bcherrington
Copy link

I have bene giving the passing-in and returning of parameters some thought too.

I have been toying with the idea of extending the DSL language to include “param” and “return” statements.

The “param” statement would be used to unpack parameters passed into the script to instance variables.

Similarly the “return” statement would pack any instance variables for returning back to the caller.

I agree that executing ‘run’ multiple times on the ‘control’ class needs careful consideration. One thought I had, was to parse the DSL and create a class which could then be instantiated and run by calling it’s methods directly. I think this approach would allow the garbage collector to free up unreferenced instances. However, there is a risk that approach this might detract from the simplicity of the code.

I have not given much thought to eager loading of the scripts because of the previous consideration. From a SideKiq worker, I call Kiba.run (including the script loading and parsing) each time with the script name and parameters (using my fork). I instantiate a new worker for each different tenant configured on the application.

Ideally, it would be nice to initialise (load and parse) all the different scripts and then just call the worker with parameters for the script. I still need to learn how I would try to accomplish this with SideKiq. I assume I would need to add some code to the ‘Sidekiq.configure_server’ block in the initialiser to tell SideKiq which scripts to load and label them for calling later.

From: Thibaut Barrère [mailto:notifications@github.com]
Sent: 10 July 2015 08:39
To: thbar/kiba
Subject: [kiba] Allow to pass parameters to run when called programmatically (#18)

When calling Kiba.run programmatically (which apparently a growing number of users are doing), it would be very useful to be able to pass parameters to configure the run.

Points to be investigated

  • Must think about the best way to implement this.
  • Some internal classes (Control) should be made public and documented properly, maybe renamed (JobDefinition?).
  • This will encourage calling run multiple time on the same Control, so I must consider the potential issues with this scenario (e.g. if you store some @State in the control, you'd have to initialize it)
  • When used from Sidekiq or MT in general, give advice to eager load classes?
  • Will this leak memory or not?

Related issues


Reply to this email directly or view it on GitHub #18 . https://github.com/notifications/beacon/AA8qKsZHYT-wY6X7TEEOgfVCQ3S65jNzks5ob2AAgaJpZM4FV3AS.gif

@pcriv
Copy link

pcriv commented Jul 14, 2015

+1

@thbar
Copy link
Owner Author

thbar commented Jul 15, 2015

@bcherrington thanks for the thoughts, appreciated! I'm leaning toward a kind of context registry that would be accessible from the script, with two keys (or more) for input and ouput. I probably need to decouple things a bit more (but make sure to keep things simple).

In all cases thanks, I'll try to play around with these ideas and see if I can implement a clean, expandable pattern.

thbar added a commit that referenced this issue Jul 15, 2015
This is useful when doing programmatic calls to `Kiba.parse`. Note that
this is experimental and implementation may change.
thbar added a commit that referenced this issue Jul 15, 2015
This so far underlines that it’s probably going to be hard to call
`run` and expect to get an output, without calling `parse` each time
before.
@thbar
Copy link
Owner Author

thbar commented Jul 15, 2015

A first experiment is available. This implementation makes it more or less mandatory to call parse before each call to run (well depending on what is needed).

Maybe a better implementation could rely on some sort of Kiba.registry[:variable], but then I'll need to make sure this works with multi-threading (#19).

@thbar
Copy link
Owner Author

thbar commented Jul 15, 2015

I also note that I initially wanted a clear separation between the parsing and run, but ultimately this approach may have drawbacks like this one too. Will ponder that.

@chrisb chrisb mentioned this issue Jul 30, 2015
@thbar
Copy link
Owner Author

thbar commented Nov 14, 2016

Closing this one for now, since programmatic calls via API are not the main intended use today. I'll revisit this if/when I officially support programmatic calls.

@thbar thbar closed this as completed Nov 14, 2016
@joshRpowell
Copy link

@thbar thanks for this great gem!

One of our projects was using the parsing-params branch from the main repo for the last 9+ months (gem 'kiba', git: 'https://github.com/thbar/kiba.git', branch: 'parsing-params') as noted above:

A first experiment is available. This implementation makes it more or less mandatory to call parse before each call to run (well depending on what is needed).

This branch was recently deleted. The dev who worked on the implementation is no longer with us. Can you remind me what this branch accomplished? We're getting the following:

ArgumentError: wrong number of arguments (given 3, expected 0..2)

we're running this method in a rake task:

  def run_etl(carrier, etl_filename)
    script_content = IO.read(etl_filename)
    stack = carrier.id
    # pass etl_filename to line numbers on errors
    job_definition = Kiba.parse(script_content, etl_filename, stack)
    Kiba.run(job_definition)
  end

@thbar
Copy link
Owner Author

thbar commented Feb 20, 2017

@joshRpowell glad you like Kiba, and sorry for your trouble today!

Indeed this was an experimental branch which ultimately is not exactly how I want to solve that problem, so I deleted it (but hadn't thought someone would actually be using it).

I will make sure to mark such branches explicitely as "experimental" in the future, sorry about that!

To solve the problem right now, I suggest you fork Kiba then cherry-pick the commits here, since I think it should mostly be this.

Let me know how it goes!

@thbar
Copy link
Owner Author

thbar commented Feb 20, 2017

@joshRpowell another possibility (since I do think you are using Kiba from a background job, right?) would be to rely on the block form of Kiba.parse to handle your definitions. This is what I use now from Sidekiq. Please note that this is a bit "fresh" as an approach, but here it comes, still:

module ETL
  module MySync
    module_function
    
    # variables could be a string, but also a Sequel connection for instance
    def setup(source_file, sequel_connection)
      Kiba.parse do
        source CSVSource, filename: source_file
        # SNIP
        destination MyDBDestination, sequel_connection: sequel_connection
      end
    end
  end
end

Which you then call with:

Kiba.run(ETL::MySync.setup(source_file, sequel_connection))

Please note that in case of errors here, Kiba.run will not call close on your destinations, so this is up to you to handle resources properly.

This approach will require a bit of refactoring on your side but it also more future-proof if you are using background jobs.

I'll address that in an upcoming blog post - hope this helps though!

@joshRpowell
Copy link

@thbar the cherry-pick worked. back up and running.

We are using Kiba from a Sidekiq job. I like your recommended approach and agree, some refactoring is required. I'll be in touch after I give it a try.

Thanks again!

@thbar
Copy link
Owner Author

thbar commented Feb 21, 2017

@joshRpowell glad to hear everything's fine now 😄 - you welcome!

@thbar
Copy link
Owner Author

thbar commented Jan 23, 2018

As a heads-up for anyone following this issue, Kiba (from v2 and upward) officially supports the programmatic job declaration which I hinted @joshRpowell about.

This is the best way to handle any form of input & output parameters, much more flexible than parsing an external text file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants