Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: External Config Design #180

Open
dorranh opened this issue Oct 28, 2020 · 2 comments
Open

Discussion: External Config Design #180

dorranh opened this issue Oct 28, 2020 · 2 comments

Comments

@dorranh
Copy link
Contributor

dorranh commented Oct 28, 2020

As noted in the PR (#69) containing the initial version of external configuration support, there are some open design questions regarding the granularity at which a pipeline author should specify configurable task arguments. This issue is to record those thoughts and serve as a place for further discussion.


This breaks down into two related questions:

  1. Should flow declarations specify the source for their configuration (e.g. FromEnv/FromFile) or should the user specify this
    at a higher level, prior to calling runFlow?
  2. Should runFlow handle parsing config values from external sources (e.g. reading config files), or should this be
    done by the user prior to calling runFlow?

These questions could be addressed in a couple of different ways which would produce the following interfaces:

Case 1 (implemenation in tweag/funflow2#69)

flow = dockerFlow DockerTaskConfig {args=[FromEnv "FOO"]}
runFlow flow

Pros:

  • Simpler to invoke pipeline - runFlow automatically checks the environment for $FOO
  • Easier to reason about - config for a task can only come from one place

Cons:

  • If you build a library of specific tasks, users will have to pass in the config the way you specify
    • For organizations, this may not be much of an issue - e.g. we specify password inputs as environment variables and
      all employees must adhere to this.
    • For flows shared publicly this might be more annoying since people have different preferences. How often will flows
      themselves be shared outside of an organization?

Case 2.

Move config loading out one level

flow = dockerFlow DockerTaskConfig {args=[FromEnv "FOO"]}
config = readConfig $ getFlowConfigKeys flow
runFlow flow config

Pros:

  • Same as 1
  • Also separates out the logic for reading config from the environment and allows one to pass
    in specific values for testing
  • Already need a getFlowConfigKeys function to be able to automatically generate a CLI anyways

Cons:

  • Same as 1
  • Invoking the pipeline takes an extra step

Case 3.

Move specification of config sources out of flow declarations and have the user provide configs:

flow = dockerFlow DockerTaskConfig {args=[Configurable "FOO"]}
config = HashMap.fromList [("FOO", fromEnv "FOO")]
runFlow flow config

Pros:

  • Allows flows to be shared more flexibly since it abstracts the source of a config out of the task and leaves it to the user to handle getting the config.
  • Might be simpler to test since you could pass in values to runFlow without having it read from external sources

Cons:

  • Requires extra work on the part of the user - they have to be sure to pass in the required config when calling runFlow
  • If configurable args in a flow are a more general type, makes it more difficult to automatically construct a CLI unless we make it so that any configurable argument can be configured by the CLI.
@GuillaumeDesforges
Copy link
Contributor

Thanks a lot for this clarifying post :)


I don't really understand Case 2. Could you please elaborate on the line:

config = readConfig $ getFlowConfigKeys flow

One could also consider

Case 4.

The runFlow tries to load all configuration in order from all sources:

  1. CLI
  2. config file
  3. environment
flow =  dockerFlow DockerTaskConfig {args=[Configurable "foo.bar"]}

runFlow flow input ()

and this foo.bar value can be passed:

  1. from CLI myexecutable --foo-bar someValue
  2. from file, e.g. YAML foo: { bar: someValue }
  3. from environment FOO_BAR="someValue" myexecutable

My guess is that the config values will always come as string/text (no matter if it is from a file, from CLI or env).

Parsing to other type (Int, Float, other) could take place in specific interpreters where needed.

@dorranh
Copy link
Contributor Author

dorranh commented Oct 29, 2020

I don't really understand Case 2. Could you please elaborate on the line:

Ya, so one thing I realized is that to generate CLI flags using the configurables in a flow, we probably need to provide a function, getFlowConfigKeys or something similar, which can gather required config keys from the various tasks in a flow (e.g. prior to weaving it) so that we know which arguments the CLI will accept.

Since there might be additional top-level CLI arguments that we want to add besides those for the configurable values, it probably makes the most sense to do this outside of runFlow, that way the user could do something like the following (e.g. with optparse-applicative) :

import Options.Applicative

runFlowWithCLI -> Flow a b -> a -> IO b
runFlowWithCLI flow = do
   
   let  -- Create the main CLI options, maybe with other sub-commands, etc.
         topLevelCLI = (...) :: Parser
         -- Create the flow-specific configuration options
         flowCLIOpts = flowCLI flow

   -- Run the command line parser
   cliOpts <-  execParser (topLevelCLI <*> flowCLIOpts)

   -- Pass in the parsed options to runFlowWithConfig. We could add
   -- a field to `FlowConfig` for explicitly passing in config values.
   runFlowWithConfig flow $ configFromOpts cliOpts

   -- or, we can also handle reading env variables and the config file here prior
   -- to calling runFlowWithConfig.

where
  -- | Traverses a flow and builds a set of CLI options using the flow's 
  -- configurable fields. This could call the `getFlowConfigKeys` function I mentioned.
  flowCLI :: Flow -> Parser CLIOpts
  -- | Converts a parsed CLIOpts to a FlowConfig
  configFromOpts :: CLIOpts -> FlowConfig

@dorranh dorranh transferred this issue from tweag/funflow2 Jul 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants