Short answer: Employing the factory method pattern allows us to **decouple the pipeline from the specific steps**: Both depend only on a shared interface (how to create a step from the factory, and how to run a given step). However, the pipeline does not have to know any more details about which kind of step it is running. This also gives the user the ability to extend the application  (such as to add new types of steps). 


# Questions/Challenges:


## Not all arguments are available until configs are loaded
- Pipeline facade can't own step config (unless it queries step factory first)
- Step factory could return both step config and step
- ...but the best place to store step config is in step (step "has a" step_config). The alternative is to store steps as a mapping of steps to step configs, which is less intuitive (and results in awkward  naming).

## How to link steps to their configs?
- **To discover relevant configs for a given step, adopt the convention that the step name corresponds to subfolder name**, i.e. `configs/${env}/${step_name}/.env`.
- Thus, the only remaining problem is to match a given step_factory to its config folder (which will be a 1:1 mapping).
- Possible solutions: When creating pipeline facade (?):
  - pass list of dataclasses, named dicts, etc., each consisting of step-factory and ~~config path~~ step name,
    - works, but makes nameing awkward, so user-facing code is less readable.
  -  initialize each step factory with the step *name* only (which in turn identifies configs? 
      - This would simplify facade.

## How should user define steps?
- Option 1: initialize pipeline facade with step factory *classes* + step name
  - Advantage: Since initialization takes place in the façade, which has reference to shared config, etc., we can pass on these references to the step factory. As a result, we don't need a wrapper method to create step from the façade, but can call it directly on the step factory.
  - Disadvantage: Results in awkward naming, and requires creating a custom class (for type checking):


In [5]:
class StepFactoryAndName:
    step_factory: StepFactory  # type: ignore
    step_name: str

- Option 2: **Initialize pipeline façade with *instances* of step factory**
  - Advantage: Avoids awkward naming and the need to create custom data class 

Decision: Go with Option 2, since it makes the code the user interacts with simpler.

## Should factory be instantiated with env, shared_config, etc, or should these references instead be passed as parameters in method calls?
- Option 1: **Pass config objects as parameters in method calls to step factory.** Since it is the pipeline_facade that would then hold these references, we would instead have to **call methods to create a step etc. from the facade**, not the factory. 
  - Advantage of passing during method call is that this **simplifies the code written by library user**, who needs to instantiate step factories to pass to pipeline facade, and only has to supply the step name as an argument (to associate it with the right configs).
- Option 2: Instantiate factory with env, shared_config, etc.
  - Advantage: Avoids needing wrapper methods on the pipeline_facade.-> leads to simpler code overall.
  - Disadvantage: User can't do the instantiation (because we want to abstract loading configs etc from them), so we have to find another way of user associating factory with a step name (e.g., pass a tuple of factory and step name), which makes it a little more awkward and less type safe (unless we use custom data class, which makes code a little more complicated).
    - -> Will have to pass configs as args when instantiating step actor (processor, etc)

Decision: Choose 1), because it is more important to simplify the code the user has to write, since this changes more often and is written more often.