add hyper-parameter optimization api

Dear DVC folk,

### Motivation 

you mention it yourself on your documentation: Fully versioned Hyperparameter Optimization comes to mind when using DVC.

### Little Research

I just made a quick research and it gets apparent very soon that this needs a specific implementation for dvc.

All the existing hyperparameter optimizers like python's hyperopt 
* assume their own hyper-parameter API how the hyperoptimization orchestration process communicates with the single algorithm
* and distribute the computation using their individual distribution machinery

### Suggestions how to integrate to DVC

It seems to me the following is needed for hyperparameter optimization to be a natural addition to DVC:
1. each triggered hyperoptimization orchestration should have its *own git branch subfolder*
1. each single hyperoptimization run should have its *own subbranch* under that subfolder
1. a *file-based hyper-parameter* API, probably based on *json*
    * i.e. the hyper-parameter configurations should be stored in a file format
        * e.g. [spearmint](https://github.com/HIPS/Spearmint/blob/master/examples/simple/config.json) uses a custom json format, [SMAC](https://automl.github.io/SMAC3/dev/options.html#parameter-configuration-space-pcs) a completely custom file-format
    * and in addition also the choosen parameters for a concrete run
        * everything I found by now either passes the hyperparemters python-internally as arguments to a function, or on the commandline as arguments to a script... so no convention to copy, but anyway it is just a dictionary with values.
    * using a common json format this would enable easy tracking/comparing of different parameters accross hyperoptimization git branches similar to how ``dvc metrics`` already work.
    * and also the final run could easily be written as a .dvc routine itself by calling ``dvc repro``
1. it would unbelievably awesome to *not reinvent the wheel* entirely, but provide *wrappers* around existing hyperoptimization* packages like [hyperopt or smac or others](http://ml4aad.org/automl/)

    the principle idea is simple: instead of running a concrete algorithm with the specific framework, you run a wrapper which 
    1. checksout a new hyperoptimization branch
    1. grabs the hyperparameters from the framework-specific API (e.g. as commandline args) and writes them into the new json file format
    1. runs ``dvc repro myalgorithm.dvc`` on a previously specified routine ``myalgorithm.dvc``
    1. commits everything on the branch
    1. somehow find out the winner of the hyper-optimization, create a specific branch for this, and commit everything nicely.

    wrapping existing optimization frameworks has several advantages
    * less code to maintain and also only against stable APIs
    * monitoring webui and else for evaluating or live-inspecting the hyperoptimization may be already available
    * the community could be new wrappers

Of course more details will pop up while actually implementing this, e.g. how to integrate hyperoptimization with .dvc pipeline files as neatly as possible (for instance we may want to commit both the single run.dvc as well as a hyperopt.dvc to the same repository -- these need to interact seamlessly together)

What do you think about this suggested approach?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add hyper-parameter optimization api #2532

Motivation

Little Research

Suggestions how to integrate to DVC

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

add hyper-parameter optimization api #2532

Description

Motivation

Little Research

Suggestions how to integrate to DVC

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions