Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Options to save model, combine and rank sub-leaderboards #26

Open
2 tasks
tmastny opened this issue May 3, 2018 · 0 comments
Open
2 tasks

Options to save model, combine and rank sub-leaderboards #26

tmastny opened this issue May 3, 2018 · 0 comments

Comments

@tmastny
Copy link
Owner

tmastny commented May 3, 2018

Drake Challenges

I encountered a few challenges using board in a drake pipeline.

  1. Working with DAGs in drake should be parallelizable. However, board is currently designed to write to leadrboard.RDS on each call. That means you can't take advantage of drake's, for fear of a writing conflict.

  2. An output file can only be the target of one command in drake. This means that the current behavior of board is problematic, even if you run everything sequentially (avoiding the problem in 1.). For example,

drake_plan(
  model = train(outcome ~ ., data = iris, method = method__),
  board(model__); file_out("leadrboard.RDS")

is problematic because leadrboard.RDS can only be the target of one of the board calls.

  1. Drake works best when the targets are the outputs of functions.

Proposed Solutions

Here is how leadr::board could be redesigned to better work with a drake workflow.

  • Have an option to not save the to leadrboard.RDS. This would resolve potential writing conflicts describe in one.
  • Have an option where board can aggregate sub leaderboards into a main leaderboard.

The idea for the last is as follows:

plan <- drake_plan(
  model = train(outcome ~ ., data = iris, method = method__),
  results = board(model__, save = FALSE)

Here, board(model__, save = FALSE) would return a leaderboard tibble with only one row for model__.

Then:

plan <- gather_plan(
  plan,
  leadrboard  = [gather results by binding rows into leaderboard tibble],
)
plan <- new_plan_row(
  board(aggregate = leadrboard); file_out("leadrboard.RDS")
)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant