-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implement future.apply
in forecast
as it is in model
#268
Comments
A PR for this would be great, thanks! |
My suggestion for this would be to drop the use of |
I've got to pull a couple of errant commits out, but I'll send it along momentarily. I reversed the order of the |
I'm happy to make a util function that does the iteration piece. It would essentially start with something that essentially at the point of having been |
I think it could be safer / more efficient if the internal util would also handle the |
Makes sense, and shouldn't be any tougher. It won't |
An example of returning to the previous structure is when the operation returns a For example, Adding/extending the apply-util for this can be done later. |
Makes sense. Should be straightforward enough. |
To make modeling and forecasting more memory efficient for Otherwise, we might need to crawl across the functions within the call to find all of their |
Just following along as I use fable for millions of time series. If this is in the newest version of the Github release I'd be happy to help test, otherwise just saying I appreciate the work here! |
On further thought and a bug report, converting to longer before parallelising isn't appropriate because the column structure defines the way in which the forecasts are handled (such as reconciliation). I'll have a go at adding the conditionally parallel mapply function. |
Look at what's in my branch right now. There was an issue with how |
Using what's currently in the
|
Output for me was:
Obviously, the |
Thanks, looks promising. Do you know if the data transfer overhead for multicore is lower than multisession? I get similar speed-up: [1] "Sequential modeling took 8.717845 secs"
[1] "Sequential forecasting took 3.696366 secs"
[1] "Multicore modeling took 3.74861 secs"
[1] "Multicore forecasting took 2.048967 secs"
[1] "Multisession modeling took 11.08729 secs"
[1] "Multisession forecasting took 2.403781 secs" |
So the underlying issue with the overhead is that |
You can reduce the overhead for |
Don't hate me for being super hacky, but throw this in after you build
My results were:
It's an interesting question about the overhead. I need to run it on something much larger. |
I extended the data out to 60 keys. Also, I killed off Chrome and a bunch of other things to free up more cores.
|
I added a second key to them and it still worked just fine. Tomorrow morning, I'm going to test it on the process for work, which takes about 2 hours sequentially, but the machine has 80 cores that are used during modeling but not during forecasting. Yet. It's a bear. |
Sounds great. I think there are some low hanging fruit for performance improvements which I'll try to get today. |
Here's the base of an
Anything written as an |
I agree that adding progressr reporting here is appropriate. This is what I'm working with at the moment. mapply_maybe_parallel <- function (.f, ..., MoreArgs = list(), SIMPLIFY = FALSE) {
if(is_attached("package:future")){
require_package("future.apply")
future.apply::future_mapply(
FUN = .f,
...,
MoreArgs = MoreArgs,
SIMPLIFY = SIMPLIFY,
future.globals = FALSE
)
}
else{
mapply(
FUN = .f,
...,
MoreArgs = MoreArgs,
SIMPLIFY = SIMPLIFY
)
}
}
mable_apply <- function (.data, .f, ..., names_to = ".model") {
mv <- mable_vars(.data)
kv <- key_vars(.data)
# Compute .f for all models in mable
result <- lapply(
as_tibble(.data)[mv],
mapply_maybe_parallel,
.f = .f,
.data,
MoreArgs = dots_list(...)
)
num_rows <- lapply(result, vapply, nrow, integer(1L))
# Assume same tsibble structure for all outputs
first_result <- result[[1]][[1]]
result <- vec_rbind(!!!lapply(result, function(x) vec_rbind(!!!x)))
# Compute .model label
model <- rep.int(mv, vapply(num_rows, sum, integer(1L)))
# Repeat key structure as needed
.data <- .data[rep.int(seq_along(num_rows[[1]]), num_rows[[1]]), kv]
# Combine into single
.data <- bind_cols(.data, !!names_to := model, result)
if (is_tsibble(first_result)) {
.data <- build_tsibble(
.data, key = c(kv, names_to, key_vars(first_result)),
index = index_var(first_result), index2 = !!index2(first_result),
ordered = is_ordered(first_result),
interval = interval(first_result))
}
return(.data)
} |
I like it. Here's something for the progress bar (because sleep is silly). Obviously, you're paying a lot more attention to the additional arguments getting passed through, which is great.
I was using it out on |
I definitely appreciate your refusal to play their |
As a note, the issue in mitchelloharawild/fable.prophet#17 continues. It doesn't happen with One way to solve this would be to attached the holidays to the It's an edge case and can be handled through a warning, but it seems like |
Long term I hope for a better holiday framework, likely powered by {almanac} and representing holidays as part of the data object. |
I assume that integrating Perhaps, when parsing |
I don't know if you're using it, @mitchelloharawild , but |
Using
fable
at any kind of scale gets painful if computation of forecasts isn't also parallelized. I've got a possible implementation I'll PR tomorrow.The text was updated successfully, but these errors were encountered: