Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NULL model blows up memory usage with parallelised fits #230

Closed
hongooi73 opened this issue Feb 15, 2020 · 17 comments
Closed

NULL model blows up memory usage with parallelised fits #230

hongooi73 opened this issue Feb 15, 2020 · 17 comments

Comments

@hongooi73
Copy link
Contributor

hongooi73 commented Feb 15, 2020

model() sometimes blows up and eats all the memory on my machine (32GB, Win 10 Pro). I'll try to make a reprex. It seems kind of random though.

@mitchelloharawild
Copy link
Member

A reprex for this would be great.

@hongooi73
Copy link
Contributor Author

Here's a reprex:

library(tidyr)
library(dplyr)
library(tsibble)
library(feasts)
library(fable)


data(orangeJuice, package="bayesm")

start_date <- as.Date("1970-01-01")

oj_data <- orangeJuice$yx %>%
    complete(store, brand, week) %>%
    mutate(week=yearweek(start_date + week*7)) %>%
    as_tsibble(index=week, key=c(store, brand))

subset_oj_data <- function(start, end)
{
    start <- yearweek(start_date + start*7)
    end <- yearweek(start_date + end*7)
    filter(oj_data, week >= start, week <= end)
}

ncores <- max(2, parallel::detectCores(logical=FALSE) - 2)
cl <- parallel::makeCluster(ncores, type="PSOCK")
parallel::clusterEvalQ(cl,
{
    library(feasts)
    library(fable)
    library(tsibble)
})

res_par <- parallel::parLapply(cl, list(subset_oj_data(40, 135)), function(df)
{
    model(df,
        ets=ETS(logmove ~ error("A") + trend("A") + season("N"))
    )
})

On my laptop, this uses 10GB of memory before returning.

  • The problem only occurs with a cluster. If I replace the parLapply with a regular lapply, everything is fine. The returned object is ~12MB in size.
  • The problem occurs if there are NAs in the data (which ETS can't handle). If I insert a fill(everything()) in the first pipeline, everything is also fine.

As an aside, it takes ETS quite a bit of time to realise that there are NAs... could this be made more efficient?

@mitchelloharawild
Copy link
Member

By memory usage, are you referring to the object size of the returned object?
This is likely due to the environments being transferred/stored from the parallel calls, which is something that will be worked on. Parallel processing is supported natively (see model() docs) using future::plan(), it still needs work to transfer less information between nodes, but it will get better.

@hongooi73
Copy link
Contributor Author

No, I mean the memory usage as shown in Task Manager

@hongooi73
Copy link
Contributor Author

Btw, I see the same problem in an Ubuntu VM. The object returned from the parLapply call is only 12MB, but the R process is taking up 10GB of memory.

@hongooi73 hongooi73 changed the title Memory usage blows up sometimes NULL model blows up memory usage with parallelised fits Feb 18, 2020
@hongooi73
Copy link
Contributor Author

Is there a way to generate a null model from scratch? Would probably save a lot of time in trying to reproduce this problem.

@mitchelloharawild
Copy link
Member

fabletools::null_model().

I'm surprised that it is the null models which cause this issue.

@hongooi73
Copy link
Contributor Author

Ok, I just tried null_model() and that returns something completely different to what I get.

> str(null_model())
Classes 'mdl_defn', 'R6' <mdl_defn>
  Public:
    add_data: function (.data) 
    check: function (.data) 
    clone: function (deep = FALSE) 
    data: NULL
    env: environment
    extra: list
    formula: quosure, formula
    initialize: function (formula, ..., .env) 
    model: null_mdl
    prepare: function (...) 
    print: function (...) 
    recall_lag: function (x, n = 1L, ...) 
    recent_data: NULL
    remove_data: function () 
    specials: environment
    stage: NULL
    train: function (.data, ...)  

Here is an example failed model fit from ETS:

> foo$ets[[1]]
Series: logmove 
Model: NULL model 
NULL model> 

> z <- foo$ets[[1]]

> z
Series: logmove 
Model: NULL model 
NULL model

> object.size(z)
10064 bytes


> unclass(z)
$fit
$n
[1] 95

$vars
[1] "logmove"

attr(,"class")
[1] "null_mdl"

$model
<null_mdl model definition>

$data
# A tsibble: 95 x 2 [1W]
       week logmove
 *   <week>   <dbl>
 1 1990 W25    9.02
 2 1990 W26   NA   
 3 1990 W27   NA   
 4 1990 W28   NA   
 5 1990 W29   NA   
 6 1990 W30   NA   
 7 1990 W31    8.72
 8 1990 W32    8.25
 9 1990 W33    8.99
10 1990 W34   NA   
# … with 85 more rows

$response
$response[[1]]
logmove


$transformation
$transformation[[1]]
Transformation: .x
Backtransformation: .x

It's probably not the null model object that is the problem, but all the other bits that get returned from model.

@mitchelloharawild
Copy link
Member

null_model() gives a model definition, much like ETS().

library(fabletools)
tsibble::pedestrian %>% 
  model(null_model(Count))
#> # A mable: 4 x 2
#> # Key:     Sensor [4]
#>   Sensor                        `null_model(Count)`
#>   <chr>                                     <model>
#> 1 Birrarung Marr                       <NULL model>
#> 2 Bourke Street Mall (North)           <NULL model>
#> 3 QV Market-Elizabeth St (West)        <NULL model>
#> 4 Southern Cross Station               <NULL model>

Created on 2020-02-20 by the reprex package (v0.3.0)

@mitchelloharawild
Copy link
Member

My guess again is the environments held in the transformation.

@hongooi73
Copy link
Contributor Author

Yup, I just tried the following;

mod_list <- parallel::parLapply(cl, oj_train, function(df) model(df, null=null_model(logmove)))
mod_list
# A mable: 913 x 3
# Key:     store, brand [913]
   store brand null        
   <int> <int> <model>     
 1     2     1 <NULL model>
 2     2     2 <NULL model>
 3     2     3 <NULL model>
 4     2     4 <NULL model>
 5     2     5 <NULL model>
 6     2     6 <NULL model>
 7     2     7 <NULL model>
 8     2     8 <NULL model>
 9     2     9 <NULL model>
10     2    10 <NULL model>
# … with 903 more rows
str(mod_list[[1]]$null[[1]])
List of 1
 $ fit           :List of 2
  ..$ n   : int 95
  ..$ vars: chr "logmove"
  ..- attr(*, "class")= chr "null_mdl"
 - attr(*, "class")= chr "mdl_ts"

That's very truncated compared to the "real" null model above.

@hongooi73
Copy link
Contributor Author

hongooi73 commented Feb 20, 2020

No, cancel that, it appears to be the same structure. So there must be an environment that ETS is capturing that null_model doesn't.

unclass(mod_list[[1]]$null[[1]])
$fit
$n
[1] 95

$vars
[1] "logmove"

attr(,"class")
[1] "null_mdl"

$model
<null_mdl model definition>

$data
# A tsibble: 95 x 2 [1W]
       week logmove
 *   <week>   <dbl>
 1 1990 W25    9.02
 2 1990 W26   NA   
 3 1990 W27   NA   
 4 1990 W28   NA   
 5 1990 W29   NA   
 6 1990 W30   NA   
 7 1990 W31    8.72
 8 1990 W32    8.25
 9 1990 W33    8.99
10 1990 W34   NA   
# … with 85 more rows

$response
$response[[1]]
logmove


$transformation
$transformation[[1]]
Transformation: .x
Backtransformation: .x

@hongooi73
Copy link
Contributor Author

hongooi73 commented Feb 20, 2020

Here's an even simpler reprex.

library(fable)
library(feasts)
library(tsibble)

set.seed(12345)
df <- expand.grid(x=1:1000, t=1:100)
df$y <- runif(nrow(df))
miss <- rbinom(nrow(df), 1, 0.25)
df$y[as.logical(miss)] <- NA
df$t <- as.Date("1970-01-01") + df$t

df <- as_tsibble(df, key=x, index=t)

cl <- parallel::makeCluster(4)
parallel::clusterEvalQ(cl,
{
    library(fable)
    library(feasts)
    library(tsibble)
})

bad <- parallel::parLapply(cl, list(df, df, df, df), function(df)
{
    model(df, ets=ETS(y ~ error("A") + trend("A") + season("N")))
})

This chews up memory on both my Windows laptop and an Ubuntu VM in Azure. Interestingly, if I replace the ETS model with a null_model, then there is no problem.

good <- parallel::parLapply(cl, list(df, df, df, df), function(df)
{
    model(df, null=null_model(y))
})

@mitchelloharawild
Copy link
Member

Looks like this is also an issue without parallel (to a lesser extent).


> bench::mark(
+   model(df, null=ETS(y ~ error("A") + trend("A") + season("N"))),
+   model(df, null=null_model(y)),
+   check = FALSE
+ )
|====================================================================================================================================================== |100% ~0 s remaining     # A tibble: 2 x 13
  expression                                                            min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time result memory         time   gc         
  <bch:expr>                                                       <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm> <list> <list>         <list> <list>     
1 model(df, null = ETS(y ~ error("A") + trend("A") + season("N")))   17.61s   17.61s    0.0568      51MB     3.86     1    68     17.61s <NULL> <df[,3] [149,… <bch:… <tibble [1…
2 model(df, null = null_model(y))                                     1.83s    1.83s    0.547     14.5MB     3.83     1     7      1.83s <NULL> <df[,3] [26,1… <bch:… <tibble [1…

@mitchelloharawild
Copy link
Member

Added some performance improvements to the model parser.
Still more that can be done here, I expect that the model parser can be made 2x faster without too much trouble.


> bench::mark(
+   model(df, null=ETS(y ~ error("A") + trend("A") + season("N"))),
+   model(df, null=null_model(y)),
+   check = FALSE
+ )
|====================================================================================================================================================== |100% ~0 s remaining     # A tibble: 2 x 13
  expression                                                            min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time result memory         time   gc         
  <bch:expr>                                                       <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm> <list> <list>         <list> <list>     
1 model(df, null = ETS(y ~ error("A") + trend("A") + season("N")))   10.63s   10.63s    0.0940    40.1MB     4.14     1    44     10.63s <NULL> <df[,3] [100,… <bch:… <tibble [1…
2 model(df, null = null_model(y))                                     1.51s    1.51s    0.661     14.5MB     5.29     1     8      1.51s <NULL> <df[,3] [34,2… <bch:… <tibble [1…

@hongooi73
Copy link
Contributor Author

hongooi73 commented Feb 20, 2020

Well, I would (maybe) expect ETS to take more time and resources, since it's actually trying to fit a model. null_model can return immediately since it doesn't have to do anything. This seems to be separate from the parallel issue, where returning a buggy object back to the master triggers a massive memory allocation. Note that if I replace the parLapply with a regular lapply, then there's no problem.

ETS could probably check for NAs right at the start, and return immediately if found. This should reduce the resource requirements to the minimum.

I'm trying to find a similar reprex for ARIMA, but it seems to be behaving well so far.

@mitchelloharawild
Copy link
Member

Closing as this issue is largely driven by tidyverts/fabletools#146

Parallel processing performance will be optimised in the next two months.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants