NULL model blows up memory usage with parallelised fits #230

hongooi73 · 2020-02-15T12:09:57Z

model() sometimes blows up and eats all the memory on my machine (32GB, Win 10 Pro). I'll try to make a reprex. It seems kind of random though.

The text was updated successfully, but these errors were encountered:

mitchelloharawild · 2020-02-15T12:21:24Z

A reprex for this would be great.

hongooi73 · 2020-02-16T08:04:29Z

Here's a reprex:

library(tidyr)
library(dplyr)
library(tsibble)
library(feasts)
library(fable)


data(orangeJuice, package="bayesm")

start_date <- as.Date("1970-01-01")

oj_data <- orangeJuice$yx %>%
    complete(store, brand, week) %>%
    mutate(week=yearweek(start_date + week*7)) %>%
    as_tsibble(index=week, key=c(store, brand))

subset_oj_data <- function(start, end)
{
    start <- yearweek(start_date + start*7)
    end <- yearweek(start_date + end*7)
    filter(oj_data, week >= start, week <= end)
}

ncores <- max(2, parallel::detectCores(logical=FALSE) - 2)
cl <- parallel::makeCluster(ncores, type="PSOCK")
parallel::clusterEvalQ(cl,
{
    library(feasts)
    library(fable)
    library(tsibble)
})

res_par <- parallel::parLapply(cl, list(subset_oj_data(40, 135)), function(df)
{
    model(df,
        ets=ETS(logmove ~ error("A") + trend("A") + season("N"))
    )
})

On my laptop, this uses 10GB of memory before returning.

The problem only occurs with a cluster. If I replace the parLapply with a regular lapply, everything is fine. The returned object is ~12MB in size.
The problem occurs if there are NAs in the data (which ETS can't handle). If I insert a fill(everything()) in the first pipeline, everything is also fine.

As an aside, it takes ETS quite a bit of time to realise that there are NAs... could this be made more efficient?

mitchelloharawild · 2020-02-16T08:40:38Z

By memory usage, are you referring to the object size of the returned object?
This is likely due to the environments being transferred/stored from the parallel calls, which is something that will be worked on. Parallel processing is supported natively (see model() docs) using future::plan(), it still needs work to transfer less information between nodes, but it will get better.

hongooi73 · 2020-02-16T08:41:24Z

No, I mean the memory usage as shown in Task Manager

hongooi73 · 2020-02-16T08:46:55Z

Btw, I see the same problem in an Ubuntu VM. The object returned from the parLapply call is only 12MB, but the R process is taking up 10GB of memory.

hongooi73 · 2020-02-20T07:43:51Z

Is there a way to generate a null model from scratch? Would probably save a lot of time in trying to reproduce this problem.

mitchelloharawild · 2020-02-20T08:40:05Z

fabletools::null_model().

I'm surprised that it is the null models which cause this issue.

hongooi73 · 2020-02-20T08:49:30Z

Ok, I just tried null_model() and that returns something completely different to what I get.

> str(null_model())
Classes 'mdl_defn', 'R6' <mdl_defn>
  Public:
    add_data: function (.data) 
    check: function (.data) 
    clone: function (deep = FALSE) 
    data: NULL
    env: environment
    extra: list
    formula: quosure, formula
    initialize: function (formula, ..., .env) 
    model: null_mdl
    prepare: function (...) 
    print: function (...) 
    recall_lag: function (x, n = 1L, ...) 
    recent_data: NULL
    remove_data: function () 
    specials: environment
    stage: NULL
    train: function (.data, ...)

Here is an example failed model fit from ETS:

> foo$ets[[1]]
Series: logmove 
Model: NULL model 
NULL model> 

> z <- foo$ets[[1]]

> z
Series: logmove 
Model: NULL model 
NULL model

> object.size(z)
10064 bytes


> unclass(z)
$fit
$n
[1] 95

$vars
[1] "logmove"

attr(,"class")
[1] "null_mdl"

$model
<null_mdl model definition>

$data
# A tsibble: 95 x 2 [1W]
       week logmove
 *   <week>   <dbl>
 1 1990 W25    9.02
 2 1990 W26   NA   
 3 1990 W27   NA   
 4 1990 W28   NA   
 5 1990 W29   NA   
 6 1990 W30   NA   
 7 1990 W31    8.72
 8 1990 W32    8.25
 9 1990 W33    8.99
10 1990 W34   NA   
# … with 85 more rows

$response
$response[[1]]
logmove


$transformation
$transformation[[1]]
Transformation: .x
Backtransformation: .x

It's probably not the null model object that is the problem, but all the other bits that get returned from model.

mitchelloharawild · 2020-02-20T08:53:00Z

null_model() gives a model definition, much like ETS().

library(fabletools)
tsibble::pedestrian %>% 
  model(null_model(Count))
#> # A mable: 4 x 2
#> # Key:     Sensor [4]
#>   Sensor                        `null_model(Count)`
#>   <chr>                                     <model>
#> 1 Birrarung Marr                       <NULL model>
#> 2 Bourke Street Mall (North)           <NULL model>
#> 3 QV Market-Elizabeth St (West)        <NULL model>
#> 4 Southern Cross Station               <NULL model>

^{Created on 2020-02-20 by the reprex package (v0.3.0)}

mitchelloharawild · 2020-02-20T08:53:50Z

My guess again is the environments held in the transformation.

hongooi73 · 2020-02-20T08:59:35Z

Yup, I just tried the following;

mod_list <- parallel::parLapply(cl, oj_train, function(df) model(df, null=null_model(logmove)))
mod_list

# A mable: 913 x 3
# Key:     store, brand [913]
   store brand null        
   <int> <int> <model>     
 1     2     1 <NULL model>
 2     2     2 <NULL model>
 3     2     3 <NULL model>
 4     2     4 <NULL model>
 5     2     5 <NULL model>
 6     2     6 <NULL model>
 7     2     7 <NULL model>
 8     2     8 <NULL model>
 9     2     9 <NULL model>
10     2    10 <NULL model>
# … with 903 more rows

str(mod_list[[1]]$null[[1]])

List of 1
 $ fit           :List of 2
  ..$ n   : int 95
  ..$ vars: chr "logmove"
  ..- attr(*, "class")= chr "null_mdl"
 - attr(*, "class")= chr "mdl_ts"

That's very truncated compared to the "real" null model above.

hongooi73 · 2020-02-20T09:01:41Z

No, cancel that, it appears to be the same structure. So there must be an environment that ETS is capturing that null_model doesn't.

unclass(mod_list[[1]]$null[[1]])

$fit
$n
[1] 95

$vars
[1] "logmove"

attr(,"class")
[1] "null_mdl"

$model
<null_mdl model definition>

$data
# A tsibble: 95 x 2 [1W]
       week logmove
 *   <week>   <dbl>
 1 1990 W25    9.02
 2 1990 W26   NA   
 3 1990 W27   NA   
 4 1990 W28   NA   
 5 1990 W29   NA   
 6 1990 W30   NA   
 7 1990 W31    8.72
 8 1990 W32    8.25
 9 1990 W33    8.99
10 1990 W34   NA   
# … with 85 more rows

$response
$response[[1]]
logmove


$transformation
$transformation[[1]]
Transformation: .x
Backtransformation: .x

hongooi73 · 2020-02-20T11:16:20Z

Here's an even simpler reprex.

library(fable)
library(feasts)
library(tsibble)

set.seed(12345)
df <- expand.grid(x=1:1000, t=1:100)
df$y <- runif(nrow(df))
miss <- rbinom(nrow(df), 1, 0.25)
df$y[as.logical(miss)] <- NA
df$t <- as.Date("1970-01-01") + df$t

df <- as_tsibble(df, key=x, index=t)

cl <- parallel::makeCluster(4)
parallel::clusterEvalQ(cl,
{
    library(fable)
    library(feasts)
    library(tsibble)
})

bad <- parallel::parLapply(cl, list(df, df, df, df), function(df)
{
    model(df, ets=ETS(y ~ error("A") + trend("A") + season("N")))
})

This chews up memory on both my Windows laptop and an Ubuntu VM in Azure. Interestingly, if I replace the ETS model with a null_model, then there is no problem.

good <- parallel::parLapply(cl, list(df, df, df, df), function(df)
{
    model(df, null=null_model(y))
})

mitchelloharawild · 2020-02-20T13:05:38Z

Looks like this is also an issue without parallel (to a lesser extent).


> bench::mark(
+   model(df, null=ETS(y ~ error("A") + trend("A") + season("N"))),
+   model(df, null=null_model(y)),
+   check = FALSE
+ )
|====================================================================================================================================================== |100% ~0 s remaining     # A tibble: 2 x 13
  expression                                                            min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time result memory         time   gc         
  <bch:expr>                                                       <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm> <list> <list>         <list> <list>     
1 model(df, null = ETS(y ~ error("A") + trend("A") + season("N")))   17.61s   17.61s    0.0568      51MB     3.86     1    68     17.61s <NULL> <df[,3] [149,… <bch:… <tibble [1…
2 model(df, null = null_model(y))                                     1.83s    1.83s    0.547     14.5MB     3.83     1     7      1.83s <NULL> <df[,3] [26,1… <bch:… <tibble [1…

mitchelloharawild · 2020-02-20T14:03:29Z

Added some performance improvements to the model parser.
Still more that can be done here, I expect that the model parser can be made 2x faster without too much trouble.


> bench::mark(
+   model(df, null=ETS(y ~ error("A") + trend("A") + season("N"))),
+   model(df, null=null_model(y)),
+   check = FALSE
+ )
|====================================================================================================================================================== |100% ~0 s remaining     # A tibble: 2 x 13
  expression                                                            min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time result memory         time   gc         
  <bch:expr>                                                       <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm> <list> <list>         <list> <list>     
1 model(df, null = ETS(y ~ error("A") + trend("A") + season("N")))   10.63s   10.63s    0.0940    40.1MB     4.14     1    44     10.63s <NULL> <df[,3] [100,… <bch:… <tibble [1…
2 model(df, null = null_model(y))                                     1.51s    1.51s    0.661     14.5MB     5.29     1     8      1.51s <NULL> <df[,3] [34,2… <bch:… <tibble [1…

hongooi73 · 2020-02-20T18:49:50Z

Well, I would (maybe) expect ETS to take more time and resources, since it's actually trying to fit a model. null_model can return immediately since it doesn't have to do anything. This seems to be separate from the parallel issue, where returning a buggy object back to the master triggers a massive memory allocation. Note that if I replace the parLapply with a regular lapply, then there's no problem.

ETS could probably check for NAs right at the start, and return immediately if found. This should reduce the resource requirements to the minimum.

I'm trying to find a similar reprex for ARIMA, but it seems to be behaving well so far.

mitchelloharawild · 2020-03-27T01:31:34Z

Closing as this issue is largely driven by tidyverts/fabletools#146

Parallel processing performance will be optimised in the next two months.

hongooi73 changed the title ~~Memory usage blows up sometimes~~ NULL model blows up memory usage with parallelised fits Feb 18, 2020

hongooi73 mentioned this issue Feb 20, 2020

Suggestion: throw errors rather than returning null models #235

Closed

mitchelloharawild closed this as completed Mar 27, 2020

seabbs mentioned this issue May 14, 2020

Memory leak epiforecasts/EpiNow#91

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NULL model blows up memory usage with parallelised fits #230

NULL model blows up memory usage with parallelised fits #230

hongooi73 commented Feb 15, 2020 •

edited

Loading

mitchelloharawild commented Feb 15, 2020

hongooi73 commented Feb 16, 2020

mitchelloharawild commented Feb 16, 2020

hongooi73 commented Feb 16, 2020

hongooi73 commented Feb 16, 2020

hongooi73 commented Feb 20, 2020

mitchelloharawild commented Feb 20, 2020

hongooi73 commented Feb 20, 2020

mitchelloharawild commented Feb 20, 2020

mitchelloharawild commented Feb 20, 2020

hongooi73 commented Feb 20, 2020

hongooi73 commented Feb 20, 2020 •

edited

Loading

hongooi73 commented Feb 20, 2020 •

edited

Loading

mitchelloharawild commented Feb 20, 2020

mitchelloharawild commented Feb 20, 2020

hongooi73 commented Feb 20, 2020 •

edited

Loading

mitchelloharawild commented Mar 27, 2020

NULL model blows up memory usage with parallelised fits #230

NULL model blows up memory usage with parallelised fits #230

Comments

hongooi73 commented Feb 15, 2020 • edited Loading

mitchelloharawild commented Feb 15, 2020

hongooi73 commented Feb 16, 2020

mitchelloharawild commented Feb 16, 2020

hongooi73 commented Feb 16, 2020

hongooi73 commented Feb 16, 2020

hongooi73 commented Feb 20, 2020

mitchelloharawild commented Feb 20, 2020

hongooi73 commented Feb 20, 2020

mitchelloharawild commented Feb 20, 2020

mitchelloharawild commented Feb 20, 2020

hongooi73 commented Feb 20, 2020

hongooi73 commented Feb 20, 2020 • edited Loading

hongooi73 commented Feb 20, 2020 • edited Loading

mitchelloharawild commented Feb 20, 2020

mitchelloharawild commented Feb 20, 2020

hongooi73 commented Feb 20, 2020 • edited Loading

mitchelloharawild commented Mar 27, 2020

hongooi73 commented Feb 15, 2020 •

edited

Loading

hongooi73 commented Feb 20, 2020 •

edited

Loading

hongooi73 commented Feb 20, 2020 •

edited

Loading

hongooi73 commented Feb 20, 2020 •

edited

Loading