Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory Leak? #19

Closed
infus0815 opened this issue Dec 3, 2018 · 7 comments
Closed

Memory Leak? #19

infus0815 opened this issue Dec 3, 2018 · 7 comments

Comments

@infus0815
Copy link

I've been using c50 trees to solve a pairwise ranking problem.

That leads to the need of having to create several c50 models for each problem.
The issue is that the memory allocated during a C5.0 call never gets released to the system. That means if im doing 100 or more C5.0 calls, eventually the rsession uses all available memory. Even after finishing the script the memory is still not released, forcing a session restart.

Tested both on windows and linux with the same results.

I wonder if there's a quick fix as c50 trees is what giving me the best prediction results.

@SamGG
Copy link

SamGG commented Dec 3, 2018

Hi,
Dummy question: did you do a garbage collectorgc() after having rm(mytree)?
Best.

@infus0815
Copy link
Author

infus0815 commented Dec 3, 2018

Yep gc() doesn't free it either. Tested every possible thing. Clearing environment and hidden vars included.

Only restarting rsession frees the allocated memory.

@SamGG
Copy link

SamGG commented Dec 3, 2018

Thanks for clarifying this. Let's wait for the developer feedback.

@topepo
Copy link
Owner

topepo commented Dec 3, 2018

Can you give some code to test with and the results of sessionInfo()?

@infus0815
Copy link
Author

infus0815 commented Dec 4, 2018

While making a script from my own code to show you i noticed that what probably causes it is, for example, a column with a high amount of factors.. Managed to replicate the memory problem in this simple script using churn dataset

library(C50)

data(churn)

churnTrain[, 2] <- factor(churnTrain[, 2])

lapply(1:300, function(x) {
  treeModel <- C5.0(x = churnTrain[, 1:3], y = churnTrain$churn)
  remove(treeModel)
})

# OR
# for(i in 1:300) {
#   treeModel <- C5.0(x = churnTrain[, 1:3], y = churnTrain$churn)
#   remove(treeModel)
#   gc()
# }

I know it doesn't make sense to factor the column i did there but its only to replicate the problem.

If you run that script more than one time you can also see that the memory allocated in the first run is not used anymore. In my problem i have datasets with columns with even more factors leading to the ram usage skyrocketing to almost full usage(8gb) in a couple of seconds.

Session Info:

R version 3.5.1 (2018-07-02)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.5 LTS

Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.6.0
LAPACK: /usr/lib/lapack/liblapack.so.3.6.0

locale:
[1] LC_CTYPE=pt_PT.UTF-8 LC_NUMERIC=C LC_TIME=pt_PT.UTF-8
[4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=pt_PT.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=pt_PT.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=pt_PT.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] C50_0.1.2

loaded via a namespace (and not attached):
[1] Rcpp_1.0.0 lattice_0.20-38 mvtnorm_1.0-8 grid_3.5.1 plyr_1.8.4
[6] magrittr_1.5 stringi_1.2.4 reshape2_1.4.3 rpart_4.1-13 Matrix_1.2-15
[11] partykit_1.2-2 splines_3.5.1 Formula_1.2-3 tools_3.5.1 stringr_1.3.1
[16] Cubist_0.2.2 survival_2.43-1 compiler_3.5.1 libcoin_1.0-1 inum_1.0-0

@SamGG
Copy link

SamGG commented Dec 4, 2018

I used Hadley's chapter about memory at http://adv-r.had.co.nz/memory.html. I think it's worth reading to understand the current case. Interestingly the first call takes memory that is not released with the remove call. Next calls require a small amount of memory and the leak is very small but real.
Hope this help also.

library(C50)

data(churn)
head(churnTrain)
#>   state account_length     area_code international_plan voice_mail_plan
#> 1    KS            128 area_code_415                 no             yes
#> 2    OH            107 area_code_415                 no             yes
#> 3    NJ            137 area_code_415                 no              no
#> 4    OH             84 area_code_408                yes              no
#> 5    OK             75 area_code_415                yes              no
#> 6    AL            118 area_code_510                yes              no
#>   number_vmail_messages total_day_minutes total_day_calls total_day_charge
#> 1                    25             265.1             110            45.07
#> 2                    26             161.6             123            27.47
#> 3                     0             243.4             114            41.38
#> 4                     0             299.4              71            50.90
#> 5                     0             166.7             113            28.34
#> 6                     0             223.4              98            37.98
#>   total_eve_minutes total_eve_calls total_eve_charge total_night_minutes
#> 1             197.4              99            16.78               244.7
#> 2             195.5             103            16.62               254.4
#> 3             121.2             110            10.30               162.6
#> 4              61.9              88             5.26               196.9
#> 5             148.3             122            12.61               186.9
#> 6             220.6             101            18.75               203.9
#>   total_night_calls total_night_charge total_intl_minutes total_intl_calls
#> 1                91              11.01               10.0                3
#> 2               103              11.45               13.7                3
#> 3               104               7.32               12.2                5
#> 4                89               8.86                6.6                7
#> 5               121               8.41               10.1                3
#> 6               118               9.18                6.3                6
#>   total_intl_charge number_customer_service_calls churn
#> 1              2.70                             1    no
#> 2              3.70                             1    no
#> 3              3.29                             0    no
#> 4              1.78                             2    no
#> 5              2.73                             3    no
#> 6              1.70                             0    no
dim(churnTrain)
#> [1] 3333   20
summary(churnTrain)
#>      state      account_length          area_code    international_plan
#>  WV     : 106   Min.   :  1.0   area_code_408: 838   no :3010          
#>  MN     :  84   1st Qu.: 74.0   area_code_415:1655   yes: 323          
#>  NY     :  83   Median :101.0   area_code_510: 840                     
#>  AL     :  80   Mean   :101.1                                          
#>  OH     :  78   3rd Qu.:127.0                                          
#>  OR     :  78   Max.   :243.0                                          
#>  (Other):2824                                                          
#>  voice_mail_plan number_vmail_messages total_day_minutes total_day_calls
#>  no :2411        Min.   : 0.000        Min.   :  0.0     Min.   :  0.0  
#>  yes: 922        1st Qu.: 0.000        1st Qu.:143.7     1st Qu.: 87.0  
#>                  Median : 0.000        Median :179.4     Median :101.0  
#>                  Mean   : 8.099        Mean   :179.8     Mean   :100.4  
#>                  3rd Qu.:20.000        3rd Qu.:216.4     3rd Qu.:114.0  
#>                  Max.   :51.000        Max.   :350.8     Max.   :165.0  
#>                                                                         
#>  total_day_charge total_eve_minutes total_eve_calls total_eve_charge
#>  Min.   : 0.00    Min.   :  0.0     Min.   :  0.0   Min.   : 0.00   
#>  1st Qu.:24.43    1st Qu.:166.6     1st Qu.: 87.0   1st Qu.:14.16   
#>  Median :30.50    Median :201.4     Median :100.0   Median :17.12   
#>  Mean   :30.56    Mean   :201.0     Mean   :100.1   Mean   :17.08   
#>  3rd Qu.:36.79    3rd Qu.:235.3     3rd Qu.:114.0   3rd Qu.:20.00   
#>  Max.   :59.64    Max.   :363.7     Max.   :170.0   Max.   :30.91   
#>                                                                     
#>  total_night_minutes total_night_calls total_night_charge
#>  Min.   : 23.2       Min.   : 33.0     Min.   : 1.040    
#>  1st Qu.:167.0       1st Qu.: 87.0     1st Qu.: 7.520    
#>  Median :201.2       Median :100.0     Median : 9.050    
#>  Mean   :200.9       Mean   :100.1     Mean   : 9.039    
#>  3rd Qu.:235.3       3rd Qu.:113.0     3rd Qu.:10.590    
#>  Max.   :395.0       Max.   :175.0     Max.   :17.770    
#>                                                          
#>  total_intl_minutes total_intl_calls total_intl_charge
#>  Min.   : 0.00      Min.   : 0.000   Min.   :0.000    
#>  1st Qu.: 8.50      1st Qu.: 3.000   1st Qu.:2.300    
#>  Median :10.30      Median : 4.000   Median :2.780    
#>  Mean   :10.24      Mean   : 4.479   Mean   :2.765    
#>  3rd Qu.:12.10      3rd Qu.: 6.000   3rd Qu.:3.270    
#>  Max.   :20.00      Max.   :20.000   Max.   :5.400    
#>                                                       
#>  number_customer_service_calls churn     
#>  Min.   :0.000                 yes: 483  
#>  1st Qu.:1.000                 no :2850  
#>  Median :1.000                           
#>  Mean   :1.563                           
#>  3rd Qu.:2.000                           
#>  Max.   :9.000                           
#> 

hist(churnTrain[, 2])
rug(churnTrain[, 2])

head(sort(churnTrain[, 2]), 50)
#>  [1]  1  1  1  1  1  1  1  1  2  3  3  3  3  3  4  5  6  6  7  7  8  9  9
#> [24]  9 10 10 10 11 11 11 11 12 12 12 13 13 13 13 13 13 13 13 13 15 15 15
#> [47] 16 16 16 16

library(pryr)

object_size(churnTrain)
#> 382 kB
churnTrain[, 2] <- factor(churnTrain[, 2])
object_size(churnTrain)
#> 395 kB

for(i in 1:30) {
  cat(i, "\n", mem_used(), "\n", sep = "")
  cat(mem_change(treeModel <- C5.0(x = churnTrain[, 1:3], y = churnTrain$churn)), "\n")
  cat(mem_change(remove(treeModel)), "\n")
  gc()
  cat(mem_used(), "\n")
}
#> 1
#> 111131416
#> 243424 
#> -3624 
#> 111369584 
#> 2
#> 111370080
#> 5152 
#> -3680 
#> 111369952 
#> 3
#> 111370416
#> 5152 
#> -3680 
#> 111370096 
#> 4
#> 111370560
#> 5152 
#> -3680 
#> 111370264 
#> 5
#> 111370736
#> 5152 
#> -3680 
#> 111370448 
#> 6
#> 111370920
#> 5152 
#> -3680 
#> 111370632 
#> 7
#> 111371104
#> 5152 
#> -3680 
#> 111370816 
#> 8
#> 111371288
#> 5152 
#> -3680 
#> 111371000 
#> 9
#> 111371472
#> 5152 
#> -3680 
#> 111371184 
#> 10
#> 111371656
#> 5152 
#> -3680 
#> 111371368 
#> 11
#> 111371840
#> 5152 
#> -3680 
#> 111371552 
#> 12
#> 111372024
#> 5152 
#> -3680 
#> 111371736 
#> 13
#> 111372208
#> 5152 
#> -3680 
#> 111371920 
#> 14
#> 111372392
#> 5152 
#> -3680 
#> 111372104 
#> 15
#> 111372576
#> 5152 
#> -3680 
#> 111372288 
#> 16
#> 111372760
#> 5152 
#> -3680 
#> 111372472 
#> 17
#> 111372944
#> 5152 
#> -3680 
#> 111372656 
#> 18
#> 111373128
#> 5152 
#> -3680 
#> 111372840 
#> 19
#> 111373312
#> 5152 
#> -3680 
#> 111373024 
#> 20
#> 111373496
#> 5152 
#> -3680 
#> 111373208 
#> 21
#> 111373680
#> 5152 
#> -3680 
#> 111373392 
#> 22
#> 111373864
#> 5152 
#> -3680 
#> 111373576 
#> 23
#> 111374048
#> 5152 
#> -3680 
#> 111373760 
#> 24
#> 111374232
#> 5152 
#> -3680 
#> 111373944 
#> 25
#> 111374416
#> 5152 
#> -3680 
#> 111374128 
#> 26
#> 111374600
#> 5152 
#> -3680 
#> 111374312 
#> 27
#> 111374784
#> 5152 
#> -3680 
#> 111374496 
#> 28
#> 111374968
#> 5152 
#> -3680 
#> 111374680 
#> 29
#> 111375152
#> 5152 
#> -3680 
#> 111374864 
#> 30
#> 111375336
#> 5152 
#> -3680 
#> 111375048

sessionInfo()
#> R version 3.5.1 (2018-07-02)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 7 x64 (build 7601) Service Pack 1
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=English_United Kingdom.1252 
#> [2] LC_CTYPE=English_United Kingdom.1252   
#> [3] LC_MONETARY=English_United Kingdom.1252
#> [4] LC_NUMERIC=C                           
#> [5] LC_TIME=English_United Kingdom.1252    
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] pryr_0.1.4 C50_0.1.2 
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_1.0.0       Formula_1.2-3    knitr_1.20       magrittr_1.5    
#>  [5] splines_3.5.1    lattice_0.20-38  stringr_1.3.1    plyr_1.8.4      
#>  [9] tools_3.5.1      grid_3.5.1       htmltools_0.3.6  yaml_2.2.0      
#> [13] survival_2.43-1  rprojroot_1.3-2  digest_0.6.18    inum_1.0-0      
#> [17] libcoin_1.0-1    Matrix_1.2-15    reshape2_1.4.3   codetools_0.2-15
#> [21] rpart_4.1-13     Cubist_0.2.2     evaluate_0.12    rmarkdown_1.10  
#> [25] stringi_1.2.4    compiler_3.5.1   backports_1.1.2  partykit_1.2-2  
#> [29] mvtnorm_1.0-8

Created on 2018-12-04 by the reprex package (v0.2.1)

@topepo
Copy link
Owner

topepo commented May 7, 2021

I have not been able to track this down. Please add a PR if you can find the issue.

@topepo topepo closed this as completed May 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants