Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ff() fails when the product of dim is too large to cast to an integer #3

Open
khughitt opened this issue Sep 1, 2020 · 5 comments
Open

Comments

@khughitt
Copy link

khughitt commented Sep 1, 2020

Greetings!

I'm attempting to use ff() via the bigcor() function in https://github.com/anspiess/propagate.

When the input matrix size exceeds a certain limit, however, ff() fails with an error:

Error in if (length < 0 || length > .Machine$integer.max) stop("length must be between 0 and .Machine$integer.max") : 
  missing value where TRUE/FALSE needed
Calls: ff
In addition: Warning message:
In ff(vmode = "double", dim = c(num_cols, num_cols)) :
  NAs introduced by coercion to integer range
Execution halted

I tracked down the issue to ff.R:2465:

n <- as.integer(prod(dim))

When dim is too large (in my case, ~4.65e4 or larger), the product of the dimensions is too large, leading to an NA value after being cast with as.integer():

r$> prod(c(4.65e4, 4.65e4))                                                                                                             
[1] 2162250000

r$> as.integer(prod(c(4.65e4, 4.65e4)))                                                                                                 
[1] NA
Warning message:
NAs introduced by coercion to integer range 

r$> .Machine$integer.max                                                                                                                
[1] 2147483647

Do you know if there is any way around this?

Otherwise, perhaps it would be worth performing a check against this early on and letting the user know that ff() cannot proceed?

Related downstream issue: anspiess/propagate#4

System Information

Attaching package ff
- getOption("fftempdir")=="/tmp/Rtmp5BMuOZ/ff"

- getOption("ffextension")=="ff"

- getOption("ffdrop")==TRUE

- getOption("fffinonexit")==TRUE

- getOption("ffpagesize")==65536

- getOption("ffcaching")=="mmnoflush"  -- consider "ffeachflush" if your system stalls on large writes

- getOption("ffbatchbytes")==16777216 -- consider a different value for tuning your system

- getOption("ffmaxbytes")==536870912 -- consider a different value for tuning your system


Attaching package: ‘ff’

The following objects are masked from ‘package:utils’:

    write.csv, write.csv2

The following objects are masked from ‘package:base’:

    is.factor, is.ordered

R version 4.0.2 (2020-06-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Arch Linux

Matrix products: default
BLAS:   /usr/lib/libopenblasp-r0.3.10.so
LAPACK: /usr/lib/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] ff_4.0.2  bit_4.0.4

loaded via a namespace (and not attached):
[1] compiler_4.0.2  parallel_4.0.2  RJSONIO_1.3-1.4

Cheers,
Keith

@comicfans
Copy link

hit similar problem here,

ary = array(1.0, dim=c(288,4076,7,54))
f = as.ff(ary)    # this is ok

but convert back

r = as.ram(f)

got

Error in if (length * .rambytes[vmode] > getOption("ffbatchbytes")) warning("creating large ram object with ",  : missing value where TRUE/FALSE needed
length * .rambytes[vmode] : NAs produced by integer overflow

using
ff_4.0.2 bit_4.0.4

@zhiiiyang
Copy link

I got a similar issue here too.

>   raw_big <- ff(1L, dim = c(300, 8e6), vmode="integer")
Error in if (length < 0 || length > .Machine$integer.max) stop("length must be between 0 and .Machine$integer.max") : 
  missing value where TRUE/FALSE needed
In addition: Warning message:
In ff(1L, dim = c(300, 8e+06), vmode = "integer") :
  NAs introduced by coercion to integer range
> .Machine$integer.max
[1] 2147483647

@dwuab
Copy link

dwuab commented Jun 15, 2021

I got the same problem too. For modern day computers, having a limit of 2147483647 elements is frustrating as a matrix of ~23 GB in R will well exceed the limit. According to https://search.r-project.org/CRAN/refmans/ff/html/LimWarn.html, the C++ backend is ready for larger limit. Hope the developer can extend the limit soon.

@hechth
Copy link

hechth commented Jan 31, 2022

Same issue here - isn't there a way to overcome this?

@agrueneberg @terjekv @truecluster would it be possible to just make a check what is the system RAM and adjust accordingly? Or to change it to the maximum number of elements which can be handled overall? There are many systems with more than 23GB RAM available these days.

If you need help with the implementation, just let me know and I'll take care of it.

@dengchunyu
Copy link

similar problems

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants