ff() fails when the product of dim is too large to cast to an integer #3

khughitt · 2020-09-01T13:53:06Z

Greetings!

I'm attempting to use ff() via the bigcor() function in https://github.com/anspiess/propagate.

When the input matrix size exceeds a certain limit, however, ff() fails with an error:

Error in if (length < 0 || length > .Machine$integer.max) stop("length must be between 0 and .Machine$integer.max") : 
  missing value where TRUE/FALSE needed
Calls: ff
In addition: Warning message:
In ff(vmode = "double", dim = c(num_cols, num_cols)) :
  NAs introduced by coercion to integer range
Execution halted

I tracked down the issue to ff.R:2465:

n <- as.integer(prod(dim))

When dim is too large (in my case, ~4.65e4 or larger), the product of the dimensions is too large, leading to an NA value after being cast with as.integer():

r$> prod(c(4.65e4, 4.65e4))                                                                                                             
[1] 2162250000

r$> as.integer(prod(c(4.65e4, 4.65e4)))                                                                                                 
[1] NA
Warning message:
NAs introduced by coercion to integer range 

r$> .Machine$integer.max                                                                                                                
[1] 2147483647

Do you know if there is any way around this?

Otherwise, perhaps it would be worth performing a check against this early on and letting the user know that ff() cannot proceed?

Related downstream issue: anspiess/propagate#4

System Information

Attaching package ff
- getOption("fftempdir")=="/tmp/Rtmp5BMuOZ/ff"

- getOption("ffextension")=="ff"

- getOption("ffdrop")==TRUE

- getOption("fffinonexit")==TRUE

- getOption("ffpagesize")==65536

- getOption("ffcaching")=="mmnoflush"  -- consider "ffeachflush" if your system stalls on large writes

- getOption("ffbatchbytes")==16777216 -- consider a different value for tuning your system

- getOption("ffmaxbytes")==536870912 -- consider a different value for tuning your system


Attaching package: ‘ff’

The following objects are masked from ‘package:utils’:

    write.csv, write.csv2

The following objects are masked from ‘package:base’:

    is.factor, is.ordered

R version 4.0.2 (2020-06-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Arch Linux

Matrix products: default
BLAS:   /usr/lib/libopenblasp-r0.3.10.so
LAPACK: /usr/lib/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] ff_4.0.2  bit_4.0.4

loaded via a namespace (and not attached):
[1] compiler_4.0.2  parallel_4.0.2  RJSONIO_1.3-1.4

Cheers,
Keith

The text was updated successfully, but these errors were encountered:

comicfans · 2020-10-13T01:01:12Z

hit similar problem here,

ary = array(1.0, dim=c(288,4076,7,54))
f = as.ff(ary)    # this is ok

but convert back

r = as.ram(f)

got

Error in if (length * .rambytes[vmode] > getOption("ffbatchbytes")) warning("creating large ram object with ",  : missing value where TRUE/FALSE needed
length * .rambytes[vmode] : NAs produced by integer overflow

using
ff_4.0.2 bit_4.0.4

zhiiiyang · 2021-02-11T01:17:57Z

I got a similar issue here too.

>   raw_big <- ff(1L, dim = c(300, 8e6), vmode="integer")
Error in if (length < 0 || length > .Machine$integer.max) stop("length must be between 0 and .Machine$integer.max") : 
  missing value where TRUE/FALSE needed
In addition: Warning message:
In ff(1L, dim = c(300, 8e+06), vmode = "integer") :
  NAs introduced by coercion to integer range
> .Machine$integer.max
[1] 2147483647

dwuab · 2021-06-15T08:24:13Z

I got the same problem too. For modern day computers, having a limit of 2147483647 elements is frustrating as a matrix of ~23 GB in R will well exceed the limit. According to https://search.r-project.org/CRAN/refmans/ff/html/LimWarn.html, the C++ backend is ready for larger limit. Hope the developer can extend the limit soon.

hechth · 2022-01-31T14:24:30Z

Same issue here - isn't there a way to overcome this?

@agrueneberg @terjekv @truecluster would it be possible to just make a check what is the system RAM and adjust accordingly? Or to change it to the maximum number of elements which can be handled overall? There are many systems with more than 23GB RAM available these days.

If you need help with the implementation, just let me know and I'll take care of it.

dengchunyu · 2022-03-23T11:20:58Z

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ff() fails when the product of dim is too large to cast to an integer #3

ff() fails when the product of dim is too large to cast to an integer #3

khughitt commented Sep 1, 2020

comicfans commented Oct 13, 2020

zhiiiyang commented Feb 11, 2021

dwuab commented Jun 15, 2021

hechth commented Jan 31, 2022 •

edited

Loading

dengchunyu commented Mar 23, 2022

ff() fails when the product of dim is too large to cast to an integer #3

ff() fails when the product of dim is too large to cast to an integer #3

Comments

khughitt commented Sep 1, 2020

comicfans commented Oct 13, 2020

zhiiiyang commented Feb 11, 2021

dwuab commented Jun 15, 2021

hechth commented Jan 31, 2022 • edited Loading

dengchunyu commented Mar 23, 2022

hechth commented Jan 31, 2022 •

edited

Loading