Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Double parser cannot handle numbers > 1e19 which are not in scientific notation #412

Closed
defconst opened this issue Jun 2, 2016 · 16 comments
Assignees
Labels
bug an unexpected problem or unintended behavior

Comments

@defconst
Copy link

defconst commented Jun 2, 2016

I observed parsing failures with large numbers/doubles.
In scientific notation there are no failures.
Maybe a problem with boost::spirit::qi::parse().

x <- "10000000000000000000"
as.double(x)
# [1] 1e+19
readr::parse_double(x)
# [1] 1e+19


x <- "100000000000000000000"
as.double(x)
# [1] 1e+20
readr::parse_double(x)
# Warning: 1 parsing failure.
# row col expected                actual
#   1  -- a double 100000000000000000000
# [1] NA
# attr(,"problems")
# Source: local data frame [1 x 4]
#
#     row   col expected                actual
#   (int) (int)    (chr)                 (chr)
#1     1    NA a double 100000000000000000000

readr::parse_double("1e+20")  # this works
# [1] 1e+20

readr::read_csv("x\n100000000000000000000.0", col_types="d", col_names=TRUE)
# Warning: 1 parsing failure.
# row col expected                  actual
#   1   x a double 100000000000000000000.0
#    x
#1 NA
@hadley
Copy link
Member

hadley commented Jun 2, 2016

Works for me:

readr::parse_double("10000000000000000000000")
#> [1] 1e+22

Can you please try with the dev version of readr?

@defconst
Copy link
Author

defconst commented Jun 2, 2016

I did that already. I have the problem with CRAN version 0.2.2 and lastest git version (8312374) on Ubuntu 16.04 (and 14.04):

readr::parse_double("10000000000000000000000")
# Warning: 1 parsing failure.
# row col expected                  actual
#   1  -- a double 10000000000000000000000
# [1] NA
# attr(,"problems")
# Source: local data frame [1 x 4]
#
#     row   col expected                  actual
#   <int> <int>    <chr>                   <chr>
# 1     1    NA a double 10000000000000000000000

devtools::session_info()
# Session info -------------------------------------------------------------------
#  setting  value
#  version  R version 3.3.0 (2016-05-03)
#  system   x86_64, linux-gnu
#  ui       X11
#  language (EN)
#  collate  en_US.UTF-8
#  tz       <NA>
#  date     2016-06-02
#
# Packages -----------------------------------------------------------------------
#  package    * version    date       source
#  assertthat   0.1        2013-12-06 CRAN (R 3.3.0)
#  BH         * 1.60.0-2   2016-05-07 CRAN (R 3.3.0)
#  devtools     1.11.1     2016-04-21 CRAN (R 3.3.0)
#  digest       0.6.9      2016-01-08 CRAN (R 3.3.0)
#  memoise      1.0.0      2016-01-29 CRAN (R 3.3.0)
#  Rcpp         0.12.5     2016-05-14 CRAN (R 3.3.0)
#  readr        0.2.2.9000 2016-06-02 Github (hadley/readr@8312374)
#  tibble       1.0        2016-03-23 CRAN (R 3.3.0)
#  withr        1.0.1      2016-02-04 CRAN (R 3.3.0)

@hadley
Copy link
Member

hadley commented Jun 2, 2016

And you have the same version of BH as me, so it's not a different version of boost::spirit. I'm at a loss.

@defconst
Copy link
Author

defconst commented Jun 2, 2016

Your test was on a Mac?
Same problem under Windows 7/R 3.3.0 with CRAN readr 0.2.2 (I don't have Rtools installed).
I can try it under OSX later ...

@defconst
Copy link
Author

defconst commented Jun 2, 2016

I can also reproduce the problem on my iMac:

readr::parse_double("100000000000000000000")
# Warning: 1 parsing failure.
# row col expected                actual
#   1  -- a double 100000000000000000000
# [1] NA
# attr(,"problems")
# Source: local data frame [1 x 4]
# 
#     row   col expected                actual
#   <int> <int>    <chr>                 <chr>
# 1     1    NA a double 100000000000000000000
devtools::session_info()
# Session info ------------------------------------------------------------------
#  setting  value                       
#  version  R version 3.3.0 (2016-05-03)
#  system   x86_64, darwin13.4.0        
#  ui       AQUA                        
#  language (EN)                        
#  collate  C                           
#  tz       Europe/Vienna               
#  date     2016-06-02                  
# 
# Packages ----------------------------------------------------------------------
#  package    * version    date       source                       
#  BH         * 1.60.0-2   2016-05-07 CRAN (R 3.3.0)               
#  R6           2.1.2      2016-01-26 CRAN (R 3.3.0)               
#  Rcpp         0.12.5     2016-05-14 CRAN (R 3.3.0)               
#  assertthat   0.1        2013-12-06 CRAN (R 3.3.0)               
#  curl         0.9.7      2016-04-10 CRAN (R 3.3.0)               
#  devtools     1.11.1     2016-04-21 CRAN (R 3.3.0)               
#  digest       0.6.9      2016-01-08 CRAN (R 3.3.0)               
#  git2r        0.15.0     2016-05-11 CRAN (R 3.3.0)               
#  httr         1.1.0      2016-01-28 CRAN (R 3.3.0)               
#  memoise      1.0.0      2016-01-29 CRAN (R 3.3.0)               
#  readr        0.2.2.9000 2016-06-02 Github (hadley/readr@19d6eaf)
#  tibble       1.0        2016-03-23 CRAN (R 3.3.0)               
#  withr        1.0.1      2016-02-04 CRAN (R 3.3.0)               

@hadley
Copy link
Member

hadley commented Jun 2, 2016

Just to make sure we're as close as possible, can you please start R from the console with R --vanilla then run devtools::session_info("readr")? I get:

> devtools::session_info("readr")
Session info -------------------------------------------------------------------
 setting  value                       
 version  R version 3.3.0 (2016-05-03)
 system   x86_64, darwin13.4.0        
 ui       X11                         
 language (EN)                        
 collate  en_US.UTF-8                 
 tz       America/Chicago             
 date     2016-06-02                  

Packages -----------------------------------------------------------------------
 package    * version     date       source                          
 assertthat   0.1         2013-12-06 CRAN (R 3.3.0)                  
 BH           1.60.0-2    2016-05-07 CRAN (R 3.3.0)                  
 curl         0.9.7       2016-04-10 CRAN (R 3.3.0)                  
 lazyeval     0.1.10.9000 2016-05-24 Github (hadley/lazyeval@bce211b)
 Rcpp         0.12.5      2016-05-14 CRAN (R 3.3.0)                  
 readr      * 0.2.2.9000  2016-06-02 local                           
 tibble       1.0-5       2016-05-26 Github (hadley/tibble@64175a8)  

@defconst
Copy link
Author

defconst commented Jun 2, 2016

Installed dev version of lazyeval and tibble and changed LC_COLLATE,
with R --vanilla I obtain the following

> devtools::session_info("readr")
# Session info -------------------------------------------------------------------
#  setting  value                       
#  version  R version 3.3.0 (2016-05-03)
#  system   x86_64, darwin13.4.0        
#  ui       X11                         
#  language (EN)                        
#  collate  en_US.UTF-8                 
#  tz       Europe/Vienna               
#  date     2016-06-02                  
# 
# Packages -----------------------------------------------------------------------
#  package    * version     date       source                          
#  assertthat   0.1         2013-12-06 CRAN (R 3.3.0)                  
#  BH           1.60.0-2    2016-05-07 CRAN (R 3.3.0)                  
#  curl         0.9.7       2016-04-10 CRAN (R 3.3.0)                  
#  lazyeval     0.1.10.9000 2016-06-02 Github (hadley/lazyeval@bce211b)
#  Rcpp         0.12.5      2016-05-14 CRAN (R 3.3.0)                  
#  readr        0.2.2.9000  2016-06-02 Github (hadley/readr@19d6eaf)   
#  tibble       1.0-5       2016-06-02 Github (hadley/tibble@64175a8)  

readr::parse_double("100000000000000000000")
# Warning: 1 parsing failure.
# row col expected                actual
#   1  -- a double 100000000000000000000
# [1] NA

@defconst
Copy link
Author

defconst commented Jun 3, 2016

It works when using boost::spirit::qi::long_double instead of boost::spirit::qi::double_
in https://github.com/hadley/readr/blob/ef750db855f9434e78bd89e8944e8b1c547bf23a/src/QiParsers.h#L22

@hadley
Copy link
Member

hadley commented Jun 3, 2016

Hmmmmm, what compiler are you using? (if you don't know, you should be able to see when installing one of the packages from source - look for either gcc or clang)

@defconst
Copy link
Author

defconst commented Jun 3, 2016

This was with default gcc on Ubuntu 16.04

> g++ --version
g++ (Ubuntu 5.3.1-14ubuntu2.1) 5.3.1 20160413

I also did some tests directly in C++:

> cat parse_double.cpp
#include <boost/spirit/include/qi.hpp>
#include <iostream>
// g++ -I /usr/local/lib/R/site-library/BH/include/ -std=c++11 -o parse_double -Os parse_double.cpp

typedef std::string::const_iterator Iter;

int main() {
    double res1 = -1;
    double res2 = -1;
    double res3 = -1;
    std::string s = "1000000000000000000000.00";
    Iter first(begin(s)), last(end(s));
    boost::spirit::qi::parse(first, last, boost::spirit::qi::long_double, res1);
    std::cout << res1 << std::endl;
    boost::spirit::qi::parse(first, last, boost::spirit::qi::double_, res2);
    std::cout << res2 << std::endl;
    boost::spirit::qi::parse(first, last, boost::spirit::qi::float_, res3);
    std::cout << res3 << std::endl;
}
> ./parse_double
1e+21
-1
-1

@hadley
Copy link
Member

hadley commented Jun 3, 2016

I bet that's the difference - I use clang.

I think there are some non obvious consequences to using long double namely that it requires C++11.

@defconst
Copy link
Author

defconst commented Jun 3, 2016

It does not require C++11.
I did some further tests, same results with clang++/g++ on Linux and clang++ on OSX :/

C++ source:

#include <boost/spirit/include/qi.hpp>
#include <iostream>
int main() {
    std::string s = "1000000000000000000000.00";
    std::string::const_iterator first(boost::begin(s)), last(boost::end(s));

    double res1 = -1;
    std::cout << "return value (qi::long_double): " << boost::spirit::qi::parse(first, last, boost::spirit::qi::long_double, res1) << std::endl;
    std::cout << "parse result: " << res1 << std::endl;

    double res2 = -1;
    std::cout << "return value (qi::double_): " << boost::spirit::qi::parse(first, last, boost::spirit::qi::double_, res2) << std::endl;
    std::cout << "parse result: " << res2 << std::endl;
}

Ubuntu 16.04 (x64):

> clang++ --version
clang version 3.8.0-2ubuntu3 (tags/RELEASE_380/final)
> clang++ -I /usr/local/lib/R/site-library/BH/include/ -o parse_double_llvm parse_double.cpp
> ./parse_double_llvm 
return value (qi::long_double): 1
parse result: 1e+21
return value (qi::double_): 0
parse result: -1

> g++ --version    
g++ (Ubuntu 5.3.1-14ubuntu2.1) 5.3.1 20160413
> g++ -I /usr/local/lib/R/site-library/BH/include/ -o parse_double_gcc parse_double.cpp
./parse_double_gcc
return value (qi::long_double): 1
parse result: 1e+21
return value (qi::double_): 0
parse result: -1

OS X 10.11.5 with Xcode CommandlineTools v7.3

> clang++ -I ~/Library/R/3.3/library/BH/include/ -o parse_double_llvm parse_double.cpp
> ./parse_double_llvm 
return value (qi::long_double): 1
parse result: 1e+21
return value (qi::double_): 0
parse result: -1

> pkgutil --pkg-info=com.apple.pkg.CLTools_Executables
package-id: com.apple.pkg.CLTools_Executables
version: 7.3.1.0.1.1461711523
...

> clang++ --version
Apple LLVM version 7.3.0 (clang-703.0.31)
Target: x86_64-apple-darwin15.5.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin

@hadley
Copy link
Member

hadley commented Jun 3, 2016

But long doubles are only available in C99, which is what C++11 uses.

@defconst
Copy link
Author

Yesterday I spent some time to test several boost versions with clang and gcc.
Large numbers work in boost version 1.58. Since version 1.59 (up to 1.61) boost::spirit::qi::double_ fails with large numbers (but I still don't understand why it works on your computer ;)

There is a Trac ticket (reported Sep 2015): https://svn.boost.org/trac/boost/ticket/11608

@hadley hadley added bug an unexpected problem or unintended behavior ready labels Jun 14, 2016
@jimhester
Copy link
Collaborator

long double was actually in C89 as well see (http://port70.net/~nsz/c/c89/c89-draft.html#3.1.2.5)

@defconst
Copy link
Author

Yes,

The long double type was present in the original 1989 C standard but support was improved by the 1999 revision of the C standard ...

(https://en.wikipedia.org/wiki/Long_double)

jimhester added a commit to jimhester/readr that referenced this issue Jun 15, 2016
@lock lock bot locked and limited conversation to collaborators Sep 25, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug an unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

3 participants