Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read_log() seems broken when col_names specified but not col_types #503

Closed
ilarischeinin opened this issue Aug 16, 2016 · 0 comments
Closed
Labels
bug an unexpected problem or unintended behavior read 📖

Comments

@ilarischeinin
Copy link

ilarischeinin commented Aug 16, 2016

read_log() works fine when no col_names argument is provided:

library(readr)
read_log(readr_example("example.log"))
Parsed with column specification:
cols(
  X1 = col_character(),
  X2 = col_character(),
  X3 = col_character(),
  X4 = col_character(),
  X5 = col_character(),
  X6 = col_integer(),
  X7 = col_integer()
)
# A tibble: 2 x 7
            X1    X2                 X3                         X4
         <chr> <chr>              <chr>                      <chr>
1 172.21.13.45  <NA> Microsoft\\JohnDoe 08/Apr/2001:17:39:04 -0800
2    127.0.0.1  <NA>              frank 10/Oct/2000:13:55:36 -0700
# ... with 3 more variables: X5 <chr>, X6 <int>, X7 <int>

But when col_names is provided, it fails:

read_log(readr_example("example.log"), col_names=c("ip", "identity", "user",
  "timestamp", "request", "status", "size"))
Error in .Call("readr_guess_types_", PACKAGE = "readr", sourceSpec, tokenizerSpec,  : 
  negative length vectors are not allowed

As the error comes when guessing column types, I tried to also provide col_types. This again works:

read_log(readr_example("example.log"), col_names=c("ip", "identity", "user",
  "timestamp", "request", "status", "size"), col_types=cols(ip=col_character(),
  identity=col_character(), user=col_character(), timestamp=col_character(),
  request=col_character(), status=col_integer(), size=col_integer()))
# A tibble: 2 x 7
            ip identity               user                  timestamp
         <chr>    <chr>              <chr>                      <chr>
1 172.21.13.45     <NA> Microsoft\\JohnDoe 08/Apr/2001:17:39:04 -0800
2    127.0.0.1     <NA>              frank 10/Oct/2000:13:55:36 -0700
# ... with 3 more variables: request <chr>, status <int>, size <int>

Searching the issues, I came across #331 and #403, which were fixed with #433. But with read_log(), this error still happens with version 1.0.0 from CRAN, or the current master branch (7dd808d). In this case it is also happens with a file with only two lines, whereas the other fix was for large files.

@hadley hadley added bug an unexpected problem or unintended behavior read 📖 labels Dec 22, 2016
@lock lock bot locked and limited conversation to collaborators Sep 24, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug an unexpected problem or unintended behavior read 📖
Projects
None yet
Development

No branches or pull requests

2 participants