New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent behaviour with read_csv and skip > #rows #119
Comments
This bit me again recently, so I boiled it down to an even simpler issue. Issue: To reproduce:
|
If you don't supply the types, do you think it's best to return the most restrictive type (i.e. logical) or the least restrictive type (i.e. character)? I think that's better behaviour than throwing an error (esp. since it's useful to do |
Here's what I have now read_csv("a,b\n1,2")
#> Source: local data frame [1 x 2]
#>
#> a b
#> (int) (int)
#> 1 1 2
read_csv("a,b\n1,2", c("a", "b"), "ii", skip = 2)
#> Source: local data frame [0 x 2]
#>
#> Variables not shown: a (int), b (int)
read_csv("a,b\n1,2", c("a", "b"), skip = 2)
#> Warning: 1 parsing failure.
#> row col expected actual
#> -- -- 0 col names 2 col names
#> Source: local data frame [0 x 0]
read_csv("a,b\n1,2", skip = 2)
#> Source: local data frame [0 x 0]
read_csv("a,b\n1,2", n_max = 0)
#> Source: local data frame [0 x 2]
#>
#> Variables not shown: a (int), b (int)
read_csv("a,b\n")
#> Warning: 1 parsing failure.
#> row col expected actual
#> -- -- 0 col names 2 col names
#> Source: local data frame [0 x 0] That seems reasonably consistent to me |
Hmmm, I think the main thing missing is that if you have column names, but no data and no column types, you get a 0 x 0 data frame - that's not quite right. I've tweaked it to make sure there are always enough col types, using character to pad out: read_csv("a,b\n1,2")
#> Source: local data frame [1 x 2]
#>
#> a b
#> (int) (int)
#> 1 1 2
read_csv("a,b\n1,2", c("a", "b"), "ii", skip = 2)
#> Source: local data frame [0 x 2]
#>
#> Variables not shown: a (int), b (int)
read_csv("a,b\n1,2", c("a", "b"), skip = 2)
#> Source: local data frame [0 x 2]
#>
#> Variables not shown: a (chr), b (chr)
read_csv("a,b\n1,2", skip = 2)
#> Source: local data frame [0 x 0]
read_csv("a,b\n1,2", n_max = 0)
#> Source: local data frame [0 x 2]
#>
#> Variables not shown: a (int), b (int)
read_csv("a,b\n")
#> Source: local data frame [0 x 2]
#>
#> Variables not shown: a (chr), b (chr) |
Thanks! This looks great! |
col_types
is specified andskip
is equal to or greater than the number of actual rows,read_csv()
returns a data.frame with 1 row.col_types
is not specified andskip
is equal to or greater than the number of actual rows,read_csv()
throws an error.I think Case1 is wrong to return a row of results when there aren't any, and should probably return a zero-row data.frame.
EDIT: Case 2 is handled OK i.e. If column types aren't specified, and there are no rows from which to infer type, you can't really return anything sensible.
I found this inconsistency when doing chunked reads from a large CSV file, and a zero-row data.frame was going to be an indicator that I'd run out of data.
The text was updated successfully, but these errors were encountered: