Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In R, Feather cannot read dataframe back if missing or having empty column header #385

Closed
guojingyu opened this issue Feb 15, 2020 · 2 comments

Comments

@guojingyu
Copy link

In R, when calling read_feather function to load the cache, it could fail with error to ask to name columns, if the original dataframe is missing one, which is quite common for datasets (e.g. no index/rownames header).

The error seems to do with the tibble package in the implementation.

If the column name is required, write_feather should ask for it if missing in the first place; or the implementation should allow missing columns.

Code example to show the bug is as below:

> library(feather) 
> n = c(2, 3, 5) 
> s = c("aa", "bb", "cc") 
> b = c(TRUE, FALSE, TRUE) 
> df = data.frame(n, s, b)
> df
  n  s     b
1 2 aa  TRUE
2 3 bb FALSE
3 5 cc  TRUE
> colnames(df)[1] <- ""
> df
     s     b
1 2 aa  TRUE
2 3 bb FALSE
3 5 cc  TRUE
> write_feather(df, '~/cache/test.feather')
> df2 <- read_feather('~/cache/test.feather')
Error: Column 1 must be named.
Use .name_repair to specify repair.
Run `rlang::last_error()` to see where the error occurred.
>
> rlang::last_error()
<error/rlang_error>
Column 1 must be named.
Use .name_repair to specify repair.
Backtrace:
  1. feather::read_feather("~/cache/test.feather")
  3. tibble:::as_tibble.default(data)
  6. feather:::as.data.frame.feather(value, stringsAsFactors = FALSE)
  9. tibble:::as_tibble.data.frame(x[])
 10. tibble:::as_tibble.list(unclass(x), ..., .rows = .rows, .name_repair = .name_repair)
 11. tibble:::lst_to_tibble(x, .rows, .name_repair, col_lengths(x))
 12. tibble:::set_repaired_names(x, .name_repair)
 17. tibble:::repaired_names(names(x), .name_repair = .name_repair)
 18. tibble:::check_unique(new_name)
Run `rlang::last_trace()` to see the full context.
@wesm
Copy link
Owner

wesm commented Feb 17, 2020

Is this issue present in the arrow::read_feather function? If so, could you open a JIRA issue?

@guojingyu
Copy link
Author

@wesm Thanks for the tip. It works well with arrow::read_feather:

> library(arrow)
> n = c(2, 3, 5) 
> s = c("aa", "bb", "cc") 
> b = c(TRUE, FALSE, TRUE) 
> df = data.frame(n, s, b)
> df
  n  s     b
1 2 aa  TRUE
2 3 bb FALSE
3 5 cc  TRUE
> colnames(df)[1] <- ""
> write_feather(df, '~/cache/test.feather')
> df2 <- read_feather('~/cache/test.feather')
> df2
# A tibble: 3 x 3
     `` s     b    
  <dbl> <fct> <lgl>
1     2 aa    TRUE 
2     3 bb    FALSE
3     5 cc    TRUE 
> df2 <- as.data.frame(read_feather('~/cache/test.feather'))
> df2
     s     b
1 2 aa  TRUE
2 3 bb FALSE
3 5 cc  TRUE

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants