New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
repair_names() feature requests #217
Comments
We could add arguments |
Yes, it certainly helps. It feels like the tidyverse should have a preferred remedy for nonexistent and duplicated column names. And tibble is currently the logical place to implement that. So I'd still like to reach a consensus across packages. What do you think of exporting |
I'd like to make I think it's probably a good idea to have two variants: one that works on a character vector of names, and the other that works on a named vector. |
Perhaps my suggestion about leading underscores is not so bright. It creates non-syntactic names. I still think the convention should be something easily detectable via regex, i.e. unlikely to be present in original column names. I expected x <- list(1:3,
var2 = letters[1:3],
c(1, 2, 3) + 0.1,
var2 = letters[26:24])
names(x)[3] <- NA
tibble::repair_names(tibble::as_data_frame(x, validate = FALSE),
prefix = "__X", sep = "__")
#> # A tibble: 3 脳 4
#> `__X__1` var2 `__X__2` var2__1
#> <int> <chr> <dbl> <chr>
#> 1 1 a 1.1 z
#> 2 2 b 2.1 y
#> 3 3 c 3.1 x |
A leading |
@jimhester Do you propose |
Moving towards a consensus on what should happen in current example:
Or we could decide whenever we incorporate a number, it refers to absolute column position
Even more consistent? Drop the
Now every name that needs repair simply gets |
FWIW I think the doubled identifiers are quite ugly, but I suppose that is part of the point... |
Yeah, I think it is a feature if the repaired names are unsightly. |
I don't like the use of dots because it goes against the usual convention to use |
The way I particularly like @jennybc's idea of simply appending the absolute position, without using |
I think we should:
I'm not sure what we should do with respect to backward compatibility. I suspect few packages depend on this behaviour; it's going to be more user code. I think as long as there's a clear way to get to the old behaviour it shouldn't cause much hassle (especially since we're now describing what's happening) |
@krlmlr What's the outlook on this re: next tibble release? |
The next tibble release will have a better |
- New `set_tidy_names()` and `tidy_names()`, a simpler version of `repair_names()` which works unchanged for now (#217).
- The `print()`, `format()`, and `tbl_sum()` methods are now implemented for class `"tbl"` and not for `"tbl_df"`. This allows subclasses to use tibble's formatting facilities. The formatting of the header can be tweaked by implementing `tbl_sum()` for the subclass. - New `set_tidy_names()` and `tidy_names()`, a simpler version of `repair_names()` which works unchanged for now (#217). - Printing now uses `x` again instead of the Unicode multiplication sign, to avoid encoding issues (#216). - `glimpse()` now properly displays tibbles with foreign characters in column names (#235).
- Subsetting zero columns no longer returns wrong number of rows (#241, @echasnovski). - New `set_tidy_names()` and `tidy_names()`, a simpler version of `repair_names()` which works unchanged for now (#217). - New `rowid_to_column()` that adds a `rowid` column as first column and removes row names (#243, @barnettjacob). - The `all.equal.tbl_df()` method has been removed, calling `all.equal()` now forwards to `base::all.equal.data.frame()`. To compare tibbles ignoring row and column order, please use `dplyr::all_equal()` (#247). - Printing now uses `x` again instead of the Unicode multiplication sign, to avoid encoding issues (#216). - String values are now quoted when printing if they contain non-printable characters or quotes (#253). - The `print()`, `format()`, and `tbl_sum()` methods are now implemented for class `"tbl"` and not for `"tbl_df"`. This allows subclasses to use tibble's formatting facilities. The formatting of the header can be tweaked by implementing `tbl_sum()` for the subclass, which is expected to return a named character vector. The `print.tbl_df()` method is still implemented for compatibility with downstream packages, but only calls `NextMethod()`. - Own printing routine, not relying on `print.data.frame()` anymore. Now providing `format.tbl_df()` and full support for Unicode characters in names and data, also for `glimpse()` (#235). - Improve formatting of error messages (#223). - Using `rlang` instead of `lazyeval` (#225, @lionel-), and `rlang` functions (#244). - `tribble()` now handles values that have a class (#237, @NikNakk). - Minor efficiency gains by replacing `any(is.na())` with `anyNA()` (#229, @csgillespie). - The `microbenchmark` package is now used conditionally (#245). - `pkgdown` website.
Resolves #357 Our discussion about name repair is here: tidyverse/tibble#217
Resolves #357 Our discussion about name repair is here: tidyverse/tibble#217
Resolves #357 Our discussion about name repair is here: tidyverse/tibble#217
Resolves #357 Our discussion about name repair is here: tidyverse/tibble#217
Resolves #357 Our discussion about name repair is here: tidyverse/tibble#217
Resolves #357 Our discussion about name repair is here: tidyverse/tibble#217
This old thread has been automatically locked. If you think you have found something related to this, please open a new issue and link to this old issue if necessary. |
I've just started running readxl's output through
repair_names()
so it stops producing tibbles with empty,NA
, or duplicated column names 馃帀.But I noticed that
tibble::repair_names()
and readr are not consistent.Feature requests, some inspired by readr:
__X1
andvar2__1
.The first one is for a better interactive experience. Otherwise, aimed at programmatic work with tibbles that may have been subjected to name repair.
The text was updated successfully, but these errors were encountered: