New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add rowname to column as numeric func #243

Merged
merged 8 commits into from May 8, 2017

Conversation

Projects
None yet
3 participants
@barnettjacob

barnettjacob commented May 4, 2017

Adds a separate function called rownames_to_column_numeric which outputs the rownames as numeric where they are legitimate numbers.

Function could conceivably be called something slightly less verbose (but less accurate) such as rownum_to_column.

@krlmlr

Thanks. Looks good already, and timely for the upcoming CRAN release!

rn <- tibble(rownames(df))
names(rn) <- var

if(mean(grepl('[0-9]*', rn[[var]])) == 1){

This comment has been minimized.

@krlmlr

krlmlr May 4, 2017

Member

Is this still important?

@@ -65,6 +65,31 @@ rownames_to_column <- function(df, var = "rowname") {
new_df
}

rownames_to_column_numeric <- function(df, var = "rowname") {

This comment has been minimized.

@krlmlr

krlmlr May 4, 2017

Member

How about rowid_to_column()?

@@ -65,6 +65,31 @@ rownames_to_column <- function(df, var = "rowname") {
new_df
}

This comment has been minimized.

@krlmlr

krlmlr May 4, 2017

Member

The function needs to be documented and exported. I think documentation can go to the same page as that for rownames_to_column().

This comment has been minimized.

@barnettjacob

barnettjacob May 4, 2017

Ok, I'll have another go. Agree with your comments about the function itself, will make some changes!

new_df <- c(rn, df)
attribs[["names"]] <- names(new_df)

attributes(new_df) <- attribs[names(attribs) != "row.names"]

This comment has been minimized.

@krlmlr

krlmlr May 4, 2017

Member

I'm not sure this function needs to care about row names at all.

@krlmlr

This comment has been minimized.

Member

krlmlr commented May 4, 2017

I forgot to mention: You can push or even force-push to the same branch, no need to reopen a new pull request.

jabarnett
@codecov-io

This comment has been minimized.

codecov-io commented May 4, 2017

Codecov Report

Merging #243 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #243      +/-   ##
==========================================
- Coverage   89.48%   89.47%   -0.02%     
==========================================
  Files          21       21              
  Lines         875      874       -1     
==========================================
- Hits          783      782       -1     
  Misses         92       92
Impacted Files Coverage Δ
R/rownames.R 100% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 08af6b0...ca94893. Read the comment docs.

jabarnett
@krlmlr

Thanks. Using add_column() would be great, but I can also do it myself.

if(nrow(df) == 0)
stopc('Data frame must have 1 or more rows')

rn <- tibble(seq(1, nrow(df)))

This comment has been minimized.

@krlmlr

krlmlr May 4, 2017

Member

seq_len() also works for the corner case of zero rows, which I'd like to support.

This comment has been minimized.

@barnettjacob

rn <- tibble(seq(1, nrow(df)))
names(rn) <- var
new_df <- cbind(rn, df)

This comment has been minimized.

@krlmlr

krlmlr May 4, 2017

Member

This code, and its equivalent in rownames_to_column(), really should be using add_column() via add_column(data, !!(var) := rn, ...) or perhaps with invoke(), see also ?rlang::invoke and ?"!!":

x <- "y"; tibble::tibble(!!(x) := 5)
#> # A tibble: 1 × 1
#>       y
#>   <dbl>
#> 1     5

This comment has been minimized.

@barnettjacob

barnettjacob May 4, 2017

you mean something like: add_column(df, !!(var) := seq_len(nrow(df)), .before = 1)?

This comment has been minimized.

@krlmlr

krlmlr May 4, 2017

Member

Yes.

@@ -9,7 +9,9 @@
#' functions allow to you detect if a data frame has row names
#' (`has_rownames()`), remove them (`remove_rownames()`), or convert
#' them back-and-forth between an explicit column (`rownames_to_column()`
#' and `column_to_rownames()`).
#' and `column_to_rownames()`). Also included is a `rowid_to_column` which

This comment has been minimized.

@krlmlr

krlmlr May 4, 2017

Member

Please add ().

This comment has been minimized.

@barnettjacob

jabarnett added some commits May 4, 2017

jabarnett
jabarnett
@barnettjacob

This comment has been minimized.

barnettjacob commented May 5, 2017

OK, I've changed it to use addcolumn(). Also did the same for rownames_to_column.

@krlmlr

Thanks, looks great. Would you mind adding a test for the new function? I prefer stripping spaces at EOL, too.

jabarnett
@krlmlr

Thanks, I think we're good to go after the next iteration.

@@ -26,6 +26,13 @@ test_that("rownames_to_column keeps the tbl classes (#882)", {
paste("There is a column named wt already!") )
})

test_that("rowid_to_column adds row names and errors if col name reused", {
res <- rowid_to_column( mtcars)
expect_true(ncol(res)==12)

This comment has been minimized.

@krlmlr

krlmlr May 5, 2017

Member

Where does the 12 come from?

@@ -26,6 +26,13 @@ test_that("rownames_to_column keeps the tbl classes (#882)", {
paste("There is a column named wt already!") )
})

test_that("rowid_to_column adds row names and errors if col name reused", {
res <- rowid_to_column( mtcars)

This comment has been minimized.

@krlmlr

krlmlr May 5, 2017

Member

I think you can adapt most of the tests for rownames_to_column() to the new function; in particular it seems important to test that the class is maintained (i.e., that it also works for data frames).

Unfortunately, that test isn't a particularly good example for good coding style; I prefer spaces around operators but not before/after parens, and an indent of two lines for function calls that occupy multiple lines, with nothing after the opening or before the closing paren. The full style guide is at http://style.tidyverse.org .

This comment has been minimized.

@barnettjacob

barnettjacob May 5, 2017

ok, thanks, will revisit.

jabarnett
@barnettjacob

This comment has been minimized.

barnettjacob commented May 5, 2017

Looking at the mtcars data I realised that the rowid_to_column function will remove row names if present (guessing because of add_column() but not 100%). Would you rather:

  • The function fail if the supplied df has row names.
  • Include an option where the user can choose to do rowname_to_column first.
@krlmlr

This comment has been minimized.

Member

krlmlr commented May 5, 2017

Currently, add_column() removes row names even for plain data frames. I'm fine with this behavior also for rowid_to_column() if it's documented and tested.

@barnettjacob

This comment has been minimized.

barnettjacob commented May 5, 2017

Got it!

jabarnett
@barnettjacob

This comment has been minimized.

barnettjacob commented May 6, 2017

OK so I've:

  • Replicated exactly the tests for rowname_to_column to do the same thing for rowid_to_column
  • Tidied up the style as best I can :-)
  • Added a line to the documentation about the function removing row names.

@krlmlr krlmlr merged commit e8132b5 into tidyverse:master May 8, 2017

1 of 2 checks passed

continuous-integration/appveyor/pr AppVeyor build failed
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@krlmlr

This comment has been minimized.

Member

krlmlr commented May 8, 2017

Thanks!

@barnettjacob

This comment has been minimized.

barnettjacob commented May 8, 2017

Awesome! Thanks for being so understanding and supportive! It was a fun and educational experience.

@barnettjacob barnettjacob deleted the barnettjacob:rownum_function branch May 8, 2017

@krlmlr

This comment has been minimized.

Member

krlmlr commented May 8, 2017

You're welcome, always happy to review PRs ;-)

krlmlr added a commit that referenced this pull request May 8, 2017

Merge tag 'v1.3.0.9002'
- New `rowid_to_column()` that adds a `rowid` column as first column and removes row names (#243, @barnettjacob).
- Use `rlang` functions (#244).
- The `all.equal.tbl_df()` method has been removed, calling `all.equal()` now forwards to `base::all.equal.data.frame()`. To compare tibbles ignoring row and column order, please use `dplyr::all_equal()` (#247).
- The `microbenchmark` package is now used conditionaly (#245).
- Subsetting zero columns no longer returns wrong number of rows (#241, @echasnovski).
- `tribble()` now handles values that have a class (#237, @NikNakk).

krlmlr added a commit that referenced this pull request May 17, 2017

Merge tag 'v1.3.1'
- Subsetting zero columns no longer returns wrong number of rows (#241, @echasnovski).

- New `set_tidy_names()` and `tidy_names()`, a simpler version of `repair_names()` which works unchanged for now (#217).
- New `rowid_to_column()` that adds a `rowid` column as first column and removes row names (#243, @barnettjacob).
- The `all.equal.tbl_df()` method has been removed, calling `all.equal()` now forwards to `base::all.equal.data.frame()`. To compare tibbles ignoring row and column order, please use `dplyr::all_equal()` (#247).

- Printing now uses `x` again instead of the Unicode multiplication sign, to avoid encoding issues (#216).
- String values are now quoted when printing if they contain non-printable characters or quotes (#253).
- The `print()`, `format()`, and `tbl_sum()` methods are now implemented for class `"tbl"` and not for `"tbl_df"`. This allows subclasses to use tibble's formatting facilities. The formatting of the header can be tweaked by implementing `tbl_sum()` for the subclass, which is expected to return a named character vector. The `print.tbl_df()` method is still implemented for compatibility with downstream packages, but only calls `NextMethod()`.
- Own printing routine, not relying on `print.data.frame()` anymore. Now providing `format.tbl_df()` and full support for Unicode characters in names and data, also for `glimpse()` (#235).

- Improve formatting of error messages (#223).
- Using `rlang` instead of `lazyeval` (#225, @lionel-), and `rlang` functions (#244).
- `tribble()` now handles values that have a class (#237, @NikNakk).
- Minor efficiency gains by replacing `any(is.na())` with `anyNA()` (#229, @csgillespie).
- The `microbenchmark` package is now used conditionally (#245).
- `pkgdown` website.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment