Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add rowname to column as numeric func #243

Merged
merged 8 commits into from May 8, 2017

Conversation

barnettjacob
Copy link

Adds a separate function called rownames_to_column_numeric which outputs the rownames as numeric where they are legitimate numbers.

Function could conceivably be called something slightly less verbose (but less accurate) such as rownum_to_column.

Copy link
Member

@krlmlr krlmlr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Looks good already, and timely for the upcoming CRAN release!

R/rownames.R Outdated
rn <- tibble(rownames(df))
names(rn) <- var

if(mean(grepl('[0-9]*', rn[[var]])) == 1){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still important?

R/rownames.R Outdated
@@ -65,6 +65,31 @@ rownames_to_column <- function(df, var = "rowname") {
new_df
}

rownames_to_column_numeric <- function(df, var = "rowname") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about rowid_to_column()?

@@ -65,6 +65,31 @@ rownames_to_column <- function(df, var = "rowname") {
new_df
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function needs to be documented and exported. I think documentation can go to the same page as that for rownames_to_column().

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll have another go. Agree with your comments about the function itself, will make some changes!

R/rownames.R Outdated
new_df <- c(rn, df)
attribs[["names"]] <- names(new_df)

attributes(new_df) <- attribs[names(attribs) != "row.names"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this function needs to care about row names at all.

@krlmlr
Copy link
Member

krlmlr commented May 4, 2017

I forgot to mention: You can push or even force-push to the same branch, no need to reopen a new pull request.

@codecov-io
Copy link

codecov-io commented May 4, 2017

Codecov Report

Merging #243 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #243      +/-   ##
==========================================
- Coverage   89.48%   89.47%   -0.02%     
==========================================
  Files          21       21              
  Lines         875      874       -1     
==========================================
- Hits          783      782       -1     
  Misses         92       92
Impacted Files Coverage Δ
R/rownames.R 100% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 08af6b0...ca94893. Read the comment docs.

Copy link
Member

@krlmlr krlmlr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Using add_column() would be great, but I can also do it myself.

R/rownames.R Outdated
if(nrow(df) == 0)
stopc('Data frame must have 1 or more rows')

rn <- tibble(seq(1, nrow(df)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seq_len() also works for the corner case of zero rows, which I'd like to support.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

R/rownames.R Outdated

rn <- tibble(seq(1, nrow(df)))
names(rn) <- var
new_df <- cbind(rn, df)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code, and its equivalent in rownames_to_column(), really should be using add_column() via add_column(data, !!(var) := rn, ...) or perhaps with invoke(), see also ?rlang::invoke and ?"!!":

x <- "y"; tibble::tibble(!!(x) := 5)
#> # A tibble: 1 × 1
#>       y
#>   <dbl>
#> 1     5

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean something like: add_column(df, !!(var) := seq_len(nrow(df)), .before = 1)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

R/rownames.R Outdated
@@ -9,7 +9,9 @@
#' functions allow to you detect if a data frame has row names
#' (`has_rownames()`), remove them (`remove_rownames()`), or convert
#' them back-and-forth between an explicit column (`rownames_to_column()`
#' and `column_to_rownames()`).
#' and `column_to_rownames()`). Also included is a `rowid_to_column` which
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add ().

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@barnettjacob
Copy link
Author

OK, I've changed it to use addcolumn(). Also did the same for rownames_to_column.

Copy link
Member

@krlmlr krlmlr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks great. Would you mind adding a test for the new function? I prefer stripping spaces at EOL, too.

Copy link
Member

@krlmlr krlmlr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I think we're good to go after the next iteration.

@@ -26,6 +26,13 @@ test_that("rownames_to_column keeps the tbl classes (#882)", {
paste("There is a column named wt already!") )
})

test_that("rowid_to_column adds row names and errors if col name reused", {
res <- rowid_to_column( mtcars)
expect_true(ncol(res)==12)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does the 12 come from?

@@ -26,6 +26,13 @@ test_that("rownames_to_column keeps the tbl classes (#882)", {
paste("There is a column named wt already!") )
})

test_that("rowid_to_column adds row names and errors if col name reused", {
res <- rowid_to_column( mtcars)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can adapt most of the tests for rownames_to_column() to the new function; in particular it seems important to test that the class is maintained (i.e., that it also works for data frames).

Unfortunately, that test isn't a particularly good example for good coding style; I prefer spaces around operators but not before/after parens, and an indent of two lines for function calls that occupy multiple lines, with nothing after the opening or before the closing paren. The full style guide is at http://style.tidyverse.org .

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, thanks, will revisit.

@barnettjacob
Copy link
Author

Looking at the mtcars data I realised that the rowid_to_column function will remove row names if present (guessing because of add_column() but not 100%). Would you rather:

  • The function fail if the supplied df has row names.
  • Include an option where the user can choose to do rowname_to_column first.

@krlmlr
Copy link
Member

krlmlr commented May 5, 2017

Currently, add_column() removes row names even for plain data frames. I'm fine with this behavior also for rowid_to_column() if it's documented and tested.

@barnettjacob
Copy link
Author

Got it!

@barnettjacob
Copy link
Author

OK so I've:

  • Replicated exactly the tests for rowname_to_column to do the same thing for rowid_to_column
  • Tidied up the style as best I can :-)
  • Added a line to the documentation about the function removing row names.

@krlmlr krlmlr merged commit e8132b5 into tidyverse:master May 8, 2017
@krlmlr
Copy link
Member

krlmlr commented May 8, 2017

Thanks!

@barnettjacob
Copy link
Author

Awesome! Thanks for being so understanding and supportive! It was a fun and educational experience.

@barnettjacob barnettjacob deleted the rownum_function branch May 8, 2017 09:41
@krlmlr
Copy link
Member

krlmlr commented May 8, 2017

You're welcome, always happy to review PRs ;-)

krlmlr added a commit that referenced this pull request May 8, 2017
- New `rowid_to_column()` that adds a `rowid` column as first column and removes row names (#243, @barnettjacob).
- Use `rlang` functions (#244).
- The `all.equal.tbl_df()` method has been removed, calling `all.equal()` now forwards to `base::all.equal.data.frame()`. To compare tibbles ignoring row and column order, please use `dplyr::all_equal()` (#247).
- The `microbenchmark` package is now used conditionaly (#245).
- Subsetting zero columns no longer returns wrong number of rows (#241, @echasnovski).
- `tribble()` now handles values that have a class (#237, @NikNakk).
krlmlr added a commit that referenced this pull request May 17, 2017
- Subsetting zero columns no longer returns wrong number of rows (#241, @echasnovski).

- New `set_tidy_names()` and `tidy_names()`, a simpler version of `repair_names()` which works unchanged for now (#217).
- New `rowid_to_column()` that adds a `rowid` column as first column and removes row names (#243, @barnettjacob).
- The `all.equal.tbl_df()` method has been removed, calling `all.equal()` now forwards to `base::all.equal.data.frame()`. To compare tibbles ignoring row and column order, please use `dplyr::all_equal()` (#247).

- Printing now uses `x` again instead of the Unicode multiplication sign, to avoid encoding issues (#216).
- String values are now quoted when printing if they contain non-printable characters or quotes (#253).
- The `print()`, `format()`, and `tbl_sum()` methods are now implemented for class `"tbl"` and not for `"tbl_df"`. This allows subclasses to use tibble's formatting facilities. The formatting of the header can be tweaked by implementing `tbl_sum()` for the subclass, which is expected to return a named character vector. The `print.tbl_df()` method is still implemented for compatibility with downstream packages, but only calls `NextMethod()`.
- Own printing routine, not relying on `print.data.frame()` anymore. Now providing `format.tbl_df()` and full support for Unicode characters in names and data, also for `glimpse()` (#235).

- Improve formatting of error messages (#223).
- Using `rlang` instead of `lazyeval` (#225, @lionel-), and `rlang` functions (#244).
- `tribble()` now handles values that have a class (#237, @NikNakk).
- Minor efficiency gains by replacing `any(is.na())` with `anyNA()` (#229, @csgillespie).
- The `microbenchmark` package is now used conditionally (#245).
- `pkgdown` website.
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 10, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants