Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add rowname to column as numeric func #243

Merged
merged 8 commits into from May 8, 2017

Conversation

@barnettjacob
Copy link

@barnettjacob barnettjacob commented May 4, 2017

Adds a separate function called rownames_to_column_numeric which outputs the rownames as numeric where they are legitimate numbers.

Function could conceivably be called something slightly less verbose (but less accurate) such as rownum_to_column.

Copy link
Member

@krlmlr krlmlr left a comment

Thanks. Looks good already, and timely for the upcoming CRAN release!

rn <- tibble(rownames(df))
names(rn) <- var

if(mean(grepl('[0-9]*', rn[[var]])) == 1){

This comment has been minimized.

@krlmlr

krlmlr May 4, 2017
Member

Is this still important?

@@ -65,6 +65,31 @@ rownames_to_column <- function(df, var = "rowname") {
new_df
}

rownames_to_column_numeric <- function(df, var = "rowname") {

This comment has been minimized.

@krlmlr

krlmlr May 4, 2017
Member

How about rowid_to_column()?

@@ -65,6 +65,31 @@ rownames_to_column <- function(df, var = "rowname") {
new_df
}

This comment has been minimized.

@krlmlr

krlmlr May 4, 2017
Member

The function needs to be documented and exported. I think documentation can go to the same page as that for rownames_to_column().

This comment has been minimized.

@barnettjacob

barnettjacob May 4, 2017
Author

Ok, I'll have another go. Agree with your comments about the function itself, will make some changes!

new_df <- c(rn, df)
attribs[["names"]] <- names(new_df)

attributes(new_df) <- attribs[names(attribs) != "row.names"]

This comment has been minimized.

@krlmlr

krlmlr May 4, 2017
Member

I'm not sure this function needs to care about row names at all.

@krlmlr
Copy link
Member

@krlmlr krlmlr commented May 4, 2017

I forgot to mention: You can push or even force-push to the same branch, no need to reopen a new pull request.

@codecov-io
Copy link

@codecov-io codecov-io commented May 4, 2017

Codecov Report

Merging #243 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #243      +/-   ##
==========================================
- Coverage   89.48%   89.47%   -0.02%     
==========================================
  Files          21       21              
  Lines         875      874       -1     
==========================================
- Hits          783      782       -1     
  Misses         92       92
Impacted Files Coverage Δ
R/rownames.R 100% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 08af6b0...ca94893. Read the comment docs.

Copy link
Member

@krlmlr krlmlr left a comment

Thanks. Using add_column() would be great, but I can also do it myself.

if(nrow(df) == 0)
stopc('Data frame must have 1 or more rows')

rn <- tibble(seq(1, nrow(df)))

This comment has been minimized.

@krlmlr

krlmlr May 4, 2017
Member

seq_len() also works for the corner case of zero rows, which I'd like to support.

This comment has been minimized.


rn <- tibble(seq(1, nrow(df)))
names(rn) <- var
new_df <- cbind(rn, df)

This comment has been minimized.

@krlmlr

krlmlr May 4, 2017
Member

This code, and its equivalent in rownames_to_column(), really should be using add_column() via add_column(data, !!(var) := rn, ...) or perhaps with invoke(), see also ?rlang::invoke and ?"!!":

x <- "y"; tibble::tibble(!!(x) := 5)
#> # A tibble: 1 × 1
#>       y
#>   <dbl>
#> 1     5

This comment has been minimized.

@barnettjacob

barnettjacob May 4, 2017
Author

you mean something like: add_column(df, !!(var) := seq_len(nrow(df)), .before = 1)?

This comment has been minimized.

@krlmlr

krlmlr May 4, 2017
Member

Yes.

@@ -9,7 +9,9 @@
#' functions allow to you detect if a data frame has row names
#' (`has_rownames()`), remove them (`remove_rownames()`), or convert
#' them back-and-forth between an explicit column (`rownames_to_column()`
#' and `column_to_rownames()`).
#' and `column_to_rownames()`). Also included is a `rowid_to_column` which

This comment has been minimized.

@krlmlr

krlmlr May 4, 2017
Member

Please add ().

This comment has been minimized.

@barnettjacob
Copy link
Author

@barnettjacob barnettjacob commented May 5, 2017

OK, I've changed it to use addcolumn(). Also did the same for rownames_to_column.

Copy link
Member

@krlmlr krlmlr left a comment

Thanks, looks great. Would you mind adding a test for the new function? I prefer stripping spaces at EOL, too.

Copy link
Member

@krlmlr krlmlr left a comment

Thanks, I think we're good to go after the next iteration.

@@ -26,6 +26,13 @@ test_that("rownames_to_column keeps the tbl classes (#882)", {
paste("There is a column named wt already!") )
})

test_that("rowid_to_column adds row names and errors if col name reused", {
res <- rowid_to_column( mtcars)
expect_true(ncol(res)==12)

This comment has been minimized.

@krlmlr

krlmlr May 5, 2017
Member

Where does the 12 come from?

@@ -26,6 +26,13 @@ test_that("rownames_to_column keeps the tbl classes (#882)", {
paste("There is a column named wt already!") )
})

test_that("rowid_to_column adds row names and errors if col name reused", {
res <- rowid_to_column( mtcars)

This comment has been minimized.

@krlmlr

krlmlr May 5, 2017
Member

I think you can adapt most of the tests for rownames_to_column() to the new function; in particular it seems important to test that the class is maintained (i.e., that it also works for data frames).

Unfortunately, that test isn't a particularly good example for good coding style; I prefer spaces around operators but not before/after parens, and an indent of two lines for function calls that occupy multiple lines, with nothing after the opening or before the closing paren. The full style guide is at http://style.tidyverse.org .

This comment has been minimized.

@barnettjacob

barnettjacob May 5, 2017
Author

ok, thanks, will revisit.

@barnettjacob
Copy link
Author

@barnettjacob barnettjacob commented May 5, 2017

Looking at the mtcars data I realised that the rowid_to_column function will remove row names if present (guessing because of add_column() but not 100%). Would you rather:

  • The function fail if the supplied df has row names.
  • Include an option where the user can choose to do rowname_to_column first.
@krlmlr
Copy link
Member

@krlmlr krlmlr commented May 5, 2017

Currently, add_column() removes row names even for plain data frames. I'm fine with this behavior also for rowid_to_column() if it's documented and tested.

@barnettjacob
Copy link
Author

@barnettjacob barnettjacob commented May 5, 2017

Got it!

@barnettjacob
Copy link
Author

@barnettjacob barnettjacob commented May 6, 2017

OK so I've:

  • Replicated exactly the tests for rowname_to_column to do the same thing for rowid_to_column
  • Tidied up the style as best I can :-)
  • Added a line to the documentation about the function removing row names.
@krlmlr krlmlr merged commit e8132b5 into tidyverse:master May 8, 2017
1 of 2 checks passed
1 of 2 checks passed
continuous-integration/appveyor/pr AppVeyor build failed
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@krlmlr
Copy link
Member

@krlmlr krlmlr commented May 8, 2017

Thanks!

@barnettjacob
Copy link
Author

@barnettjacob barnettjacob commented May 8, 2017

Awesome! Thanks for being so understanding and supportive! It was a fun and educational experience.

@barnettjacob barnettjacob deleted the barnettjacob:rownum_function branch May 8, 2017
@krlmlr
Copy link
Member

@krlmlr krlmlr commented May 8, 2017

You're welcome, always happy to review PRs ;-)

krlmlr added a commit that referenced this pull request May 8, 2017
- New `rowid_to_column()` that adds a `rowid` column as first column and removes row names (#243, @barnettjacob).
- Use `rlang` functions (#244).
- The `all.equal.tbl_df()` method has been removed, calling `all.equal()` now forwards to `base::all.equal.data.frame()`. To compare tibbles ignoring row and column order, please use `dplyr::all_equal()` (#247).
- The `microbenchmark` package is now used conditionaly (#245).
- Subsetting zero columns no longer returns wrong number of rows (#241, @echasnovski).
- `tribble()` now handles values that have a class (#237, @NikNakk).
krlmlr added a commit that referenced this pull request May 17, 2017
- Subsetting zero columns no longer returns wrong number of rows (#241, @echasnovski).

- New `set_tidy_names()` and `tidy_names()`, a simpler version of `repair_names()` which works unchanged for now (#217).
- New `rowid_to_column()` that adds a `rowid` column as first column and removes row names (#243, @barnettjacob).
- The `all.equal.tbl_df()` method has been removed, calling `all.equal()` now forwards to `base::all.equal.data.frame()`. To compare tibbles ignoring row and column order, please use `dplyr::all_equal()` (#247).

- Printing now uses `x` again instead of the Unicode multiplication sign, to avoid encoding issues (#216).
- String values are now quoted when printing if they contain non-printable characters or quotes (#253).
- The `print()`, `format()`, and `tbl_sum()` methods are now implemented for class `"tbl"` and not for `"tbl_df"`. This allows subclasses to use tibble's formatting facilities. The formatting of the header can be tweaked by implementing `tbl_sum()` for the subclass, which is expected to return a named character vector. The `print.tbl_df()` method is still implemented for compatibility with downstream packages, but only calls `NextMethod()`.
- Own printing routine, not relying on `print.data.frame()` anymore. Now providing `format.tbl_df()` and full support for Unicode characters in names and data, also for `glimpse()` (#235).

- Improve formatting of error messages (#223).
- Using `rlang` instead of `lazyeval` (#225, @lionel-), and `rlang` functions (#244).
- `tribble()` now handles values that have a class (#237, @NikNakk).
- Minor efficiency gains by replacing `any(is.na())` with `anyNA()` (#229, @csgillespie).
- The `microbenchmark` package is now used conditionally (#245).
- `pkgdown` website.
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 10, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants