Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add frame matrix #168

Merged
merged 6 commits into from Nov 30, 2016
Merged

Add frame matrix #168

merged 6 commits into from Nov 30, 2016

Conversation

anhqle
Copy link
Contributor

@anhqle anhqle commented Aug 30, 2016

I extract the code that parses tribble(...) arguments into extract_data(), which returns list(frame_names, frame_rest). Then I re-use extract_data() in both tribble() and frame_matrix(). I'd appreciate thoughts on this.

If this strategy / function names seem okay, I'll submit more PR for edge cases.

@codecov-io
Copy link

codecov-io commented Aug 30, 2016

Current coverage is 99.47% (diff: 97.67%)

Merging #168 into master will decrease coverage by 0.12%

@@             master       #168   diff @@
==========================================
  Files            16         16          
  Lines           740        757    +17   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits            737        753    +16   
- Misses            3          4     +1   
  Partials          0          0          

Powered by Codecov. Last update 1e5adec...f9a5224

Copy link
Member

@krlmlr krlmlr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Looks good so far, I'd prefer if the code was split in even smaller functions.

frame_mat
}

extract_data <- function(...) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function name isn't very meaningful.

out <- rep(list(logical()), length(frame_names))
names(out) <- frame_names
return(as_tibble(out))
return(list(frame_names = frame_names, frame_rest = NULL))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the corner cases covered by the tests?

frame_mat
}

extract_data <- function(...) {
dots <- list(...)

# Extract the names.
frame_names <- character()
i <- 1
while (TRUE) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind extracting the loop to a function?

# Create a tbl_df and return it
names(frame_col) <- frame_names
as_tibble(frame_col)
list(frame_names = frame_names, frame_rest = frame_rest)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error checking and the composition of the returned list could also form a separate function.

@krlmlr
Copy link
Member

krlmlr commented Oct 9, 2016

Oddly, GitHub includes my already committed changes to master in the diff. Could you please rebase against master?

@anhqle
Copy link
Contributor Author

anhqle commented Oct 10, 2016

Did you write some code for this yourselves? I pull rebase from master and saw some new code for edge cases. Just curious.

I think I've addressed all of your comments in the last PR (thanks as always!). Is there anything more I need to do to get Github's new "Requested changes" to go away?

Copy link
Member

@krlmlr krlmlr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just a few nitpicks.

list(frame_names = frame_names, frame_rest = frame_rest)
}

extract_frame_names_from_dots <- function(dots) {
frame_names <- character()
i <- 1
while (TRUE) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would a for loop work here?

expect_equal(result, expected)
})

test_that("frame_matrix creates 0x0 matrix with column names when
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please double-check the test title.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really a 0x0 matrix that's created here?

frame_names <- extract_frame_names_from_dots(dots)

# Extract the data
if (length(frame_names) == length(dots)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this special case really necessary? The code might look cleaner if you just did frame_rest <- dots[-seq_along(frame_names)] .

dots <- list(...)

if (length(dots) == 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shortcut might become unnecessary, too.


if (is.null(data$frame_rest)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Safer to check length() == 0 here.

@krlmlr
Copy link
Member

krlmlr commented Oct 11, 2016

I have no idea why GitHub is doing the things it does. Anyway, if you just do git rebase origin/master; git push -u LaDilettante HEAD --force on your branch (assuming origin is the upstream remote and LaDilettante is yours) the spurious changes should go away.

Copy link
Member

@krlmlr krlmlr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See last review.

@anhqle
Copy link
Contributor Author

anhqle commented Nov 22, 2016

My apology for being late with my follow-up -- I've been busying applying for jobs.

I'm committed to close this issue soon. Let me know if it needs more work.

@krlmlr
Copy link
Member

krlmlr commented Nov 22, 2016

Thanks. Could you please rebase your branch against current master, and force-push? GitHub seems to have a tough time isolating the changes from your PR from those that are already in master.

Anh Le added 3 commits November 22, 2016 13:57
- Test for frame_matrix() basic functionalities
- Extract the code that parses tribble(...) arguments. Re-use that code for both tribble and frame_matrix
@anhqle
Copy link
Contributor Author

anhqle commented Nov 27, 2016

@krlmlr Is this rebasing sufficient to your need?

Copy link
Member

@krlmlr krlmlr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Would you mind doing another round?


if (length(dots) == 0) return(frame_names);

for (i in 1:length(dots)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can get rid of the shortcut above by iterating over seq_along(dots) instead of 1:... .

frame_rest <- dots[i:length(dots)]
n_elements <- length(frame_rest)
validate_rectangular_shape <- function(frame_names, frame_rest) {
if (length(frame_names) == 0 & length(frame_rest) == 0) return();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit: && is a bit cleaner here.

test_that("frame_matrix constructs a matrix as expected", {
result <- frame_matrix(
~col1, ~col2,
10, 3,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind choosing two different values (3 vs. 3)?

frame_names <- extract_frame_names_from_dots(dots)

# Extract the data
if (length(frame_names) == 0 & length(dots) != 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

&&


if (length(data$frame_rest) == 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove this special case by setting data$frame_rest to logical() ? The following works on my system:

matrix(logical(), ncol = 5)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Later in extract_frame_data_from_dots(), frame_rest is elegantly returned as a list with frame_rest <- dots[-seq_along(frame_names)]. In the empty case, this returns frame_rest as list(). I think it's better to keep frame_list consistently as a list no matter empty or not.

If we remove the special case here, we have to add a special case in extract_frame_data_from_dots() so that frame_rest is logical().

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But you could do here:

if (length(data$frame_rest) == 0) {
  data$frame_rest <- logical()
}

I prefer to avoid shortcuts for edge cases if at all possible, because they are easy to overlook if the code needs to be edited.


frame_mat <- matrix(data$frame_rest, ncol = length(data$frame_names), byrow = TRUE)

frame_col <- lapply(seq_len(ncol(frame_mat)), function(i) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please extract everything after extract_frame_data_from_dots(...) to a separate function, and perhaps also function(i) {...} ?

Copy link
Contributor Author

@anhqle anhqle Nov 28, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could, but would appreciate more pointers about why. The implementation after extract_frame_data_from_dots() is very unique, so if we extract everything after it into a function, that function won't be reused anywhere but here. Note that the implementation of tribble and frame_matrix are quite different after extract_frame_data_from_dots(...), so I don't think any reuse is possible.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's based on the principles of "clean code" I learned from Robert C. Martin's book (not comprehensive):

  • Each function should do one thing (and do it well)
  • The purpose of the function should be obvious from its name
  • Prefer short pure functions with few arguments
  • All code in a function should operate on the same level of abstraction, so that the logic of each function is easy to understand

In addition, by extracting a function you could write a unit test specifically for that function, but that's currently not top priority.

frame_matrix <- function(...) {
data <- extract_frame_data_from_dots(...)

if (any(vapply(data$frame_rest, needs_list_col, logical(1)))) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here -- please extract everything after extract_frame_data_from_dots(...) to a function.

@anhqle
Copy link
Contributor Author

anhqle commented Nov 28, 2016

@krlmlr I add a PR for the smaller stuff.

If I extract out the code after extract_frame_data_from_dots(...), those codes won't be reused anywhere else. I'm thus unsure about why we should do that. Could you explain a bit more about what your thinking behind requesting that extraction so that I can name the function / return appropriately?

@anhqle
Copy link
Contributor Author

anhqle commented Nov 29, 2016

@krlmlr Everything extracted out as requested.

Copy link
Member

@krlmlr krlmlr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, almost good to merge. Could you please fix, and also reorder the new functions:

  1. tribble(), its alias, and frame_matrix()
  2. extract_frame_data_from_dots() and its callees
  3. turn_frame_data_into_tibble() and callees
  4. turn_frame_data_into_matrix() and callees


if (length(dots) == 0) return(frame_names);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need this shortcut?

#' \code{frame_names}-a character vector, and \code{frame_rest}-a list
turn_frame_data_into_tibble <- function(data) {
if (length(data$frame_rest) == 0) {
data$frame_rest <- matrix(logical(), ncol = length(data$frame_names))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be sufficient to set rest <- logical() ?

#'
#' @param data a list with 2 elements,
#' \code{frame_names}-a character vector, and \code{frame_rest}-a list
turn_frame_data_into_tibble <- function(data) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'll be easier to read if this function (and its matrix counterpart) take two arguments names and rest. We wouldn't need the documentation, too.

return(frame_col)
}

frame_matrix <- function(...) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean to export and document this function?

1. reorder functions
2. documentation for frame_matrix
3. other minor fixes
@anhqle
Copy link
Contributor Author

anhqle commented Nov 29, 2016

@krlmlr I moved things around as requested -- thanks for being patient with your guidance.

Two more things we may want to do

  • Should we use frame_matrix() while frame_data() will be phased out? Would mibble() be too obscure?
  • I realize that we should also allow frame_matrix() to take in row names. If you think it's a good idea, I can file an issue and pick this up in a month.

@krlmlr krlmlr merged commit 3075318 into tidyverse:master Nov 30, 2016
@krlmlr
Copy link
Member

krlmlr commented Nov 30, 2016

Thanks!

@hadley: Can you suggest an alternative name for frame_matrix() ? Are we going to keep the frame_data() alias?

mibble() has been suggested by @lionel-, too -- I'm fine with that. Open to row names / dim names extensions as described in #140.

krlmlr added a commit that referenced this pull request Nov 30, 2016
- New `frame_matrix()` (#140, #168, @LaDilettante).
- The `max.print` option is ignored when printing a tibble (#194, #195, @t-kalinowski).
- Fix typo in `obj_sum` documentation (#193, @etiennebr).
- Keep column classes when adding row to empty tibble (#171, #177, @LaDilettante).
- Now explicitly stating minimum Rcpp version 0.12.3.
krlmlr added a commit that referenced this pull request Apr 1, 2017
Bug fixes
=========

- Time series matrices (objects of class `mts` and `ts`) are now supported in `as_tibble()` (#184).
- The `all_equal()` function (called by `all.equal.tbl_df()`) now forwards to `dplyr` and fails with a helpful message if not installed. Data frames with list columns cannot be compared anymore, and differences in the declared class (`data.frame` vs. `tbl_df`) are ignored. The `all.equal.tbl_df()` method gives a warning and forwards to `NextMethod()` if `dplyr` is not installed; call `all.equal(as.data.frame(...), ...)` to avoid the warning. This ensures consistent behavior of this function, regardless if `dplyr` is loaded or not (#198).

Interface changes
=================

- Now requiring R 3.1.0 instead of R 3.1.3 (#189).
- Add `as.tibble()` as an alias to `as_tibble()` (#160, @LaDilettante).
- New `frame_matrix()`, similar to `frame_data()` but for matrices (#140, #168, @LaDilettante).
- New `deframe()` as reverse operation to `enframe()` (#146, #214).
- Removed unused dependency on `assertthat`.

Features
========

General
-------

- Keep column classes when adding row to empty tibble (#171, #177, @LaDilettante).
- Singular and plural variants for error messages that mention a list of objects (#116, #138, @LaDilettante).
- `add_column()` can add columns of length 1 (#162, #164, @LaDilettante).

Input validation
----------------

- An attempt to read or update a missing column now throws a clearer warning (#199).
- An attempt to call `add_row()` for a grouped data frame results in a helpful error message (#179).

Printing
--------

- Render Unicode multiplication sign as `x` if it cannot be represented in the current locale (#192, @ncarchedi).
- Backtick `NA` names in printing (#206, #207, @jennybc).
- `glimpse()` now uses `type_sum()` also for S3 objects (#185, #186, @holstius).
- The `max.print` option is ignored when printing a tibble (#194, #195, @t-kalinowski).

Documentation
=============

- Fix typo in `obj_sum` documentation (#193, @etiennebr).
- Reword documentation for `tribble()` (#191, @kwstat).
- Now explicitly stating minimum Rcpp version 0.12.3.

Internal
========

- Using registration of native routines.
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 10, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants