New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add frame matrix #168

Merged
merged 6 commits into from Nov 30, 2016

Conversation

Projects
None yet
3 participants
@anhqle
Contributor

anhqle commented Aug 30, 2016

I extract the code that parses tribble(...) arguments into extract_data(), which returns list(frame_names, frame_rest). Then I re-use extract_data() in both tribble() and frame_matrix(). I'd appreciate thoughts on this.

If this strategy / function names seem okay, I'll submit more PR for edge cases.

@codecov-io

This comment has been minimized.

codecov-io commented Aug 30, 2016

Current coverage is 99.47% (diff: 97.67%)

Merging #168 into master will decrease coverage by 0.12%

@@             master       #168   diff @@
==========================================
  Files            16         16          
  Lines           740        757    +17   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits            737        753    +16   
- Misses            3          4     +1   
  Partials          0          0          

Powered by Codecov. Last update 1e5adec...f9a5224

@krlmlr

Thanks! Looks good so far, I'd prefer if the code was split in even smaller functions.

R/tribble.R Outdated
frame_mat
}
extract_data <- function(...) {

This comment has been minimized.

@krlmlr

krlmlr Oct 3, 2016

Member

The function name isn't very meaningful.

R/tribble.R Outdated
out <- rep(list(logical()), length(frame_names))
names(out) <- frame_names
return(as_tibble(out))
return(list(frame_names = frame_names, frame_rest = NULL))

This comment has been minimized.

@krlmlr

krlmlr Oct 3, 2016

Member

Are the corner cases covered by the tests?

R/tribble.R Outdated
frame_mat
}
extract_data <- function(...) {
dots <- list(...)
# Extract the names.
frame_names <- character()
i <- 1
while (TRUE) {

This comment has been minimized.

@krlmlr

krlmlr Oct 3, 2016

Member

Would you mind extracting the loop to a function?

R/tribble.R Outdated
# Create a tbl_df and return it
names(frame_col) <- frame_names
as_tibble(frame_col)
list(frame_names = frame_names, frame_rest = frame_rest)

This comment has been minimized.

@krlmlr

krlmlr Oct 3, 2016

Member

The error checking and the composition of the returned list could also form a separate function.

@krlmlr

This comment has been minimized.

Member

krlmlr commented Oct 9, 2016

Oddly, GitHub includes my already committed changes to master in the diff. Could you please rebase against master?

@anhqle

This comment has been minimized.

Contributor

anhqle commented Oct 10, 2016

Did you write some code for this yourselves? I pull rebase from master and saw some new code for edge cases. Just curious.

I think I've addressed all of your comments in the last PR (thanks as always!). Is there anything more I need to do to get Github's new "Requested changes" to go away?

@krlmlr

krlmlr approved these changes Oct 11, 2016

Looks good, just a few nitpicks.

R/tribble.R Outdated
list(frame_names = frame_names, frame_rest = frame_rest)
}
extract_frame_names_from_dots <- function(dots) {
frame_names <- character()
i <- 1
while (TRUE) {

This comment has been minimized.

@krlmlr

krlmlr Oct 11, 2016

Member

Would a for loop work here?

expect_equal(result, expected)
})
test_that("frame_matrix creates 0x0 matrix with column names when

This comment has been minimized.

@krlmlr

krlmlr Oct 11, 2016

Member

Please double-check the test title.

This comment has been minimized.

@krlmlr

krlmlr Nov 22, 2016

Member

Is it really a 0x0 matrix that's created here?

R/tribble.R Outdated
frame_names <- extract_frame_names_from_dots(dots)
# Extract the data
if (length(frame_names) == length(dots)) {

This comment has been minimized.

@krlmlr

krlmlr Oct 11, 2016

Member

Is this special case really necessary? The code might look cleaner if you just did frame_rest <- dots[-seq_along(frame_names)] .

R/tribble.R Outdated
dots <- list(...)
if (length(dots) == 0) {

This comment has been minimized.

@krlmlr

krlmlr Oct 11, 2016

Member

This shortcut might become unnecessary, too.

R/tribble.R Outdated
if (is.null(data$frame_rest)) {

This comment has been minimized.

@krlmlr

krlmlr Oct 11, 2016

Member

Safer to check length() == 0 here.

@krlmlr

This comment has been minimized.

Member

krlmlr commented Oct 11, 2016

I have no idea why GitHub is doing the things it does. Anyway, if you just do git rebase origin/master; git push -u LaDilettante HEAD --force on your branch (assuming origin is the upstream remote and LaDilettante is yours) the spurious changes should go away.

@krlmlr

See last review.

@anhqle

This comment has been minimized.

Contributor

anhqle commented Nov 22, 2016

My apology for being late with my follow-up -- I've been busying applying for jobs.

I'm committed to close this issue soon. Let me know if it needs more work.

@krlmlr

This comment has been minimized.

Member

krlmlr commented Nov 22, 2016

Thanks. Could you please rebase your branch against current master, and force-push? GitHub seems to have a tough time isolating the changes from your PR from those that are already in master.

anhqle added some commits Aug 30, 2016

Resolve #140.
- Test for frame_matrix() basic functionalities
- Extract the code that parses tribble(...) arguments. Re-use that code for both tribble and frame_matrix

@anhqle anhqle force-pushed the anhqle:add-frame_matrix branch from f661a52 to e911285 Nov 22, 2016

@anhqle

This comment has been minimized.

Contributor

anhqle commented Nov 27, 2016

@krlmlr Is this rebasing sufficient to your need?

@krlmlr

Thanks. Would you mind doing another round?

R/tribble.R Outdated
if (length(dots) == 0) return(frame_names);
for (i in 1:length(dots)) {

This comment has been minimized.

@krlmlr

krlmlr Nov 27, 2016

Member

I think you can get rid of the shortcut above by iterating over seq_along(dots) instead of 1:... .

R/tribble.R Outdated
frame_rest <- dots[i:length(dots)]
n_elements <- length(frame_rest)
validate_rectangular_shape <- function(frame_names, frame_rest) {
if (length(frame_names) == 0 & length(frame_rest) == 0) return();

This comment has been minimized.

@krlmlr

krlmlr Nov 27, 2016

Member

Minor nit: && is a bit cleaner here.

test_that("frame_matrix constructs a matrix as expected", {
result <- frame_matrix(
~col1, ~col2,
10, 3,

This comment has been minimized.

@krlmlr

krlmlr Nov 27, 2016

Member

Would you mind choosing two different values (3 vs. 3)?

R/tribble.R Outdated
frame_names <- extract_frame_names_from_dots(dots)
# Extract the data
if (length(frame_names) == 0 & length(dots) != 0) {

This comment has been minimized.

@krlmlr
R/tribble.R Outdated
if (length(data$frame_rest) == 0) {

This comment has been minimized.

@krlmlr

krlmlr Nov 27, 2016

Member

Can we remove this special case by setting data$frame_rest to logical() ? The following works on my system:

matrix(logical(), ncol = 5)

This comment has been minimized.

@anhqle

anhqle Nov 28, 2016

Contributor

Later in extract_frame_data_from_dots(), frame_rest is elegantly returned as a list with frame_rest <- dots[-seq_along(frame_names)]. In the empty case, this returns frame_rest as list(). I think it's better to keep frame_list consistently as a list no matter empty or not.

If we remove the special case here, we have to add a special case in extract_frame_data_from_dots() so that frame_rest is logical().

This comment has been minimized.

@krlmlr

krlmlr Nov 28, 2016

Member

But you could do here:

if (length(data$frame_rest) == 0) {
  data$frame_rest <- logical()
}

I prefer to avoid shortcuts for edge cases if at all possible, because they are easy to overlook if the code needs to be edited.

R/tribble.R Outdated
frame_mat <- matrix(data$frame_rest, ncol = length(data$frame_names), byrow = TRUE)
frame_col <- lapply(seq_len(ncol(frame_mat)), function(i) {

This comment has been minimized.

@krlmlr

krlmlr Nov 27, 2016

Member

Could you please extract everything after extract_frame_data_from_dots(...) to a separate function, and perhaps also function(i) {...} ?

This comment has been minimized.

@anhqle

anhqle Nov 28, 2016

Contributor

I could, but would appreciate more pointers about why. The implementation after extract_frame_data_from_dots() is very unique, so if we extract everything after it into a function, that function won't be reused anywhere but here. Note that the implementation of tribble and frame_matrix are quite different after extract_frame_data_from_dots(...), so I don't think any reuse is possible.

This comment has been minimized.

@krlmlr

krlmlr Nov 28, 2016

Member

It's based on the principles of "clean code" I learned from Robert C. Martin's book (not comprehensive):

  • Each function should do one thing (and do it well)
  • The purpose of the function should be obvious from its name
  • Prefer short pure functions with few arguments
  • All code in a function should operate on the same level of abstraction, so that the logic of each function is easy to understand

In addition, by extracting a function you could write a unit test specifically for that function, but that's currently not top priority.

R/tribble.R Outdated
frame_matrix <- function(...) {
data <- extract_frame_data_from_dots(...)
if (any(vapply(data$frame_rest, needs_list_col, logical(1)))) {

This comment has been minimized.

@krlmlr

krlmlr Nov 27, 2016

Member

Same here -- please extract everything after extract_frame_data_from_dots(...) to a function.

@anhqle

This comment has been minimized.

Contributor

anhqle commented Nov 28, 2016

@krlmlr I add a PR for the smaller stuff.

If I extract out the code after extract_frame_data_from_dots(...), those codes won't be reused anywhere else. I'm thus unsure about why we should do that. Could you explain a bit more about what your thinking behind requesting that extraction so that I can name the function / return appropriately?

@anhqle

This comment has been minimized.

Contributor

anhqle commented Nov 29, 2016

@krlmlr Everything extracted out as requested.

@krlmlr

Thanks, almost good to merge. Could you please fix, and also reorder the new functions:

  1. tribble(), its alias, and frame_matrix()
  2. extract_frame_data_from_dots() and its callees
  3. turn_frame_data_into_tibble() and callees
  4. turn_frame_data_into_matrix() and callees
R/tribble.R Outdated
if (length(dots) == 0) return(frame_names);

This comment has been minimized.

@krlmlr

krlmlr Nov 29, 2016

Member

Do we still need this shortcut?

R/tribble.R Outdated
#' \code{frame_names}-a character vector, and \code{frame_rest}-a list
turn_frame_data_into_tibble <- function(data) {
if (length(data$frame_rest) == 0) {
data$frame_rest <- matrix(logical(), ncol = length(data$frame_names))

This comment has been minimized.

@krlmlr

krlmlr Nov 29, 2016

Member

Would it be sufficient to set rest <- logical() ?

R/tribble.R Outdated
#'
#' @param data a list with 2 elements,
#' \code{frame_names}-a character vector, and \code{frame_rest}-a list
turn_frame_data_into_tibble <- function(data) {

This comment has been minimized.

@krlmlr

krlmlr Nov 29, 2016

Member

I think it'll be easier to read if this function (and its matrix counterpart) take two arguments names and rest. We wouldn't need the documentation, too.

R/tribble.R Outdated
return(frame_col)
}
frame_matrix <- function(...) {

This comment has been minimized.

@krlmlr

krlmlr Nov 29, 2016

Member

Did you mean to export and document this function?

final touches
1. reorder functions
2. documentation for frame_matrix
3. other minor fixes
@anhqle

This comment has been minimized.

Contributor

anhqle commented Nov 29, 2016

@krlmlr I moved things around as requested -- thanks for being patient with your guidance.

Two more things we may want to do

  • Should we use frame_matrix() while frame_data() will be phased out? Would mibble() be too obscure?
  • I realize that we should also allow frame_matrix() to take in row names. If you think it's a good idea, I can file an issue and pick this up in a month.

@krlmlr krlmlr merged commit 3075318 into tidyverse:master Nov 30, 2016

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@krlmlr

This comment has been minimized.

Member

krlmlr commented Nov 30, 2016

Thanks!

@hadley: Can you suggest an alternative name for frame_matrix() ? Are we going to keep the frame_data() alias?

mibble() has been suggested by @lionel-, too -- I'm fine with that. Open to row names / dim names extensions as described in #140.

krlmlr added a commit that referenced this pull request Nov 30, 2016

Merge tag 'v1.2-13'
- New `frame_matrix()` (#140, #168, @LaDilettante).
- The `max.print` option is ignored when printing a tibble (#194, #195, @t-kalinowski).
- Fix typo in `obj_sum` documentation (#193, @etiennebr).
- Keep column classes when adding row to empty tibble (#171, #177, @LaDilettante).
- Now explicitly stating minimum Rcpp version 0.12.3.

krlmlr added a commit that referenced this pull request Apr 1, 2017

Merge tag 'v1.3.0'
Bug fixes
=========

- Time series matrices (objects of class `mts` and `ts`) are now supported in `as_tibble()` (#184).
- The `all_equal()` function (called by `all.equal.tbl_df()`) now forwards to `dplyr` and fails with a helpful message if not installed. Data frames with list columns cannot be compared anymore, and differences in the declared class (`data.frame` vs. `tbl_df`) are ignored. The `all.equal.tbl_df()` method gives a warning and forwards to `NextMethod()` if `dplyr` is not installed; call `all.equal(as.data.frame(...), ...)` to avoid the warning. This ensures consistent behavior of this function, regardless if `dplyr` is loaded or not (#198).

Interface changes
=================

- Now requiring R 3.1.0 instead of R 3.1.3 (#189).
- Add `as.tibble()` as an alias to `as_tibble()` (#160, @LaDilettante).
- New `frame_matrix()`, similar to `frame_data()` but for matrices (#140, #168, @LaDilettante).
- New `deframe()` as reverse operation to `enframe()` (#146, #214).
- Removed unused dependency on `assertthat`.

Features
========

General
-------

- Keep column classes when adding row to empty tibble (#171, #177, @LaDilettante).
- Singular and plural variants for error messages that mention a list of objects (#116, #138, @LaDilettante).
- `add_column()` can add columns of length 1 (#162, #164, @LaDilettante).

Input validation
----------------

- An attempt to read or update a missing column now throws a clearer warning (#199).
- An attempt to call `add_row()` for a grouped data frame results in a helpful error message (#179).

Printing
--------

- Render Unicode multiplication sign as `x` if it cannot be represented in the current locale (#192, @ncarchedi).
- Backtick `NA` names in printing (#206, #207, @jennybc).
- `glimpse()` now uses `type_sum()` also for S3 objects (#185, #186, @holstius).
- The `max.print` option is ignored when printing a tibble (#194, #195, @t-kalinowski).

Documentation
=============

- Fix typo in `obj_sum` documentation (#193, @etiennebr).
- Reword documentation for `tribble()` (#191, @kwstat).
- Now explicitly stating minimum Rcpp version 0.12.3.

Internal
========

- Using registration of native routines.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment