New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grammar fix index es #138

Merged
merged 2 commits into from Aug 30, 2016

Conversation

Projects
None yet
3 participants
@anhqle
Contributor

anhqle commented Aug 2, 2016

Resolve #116

Test cases modified and passed

@codecov-io

This comment has been minimized.

codecov-io commented Aug 2, 2016

Current coverage is 99.58% (diff: 100%)

Merging #138 into master will increase coverage by 0.14%

@@             master       #138   diff @@
==========================================
  Files            16         16          
  Lines           716        727    +11   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits            712        724    +12   
+ Misses            4          3     -1   
  Partials          0          0          

Powered by Codecov. Last update 5feb38d...7859a5f

@@ -301,7 +301,7 @@ add_row <- function(.data, ...) {
extra_vars <- setdiff(names(df), names(.data))
if (length(extra_vars) > 0) {
stopc(
"This row would add new variables: ", format_n(extra_vars)
"This row would add new variable(s): ", format_n(extra_vars)

This comment has been minimized.

@krlmlr

krlmlr Aug 2, 2016

Member

Grammar is still wrong here...

@krlmlr

This comment has been minimized.

Member

krlmlr commented Aug 2, 2016

Thanks for the pull request. Correct grammar is a good thing, but the parentheses tend to distract me from the actual contents of the message. I'd be more inclined to review a pull request that prints correct grammar without parentheses, see #116 (comment) for an sketch.

@anhqle

This comment has been minimized.

Contributor

anhqle commented Aug 2, 2016

Noted and will fix. I thought the parentheses were literally what you want in #116

@krlmlr

This comment has been minimized.

Member

krlmlr commented Aug 2, 2016

Thanks, my comment was a bit unclear indeed.

@anhqle

This comment has been minimized.

Contributor

anhqle commented Aug 2, 2016

To get correct grammar I use ngettext, which admittedly obscures the error message a little. As always I would appreciate any feedback on anything.

@krlmlr

This comment has been minimized.

Member

krlmlr commented Aug 2, 2016

ngettext() doesn't seem to be a good choice here, it works best with NLS support. I was rather thinking about something like the following to keep the code clean:

format_n <- function(prefix, objects) {
  paste0(pluralise(prefix, length(objects)), quote_n(objects))
}

pluralise("object(s)", 1)
## object
pluralise("object(s)", 2)
## objects
pluralise("[an] index(es)", 1)
## an index
pluralise("[an] index(es)", 2)
## indexes
@anhqle

This comment has been minimized.

Contributor

anhqle commented Aug 13, 2016

I have a PR ready, but concerned about the design decision.

  1. format_n are used elsewhere besides check_names_df, (e.g. all-equal.R). There will be broken code if we change format_n 's number of args from 1 to 2 as in

    format_n <- function(prefix, objects) {
      paste0(pluralise(prefix, length(objects)), quote_n(objects))
    }
    
  2. IMO the message fed to stopc is not easily comprehensible, as follows:

    if (any(non_integer)) {
        stopc("Invalid non-integer column ", format_n("index", j[non_integer]))
      }
    
  3. It seems anti-UNIX philosophy that format_n now does not only format a number (great!) but also pluralises words (not so great).

Solution: If you are only concerned about the portability of ngettext and NLS, I can write an ad-hoc ngettext for us. If you have other concerns, please advise. Thank you!

@krlmlr

This comment has been minimized.

Member

krlmlr commented Aug 14, 2016

Thanks.

  1. I wasn't aware of that. Can we create a new function which then calls format_n() and pluralise() (from #138 (comment))?

    xxx <- function(prefix, objects) {
      paste0(pluralise(prefix, objects), format_n(objects))
    }
  2. I'd rather write stopc(xxx("Invalid non-integer column index(es)", j[non_integer]))

  3. IMO all callees of xxx() are at the same level of abstraction and this is not too much for a single function to do. I'm unsure if pluralise() should accept a vector or an integer scalar.

@anhqle anhqle force-pushed the anhqle:grammar-fix-index-es branch 2 times, most recently from f7c4c3e to e7aa6e9 Aug 23, 2016

@@ -77,9 +75,10 @@ check_names_before_after.character <- function(j, names) {
check_needs_no_dim(j)

pos <- safe_match(j, names)
browser()

This comment has been minimized.

@krlmlr

krlmlr Aug 26, 2016

Member

Can you please remove this?

stopifnot(length(objects) > 0)
if (length(objects) == 1) {
# strip [, remove everything within ()
gsub("\\[|\\]|\\([^\\)]+\\)", "", message)

This comment has been minimized.

@krlmlr

krlmlr Aug 26, 2016

Member

Can we match only pairs of opening and closing parens/brackets that don't have spaces inside, e.g.

"[[]([^] ]+)]" -> "\\1"
"[(]([^) ]+)[)]" -> ""

Example:

> gsub("[[]([^] ]+)]", "\\1", "[an] index(es)")
[1] "an index(es)"
> gsub("[(]([^) ]+)[)]", "", "[an] index(es)")
[1] "[an] index"

This comment has been minimized.

@anhqle

anhqle Aug 26, 2016

Contributor

I'm not sure I follow the examples or understand the use case, but I did follow your instruction. Now if pairs of parens / brackets have spaces inside, the regex leaves them and their contents alone.

NAMESPACE Outdated
@@ -24,7 +24,6 @@ S3method(as_tibble,tbl_df)
S3method(check_names_before_after,character)
S3method(check_names_before_after,default)
S3method(check_names_df,character)
S3method(check_names_df,default)

This comment has been minimized.

@krlmlr

krlmlr Aug 26, 2016

Member

Why did this go?

Add correct plural/singular error message. Test cases included.
leave parentheses with spaces inside alone

@anhqle anhqle force-pushed the anhqle:grammar-fix-index-es branch from e7aa6e9 to 61569b9 Aug 26, 2016

gsub("\\[|\\]|\\([^\\) ]+\\)", "", message, perl = TRUE)
} else {
# strip (, remove everything within []
gsub("\\(|\\)|\\[[^\\] ]+\\]\\s*", "", message, perl = TRUE)

This comment has been minimized.

@anhqle

anhqle Aug 26, 2016

Contributor

Note the added in the regex. If parentheses / brackets contain spaces inside, we don't remove them or their content.

test_that("pluralise leaves alone parentheses / square brackets that have spaces inside", {
expect_identical(pluralise("[an] index(es )", c("x")), "an index(es )")
expect_identical(pluralise("[an ] index(es)", c("x", "y")), "[an ] indexes")
})

This comment has been minimized.

@anhqle

anhqle Aug 26, 2016

Contributor

I added a test about leaving parens/brackets with spaces alone. I didn't understand the use case, so the test examples may be dumb. I hope I got your ideas right.

This comment has been minimized.

@krlmlr

krlmlr Aug 29, 2016

Member

These special characters may be part of the error message.

What does pluralise("[an ] index(es )", c("x")) return with your implementation?

This comment has been minimized.

@anhqle

anhqle Aug 29, 2016

Contributor

pluralise("[an ] index(es )", c("x")) returns "an index(es )".

I think I see what you want now. If parens / brackets have spaces inside, they are probably part of the error message and should be left alone. For example,

"[an] invalid index(es) (be careful!) [for real]"
-> singularize: "an invalid index (be careful!) [for real]"
-> pluralise: "invalid indexes (be careful!) [for real]"

@krlmlr krlmlr merged commit 76f3a01 into tidyverse:master Aug 30, 2016

3 checks passed

codecov/project 99.58% (+0.14%) compared to 5feb38d
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@krlmlr

This comment has been minimized.

Member

krlmlr commented Aug 30, 2016

Awesome, thanks!

krlmlr added a commit that referenced this pull request Aug 30, 2016

Merge tag 'v1.2-12'
- Simplify tests for `add_row()` and `add_column()` (#165, #166, @LaDilettante).
- `add_column()` can add columns of length 1 (#162, #164, @LaDilettante).
- Singular and plural variants for error messages that mention a list of objects (#116, #138, @LaDilettante).

krlmlr added a commit that referenced this pull request Apr 1, 2017

Merge tag 'v1.3.0'
Bug fixes
=========

- Time series matrices (objects of class `mts` and `ts`) are now supported in `as_tibble()` (#184).
- The `all_equal()` function (called by `all.equal.tbl_df()`) now forwards to `dplyr` and fails with a helpful message if not installed. Data frames with list columns cannot be compared anymore, and differences in the declared class (`data.frame` vs. `tbl_df`) are ignored. The `all.equal.tbl_df()` method gives a warning and forwards to `NextMethod()` if `dplyr` is not installed; call `all.equal(as.data.frame(...), ...)` to avoid the warning. This ensures consistent behavior of this function, regardless if `dplyr` is loaded or not (#198).

Interface changes
=================

- Now requiring R 3.1.0 instead of R 3.1.3 (#189).
- Add `as.tibble()` as an alias to `as_tibble()` (#160, @LaDilettante).
- New `frame_matrix()`, similar to `frame_data()` but for matrices (#140, #168, @LaDilettante).
- New `deframe()` as reverse operation to `enframe()` (#146, #214).
- Removed unused dependency on `assertthat`.

Features
========

General
-------

- Keep column classes when adding row to empty tibble (#171, #177, @LaDilettante).
- Singular and plural variants for error messages that mention a list of objects (#116, #138, @LaDilettante).
- `add_column()` can add columns of length 1 (#162, #164, @LaDilettante).

Input validation
----------------

- An attempt to read or update a missing column now throws a clearer warning (#199).
- An attempt to call `add_row()` for a grouped data frame results in a helpful error message (#179).

Printing
--------

- Render Unicode multiplication sign as `x` if it cannot be represented in the current locale (#192, @ncarchedi).
- Backtick `NA` names in printing (#206, #207, @jennybc).
- `glimpse()` now uses `type_sum()` also for S3 objects (#185, #186, @holstius).
- The `max.print` option is ignored when printing a tibble (#194, #195, @t-kalinowski).

Documentation
=============

- Fix typo in `obj_sum` documentation (#193, @etiennebr).
- Reword documentation for `tribble()` (#191, @kwstat).
- Now explicitly stating minimum Rcpp version 0.12.3.

Internal
========

- Using registration of native routines.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment