Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need to implement $<-.tbl_df for clearer warning when trying a partial update of a missing column #199

Closed
sainathadapa opened this issue Nov 29, 2016 · 4 comments

Comments

@sainathadapa
Copy link

@sainathadapa sainathadapa commented Nov 29, 2016

This question is related to this stackoverflow question: http://stackoverflow.com/questions/39041115/fixing-a-multiple-warning-unknown-column

The following code produces a warning: (from stackoverflow answer)

library(tibble)
tibble_df <- tibble(id = c(1:3), name = c("mary", "jill","steve"))
tibble_df$test <- 'aa'
tibble_df$age[tibble_df$name == "mary"] <- 47
> Warning message:
> Unknown column 'age'

The warning arises ultimately from the check_names_df function, which is called within the $.tbl_df function. From what I understand the $.tbl_df, $<-.data.frame functions are being called in that order for the last step in the above example.

My questions are:

  1. Why is $<-.data.frame not being called directly since there is no $<-.tbl_df?
  2. Whenever df$col2[df$col1 == 'something'] <- val style is used, a warning is produced now. Do you think this style should be discouraged, since the warning is basically indirectly asking people to not use style like this.
@krlmlr
Copy link
Member

@krlmlr krlmlr commented Dec 1, 2016

Thanks.

  1. I think $<-.data.frame is called directly, and then calls $.tbl_df to get the current value (which it needs anyway to carry out the partial update).
  2. I don't think tibbles should be supporting this syntax, because it's potentially dangerous -- too easy to misspell a column name and end up with a wrong result. Initializing with NA first, as in the StackOverflow answer, is the better way in my opinion.

@zeehio
Copy link

@zeehio zeehio commented Dec 1, 2016

If the aim of that warning is to suggest a safer and future proof alternative to tibble_df$age[tibble_df$name == "mary"] <- 47, then it would be great to change the warning message from Unknown column: 'age' to Uninitialized column: 'age'. See ?good_tibble_programming for further information.

I am all for more robust ways of coding, but the message as it is makes me think that tibble has an issue when adding new columns, rather than thinking that the code I have been writing and using for years may not be the recommended way of working with tibbles.

Thanks for your hard work on the package in any case

@krlmlr
Copy link
Member

@krlmlr krlmlr commented Dec 2, 2016

Fair enough, let's implement $<-.tbl_df.

@krlmlr krlmlr changed the title Question regarding the Unknown column '***' warning Need to implement $<-.tbl_df for clearer warning when trying a partial update of a missing column Dec 2, 2016
@krlmlr krlmlr added this to the 1.3.0 milestone Jan 4, 2017
krlmlr added a commit that referenced this issue Jan 7, 2017
@hadley: Is there a better way to warn about `df$col[indexes] <- values` if `col` doesn't exist yet? Should we warn anyway (#199)?
@krlmlr krlmlr closed this in 515c973 Jan 9, 2017
krlmlr added a commit that referenced this issue Jan 9, 2017
krlmlr added a commit that referenced this issue Jan 9, 2017
- Test R 3.1.3 and later in AppVeyor, using `Depends: R (>= 3.1.0)` in `DESCRIPTION`. Support for R 3.0.0 requires a `lazyeval` update (#189).
- An attempt to partially update a missing column now throws a clearer warning (#199).
- Time series matrices (objects of class `mts` and `ts`) are now supported in `as_tibble()` (#184).
- An attempt to call `add_row()` for a grouped data frame results in a helpful error message (#179).
- The `all_equal()` function (called by `all.equal.tbl_df()`) now forwards to `dplyr` and fails with a helpful message if not installed. Data frames with list columns cannot be compared anymore, and differences in the declard class (`data.frame` vs. `tbl_df`) are ignored. This ensures consistent behavior of this function, regardless if `dplyr` is loaded or not (#198).
- Backtick `NA` names in printing (#206, #207, @jennybc).
krlmlr added a commit that referenced this issue Apr 1, 2017
Bug fixes
=========

- Time series matrices (objects of class `mts` and `ts`) are now supported in `as_tibble()` (#184).
- The `all_equal()` function (called by `all.equal.tbl_df()`) now forwards to `dplyr` and fails with a helpful message if not installed. Data frames with list columns cannot be compared anymore, and differences in the declared class (`data.frame` vs. `tbl_df`) are ignored. The `all.equal.tbl_df()` method gives a warning and forwards to `NextMethod()` if `dplyr` is not installed; call `all.equal(as.data.frame(...), ...)` to avoid the warning. This ensures consistent behavior of this function, regardless if `dplyr` is loaded or not (#198).

Interface changes
=================

- Now requiring R 3.1.0 instead of R 3.1.3 (#189).
- Add `as.tibble()` as an alias to `as_tibble()` (#160, @LaDilettante).
- New `frame_matrix()`, similar to `frame_data()` but for matrices (#140, #168, @LaDilettante).
- New `deframe()` as reverse operation to `enframe()` (#146, #214).
- Removed unused dependency on `assertthat`.

Features
========

General
-------

- Keep column classes when adding row to empty tibble (#171, #177, @LaDilettante).
- Singular and plural variants for error messages that mention a list of objects (#116, #138, @LaDilettante).
- `add_column()` can add columns of length 1 (#162, #164, @LaDilettante).

Input validation
----------------

- An attempt to read or update a missing column now throws a clearer warning (#199).
- An attempt to call `add_row()` for a grouped data frame results in a helpful error message (#179).

Printing
--------

- Render Unicode multiplication sign as `x` if it cannot be represented in the current locale (#192, @ncarchedi).
- Backtick `NA` names in printing (#206, #207, @jennybc).
- `glimpse()` now uses `type_sum()` also for S3 objects (#185, #186, @holstius).
- The `max.print` option is ignored when printing a tibble (#194, #195, @t-kalinowski).

Documentation
=============

- Fix typo in `obj_sum` documentation (#193, @etiennebr).
- Reword documentation for `tribble()` (#191, @kwstat).
- Now explicitly stating minimum Rcpp version 0.12.3.

Internal
========

- Using registration of native routines.
@github-actions
Copy link

@github-actions github-actions bot commented Dec 14, 2020

This old thread has been automatically locked. If you think you have found something related to this, please open a new issue and link to this old issue if necessary.

@github-actions github-actions bot locked and limited conversation to collaborators Dec 14, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants