Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tibble vs tbl_df #82

Closed
hadley opened this issue May 25, 2016 · 4 comments

Comments

@hadley
Copy link
Member

commented May 25, 2016

If you're new to tibble/dplyr, it's a bit confusing to understand the difference between tibbles and tbl_df. To help reduce this confusion we might:

  • Add ?tibble and explanation the history & definition
  • Make obj_sum return "tibble" for tibbles (instead of "tbl_df")
@hadley

This comment has been minimized.

Copy link
Member Author

commented May 25, 2016

That would mean we'd need to deprecate (and eventually remove) tibble().

What would you think of that @jennybc?

@krlmlr

This comment has been minimized.

Copy link
Member

commented May 25, 2016

Alternative option: We prominently link to package?tibble from ?tibble from the first paragraph of the "Description" section.

@jennybc

This comment has been minimized.

Copy link
Member

commented May 25, 2016

You're making me wonder if I know the difference between tibbles and tbl_df. When I say "a tibble", I think I mean a tbl_df. Am I confused? When you say tbl_df above, do you mean data.frame?

Regardless, I think you're saying it makes sense to use tibble as a place to document tibbles. So the function tibble() needs to get out of the way, i.e. only be known as frame_data(). That seems OK.

This brings up a question and it seems a decent place to ask it:

  • To turn sthg into a tibble, are you supposed to use tibble::as_data_frame()? Or dplyr::as.tbl()? Why isn't it tibble::as_tbl_df()? I guess because data_frame() exists? There are a lot of ways to get confused here.

krlmlr pushed a commit that referenced this issue May 26, 2016

Kirill Müller
add link
- Link to the package documentation from the `tibble` help page (#82).

krlmlr pushed a commit that referenced this issue Jun 13, 2016

Kirill Müller
Merge tag 'v1.0-6'
- Reworked output: More concise summary, removed empty line, showing number of hidden rows and columns (#51).
- Link to the package documentation from the `tibble` help page (#82).
- Don't rely on `knitr` internals for testing (#78).
@hadley

This comment has been minimized.

Copy link
Member Author

commented Jun 13, 2016

I think of data_frame() and as_data_frame() as parallels of data.frame() and as.data.frame() in the same way that read_csv() is a parallel of read.csv(). But maybe that's too clever?

It would be more consistent to have tibble() as the standard creation method and as_tibble() as the standard coercion method. We should probably keep data_frame() and as_data_frame() but de-emphasise them in the documentation.

@krlmlr krlmlr modified the milestone: 1.1 Jun 30, 2016

@krlmlr krlmlr self-assigned this Jun 30, 2016

@krlmlr krlmlr added the in progress label Jun 30, 2016

@krlmlr krlmlr closed this in 7ae74a6 Jun 30, 2016

@krlmlr krlmlr removed the in progress label Jun 30, 2016

krlmlr pushed a commit that referenced this issue Jun 30, 2016

Kirill Müller
Merge tag 'v1.0-15'
- Prefer `tibble()` and `as_tibble()` over `data_frame()` and `as_data_frame()` in code and documentation (#82).
- `tibble()` is no longer an alias for `frame_data()` (#82).
- Rename `is_data_frame()` to `is_tibble()`.
- `obj_sum()` and `type_sum()` show `"tibble"` instead of `"tbl_df"` for tibbles (#82).

krlmlr pushed a commit that referenced this issue Jul 4, 2016

Kirill Müller
Merge tag 'v1.1'
Follow-up release.

- `tibble()` is no longer an alias for `frame_data()` (#82).
- Remove `tbl_df()` (#57).
- `$` returns `NULL` if column not found, without partial matching. A warning is given (#109).
- `[[` returns `NULL` if column not found (#109).

- Reworked output: More concise summary (begins with hash `#` and contains more text (#95)), removed empty line, showing number of hidden rows and columns (#51). The trailing metadata also begins with hash `#` (#101). Presence of row names is indicated by a star in printed output (#72).
- Format `NA` values in character columns as `<NA>`, like `print.data.frame()` does (#69).
- The number of printed extra cols is now an option (#68, @lionel-).
- Computation of column width properly handles wide (e.g., Chinese) characters, tests still fail on Windows (#100).
- `glimpse()` shows nesting structure for lists and uses angle brackets for type (#98).
- Tibbles with `POSIXlt` columns can be printed now, the text `<POSIXlt>` is shown as placeholder to encourage usage of `POSIXct` (#86).
- `type_sum()` shows only topmost class for S3 objects.

- Strict checking of integer and logical column indexes. For integers, passing a non-integer index or an out-of-bounds index raises an error. For logicals, only vectors of length 1 or `ncol` are supported. Passing a matrix or an array now raises an error in any case (#83).
- Warn if setting non-`NULL` row names (#75).
- Consistently surround variable names with single quotes in error messages.
- Use "Unknown column 'x'" as error message if column not found, like base R (#94).
- `stop()` and `warning()` are now always called with `call. = FALSE`.

- The `.Dim` attribute is silently stripped from columns that are 1d matrices (#84).
- Converting a tibble without row names to a regular data frame does not add explicit row names.
- `as_tibble.data.frame()` preserves attributes, and uses `as_tibble.list()` to calling overriden methods which may lead to endless recursion.

- New `has_name() (#102).
- Prefer `tibble()` and `as_tibble()` over `data_frame()` and `as_data_frame()` in code and documentation (#82).
- New `is.tibble()` and `is_tibble()` (#79).
- New `enframe()` that converts vectors to two-column tibbles (#31, #74).
- `obj_sum()` and `type_sum()` show `"tibble"` instead of `"tbl_df"` for tibbles (#82).
- `as_tibble.data.frame()` gains `validate` argument (as in `as_tibble.list()`), if `TRUE` the input is validated.
- Implement `as_tibble.default()` (#71, tidyverse/dplyr#1752).
- `has_rownames()` supports arguments that are not data frames.

- Two-dimensional indexing with `[[` works (#58, #63).
- Subsetting with empty index (e.g., `x[]`) also removes row names.

- Document behavior of `as_tibble.tbl_df()` for subclasses (#60).
- Document and test that subsetting removes row names.

- Don't rely on `knitr` internals for testing (#78).
- Fix compatibility with `knitr` 1.13 (#76).
- Enhance `knit_print()` tests.
- Provide default implementation for `tbl_sum.tbl_sql()` and `tbl_sum.tbl_grouped_df()` to allow `dplyr` release before a `tibble` release.
- Explicit tests for `format_v()` (#98).
- Test output for `NULL` value of `tbl_sum()`.
- Test subsetting in all variants (#62).
- Add missing test from dplyr.
- Use new `expect_output_file()` from `testthat`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.