Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add_row changes column type in empty data frame #171

Closed
atribe opened this issue Sep 7, 2016 · 9 comments
Closed

add_row changes column type in empty data frame #171

atribe opened this issue Sep 7, 2016 · 9 comments

Comments

@atribe
Copy link

@atribe atribe commented Sep 7, 2016

When not using all name-value pairs during add_row on empty tbl_df the column types are being changed to lgl.

Example:

example.empty.df <- structure(list(id = integer(0),
                                   install_id = integer(0),
                                   port = integer(0), 
                                   db_name = character(0),
                                   complete_db = integer(0)),
                              .Names = c("id", "install_id", "port", "db_name", "complete_db"),
                              row.names = integer(0),
                              class = c("tbl_df", "tbl", "data.frame"))

example.empty.df
# A tibble: 0 × 5
# ... with 5 variables: id <int>, install_id <int>, port <int>, db_name <chr>, complete_db <int>

changed.classes <- example.empty.df %>% tibble::add_row(install_id = 5)
changed.classes
# A tibble: 1 × 5
     id install_id  port db_name complete_db
  <lgl>      <dbl> <lgl>   <lgl>         <lgl>
1    NA          5    NA      NA            NA

I expected the column classes to be retained.

@anhqle
Copy link
Contributor

@anhqle anhqle commented Sep 13, 2016

It's because of this part, which pads empty columns with NA, which is logical by default.

na_value <- function(boilerplate) {
  if (is.list(boilerplate)) {
    list(NULL)
  }
  else
    NA
}

The NA is logical so that it can be rbind-ed to any column class, e.g. Date, DateTime. I wonder what would be a good solution.

@atribe
Copy link
Author

@atribe atribe commented Sep 14, 2016

How about something like:

na_value <- function(boilerplate) {
  if (is.list(boilerplate)) {
    list(NULL)
  } else if(class(boilerplate) == "integer") {
    NA_integer_
  } else if(class(boilerplate) == "character") {
    NA_character_
  } else if(class(boilerplate) == "numeric") {
    NA_real_
  } else if(class(boilerplate) == "complex") {
    NA_complex_
  } else {
    NA
  }
}

I did some basic testing and it did the trick in my instance.

@anhqle
Copy link
Contributor

@anhqle anhqle commented Sep 14, 2016

I tried that, but it doesn't cover other classes like factor, Date, Datetime.

na_value <- function(boilerplate) {
if (is.list(boilerplate)) {
list(NULL)
} else if(class(boilerplate) == "integer") {
NA_integer_
} else if(class(boilerplate) == "character") {
NA_character_
} else if(class(boilerplate) == "numeric") {
NA_real_
} else if(class(boilerplate) == "complex") {
NA_complex_
} else {
NA
}
}

I did some basic testing and it did the trick in my instance.

—You are receiving this because you commented.Reply to this email directly, view it on GitHub, or mute the thread.

@krlmlr
Copy link
Member

@krlmlr krlmlr commented Sep 14, 2016

One option would be:

> rbind(data.frame(a = character(), stringsAsFactors = FALSE), data.frame(a = NA))[[1]] %>% class
[1] "logical"
> dplyr::bind_rows(data.frame(a = character(), stringsAsFactors = FALSE), data.frame(a = NA))[[1]] %>% class
[1] "character"

@anhqle
Copy link
Contributor

@anhqle anhqle commented Sep 14, 2016

Should we extract the code of bind_rows into tibble or should we import dplyr?

Extracting the code seems to violate DRY, but importing the entire dplyr just for one function seems overkill. Plus I don't know what's your vision for the dependencies within tidyverse.

@atribe
Copy link
Author

@atribe atribe commented Sep 15, 2016

What about the approach of reapplying classes? Something like this

missing_vars <- setdiff(names(my.data), names(df))
df[missing_vars] <- lapply(my.data[missing_vars], na_value)
df <- df[names(my.data)]

# new bit
df <- Map(f = function(df, .data.class) as(df, .data.class),
          df = df,
          .data.class = lapply(.data, class)) %>%
  as.tibble()

@anhqle
Copy link
Contributor

@anhqle anhqle commented Sep 16, 2016

If I'm not wrong, some classes can't be coerced like this. For example you can't turn logical NA into a factor.

@krlmlr
Copy link
Member

@krlmlr krlmlr commented Sep 16, 2016

Importing dplyr won't work. I'm looking into reorganizing the C++ code in dplyr, let me think of a good approach to the problem here. Until then I'd review a PR that reorganizes the if conditions.

anhqle pushed a commit to anhqle/tibble that referenced this issue Oct 2, 2016
When add_row to an empty tibble, all column classes were converted to logical.

We fix this by checking whether we're adding to an empty tibble. If yes, we coerce column classes back to the original.
anhqle pushed a commit to anhqle/tibble that referenced this issue Oct 9, 2016
When add_row to an empty tibble, all column classes were converted to logical.

We fix this by checking whether we're adding to an empty tibble. If yes, we coerce column classes back to the original.
@krlmlr krlmlr closed this in #177 Oct 9, 2016
krlmlr added a commit that referenced this issue Oct 9, 2016
…empty-tibble

- Keep column classes when adding row to empty tibble (#171, #177, @LaDilettante).
krlmlr added a commit that referenced this issue Nov 30, 2016
- New `frame_matrix()` (#140, #168, @LaDilettante).
- The `max.print` option is ignored when printing a tibble (#194, #195, @t-kalinowski).
- Fix typo in `obj_sum` documentation (#193, @etiennebr).
- Keep column classes when adding row to empty tibble (#171, #177, @LaDilettante).
- Now explicitly stating minimum Rcpp version 0.12.3.
krlmlr added a commit that referenced this issue Apr 1, 2017
Bug fixes
=========

- Time series matrices (objects of class `mts` and `ts`) are now supported in `as_tibble()` (#184).
- The `all_equal()` function (called by `all.equal.tbl_df()`) now forwards to `dplyr` and fails with a helpful message if not installed. Data frames with list columns cannot be compared anymore, and differences in the declared class (`data.frame` vs. `tbl_df`) are ignored. The `all.equal.tbl_df()` method gives a warning and forwards to `NextMethod()` if `dplyr` is not installed; call `all.equal(as.data.frame(...), ...)` to avoid the warning. This ensures consistent behavior of this function, regardless if `dplyr` is loaded or not (#198).

Interface changes
=================

- Now requiring R 3.1.0 instead of R 3.1.3 (#189).
- Add `as.tibble()` as an alias to `as_tibble()` (#160, @LaDilettante).
- New `frame_matrix()`, similar to `frame_data()` but for matrices (#140, #168, @LaDilettante).
- New `deframe()` as reverse operation to `enframe()` (#146, #214).
- Removed unused dependency on `assertthat`.

Features
========

General
-------

- Keep column classes when adding row to empty tibble (#171, #177, @LaDilettante).
- Singular and plural variants for error messages that mention a list of objects (#116, #138, @LaDilettante).
- `add_column()` can add columns of length 1 (#162, #164, @LaDilettante).

Input validation
----------------

- An attempt to read or update a missing column now throws a clearer warning (#199).
- An attempt to call `add_row()` for a grouped data frame results in a helpful error message (#179).

Printing
--------

- Render Unicode multiplication sign as `x` if it cannot be represented in the current locale (#192, @ncarchedi).
- Backtick `NA` names in printing (#206, #207, @jennybc).
- `glimpse()` now uses `type_sum()` also for S3 objects (#185, #186, @holstius).
- The `max.print` option is ignored when printing a tibble (#194, #195, @t-kalinowski).

Documentation
=============

- Fix typo in `obj_sum` documentation (#193, @etiennebr).
- Reword documentation for `tribble()` (#191, @kwstat).
- Now explicitly stating minimum Rcpp version 0.12.3.

Internal
========

- Using registration of native routines.
@github-actions
Copy link

@github-actions github-actions bot commented Dec 14, 2020

This old thread has been automatically locked. If you think you have found something related to this, please open a new issue and link to this old issue if necessary.

@github-actions github-actions bot locked and limited conversation to collaborators Dec 14, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

3 participants