New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

type conversion of one var in spread(..., convert = TRUE) is influenced by the other vars #118

Closed
jennybc opened this Issue Sep 30, 2015 · 5 comments

Comments

Projects
None yet
2 participants
@jennybc
Member

jennybc commented Sep 30, 2015

All the variables created by spread() now share the same type, i.e. character < double < integer. This behavior is new in 0.3.x. It happened here a37274c, where type conversion was moved before the reshaping.

library(dplyr)
library(tidyr)

(x <- data_frame(
  row = rep(1:2, 3),
  column = rep(1:3, each = 2),
  cell_contents = as.character(c(rep("Argentina", 2),
                                 62.485, 64.399,
                                 1952, 1957))
))
#> Source: local data frame [6 x 3]
#> 
#>     row column cell_contents
#>   (int)  (int)         (chr)
#> 1     1      1     Argentina
#> 2     2      1     Argentina
#> 3     1      2        62.485
#> 4     2      2        64.399
#> 5     1      3          1952
#> 6     2      3          1957
var_names <- c("row", "country", "life_exp", "year")

## Watch (life_exp and) year go from character ...
x %>% 
  spread("column", "cell_contents", convert = TRUE) %>% 
  setNames(var_names)
#> Source: local data frame [2 x 4]
#> 
#>     row   country life_exp  year
#>   (int)     (chr)    (chr) (chr)
#> 1     1 Argentina   62.485  1952
#> 2     2 Argentina   64.399  1957

## to double ...
x %>%
  filter(column > 1) %>% 
  spread("column", "cell_contents", convert = TRUE) %>% 
  setNames(var_names[-2])
#> Source: local data frame [2 x 3]
#> 
#>     row life_exp  year
#>   (int)    (dbl) (dbl)
#> 1     1   62.485  1952
#> 2     2   64.399  1957

## to integer
x %>%
  filter(column > 2) %>% 
  spread("column", "cell_contents", convert = TRUE) %>% 
  setNames(var_names[-(2:3)])
#> Source: local data frame [2 x 2]
#> 
#>     row  year
#>   (int) (int)
#> 1     1  1952
#> 2     2  1957

session_info(c("dplyr", "tidyr"))
#> Session info --------------------------------------------------------------
#>  setting  value                       
#>  version  R version 3.2.2 (2015-08-14)
#>  system   x86_64, darwin13.4.0        
#>  ui       RStudio (0.99.700)          
#>  language (EN)                        
#>  collate  en_US.UTF-8                 
#>  tz       America/Vancouver           
#>  date     2015-09-30
#> Packages ------------------------------------------------------------------
#>  package    * version    date       source                       
#>  assertthat   0.1        2013-12-06 CRAN (R 3.2.0)               
#>  BH           1.58.0-1   2015-05-21 CRAN (R 3.2.0)               
#>  DBI          0.3.1      2014-09-24 CRAN (R 3.2.0)               
#>  dplyr      * 0.4.3.9000 2015-09-16 Github (hadley/dplyr@00a0a74)
#>  lazyeval     0.1.10     2015-01-02 CRAN (R 3.2.0)               
#>  magrittr     1.5        2014-11-22 CRAN (R 3.2.0)               
#>  R6           2.1.1      2015-08-19 CRAN (R 3.2.0)               
#>  Rcpp         0.12.1     2015-09-10 CRAN (R 3.2.0)               
#>  stringi      0.5-5      2015-06-29 CRAN (R 3.2.0)               
#>  tidyr      * 0.3.1      2015-09-10 CRAN (R 3.2.0)
@jennybc

This comment has been minimized.

Member

jennybc commented Sep 30, 2015

On a happier note, another pass with readr::type_convert() basically fixes this. But even after that, the conversion of variables that consist solely of NA is still different between v0.2.0 (logical NA) and v0.3.x (NA_character_ or whatever the type of the non-missing variables).

jennybc added a commit to jennybc/googlesheets that referenced this issue Oct 2, 2015

jennybc added a commit to jennybc/googlesheets that referenced this issue Oct 2, 2015

@jennybc

This comment has been minimized.

Member

jennybc commented Oct 4, 2015

I am highly motivated to work on this but fear my fix would not be the fix you want. Was I using spread() in an unintended way = for the creation of variables of diverse types?

@hadley

This comment has been minimized.

Member

hadley commented Oct 6, 2015

You're using it correctly - I think the fix should be simple if you want to attempt it - I'm just calling type.convert() too early - you need to wait until after it's been converted to a data frame. A unit test would also be nice

@jennybc

This comment has been minimized.

Member

jennybc commented Oct 6, 2015

From a37274c I thought it was maybe an explicit goal to use matrixToDataframe()? But there will be no such matrix if the type conversion happens after reshaping.

But ok yes I will make a PR that goes to back to something closer to how it was before, but doesn't undo the progress re: #62 and #35. And add a test.

@hadley

This comment has been minimized.

Member

hadley commented Oct 6, 2015

I don't think I was thinking clearly when I moved type.convert in that commit.

@hadley hadley closed this in 60d1caa Oct 13, 2015

hadley added a commit that referenced this issue Oct 13, 2015

Merge pull request #120 from jennybc/master
do type conversion after reshaping in spread(); fixes #118
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment