Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

type conversion of one var in spread(..., convert = TRUE) is influenced by the other vars #118

Closed
jennybc opened this issue Sep 30, 2015 · 5 comments

Comments

@jennybc
Copy link
Member

@jennybc jennybc commented Sep 30, 2015

All the variables created by spread() now share the same type, i.e. character < double < integer. This behavior is new in 0.3.x. It happened here a37274c, where type conversion was moved before the reshaping.

library(dplyr)
library(tidyr)

(x <- data_frame(
  row = rep(1:2, 3),
  column = rep(1:3, each = 2),
  cell_contents = as.character(c(rep("Argentina", 2),
                                 62.485, 64.399,
                                 1952, 1957))
))
#> Source: local data frame [6 x 3]
#> 
#>     row column cell_contents
#>   (int)  (int)         (chr)
#> 1     1      1     Argentina
#> 2     2      1     Argentina
#> 3     1      2        62.485
#> 4     2      2        64.399
#> 5     1      3          1952
#> 6     2      3          1957
var_names <- c("row", "country", "life_exp", "year")

## Watch (life_exp and) year go from character ...
x %>% 
  spread("column", "cell_contents", convert = TRUE) %>% 
  setNames(var_names)
#> Source: local data frame [2 x 4]
#> 
#>     row   country life_exp  year
#>   (int)     (chr)    (chr) (chr)
#> 1     1 Argentina   62.485  1952
#> 2     2 Argentina   64.399  1957

## to double ...
x %>%
  filter(column > 1) %>% 
  spread("column", "cell_contents", convert = TRUE) %>% 
  setNames(var_names[-2])
#> Source: local data frame [2 x 3]
#> 
#>     row life_exp  year
#>   (int)    (dbl) (dbl)
#> 1     1   62.485  1952
#> 2     2   64.399  1957

## to integer
x %>%
  filter(column > 2) %>% 
  spread("column", "cell_contents", convert = TRUE) %>% 
  setNames(var_names[-(2:3)])
#> Source: local data frame [2 x 2]
#> 
#>     row  year
#>   (int) (int)
#> 1     1  1952
#> 2     2  1957

session_info(c("dplyr", "tidyr"))
#> Session info --------------------------------------------------------------
#>  setting  value                       
#>  version  R version 3.2.2 (2015-08-14)
#>  system   x86_64, darwin13.4.0        
#>  ui       RStudio (0.99.700)          
#>  language (EN)                        
#>  collate  en_US.UTF-8                 
#>  tz       America/Vancouver           
#>  date     2015-09-30
#> Packages ------------------------------------------------------------------
#>  package    * version    date       source                       
#>  assertthat   0.1        2013-12-06 CRAN (R 3.2.0)               
#>  BH           1.58.0-1   2015-05-21 CRAN (R 3.2.0)               
#>  DBI          0.3.1      2014-09-24 CRAN (R 3.2.0)               
#>  dplyr      * 0.4.3.9000 2015-09-16 Github (hadley/dplyr@00a0a74)
#>  lazyeval     0.1.10     2015-01-02 CRAN (R 3.2.0)               
#>  magrittr     1.5        2014-11-22 CRAN (R 3.2.0)               
#>  R6           2.1.1      2015-08-19 CRAN (R 3.2.0)               
#>  Rcpp         0.12.1     2015-09-10 CRAN (R 3.2.0)               
#>  stringi      0.5-5      2015-06-29 CRAN (R 3.2.0)               
#>  tidyr      * 0.3.1      2015-09-10 CRAN (R 3.2.0)
@jennybc
Copy link
Member Author

@jennybc jennybc commented Sep 30, 2015

On a happier note, another pass with readr::type_convert() basically fixes this. But even after that, the conversion of variables that consist solely of NA is still different between v0.2.0 (logical NA) and v0.3.x (NA_character_ or whatever the type of the non-missing variables).

jennybc pushed a commit to jennybc/googlesheets that referenced this issue Oct 2, 2015
jennybc pushed a commit to jennybc/googlesheets that referenced this issue Oct 2, 2015
@jennybc
Copy link
Member Author

@jennybc jennybc commented Oct 4, 2015

I am highly motivated to work on this but fear my fix would not be the fix you want. Was I using spread() in an unintended way = for the creation of variables of diverse types?

@hadley
Copy link
Member

@hadley hadley commented Oct 6, 2015

You're using it correctly - I think the fix should be simple if you want to attempt it - I'm just calling type.convert() too early - you need to wait until after it's been converted to a data frame. A unit test would also be nice

@jennybc
Copy link
Member Author

@jennybc jennybc commented Oct 6, 2015

From a37274c I thought it was maybe an explicit goal to use matrixToDataframe()? But there will be no such matrix if the type conversion happens after reshaping.

But ok yes I will make a PR that goes to back to something closer to how it was before, but doesn't undo the progress re: #62 and #35. And add a test.

@hadley
Copy link
Member

@hadley hadley commented Oct 6, 2015

I don't think I was thinking clearly when I moved type.convert in that commit.

@hadley hadley closed this in 60d1caa Oct 13, 2015
hadley added a commit that referenced this issue Oct 13, 2015
do type conversion after reshaping in spread(); fixes #118
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants