Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

type_convert only touches character columns; doesn't check/subset col_types accordingly #369

Closed
jennybc opened this issue Feb 22, 2016 · 2 comments

Comments

@jennybc
Copy link
Member

commented Feb 22, 2016

library(readr)
suppressPackageStartupMessages(library(dplyr))

If you can use read_csv(), the automatic variable type detection is great.

read_csv("integer,logical\n1,TRUE\n2,FALSE\n")
#> Source: local data frame [2 x 2]
#> 
#>   integer logical
#>     (int)   (lgl)
#> 1       1    TRUE
#> 2       2   FALSE

I understand type_convert() is meant to do similar when you're stuck with a data frame that needs Hadleyverse type (re-)conversion.

(df <- data_frame(integer = c(1, 2),
                  logical = c("TRUE", "FALSE")))
#> Source: local data frame [2 x 2]
#> 
#>   integer logical
#>     (dbl)   (chr)
#> 1       1    TRUE
#> 2       2   FALSE

Right now, only character columns are re-processed. Could all columns be re-processed? At least when col_types = NULL?

type_convert(df)
#> Source: local data frame [2 x 2]
#> 
#>   integer logical
#>     (dbl)   (lgl)
#> 1       1    TRUE
#> 2       2   FALSE

Also, if you specify col_types, it's not actually checked/subsetted against the character columns, so you get warnings and an incorrect result.

type_convert(df, col_types = "il")
#> Warning: Insufficient `col_names`. Adding 1 names.
#> Warning in type_convert_col(char_cols[[i]], col_types[[i]],
#> which(is_character)[i], : [0, 2]: expected an integer, but got 'TRUE'
#> Warning in type_convert_col(char_cols[[i]], col_types[[i]],
#> which(is_character)[i], : [1, 2]: expected an integer, but got 'FALSE'
#> Source: local data frame [2 x 2]
#> 
#>   integer logical
#>     (dbl)   (int)
#> 1       1      NA
#> 2       2      NA

Current workaround is to make sure everything is character prior to use of type_convert().

df[] <- lapply(df, as.character)
type_convert(df)
#> Source: local data frame [2 x 2]
#> 
#>   integer logical
#>     (int)   (lgl)
#> 1       1    TRUE
#> 2       2   FALSE

This may be related to what this person was talking about? Hard to tell. #160

@hadley

This comment has been minimized.

Copy link
Member

commented Jun 1, 2016

The problem is that there's no logic in readr to simplify anything other than character columns, so I think (e.g.) converting double to integer is out of scope (unless that is legitimately the only conversion, in which we could make a special case).

We definitely need to rethink how col_types works in this scenario.

@hadley

This comment has been minimized.

Copy link
Member

commented Jun 8, 2016

@jimhester I think the main thing here is probably to not allow character specification of column types

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
3 participants
You can’t perform that action at this time.