Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

na="" not working for type 'c' columns #114

Closed
bearloga opened this issue Apr 10, 2015 · 4 comments
Closed

na="" not working for type 'c' columns #114

bearloga opened this issue Apr 10, 2015 · 4 comments

Comments

@bearloga
Copy link
Contributor

@bearloga bearloga commented Apr 10, 2015

Trying to read the CSV from the Planned Parenthood competition (can be downloaded by signing up)

R> col_types <- 'icn...nc...c' # shortened for GitHub
R> train_data <- read_csv('train_values.csv',col_types=col_types,na="")

Checking for empty string values:

R> train_data[,1370:1379] %>% apply(2,. %>% { .=="" } %>% sum)
c_1368 c_1369 c_1370 c_1371 c_1372 c_1373 c_1374 c_1375 c_1376 c_1377 
 14639  14561  13925  14642      4  13830   8852  13623  14636  11071 

Checking for NA values:

R> train_data[,1370:1379] %>% apply(2,. %>% is.na %>% sum)
c_1368 c_1369 c_1370 c_1371 c_1372 c_1373 c_1374 c_1375 c_1376 c_1377 
     0      0      0      0      0      0      0      0      0      0 

Comparing numeric and character type columns:

R> train_data[,115:125] %>% apply(2,. %>% is.na %>% sum)
n_0112 n_0113 n_0114 n_0115 o_0116 o_0117 o_0118 o_0119 o_0120 o_0121 o_0122 
 14594  12785  13833  14494      0      0      0      0      0      0      0 

R> train_data[,115:125] %>% apply(2,. %>% { .=="" } %>% sum)
n_0112 n_0113 n_0114 n_0115 o_0116 o_0117 o_0118 o_0119 o_0120 o_0121 o_0122 
    NA     NA     NA     NA  14322  14643  14597  14557     39  14643  14635 

Is this intended behavior?

@hadley
Copy link
Member

@hadley hadley commented Apr 10, 2015

Simple reproducible example:

read_csv("x,y
,
a,b", na = "")

This appears to be by design - I was using "" to indicate that you didn't want any missing values.

@mdlincoln
Copy link

@mdlincoln mdlincoln commented Apr 21, 2015

If so, then how do you indicate that you do want NAs for missing values in your character type columns, rather than empty strings as seems to be the current default behavior:

read_csv("x,y
,
a,b")
Source: local data frame [2 x 2]

  x y
1    
2 a b
@hadley
Copy link
Member

@hadley hadley commented Apr 21, 2015

I should've been clear that's its a bad design, and I'll fix it ;)

@bearloga
Copy link
Contributor Author

@bearloga bearloga commented Jul 9, 2015

Awesome! Thank you for fixing this!

@lock lock bot locked and limited conversation to collaborators Sep 25, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants