New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

na="" not working for type 'c' columns #114

Closed
bearloga opened this Issue Apr 10, 2015 · 4 comments

Comments

Projects
None yet
3 participants
@bearloga
Contributor

bearloga commented Apr 10, 2015

Trying to read the CSV from the Planned Parenthood competition (can be downloaded by signing up)

R> col_types <- 'icn...nc...c' # shortened for GitHub
R> train_data <- read_csv('train_values.csv',col_types=col_types,na="")

Checking for empty string values:

R> train_data[,1370:1379] %>% apply(2,. %>% { .=="" } %>% sum)
c_1368 c_1369 c_1370 c_1371 c_1372 c_1373 c_1374 c_1375 c_1376 c_1377 
 14639  14561  13925  14642      4  13830   8852  13623  14636  11071 

Checking for NA values:

R> train_data[,1370:1379] %>% apply(2,. %>% is.na %>% sum)
c_1368 c_1369 c_1370 c_1371 c_1372 c_1373 c_1374 c_1375 c_1376 c_1377 
     0      0      0      0      0      0      0      0      0      0 

Comparing numeric and character type columns:

R> train_data[,115:125] %>% apply(2,. %>% is.na %>% sum)
n_0112 n_0113 n_0114 n_0115 o_0116 o_0117 o_0118 o_0119 o_0120 o_0121 o_0122 
 14594  12785  13833  14494      0      0      0      0      0      0      0 

R> train_data[,115:125] %>% apply(2,. %>% { .=="" } %>% sum)
n_0112 n_0113 n_0114 n_0115 o_0116 o_0117 o_0118 o_0119 o_0120 o_0121 o_0122 
    NA     NA     NA     NA  14322  14643  14597  14557     39  14643  14635 

Is this intended behavior?

@hadley

This comment has been minimized.

Member

hadley commented Apr 10, 2015

Simple reproducible example:

read_csv("x,y
,
a,b", na = "")

This appears to be by design - I was using "" to indicate that you didn't want any missing values.

@mdlincoln

This comment has been minimized.

mdlincoln commented Apr 21, 2015

If so, then how do you indicate that you do want NAs for missing values in your character type columns, rather than empty strings as seems to be the current default behavior:

read_csv("x,y
,
a,b")
Source: local data frame [2 x 2]

  x y
1    
2 a b
@hadley

This comment has been minimized.

Member

hadley commented Apr 21, 2015

I should've been clear that's its a bad design, and I'll fix it ;)

@bearloga

This comment has been minimized.

Contributor

bearloga commented Jul 9, 2015

Awesome! Thank you for fixing this!

@lock lock bot locked and limited conversation to collaborators Sep 25, 2018

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.