New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NAs for cells with 'numbers as text' in a numeric column #217

Closed
ajdamico opened this Issue Nov 22, 2016 · 3 comments

Comments

4 participants
@ajdamico

ajdamico commented Nov 22, 2016

mre below. thanks hadley!

library(readxl)
library(gdata)

pums.layout <- paste0( tempfile() , '.xls' )

download.file( "http://www2.census.gov/census_2000/datasets/PUMS/FivePercent/5%25_PUMS_record_layout.xls" ,	pums.layout , mode = 'wb' )

readxl_result <- readxl::read_excel( pums.layout , sheet = 1 , skip = 1 )

gdata_result <- gdata::read.xls( pums.layout , sheet = 1 , skip = 1 )

# these values match what's in the file when you open in excel
gdata_result[ 350:527 , 3 ]

# here's the error: missings
readxl_result[ 350:527 , 3 ]
@jennybc

This comment has been minimized.

Member

jennybc commented Jan 5, 2017

These cells are "Number stored as text", according the ⚠️ pop-up in Excel.

Error from src/XlsWorkSheet.h#L145-L149. This block commented as "// Needs to compare to actual cell type to give warnings". So this is intentional behavior, though possibly too picky?

## https://github.com/hadley/readxl/issues/217

library(readxl)
library(gdata)

#pums.layout <- paste0( tempfile() , '.xls' )
pums.layout <- "5%25_PUMS_record_layout.xls"

download.file( "http://www2.census.gov/census_2000/datasets/PUMS/FivePercent/5%25_PUMS_record_layout.xls" , pums.layout , mode = 'wb' )

readxl_result <- readxl::read_excel( pums.layout , sheet = 1 , skip = 1 )
#> DEFINEDNAME: 20 00 00 01 0b 00 00 00 01 00 00 00 00 00 00 06 3b 00 00 00 00 78 02 00 00 0d 00 
#> DEFINEDNAME: 20 00 00 01 0b 00 00 00 02 00 00 00 00 00 00 06 3b 01 00 00 00 c3 04 00 00 0d 00 
#> DEFINEDNAME: 20 00 00 01 0b 00 00 00 01 00 00 00 00 00 00 06 3b 00 00 00 00 78 02 00 00 0d 00 
#> DEFINEDNAME: 20 00 00 01 0b 00 00 00 02 00 00 00 00 00 00 06 3b 01 00 00 00 c3 04 00 00 0d 00 
#> DEFINEDNAME: 20 00 00 01 0b 00 00 00 01 00 00 00 00 00 00 06 3b 00 00 00 00 78 02 00 00 0d 00 
#> DEFINEDNAME: 20 00 00 01 0b 00 00 00 02 00 00 00 00 00 00 06 3b 01 00 00 00 c3 04 00 00 0d 00 
#> DEFINEDNAME: 20 00 00 01 0b 00 00 00 01 00 00 00 00 00 00 06 3b 00 00 00 00 78 02 00 00 0d 00 
#> DEFINEDNAME: 20 00 00 01 0b 00 00 00 02 00 00 00 00 00 00 06 3b 01 00 00 00 c3 04 00 00 0d 00
#> Warning in xls_cols(path, sheet, col_names = col_names, col_types =
#> col_types, : Expecting numeric in [186, 3] got `123`
#> Warning in xls_cols(path, sheet, col_names = col_names, col_types =
#> col_types, : Expecting numeric in [187, 3] got `123`
#> <snip, snip... more of the same>

gdata_result <- gdata::read.xls( pums.layout , sheet = 1 , skip = 1 )

# these values match what's in the file when you open in excel
gdata_result[ 350:527 , 3 ]
#>   [1] 188 188 188 188 188 188 188 188 188 188 188 188 188 188 188 188 188
#>  [18] 189 189 190 190 190  NA 191 191 195 195 195 195 195 196 196 200 200
#>  [35] 200 200 200 201 201 203 203 203 203 203 203 203 203 203 203 203 203
#>  [52] 203 203 203 203 203 203 203 203 203 203 203 203 203 204 204 205 205
#>  [69] 205 206 206 211 211 211 211 211 212 212 213 213 213 213 213 213 213
#>  [86] 213 215 215 217 217 219 219 221 221 223 223 224 224 225 225 225 225
#> [103] 225 226 226 226 226 226 227 227 232 232 232 235 235 235 236 236 240
#> [120] 240 240 243 243 243 244 244 244 245 245 245 245 245 245 246 246 246
#> [137] 247 247 247 247 247 248 248 248 248 248 248 248 248 248 250 250 250
#> [154] 250 250 250 250 250 250 250 250 250 250 250 250 250 258 258 258 258
#> [171] 258 258 266 266 266 266 266 266

# here's the error: missings
readxl_result[ 350:527 , 3 ]
#>   [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
#>  [24] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
#>  [47] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
#>  [70] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
#>  [93] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
#> [116] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
#> [139] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
#> [162] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

@jennybc jennybc added the col_types label Jan 7, 2017

@jennybc jennybc changed the title from some cells are missing some data to NAs for cells with 'numbers as text' in a numeric column Jan 7, 2017

@MichaelChirico

This comment has been minimized.

MichaelChirico commented Feb 23, 2017

Chiming in to agree it's too picky, especially if I explicitly state col_types to be numeric, I expect coercion of numbers-as-text

@jennybc jennybc added this to TODO in jennybc Feb 26, 2017

@jennybc jennybc closed this in #277 Mar 5, 2017

jennybc added a commit that referenced this issue Mar 5, 2017

Add logical cell & col types; refactor cell typing and coercion (#277)
* Add logical cell and col types; general refactor of cell typing and coercion

Rescues xls formula dates

* Make logicalFromString() to use in xls and xlsx

* Attempt to coerce text to number; fixes #217, fixes #106

* Work on NEWS.md

* Fiddle with comments and kick appveyor

* Edit Xls[x]WorkSheet.h side-by-side for parallelism

* Edit NEWS.md

* Coerce numeric to date in a "date" column; fixes #266

* Don't use C++11 when converting string to double

* Test all the warnings

* Enhance comments and refactor xls cell typing

* Consistently use strncmp(this, that, n) == 0

* Simplify logical cell creation in xls

* Fewer parens

* Add comment, collapse two cases

* Use Rf_StringTrue() and Rf_StringFalse()

* Beef up tests of logical coercion; delete old coercion test that adds nothing

* Frontload comments re: xls & xlsx cell types and make more parallel
@ajdamico

This comment has been minimized.

ajdamico commented Mar 5, 2017

thanks very much @jennybc !!

@jennybc jennybc moved this from TODO to Done in jennybc Mar 10, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment