Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Error using col_names and col_types #81
> read_excel(data_file, sheet = "ES") %>% colnames  "Lfd. Nr." "Name" "Vorname"  "Geb.-datum" "OP-Datum" "Alter"  "Konv. OP-Risiko" "Card. Schock" "Rescue"  "aEuroScore" "loEuroScore" "EuroSCORE II"  "STS_Mortalitaetsrisiko" "STS_Morb./Mort." "STS Long Stay"  "STS Stroke" "STS Pronl. Vent." "STS Ren Fail"  "Syntax Score" "Rhythmus" "A-fib"  "PM" "Z.n. AKE" "Z.n. ACVB"  "Z.n. MKE/R" "Z.n. anderer OP (Perikarderoeffnung)" "Stroke, neurol. Defizit"  "pAVK" "COPD" "SPAP>50"  "pHTs" "pHTd" "pHTm"  "Diab mell" "NI" "KHK"  "Z.n.Stent" "Dialyse" "Risikofaktoren"
Now I would like to use my own column type and name specification for the 39 excel columns:
# shorthands for readxl package num <- "numeric"; txt = "text"; date = "date"; blank = "blank" colspec.es <- c( id = num, es_name = txt, es_vorname = txt, es_dob = date, es_date_op = date, es_age = num, es_konv_risk = txt, es_card_shock = num, es_rescue_indication = num, es_addESI= num, es_logESI= num, es_ESII= num, es_STS_mort= num, es_STS_mort_morb= num, es_STS_longstay= num, es_STS_stroke= num, es_STS_prolong_vent= num, es_STS_ren_fail= num, es_syntax = num, es_rhythm = txt, es_afib = num, es_pm = num, es_sp_ake = num, es_sp_cabg = num, es_sp_mk_op = num, es_sp_other_pericard_open = num, es_stroke_neuro_deficit = num, es_pavk = num, es_copd = num, es_spap_gt_50 = num, es_pHTs = num, es_pHTd = num, es_pHTm = num, es_diabetes = num, es_ni = num, es_khk = num, es_sp_stent = num, es_dialyse = num, es_risks_txt = txt ) xls.es <- read_excel(data_file, sheet = "ES", skip = 0, col_names = names(colspec.es), col_types = unname(colspec.es))
This results in the error message "Error: Need one name and type for each column"
My session Info:
What do you think?
Hi, I have stumbled on that bug, too. I don't know if this is related, but when one does specify
I can confirm the point from @mr-majkel about
Works for a file with four columns, but
doesn't work for the same file.
It is rather easy to work around the problem with
On the other hand it may be a design choice to force the user to specify a column_type for each column. If that's the case, one could clarify the documentation by adding
@hadley, you can find an example file at https://dl.dropboxusercontent.com/u/7082685/Temporary/readxl_issue_81_temp_excel_file.xlsx.
There is some data in column A, and column B is a blank column which Excel somehow considers not to be blank (the issue disappears when deleting column B).
temp 1 12.10174 NA 2 12.32619 NA 3 12.50086 NA
while specifying only one column type then obviously throws an error
read_excel('readxl_issue_81_temp_excel_file.xlsx',col_types=c('numeric')) Error: Need one name and type for each column
The problem @meyera was experiencing is different, as he has no empty column names. As far as I understand it, you can reproduce it by
> read_excel('D:/Dropbox/Public/Temporary/readxl_issue_81_temp_excel_file.xlsx',sheet=2) temp test test2 1 12.10174 1 1 2 12.32619 2 2 3 12.50086 3 3 > read_excel('D:/Dropbox/Public/Temporary/readxl_issue_81_temp_excel_file.xlsx',sheet=2,col_types=rep('numeric',3)) Error: Need one name and type for each column
using the second sheet, which looks like
I suppose the first example is just a strange Excel thing, and the second one is the result of a not properly formatted table, so I'm not sure if anything needs fixing...
Anyway, thanks indeed for another awesome package!
My session info:
> sessionInfo() R version 3.1.1 (2014-07-10) Platform: i386-w64-mingw32/i386 (32-bit) locale:  LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United Kingdom.1252  LC_MONETARY=English_United Kingdom.1252  LC_NUMERIC=C  LC_TIME=English_United Kingdom.1252 attached base packages:  stats graphics grDevices utils datasets methods base other attached packages:  readxl_0.1.0 loaded via a namespace (and not attached):  Rcpp_0.11.5 tools_3.1.1
I think @rogiersbart demonstrated the problem in the second sheet example. read_excel() without col_types did silently drop those empty columns, but with col_types they are required to specify. Reading the file first without col_types you don't realize that there are blank columns.
When I realised the problem, blank columns were last two columns. So, it was not obvious that there were blank columns, even inspecting from excel. Unfortunately, I haven't been able to reproduce that. If I intentionally insert blank columns after other columns, I get empty column on table as in @rogiersbart's sheet 1 example. And, that's ok.
I am also being bitten by this issue. A couple of requests:
I am experiencing this issue on a publicly available xls file that has no visible blank columns. It has one blank column to the right of the data range detected using guidance here ("last cell" is DR445). Opening the file in Excel, manually deleting all the columns to the right of the visible data range (everything after DQ), saving as a new xls file, and importing that xls file does work, but is an undesirable solution (this is only one of many files to be read in and all are supposed to be identically structured, but in reality may have different numbers of invisible blank columns to the right).
I would like it if when the user specifies
The issue is reproduced below.
library(readxl) # download data for 2013 from CA OSHPD website download.file(url = "http://oshpd.ca.gov/hid/Products/Hospitals/QuatrlyFinanData/Qtr2013/2013_Q4R4.xls", destfile = "2013_Q4R4.xls", mode = "wb") # Excel file has 445 visible rows (header + 444 data rows) # Excel file has 121 visible columns (A:DQ) # set column types based on data documentation # available at http://oshpd.ca.gov/hid/Products/Hospitals/QuatrlyFinanData/QFUR2000AfterDoc.pdf OSHPD_col_types <- c("text", "text", "numeric", "date", "date", rep("text", 17 - 5), rep("numeric", 121 - 17)) length(OSHPD_col_types) # should be 121
data_2013_col_types <- read_excel("2013_Q4R4.xls", col_types = OSHPD_col_types)
# not passing any col_names and col_types works # though with the DEFINEDNAME: notes data_2013_nothing <- read_excel("2013_Q4R4.xls")
# trying to pass in both names and types also does not work OSHPD_col_names <- colnames(data_2013_nothing) length(OSHPD_col_names)
data_2013 <- read_excel("2013_Q4R4.xls", col_types = OSHPD_col_types, col_names = OSHPD_col_names)
This was referenced
May 1, 2016
added a commit
Oct 11, 2016
referenced this issue
Oct 11, 2016
added a commit
Dec 9, 2016
I also have some similar experience.
and the code:
Thanks in advance! (The readxl is a realy great package.)