New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in read_fun(path = path, sheet = sheet, limits = limits, shim = shim, : std::bad_alloc #373

Closed
ghost opened this Issue Jul 6, 2017 · 9 comments

Comments

Projects
None yet
3 participants
@ghost

ghost commented Jul 6, 2017

I have an excel file with multiple sheets and each sheet more than 65000 rows. I get the above mentioned error with the latest version (1.0.0) of the package when I try read_excel(path = x, sheet = y, col_names = T) and when I try by downgrading the package with the same code it is ignoring one of the column. I have few timestamp columns so when I try by adding col_types = c(rep("text"), 11) it throwing me an error saying Error in xls_cols(path, sheet, col_names = col_names, col_types = col_types, : col_names and col_types must have the same length When I clearly have 11 columns.
I'm using 16gb machine and i7 processor, I guess RAM size should not be an issue here.

@jennybc

This comment has been minimized.

Member

jennybc commented Jul 6, 2017

So you're saying you have a std::bad_alloc error with v1.0.0 but NOT with the previous release? I.e. you can read the file w/o readxl giving you an explicit error with the previous release?

The problems you report re: ignoring last column and complaining about mismatched col_names and col_types are bugs fixed in v1.0.0.

Any chance you can provide this file to me? How was it created, i.e. was it Excel or some other tool? The last column thing makes me suspect Excel did not make this file ...

@ghost

This comment has been minimized.

ghost commented Jul 6, 2017

I'm afraid, I cannot share the file but yes the file was generated using Neware btsda software

@jennybc

This comment has been minimized.

Member

jennybc commented Jul 6, 2017

There's little I can do w/o the file because there's not enough information to go on. Can you open this file in Excel and resave as .xls and as .xlsx? Does that change readxl's behaviour on it, i.e. can readxl import it after this resaving? That would be semi-informative.

I'm also still interested in confirmation or clarification on this:

So you're saying you have a std::bad_alloc error with v1.0.0 but NOT with the previous release? I.e. you can read the file w/o readxl giving you an explicit error with the previous release?

@ghost

This comment has been minimized.

ghost commented Jul 7, 2017

Yes

So you're saying you have a std::bad_alloc error with v1.0.0 but NOT with the previous release? I.e. you can read the file w/o readxl giving you an explicit error with the previous release?

When I resave it as .xlsx file then it works but it did not work after resaving it as .xls and even if I save contents of only one column with 65536 rows (including column name) from one of the sheet of the existing file to new file with .xls extension it fails

@jennybc

This comment has been minimized.

Member

jennybc commented Jul 7, 2017

OK thanks. I think that suggests the v1.0.0 fix for dropping the last column for some non-Excel-written .xls files actually creates a new problem for .xls written by Neware btsda. We actually patched the underlying libxls library to do this.

But again not sure I can do anything about this without a single specimen to inspect.

@ghost

This comment has been minimized.

ghost commented Jul 7, 2017

I'm attaching a sample file. I hope it helps.
SampleFile.xls.zip

@nassuphis

This comment has been minimized.

nassuphis commented Jan 16, 2018

I get same error, even fro very small ranges.
I believe this has to do with formatting information held in the .XLS file.
I constructed 2 sheets, containing no data.
however, "sheet2" in "bad_sheet4.xls" contains a lot of "fill" formatting.
No data, just fill information:

Here is the code, sheets attached.

bad_sheet5.zip
bad_sheet4.zip

require(readxl)

works

res<-read_xls(
"bad_sheet5.xls",
sheet="sheet2",
range="B10:C20",
col_names=c("B","C"),
col_types="text",
n_max=2000
)

works

res<-read_xls(
"bad_sheet4.xls",
sheet="sheet1",
range="B10:C20",
col_names=c("B","C"),
col_types="text",
n_max=2000
)

does not work

res<-read_xls(
"bad_sheet4.xls",
sheet="sheet2",
range="B10:C20",
col_names=c("B","C"),
col_types="text",
n_max=2000
)

@vkapartzianis

This comment has been minimized.

vkapartzianis commented Jan 16, 2018

It's a problem with XLS sheets that contain 65536 rows. Current code enters an infinite loop in that case (65535 + 1 wraps around to 0 when using 16-bit counters), and all available memory gets eventually exhausted I guess.

I've submitted a pull request that will hopefully resolve this.

@jennybc jennybc added the bug label Mar 16, 2018

jennybc added a commit that referenced this issue Mar 16, 2018

@jennybc

This comment has been minimized.

Member

jennybc commented Mar 16, 2018

@nassuphis I can't open bad_sheet5. But, after the fix in #432, I can read bad_sheet4 and get # A tibble: 0 x 0, which is to be expected, if I'm understanding you.

Once I merge #432 and close this issue, please re-install the dev version of readxl from GitHub. If your problem persists, please open a new issue.

@jennybc jennybc closed this in #432 Mar 17, 2018

jennybc added a commit that referenced this issue Mar 17, 2018

Prevent infinite loop reading xls w/ 65536 rows (#432)
* Prevent infinite loop reading xls w/ 65536 rows

Fixes #373

* Add NEWS bullet
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment