Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broken Excel caused by illegal character #243

Open
anykeyh opened this issue Sep 9, 2016 · 2 comments
Open

Broken Excel caused by illegal character #243

anykeyh opened this issue Sep 9, 2016 · 2 comments

Comments

@anykeyh
Copy link

anykeyh commented Sep 9, 2016

Hello,

I'm exporting database to excel using RubyXL, and for some reason it happens a field was containing the character "\u0003". Therefor the export run smoothly but the file cannot be read by Excel anymore.

xmllint doesn't validate the worksheet either:

sheet1.xml:2175: parser error : PCDATA invalid Char value 3
361" s="0" t="str"><v/></c><c r="Z361" s="0" t="str"><v>Fédération Française

For now I'm fixing it under my local code (e.g. adding a bad chars escape mechanism on strings)

@weshatheleopard
Copy link
Owner

Can you attach the problematic file please?

@anykeyh
Copy link
Author

anykeyh commented Oct 12, 2016

I can't send you the file as it's quite confidential / production data, but here is how to rebuild a simili-case:

require 'rubyXL'

workbook = RubyXL::Workbook.new
worksheet = workbook[0]
worksheet.add_cell(0, 0, "This is a \u0003 problematic string")
workbook.write("test.xlsx")

When you try to open this excel it will complain about it being broken and the sheet where the error comes from will be blank.
I'm using excel mac 2011 but I know some colleagues has the error on windows too.

unzipping the excel and xmllinton the first workbook will raise exception, as illegal character is found.

The workaround I've build in my project is to substract the incriminated characters:

#Reject all table from 0 ... 0x1F except "\n", "\r", "\t"
KEPT_CHARS="\n\r\t".split('').map(&:ord)
REJECTED_CHARS = (0..0x1F).reject{|x| KEPT_CHARS.include?(x) }.map(&:chr).join

def safe_xls_string(x)
  x.delete(REJECTED_CHARS)
end

It's just a temporary fix which is perfect for me, but I must confess I didn't went deeper in what characters was authorized or forbidden in the XML, exception of new line, tab and carriage return, obviously useful, so it would be advisable to seek for the info, or encode differently the XML PCDATA ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants