Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Newlines are not preserved #202

Closed
pehlert opened this issue Sep 24, 2015 · 10 comments
Closed

Newlines are not preserved #202

pehlert opened this issue Sep 24, 2015 · 10 comments

Comments

@pehlert
Copy link

pehlert commented Sep 24, 2015

I tried to write a cell with multiple lines, but despite setting text_wrap to true as suggested in another tickets, my newlines are ignored and seemingly converted to spaces.

This is an extract of the code I'm using:

        worksheet.add_cell(idx + 1, 4, ticket.tasks.map(&:full_name).join("\n"))
        worksheet[idx+1][4].change_text_wrap(true)
@weshatheleopard
Copy link
Owner

require 'rubyXL'
workbook = RubyXL::Workbook.new
worksheet = workbook[0]
cell = worksheet.add_cell(0, 0, "line1\nline2")
cell.change_text_wrap(true)
workbook.write("testing_newlines.xlsx")

Works perfectly fine, the cell has a newline in it.

If you can provide a minimal code that exhibits the behavior you are talking about, please provide and reopen.

@pehlert
Copy link
Author

pehlert commented Sep 24, 2015

Many thanks for the quick response.
This is interesting. I was opening the file with Excel for Mac 2016 and "\n" was shown as space.
On Windows, everything works fine.

On Mac, I could get it working by using "\r" instead of "\n", however that was not compatible with Windows. Lastly "\r\n" works on Windows, but adds an additional space on the beginning of every new line on Mac.

I have tried to create a file sheet manually with Excel and it was cross-os compatible. Any ideas?

@weshatheleopard
Copy link
Owner

I strongly suspect that the reason is that Excel uses shared strings instead of inline values that RubyXL uses.

But, can you provide the minimal files for both platforms so I can take a look at the differences beween them? Create a new document, type "a(alt-enter)b" into the very first cell, save them and give to me.

@pehlert
Copy link
Author

pehlert commented Sep 24, 2015

Of course I can, and many thanks for your help already!

The first file was created with Excel 2013 on Windows, the second file with Excel 2016 on Mac.

https://www.dropbox.com/s/drbgwece5x5u9g2/Windows.xlsx?dl=0
https://www.dropbox.com/s/bru3wc5mzt9fzdw/Mac.xlsx?dl=0

@weshatheleopard
Copy link
Owner

Found it. The problem lies with nokogiri which escapes \r while it shouldn't.

@weshatheleopard
Copy link
Owner

I have a strong suspicion that we have with 2 minor bugs complementing each other creating a "perfect storm" condition here:

  • libxml2 has a minor bug where it escapes \r into 
, which is unnecessary because \r is a valid XML character as per spec. Normally, that would have absolutely no effect, since 
 would be unescaped by the XML parser, but...
  • on the other hand, Excel for Mac's XML parser has a minor bug where it somehow unescapes 
 too late in the process, and thus treats it not as a carriage return character, but as a whitespace character.

It's the presence of both of these bugs that creates the effect that you see; if any one of them is removed from the equation, the effect disappears.

I'm afraid I have little to no control over either.

@pehlert
Copy link
Author

pehlert commented Sep 25, 2015

This sounds to me that "\n" should work for line-break on Mac, while "\r" shouldn't, or am I mistaken here? Unfortunately, it's the other way round. Using "\r" gives proper line-break (only on Mac, not on Windows), while "\n" is converted into white-space on Mac.

@weshatheleopard
Copy link
Owner

Using "\r" gives proper line-break (only on Mac, not on Windows),

Perfect! That's the last piece of the puzzle I was missing, and now I have a clear picture of what's going on!

  • Mac's "native" line-break is \r. When loading the file, Excel replaces all the occurrences of \r\n with \r to bring the file to "Mac native" linebreaks; and when saving, does the reverse (you see, in both the files you sent me the linebreaks in that cell are encoded as \r\n);
  • Unfortunately, its parser does replacement first, and unescaping after that. So once all the \r\ns are replaced, 
\n remains that way because it does not match the pattern.Then, Excel converts 
 to \r, ending up with \r\n — and stays that way;
  • Excel treats \n as whitespace.

@pehlert
Copy link
Author

pehlert commented Sep 25, 2015

Ah! That finally makes it clear to me, too. The workaround that I have found for now is to use "\n\r" in my script, which is a little counter-intuitive but gives working line-breaks on both platforms with the only disadvantage of adding a trailing whitespace to each line which doesn't hurt in my particular use-case.

@weshatheleopard
Copy link
Owner

So, from the conversation in sparklemotion/nokogiri#1356, we conclude:

  • libxml2 tries to be excessively helpful by escaping \rs that it shouldn't;
  • Excel does unescaping after stripping — which is wrong; it should be the other way around.

Because of that, there's no good way to solve this issue. The only reasonable way I see is to persuade the authors of libxml2 to implement some sort of "quirks" flag that would tell it "do not escape \rs!"

So it looks like I don't have a good way to solve this. :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants