Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTML includes XML declaration and DOCTYPE on JRuby #570

Closed
dnagir opened this issue Dec 15, 2011 · 11 comments
Closed

HTML includes XML declaration and DOCTYPE on JRuby #570

dnagir opened this issue Dec 15, 2011 · 11 comments

Comments

@dnagir
Copy link

dnagir commented Dec 15, 2011

Under JRuby 1.6.5 (1.9 mode) the page.html returns something that looks like:

> page.html

"<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\"...."

We can clearly see that <?xml version="1.0" encoding="UTF-8"> has been inserted which is just plain wrong.
Also the DOCTYPE is transitional HTML 4.01, but this is not correct either because the page was generated with DOCTYPE for HTML5.

This doesn't happen on MRI 1.9.3.

Any clues how to fix/workaround it?

(using rack driver).

@jnicklas
Copy link
Collaborator

Why is it a problem? Are you doing something which checks the XML instruction or Doctypes? If so, why?

@dnagir
Copy link
Author

dnagir commented Dec 15, 2011

Because it is not displayed by a browser. When using launchy, for example
On Dec 15, 2011 11:06 PM, "Jonas Nicklas" <
reply@reply.github.com>
wrote:

Why is it a problem? Are you doing something which checks the XML
instruction or Doctypes? If so, why?


Reply to this email directly or view it on GitHub:
#570 (comment)

@jnicklas
Copy link
Collaborator

I still don't understand, the problem is that it is not shown, how is that a problem?

@dnagir
Copy link
Author

dnagir commented Dec 15, 2011

The generated HTML cannot be displayed by the browser because the first line is XML declaration.
Thus the document is treated as XML. But the 2 nd line is HTML doctype which makes the whole HTML invalid in all formats and thus browser cant display the page.

Even more than that, the doctype is not the same as declared I the views.

Have I explained the problem?

This is definitely a bug which doesn't seem to happen on MRI.

On 15/12/2011, at 23:28, Jonas Nicklasreply@reply.github.com wrote:

I still don't understand, the problem is that it is not shown, how is that a problem?


Reply to this email directly or view it on GitHub:
#570 (comment)

@jnicklas
Copy link
Collaborator

Hmm, strange. Browsers should generally ignore an XML declaration, except that it may throw the browser into quirks mode IIRC. Have you tried this in multiple browsers?

@dnagir
Copy link
Author

dnagir commented Dec 19, 2011

I don't know why browsers should ignore that. Why are we even discussing that...
It's obviously just plain wrong and incorrect. Not the browsers to blame for that.

And it doesn't seems to work in Chrome, Safari and FF.

On 19/12/2011, at 20:03, Jonas Nicklasreply@reply.github.com wrote:

Hmm, strange. Browsers should generally ignore an XML declaration, except that it may throw the browser into quirks mode IIRC. Have you tried this in multiple browsers?


Reply to this email directly or view it on GitHub:
#570 (comment)

@jnicklas
Copy link
Collaborator

jnicklas commented Jan 3, 2012

No, it's not obviously just plain wrong. In fact the HTML5 specification requires HTML documents using the XML syntax to specify an XML declaration at the top of the document. See: http://www.w3.org/TR/html-markup/syntax.html#character-encoding.

Now in this case, the doctype is HTML 4.01 Transitional, which is not an XML based format. However in some old browsers, adding the XML declaration had some effect on quirks/standards mode, I think it still does, but I can't remember.

Furthermore, Capybara has nothing to do with adding this XML declaration, that's Nokogiri's doing. It seems a bit idiotic for Nokogiri to add this xml declaration, but I don't think they are wrong in adding it. In any case, this is not something we can fix in Capybara.

@jnicklas jnicklas closed this as completed Jan 3, 2012
@dnagir
Copy link
Author

dnagir commented Jan 3, 2012

Maybe you can help guys at Nokogiri to solve this issue by explaining better where it misbehaves?

@dnagir
Copy link
Author

dnagir commented Jan 5, 2012

@jnicklas actually it is a problem with Capybara.

I was trying to make a minimal repro for Nokogiri, but it just worked.

The issue is this:

page.html.lines.first # "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n"
page.body.lines.first # "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n"
page.source.lines.first # "<!DOCTYPE html>\n"

And the save_and_open_page method is using body, instead of source which wraps it with the XML declaration.
I think Capybara.save_page should use source instead of body. We really want to save the whole page not only what's inside the body.

@dnagir
Copy link
Author

dnagir commented Jan 5, 2012

The workaround is NOT to use page.save_and_open_page. Instead one should do:

require 'capybara/util/save_and_open_page'
Capybara.save_and_open_page(page.source)

To fix it, I believe source should be used instead of body calling Capybara.save_page.

@gaizka
Copy link

gaizka commented Jan 8, 2012

Hi there!

This is happening to me, too. I just upgraded from Capybara 0.4 to 1.1. Everything works OK in my laptop (ruby 1.9.2p180), but my spec fail in my CI server (ruby 1.9.2-p0, 64 bits machine). I'll try to make a simple failing test.

The problem, as you said, is that there's an extra "", so some parsing by capybara breaks.

Nokogiri has been kept at the same version (1.5.0), so I think this is something related with Capybara.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants