Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

XHTML exceptions for tags #10

Closed
technosophos opened this Issue · 20 comments

2 participants

@technosophos
Owner

Certain tags in XHTML cannot use an unary form (). QueryPath needs to handle those cases in the xhtml() method.

Examples:

@bangpound

If the xhtml() method were to call the document save method with the LIBXML_NOEMPTYTAG option, this could be handled.

http://www.php.net/manual/en/domdocument.save.php
http://www.php.net/manual/en/domdocument.savexml.php

@technosophos

Very good idea. This is now done.

SHA 15515fb

@technosophos

Looks like a single regex could probably do all of the heavy lifting. I think this could be a very compelling solution to an otherwise annoying problem.

Thoughts on what the "right" list of tags to collapse is? Do you think the list in the comment you pointed to is complete?

@technosophos

I checked in a solution for the <br />, <hr /> etc tags. If you wanna give it a go, it's on GitHub now.

I'm looking into what the best solution to the CDATA one is. I have to make sure that the output always remains compatible with an XML parser, which means I can't remove the CDATA section without replacing it with... something. (Otherwise, a strict XML parser will choke on GT and AND operators, single and double quotes, and the like.

@technosophos

One solution that will keep it XML compatible but also HTML compaptible: http://javascript.about.com/library/blxhtml.htm
https://developer.mozilla.org/en/properly_using_css_and_javascript_in_xhtml_documents

I'm thinking that the about.com one will work correctly in the most number of cases, so I will probably modify your regex to do that. Does that sound about right?

@technosophos

I'm trying to reproduce your other reported error, too. (the one about the double-slash on the last element)

@technosophos

Have I got the solution for you!

The code I just checked in allows you to configure how the CDATA section gets replaced. So in the $options, you can add this:

 $options = array('escape_xhtml_js_css_sections' => QueryPath::JS_CSS_ESCAPE_NONE);
 qp($hml, $css, $options)->xhtml();

That will simply remove the CDATA parts.

@technosophos

Can you give me an example doc that generates this problem?

Input: <br /> <img />
Output looks like this: <br /> <img / />

I can't seem to reproduce it here.

@technosophos

Yes, I can see variants of the error now. I wonder if my regex is mis-matching something in the doctype declaration.

@technosophos

Ah, I found it.

The regular expression did not appropriately handle cases where the tag was unary already, and (oddly enough) libxml added a few on its own. So in the end, it was a two-char fix to the regular expression.

@technosophos

I'm going to mark this as "closed" (which is a sure-fire way of finding a new related bug). Please re-open if any additional XHTML creation errors are found.

@sdboyer sdboyer referenced this issue from a commit in sdboyer/querypath
@technosophos Fixed issue #10, fixed docs.
- The XHTML methods now do not collapse tags in unary fashion.
- Made sure to update docs for 2.1
- Credited Emily for her new contributions
- Updated unit tests
15515fb
@sdboyer sdboyer referenced this issue from a commit in sdboyer/querypath
@technosophos Fixing CDATA issue in #10. 2728bfe
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.