Skip to content
This repository has been archived by the owner on Jan 21, 2020. It is now read-only.

Add a way to customize the load command to run loadHTML or pass addition... #6

Closed
wants to merge 2 commits into from

Conversation

BernhardPosselt
Copy link
Contributor

@BernhardPosselt BernhardPosselt commented Oct 3, 2014

Add a way to customize the load command to run loadHTML or pass additional parameters to loadXML

Example:

$xmlDom = XmlSecurity::scan($xml, $dom, LIBXML_COMPACT);
$htmlDom = XmlSecurity::scanHtml($html, $dom, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);

@ezimuel

@BernhardPosselt
Copy link
Contributor Author

For reference #5

@BernhardPosselt
Copy link
Contributor Author

A different solution would be to provide 2 additional parameters like

<?php
$libXMLOptions = LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD | LIBXML_NONET;  // default LIBXML_NONET
$loadHTML = true  // default false;

XmlSecurity::scan($xml, $dom, $libXMLOptions, $loadHTML);

But I think the callback is the most versatile solution since the whole library is essentially just here to securely wrap around an XML load call and if the loadXML and loadHTML receive additional arguments the code does not have to be changed. Also anything above 3 parameters is usually not a good idea for readability

@LukasReschke @MorrisJobke

@BernhardPosselt
Copy link
Contributor Author

Any updates on this? Please review

@weierophinney
Copy link
Member

The one issue I see with this is the fact that inside scanXml(), one of the key features is that we're passing the LIBXML_NONET callback to the appropriate load*() function. This flag ensures that no external entities are loaded when we create/populate the DOM document instance. If we allow passing a callback, that removes this particular part of the security scan functionality -- and thus makes it less secure.

@Raydiation we need a solution that does not bypass that particular aspect of the code.

@BernhardPosselt
Copy link
Contributor Author

Well the only other solution I can see then is to go for XmlSecurity::scan($xml, $dom, $libXmlOptions, $useLoadHtml);

and use $libXmlOptions | LIBXML_NONET inside the method. Would that solution work for you?

@weierophinney
Copy link
Member

@Raydiation another possibility is a new scanHtml() method, and moving the logic that occurs after we get the DOM instance into a helper method. This would reduce the number of arguments necessary, and give distinct scanners for HTML vs XML, mirroring DOMDocument itself.

@BernhardPosselt
Copy link
Contributor Author

Right, makes sense :)

Will do

@BernhardPosselt
Copy link
Contributor Author

Ok, usage is now:

$libXmlConstants = LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD;

XmlSecurity::scan($xml, $dom, $libXmlConstants);
XmlSecurity::scanHtml($xml, $dom, $libXmlConstants);

The third parameter is optional and will always be binary or'd with LIBXML_NONET

@BernhardPosselt
Copy link
Contributor Author

I've updated the travis config file to update to libxml 2.7.8 to be able to use the LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD headers in the tests if thats ok

…ional parameters to loadXML

remove callable type hint because its not available in php 5.3

remove options from test to not break when using loadHTML on php 5.3 where only one parameter is expected

actually cause test to fail if code is removed

add comment and clean up test

add scanHtml method and allow to pass in libxml headers

try to fix coding style fixes

update libxml on travis to not fail the tests

remove unnecessary if
@BernhardPosselt
Copy link
Contributor Author

BTW: PR is under the same license as the whole zend framework, https://github.com/zendframework/zf2/blob/master/LICENSE.txt

@BernhardPosselt
Copy link
Contributor Author

@weierophinney everything ok now? can i get a 👍 ?

matrix:
allow_failures:
- php: hhvm

before_install:
# need to update libxml to 2.7.8 to be able to run tests using the
# LIBXML_HTML_NODEFDTD and LIBXML_HTML_NOIMPLIED libxml constant
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if those flags are not existing at all? Just a hard failure in the test suite?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ATM yes, I'll add a check for libxml version and skip the test like mentioned in the phpunit docs, sec ;)

if (!extension_loaded('mysqli')) {
            $this->markTestSkipped(
              'The MySQLi extension is not available.'
            );
        }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -72,6 +72,31 @@ public function testScanDom()
$this->assertEquals($node->nodeValue, 'test');
}

/**
* @requires PHP 5.4
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about PHP 5.3? Can there be a test for it as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5.3 does not allow any parameters to be passed to loadHtml so LIBXML_NONET can not be passed in, see http://php.net/manual/de/domdocument.loadhtml.php

I'm unsure if this can be fixed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh, we need 5.3 compatibility here for now :-\

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only option I see atm is to fall back to an xml parser in case php 5.3 and scanHtml is being used. This might yield weird results though so I'm unsure if this is the way to go. Or maybe throw an exception if its used on PHP 5.3 and tell the user to use loadXml instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ezimuel ping on this.

Or maybe throw an exception if its used on PHP 5.3 and tell the user to use loadXml instead?

Can't really do, as there are parts of the codebase loading DOM nodes that are indeed used also in PHP 5.3 context. Writing an HTML parser is not attracting either, and would just be another huge source of trouble.

@weierophinney weierophinney added this to the 1.2.0 milestone Jan 22, 2019
@weierophinney
Copy link
Member

Since we no longer support 5.3 in this library (currently 5.6 and up), I think the last points are now moot. As such, will review again for a 1.2.0 release. I can likely do the rebase and push back to your branch, @BernhardPosselt .

weierophinney added a commit to weierophinney/ZendXml that referenced this pull request Jan 22, 2019
@weierophinney weierophinney mentioned this pull request Jan 22, 2019
@BernhardPosselt
Copy link
Contributor Author

Nah, I think we can kill it then :)

@weierophinney
Copy link
Member

@BernhardPosselt It's a great feature! I've already pulled your code and created #18 to test it - all is green, so it'll be in a 1.2.0 release in a few minutes. 😄

Thanks for the great work!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants