Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default to not fetching DTDs from the network #91

Closed
trejkaz opened this Issue Dec 13, 2016 · 13 comments

Comments

Projects
None yet
3 participants
@trejkaz
Copy link

trejkaz commented Dec 13, 2016

I found us having to workaround this one:

        // This is to disable DTD lookups on the internet
        DocumentBuilderFactory dbf = XMLUnit.getControlDocumentBuilderFactory();
        dbf.setFeature("http://xml.org/sax/features/validation", false);
        dbf.setFeature("http://apache.org/xml/features/nonvalidating/load-dtd-grammar", false);
        dbf.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);

I think that by default, at least loading external DTDs should be disabled, since I prefer to have my tests reliably pass without external factors being able to trigger failures. (I don't particularly mind validation being left turned on for DTDs which are local... and am not entirely sure why we are disabling that as well.)

@bodewig

This comment has been minimized.

Copy link
Member

bodewig commented Dec 14, 2016

Hmm, I'm not convinced this should be the default, but we could certainly make things easier for people by providing a configuration option for this so people don't need to figure out the features. The apache features you list only apply to Xerces (and probably the Xerces included within Oracle's JDKs), right?

Your code looks as if you'd still be using XMLUnit 1.x. Is this 2.x's legacy jar or "the real 1.x"?

@trejkaz

This comment has been minimized.

Copy link
Author

trejkaz commented Dec 14, 2016

Looks like it's actual 1.x.

xmlunit: [ 'xmlunit:xmlunit:1.6' ],

I tried bumping to xmlunit-matchers 2.3.0, since I generally do prefer matchers, but it seems like the API is a little bit too different to just switch over, so initially I am moving it to xmlunit-legacy 2.3.0.

As far as whether it should be the default... I dunno. I think options to promote test reproducibility should be the default, so the current default is certainly bad, unless you like flaky tests, which I certainly don't.

Maybe there is a way to have the best of both worlds. Still enable all validation, but set the factories up such that all schemas and DTDs are actually loaded locally, for instance, and if the callback doesn't return a DTD, it either fails fast or skips validation.

@bodewig bodewig added the java-legacy label Dec 14, 2016

@bodewig

This comment has been minimized.

Copy link
Member

bodewig commented Dec 14, 2016

2.x is deliberately very different to 1.x but the legacy project should be a drop-in replacement. I'm not really sure I want to cut a new 1.x release at all anymore. If the legacy jar works for you, that would be fine.

It should be pretty easy to have all DTDs available locally if you configure an entity resolver (using an XML catalog or whatever) but I don't really see what XMLUnit could do to help apart from making i easier to specify an entity resolver.

The topic hasn't really come up before, I guess most people aren't using DTDs at all or use DTDs with only local SYSTEM ids (or none at all).

I get your point about reproducibility but OTOH don't want to silently configure features of the XML parser most people have probably never heard about. IMHO things should be made explicit.

@bodewig bodewig added this to the 2.6.0 milestone Oct 18, 2017

@trejkaz

This comment has been minimized.

Copy link
Author

trejkaz commented Nov 13, 2017

In updating to the matchers style, I'm half way towards this not really mattering.

For equality checks I can just set up a document builder factory and override it:

assertThat(actual, isIdenticalTo(expected)
    .withDocumentBuilderFactory(documentBuilderFactory));

For hasXPath I'd like to do something similar, but it doesn't seem to have an equivalent at all right now.

I guess I can wrap the matcher and do the parsing myself though, and delegate the call with the node...

@bodewig

This comment has been minimized.

Copy link
Member

bodewig commented Nov 13, 2017

I intend to tackle this with 2.6.0 by providing some simplified setup of a DocumentBuilderFactory but wo't find time to do so the coming days.

The XPath related matchers use Convert.toNode without specifying a factory. It should be easy to add a setter for a factory much like the one of CompareMatcher and use that if it has been specified. I'll open a new issue for that.

@bodewig

This comment has been minimized.

Copy link
Member

bodewig commented Jan 7, 2018

#108 is now fixed in the master branch. I still plan fixing this issue and cutting a 2.6.0 release soon.

@bodewig

This comment has been minimized.

Copy link
Member

bodewig commented Apr 13, 2018

I've started adding disabling DTD loading in the branch https://github.com/xmlunit/xmlunit/tree/add-some-xxe-protection and strangely all unit tests still pass, so I wonder whether I've done something wrong. If you could spare a moment to look at my change @trejkaz I'd be grateful.

@bodewig

This comment has been minimized.

Copy link
Member

bodewig commented Apr 13, 2018

Ah, I think this is because the only tests that rely on DTD processing use the validation package which uses SAXParser rather than DocumentBuilderFactory, which I haven't covered, yet.

It feels wrong to disable DTD processing in a class dedicated to validating XML documents against a DTD, so I won't touch this, I guess. I'll just make the validate method public which allows the SaxParserFactory to be passed in.

@bodewig bodewig closed this in 1a542ec Apr 14, 2018

@bodewig

This comment has been minimized.

Copy link
Member

bodewig commented Apr 14, 2018

I think I've covered most places - and despite this issues title the legacy package has the XXE protection disabled. Feedback before I cut the 2.6.0 release would be greatly appreciated.

@trejkaz

This comment has been minimized.

Copy link
Author

trejkaz commented Apr 16, 2018

I gave the snapshot a spin on our project here, and it works for us. (I stepped in to check that the DTD parsing was really off, but also ran on all tests to see if any new failures would appear.)

@bodewig

This comment has been minimized.

Copy link
Member

bodewig commented Apr 16, 2018

great, thanks

@janssen

This comment has been minimized.

Copy link

janssen commented Jul 3, 2018

So, for those of us non-experts, how exactly in 2.6 do you compare two XML docs without validating? The W3C seems to have disabled internet XHTML validation by removing http://www.w3.org/TR/xhtml11/DTD/xhtml-datatypes-1.mod, so I'm suddenly in dire need of being able to not validate.

Should we still use the code at the top of this issue?

@trejkaz

This comment has been minimized.

Copy link
Author

trejkaz commented Jul 3, 2018

I double-checked the actual 2.6.0 release and the default DocumentBuilderFactory set in the builder now turns off almost all the features we were turning off.

So now you can get away with just asserts like:

assertThat(xml, hasXPath("//D:response"));

If you were paranoid you could chain on a call to withDocumentBuilderFactory to set your own factory anyway.

(On the other hand, I went to check up on our XHTML validation, and I guess that's still working, because we must be loading those DTDs from the local filesystem already.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.