Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove_blank_text and other options of XMLParser missing #102

Closed
dga-nagra opened this issue Nov 8, 2023 · 3 comments
Closed

remove_blank_text and other options of XMLParser missing #102

dga-nagra opened this issue Nov 8, 2023 · 3 comments

Comments

@dga-nagra
Copy link

The library doesn't provide a way to remove blank text.
Using the basic lxml library:

from lxml import etree

PARSER = etree.XMLParser(remove_blank_text=True)
res = etree.XML(string, parser=PARSER)

This is the same for any option mentionned in #23.
These are important features that should be supported .

I tried to use ``XMLParserfrom the original libary with theXML()` of this one, I also tried to use `DefusedXMLParser`.
This simply doesn't seem possible.

Can you provide an example on how to access these features?
Thank you.

@tiran
Copy link
Owner

tiran commented Nov 8, 2023

You have to construct a XMLParser a custom parser with lxml.etree.XMLParser and pass it to defusedxml.lxml.XML. You should at least set resolve_entities=False. The custom element lookup is optional.

>>> import lxml.etree
>>> import defusedxml.lxml
>>>
>>> parser = lxml.etree.Parser(remove_blank_text=True, resolve_entities=False)
>>> lookup = lxml.etree.ElementDefaultClassLookup(defusedxml.lxml.RestrictedElement)
>>> parser.set_element_class_lookup(lookup)
>>> e = defusedxml.lxml.XML("<root><el/>  </root>", parser=parser)
>>> lxml.etree.tostring(e)
b'<root><el/></root>'

@tiran tiran closed this as not planned Won't fix, can't repro, duplicate, stale Nov 8, 2023
@dga-nagra
Copy link
Author

dga-nagra commented Nov 8, 2023

Hi @tiran, you said in #33 that

I have deprecated the defusedxml.lxml module and will remove it in a future release.

So I shouldn't be using the solution you provide, or have you changed your mind? Because I already have the deprecation warning when following your recommendation and it doesn't seem that RestrictedElement is available anywhere else.
Could you please not close this issue right away considering the solution you provide is supposed to be deprecated.

@dga-nagra
Copy link
Author

@tiran any update on the deprecated solution?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants