Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Find a safer way to integrate with lxml #283

Closed
scoder opened this issue Feb 6, 2024 · 5 comments · Fixed by #299
Closed

Find a safer way to integrate with lxml #283

scoder opened this issue Feb 6, 2024 · 5 comments · Fixed by #299

Comments

@scoder
Copy link
Contributor

scoder commented Feb 6, 2024

lxml's maintainer here. Users have run into numerous issues in the past due to incompatible libxml2 versions being used by lxml and xmlsec. This is because xmlsec is usually built against system installed libraries, whereas lxml bundles libxml2 and libxslt to ease the otherwise difficult installation. Thus, both can end up using different library versions that make different assumptions about the C tree structure and its handling, as well as missing configuration state (see #239, for example). lxml has about 100x as many downloads as xmlsec, so most users don't use both together, and thus, both have different needs regarding their installation and user base.

I would like to improve the situation by making lxml and xmlsec more independent from each other.

Most xmlsec features seem to rely on serialisation, probably C14N, which can be left to lxml. Exchanging serialised byte buffers should trivially avoid compatibility issues.

At which places does xmlsec need actual access to libxml2 trees? How can we get them to use a safe but efficient data exchange?

@akx
Copy link

akx commented Feb 7, 2024

Looks like python-xmlsec uses both the libxml APIs and the lxml C API 😅

Unfortunately it looks like this repo is pretty unmaintained at the moment (but here's hoping someone steps up to do something about that!).

@eljeffeg
Copy link

I think @jimjag is gonna fork it. 🙏

@jonathangreen
Copy link
Contributor

@scoder is there a way check what version of libxml2 lxml is using via the C api?

Maybe a way to approach this is to check the libxml2 version being used by both lxml and xmlsec and raise and exception if it is not compatible.

It's not as elegant as your suggestion but it seems like an easier change to make and it would at least alert users of the issue instead of having to try to figure out what is causing a segfault.

@yhlee-tw
Copy link

@jonathangreen does it have to be C API?

from lxml import etree

print("%-20s: %s" % ('libxml used',      etree.LIBXML_VERSION))
print("%-20s: %s" % ('libxml compiled',  etree.LIBXML_COMPILED_VERSION))

from https://lxml.de/1.3/FAQ.html#i-think-i-have-found-a-bug-in-lxml-what-should-i-do

That's how I am testing a few combinations on macOS today with your xmlsec.get_libxml_version() in 3191662

This was referenced Mar 21, 2024
@jonathangreen
Copy link
Contributor

jonathangreen commented Mar 21, 2024

@yhlee-tw it would be much easier if we could do the comparison in python but unfortunately the integration with lxml is done via the lxml c api. There really isn't any python code as part of this package.

I took a look at the lxml c api and I don't see a way to get the libxml2 version being used, but I was hoping @scoder or someone more familiar with the lxml c api might have a suggestion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants