-
Notifications
You must be signed in to change notification settings - Fork 107
Open
Labels
Description
python-xmlsec currently relies on passing raw xmlNodePtr objects between lxml (which builds on libxml2) and xmlsec1 (which also uses libxml2). This creates a fragile situation where different versions of libxml2 may be loaded into the same process, leading to:
- Segfaults or memory corruption due to incompatible struct layouts
- Invalid memory free errors (e.g., double-free or mismatched allocators)
- Signature verification failures caused by inconsistent parser state
- Undefined behavior from mismatched
libxml2global configuration
This occurs because:
lxmlbundles its ownlibxml2andlibxslt(especially in binary wheels) to ease installation for users on Windows, macOS, and some Linux platforms.python-xmlsecbinds toxmlsec1, which in turn links to the system'slibxml2.- Pointers like
xmlNodePtrcreated bylxmlare then passed topython-xmlsecfunctions liketree.find_node()orSignatureContext.sign().
If the libxml2 versions are not ABI-compatible, this can easily lead to crashes, unpredictable behavior, or memory corruption.
Proposed Solution: Decoupling via Canonicalized XML
Instead of passing xmlNodePtr from lxml to python-xmlsec, we should support passing serialized XML (as bytes), ideally using Canonical XML (C14N) where appropriate. This isolates the XML parsing and memory management between the two libraries.
Example Usage
from lxml import etree
import xmlsec
doc = etree.fromstring("<Root><Signature/></Root>")
c14n_bytes = etree.tostring(doc, method="c14n", exclusive=True)
# Proposed new API:
signed_bytes = xmlsec.sign_serialized(c14n_bytes, key_file="key.pem")
# Parse back with lxml if needed
signed_doc = etree.fromstring(signed_bytes)D3X